The Mind of Jay

prepare for a brain dump

Archive for the ‘Internet’ Category

Making Shortened URLs Safe

I was reading a post titled “Persistence Pays Parasites” about internet security by Cory Doctorow and it got me thinking how people are now perceiving more & more that shortened URLs are potential security risks, given the growing complexity of all the interwebs and all the angles that phishing attacks may spawn.

The inherent problem with shortened URLs is that we don’t know what the links will lead to UNTIL we click on them.  That degrade the part of the usefulness of the URL-shortening which is typically meant to crunch a URL down to save precious characters - such as to fit in the 140-character limit of Twitter status updates or 400-character limit of Facebook updates.  There are also other purposes, not so pure, such as affiliate marketers using the shortened URL to hide affiliate codes.

Shortened URLs work by deploying an HTTP redirect.  It’s nothing magical and is pretty low-resource overhead for a service running a URL-shortening site.  A lot of web site owners will deploy their own URL-shortening by owning a short domain and using that, but most people are not site owners or web developers.  Most of the time, it’s one consumer sending another consumer a URL and their choice of URL shortening is limited and often that is what phishing attacks exploit.

Here’s what I propose:

Most of the purpose behind URL-shortening is for delivery and not necessarily a limitation of the technology used to display the URL.  For example, Twitter imposes a 140-character limit solely because they only want to transport 140 characters.  Once the message is delivered to the recipient, there is no longer a need to limit how the message is displayed.  If someone sends you a tweet and you view it on your phone, more often than not your phone will automatically create a live link to that message.

So, if I want to send a link to http://bit.ly/aZTRIP in a message, the device displaying it makes it a live link like so http://bit.ly/aZTRIP.

However, the device that is enabling the display is not typically limited to 140 characters.  Remember, that’s just a transport limitation.  So, there is no reason to not support a mechanism in the display device to pre-fetch a URL via a HTTP HEAD request to determine what the ultimate landing URL will be and then subsequently display that URL in some way so that the recipient can learn what URL they will end up on BEFORE clicking the link.  In desktop environments where mouse hovering is available, it would be as simple as adding a title tag to the link where the value of the tag is the landing URL.

For example, the live link above does not have such a title tag, but this one does: http://bit.ly/aZTRIP. If you hover over it, notice that it tells you where the shortened URL will lead to.

For mobile devices, that gets a little harder to mimic since there is no equivalent “mouse hover”, but there are any number of other ways to do it:

All of this could easily be made possible by pre-fetching the reference to the shortened URL by the device used to display the message, without a need to pre-fetch the actual page.

This can also be done by service providers before messages are delivered, but it’s generally not a good idea for delivery services to edit the content of messages, so this really should be left to the display devices themselves.

The caveats are added network bandwidth for both the recipient and URL shortening services, and click stats for URL shortening services might get skewed higher, but ultimately it would allow URLs from such services to be more easily trusted as there will be methods in place to preempt the effectiveness of phishing attempts on unsuspecting consumers.  The tradeoff is well worth it.

So who will be the first to deploy this on a broadly deployed device or platform?

bookmark & share:
  • Digg
  • del.icio.us
  • Reddit
  • StumbleUpon
  • Google
  • Facebook
  • YahooMyWeb
  • BlinkList
  • blogmarks
  • Fark
  • Furl
  • Live
  • Propeller
  • Pownce
  • Bumpzee
  • BlogMemes
  • Slashdot
  • Spurl
  • Technorati
  • 0 Comments
  • Filed under: Internet
  • Even when you “opt out” of Facebook’’s new “Like” button program, here’s how Facebook still exploits your privacy.

    While you have the new feature of “Like Everything” or “Instant Personalization” (or whatever they’re call it) set to “on”, you will be alerted at least once per site whether you want to opt in for that site to be enhanced with “Like” links. You can choose to opt out, so seems safe, right? Or, you can even turn off the feature entirely and not get bugged to make that choice and, again, seems safe, right?

    Wrong.

    Here’s why:

    In order for Facebook to actually know whether you’ve opted in or not on any particular site, the JavaScript they’ve talked all these sites into integrating still has to run on those sites. In other words, when you go to Yelp or Pandora or CNN or other site that has made such an arrangement with Facebook, even if you’ve opted out, Facebook itself still knows you are visiting that site. This is the case a lot of times with many of the Facebook connect login features but at least in that scenario it’s not necessarily the case that live JavaScript is running on those sites served directly by Facebook.

    To put it simply, even if you’ve opted out, when you visit such sites then Facebook STILL knows you are visiting the site and will follow your every move on that site. If you think that Facebook is not doing anything with that data, you are fooling yourself.

    Facebook is competing with Google for eyeballs. Part of the purpose of capturing eyeballs is monetizing them. Google’s largest revenue source is something called AdWords which is supported heavily by a service called Google Analytics which many big sites on the web have deployed. But, in the case of many of those sites embedding Google’s JavaScript, Google doesn’t know who the person behind the browser is. Facebook, on the other hand, DOES know. Even if you are not logged in at the time, they can track you with a cookie and when you log in again they will tie all your surfing habits to YOU. ADVERTISERS WILL LOVE THIS. This is their grand scheme, this is their first step in an attempt to co-opt Google’s share of analytics services and potentially overtake that market by exploiting private surfing habits of members - did you opt into THAT? No? Too bad, you’re opted in whether you like it or not, and all your surfing habits will end up in a database owned by a corporation with a history of mis-managing the privacy of member data.

    The kicker is - you have NO IDEA which sites will eventually deploy this hidden Facebook integration.

    Given Facebook’s abysmal track record at taking member privacy seriously, do you seriously trust potentially the entire web to be “watched” by Facebook, with Facebook having the unprecedented potential to not only see the surfing habits of people but also tie all those habits to your personal and private info?

    bookmark & share:
    • Digg
    • del.icio.us
    • Reddit
    • StumbleUpon
    • Google
    • Facebook
    • YahooMyWeb
    • BlinkList
    • blogmarks
    • Fark
    • Furl
    • Live
    • Propeller
    • Pownce
    • Bumpzee
    • BlogMemes
    • Slashdot
    • Spurl
    • Technorati
  • 0 Comments
  • Filed under: Internet
  • I’ve been on an anti-Google tear recently (privately, Facebook, nothing really public).  The more time that goes by, and the more the company is observed, the more they are observed go against their self-proclaimed “don’t be evil” creed.

    An example from about 7 months ago is a blog post by Matt Cutts, the guy every SEO-obsessive type seems to listen to the most:

    http://www.mattcutts.com/blog/pagerank-sculpting/

    In the post, he describes the change in algorithm Google applied some time in 2008 in regards to rel=”nofollow” links.  I’ll get to why it’s all a big cluster fuck in a moment but first some background on me:

    I’m the kind of guy who doesn’t spend a lot of time on “SEO” concerns or jumping through every hoop those types jump through.  I focus on the creation of good ideas, good content, good value web-driven resources and the rest SHOULD sort itself out.

    The problem is that the kind of content I maintain is followed by a lot of people who link back through blogs, fora, social sites, etc, and when Google announced the deployment of rel=”nofollow” a year later a lot of web frameworks, blog & forum software and hosted service sites deployed rel=”nofollow” on all user-submitted content and, further, webmasters all over went a step beyond and applied rel=”nofollow” to almost everything.  The impression was that it was a way to salvage or enhance “link juice” when the main think (allegedly) Google was trying to do with it was reduce the amount of spam links they inevitably followed.  It was a network resource saver for them as well as a way for them to reduce the processing load of figuring out the rankings of billions of pages.

    The result when the dust settled was that people who don’t concern themselves with SEO and just simply manage good content got hurt.  Organic search got affected and it forced those types like myself to have to pay attention to SEO.  My theory is that Google’s strategy was to force more web sites to play the SEO game and thereby create an army of webmasters who jump every one of their hoops in order to shape the web’s reliance on them in search dominance.

    Ho nofollow originally worked: If a page of rank 10 had 10 links on it, a rank of 1 was passed down the pipe to each of those links.  If 5 of the links were set to nofollow, the rank would be divided among the remaining links (2 each) which then causes SEO people all around to “hoard” link juice by using nofollow on even more links as a way to “shape” page ranks throughout their sites.  Well, as Matt described in his blog post, the new method as of 2008 now causes rank to leak even to those nofollow links, causing the remaining links without nofollow to have less “link juice”.

    The end result of this is that they made using nofollow necessary and now they’ve made it less-than-useless.  In fact, it’s now downright detrimental.  What they’ve FURTHER done to dodge people simply removing nofollow is to actually count any links AGAINST sites if those links are otherwise pointing to stuff their engine thinks is spam or bad content.  Technically, this should NOT affect me, since I don’t allow active links in the first place rather than allow links to sites I may not otherwise perceive as valuable.  But, this DOES affect me because it further degrades the value of links coming INTO my sites because I’m pretty sure a lot of such sites that link to my resources play the “SEO game” too much and therefore have degraded the value of links throughout their site due to following the nofollow mantra too closely.

    I have NOTHING to do with sites who link to me but having links come back to me is SUPPOSED to be a sign of authority.  But, the way Google has tiered this deployment, first the organic click value was reduced to rel=”nofollow” being deployed and now it will degrade even more because the remaining links are reduced in passing rank.

    Cut in half once, then cut in half again.

    Do you see what’s wrong with this picture?

    But, I have a solution, one that I’m sure Google will eventually work around.  If they want people to play the SEO game, then I’ll be on the side of people who take it upon themselves to “game” them.  They want to retain value in their search engine, and the way to do that is make people jump through hoops to play their game.  If they want people to play their game, then I’ll be on the side of out-gaming them.

    It’s a simple matter of math.

    If Google passes rank based on the number of links on a page, still counts nofollow links to do their division but no longer give back that diluted rank to the links on the page without nofollow, then the way around it is to NOT remove rel=”nofollow” on those links unless you know for sure that those links don’t point to something Google considers spam.  Since you don’t know how to figure that out (who does?) then your option isn’t in removing rel=”nofollow”.  Doing so won’t HELP you and it may even HURT you.  But, if you get value from those links (like, perceived value of visitors of sending them somewhere valuable, or they are paid links or point to other web sites you run) but no need for passing rank, then you need a solution that allows you to STILL link to those resources without those resources counting AGAINST you and without those resources diluting the other links on your pages.

    I don’t like this but the solution in those cases, where possible, is to re-code those links with JavaScript.  95% or more of people now have JavaScript enabled.  For those without JavaScript enabled, a simple noscript containing a URL that isn’t hyperlinked will suffice.  That addresses usability issues in a reasonable way.

    On my own sites, I’ve simply dodged the allowance of people to leave links in the first place.  I’ve also dodged the use of rel=”nofollow” except in the rare circumstances where it made the most sense.  I can only do things that improve how I pass rank, simply because my focus all along has been to maintain high value resources, something Google keeps telling us is what they want (yeah right).

    However, since I am likely affected by this whole nofollow fiasco in the context of outside sites linking into to mine, I will give all you SEO-obsessed guys this little solution:

    <script language=”text/javascript”><!–
    document.write(’< href=”http://outsidesite/”>Great Resource</a>’);
    –><noscript>Great Resource: http://outsidesite/</noscript>

    Now, the (allegedly intended) spirit of rel=”nofollow” is, er, followed and the remaining active links on your site get to retain more “link juice” than they were left with based on the most recent change.  Those of you who want to render via the DOM can do so, the above is just the most basic example to start with.

    Now a page with a rank of 10 with 5 direct links and 5 nofollow links will have the rank divided among the direct links. Further, it gives you a means to dodge some future spanking they might deploy if they decided that they’ll degrade the rank of your pages/site based on even links you’ve designated nofollow.  It dodges the issue entirely, and allows you to go back to “re-shaping” and not cause grievance to the sites you do link to directly because the rank you pass them is back to being “shape-able”.

    This will likely get fucked in a year or so when Google is able to start following JavaScript links and treat them as raw HTML, but that’s just another hoop for Uncle Google that you’ll have to jump through, isn’t it?

    bookmark & share:
    • Digg
    • del.icio.us
    • Reddit
    • StumbleUpon
    • Google
    • Facebook
    • YahooMyWeb
    • BlinkList
    • blogmarks
    • Fark
    • Furl
    • Live
    • Propeller
    • Pownce
    • Bumpzee
    • BlogMemes
    • Slashdot
    • Spurl
    • Technorati
  • 0 Comments
  • Filed under: Blogging, Internet
  • I hope I’m not re-inventing someone else’s wheel…

    While coding a new page design, I finally got fed up with calling external JavaScript code which would use any form of document.write() method, since the page won’t “finish” loading from a client perspective until the JavaScript is finished being downloaded & executed.  Placing such scripts in the <head> of the page is even worse as the page won’t even begin to render until all pieces within the <head> are retrieved.

    The new methodology designers are using is to embed AJAX (JavaScript + XMLHttpRequest object) functions to embed additional page content (by updating the content of tags with .innerHTML or other aspect of the DOM), since the page will finish rendering and not “lag” waiting for that code to execute.  However, this still doesn’t address the issue of the browser still having to wait for the library to get fetched & loaded in the first place.

    I’m not certain if anyone else is doing this, but I decided to try a new method based on AJAX to further speed up page loading.  The caveat of the method is that a page might finish loading before some JavaScript features have fully rendered, but my feeling is that a faster-loading page in general is much more acceptable to a visitor than one that loads slower, even if the faster loading doesn’t display the enhanced functionality immediately.

    Here is what I’m doing:

    Let’s say you have a bit of AJAX functionality inside of an external JS file.  You still have to wait for that file to get fetched before the browser starts to execute it.  By embedding a simple AJAX function into the HTML itself, the JS file can be fetched via AJAX (essentially the same speed it would take the browser to do it anyway without AJAX).  Multiple JS files can be fetched in a single call.  Then, I wrap the JS inside of <script> tags, embed inside an empty <div> with an id of “ajax” and then loop through this code:

    var div = document.getElementById('ajax');
    var scripts = div.getElementsByTagName('script');
    for (var i = 0; i < CCx.scripts; i++) {
      eval (scripts[i].text);
    }

    What this effectively does is allow the page to avoid waiting for external JS files to load before considering the page to be finished loading.  This may be more of an improvement on older browsers but even on Firefox 3.5 and Internet Explorer 8, I’ve noticed an improvement.  It also dodges an issue with Firefox causing pages to lag when the the Avast virus scanner is running on the client computer.

    The main thing still to test is caching, as this method may be moot if the browser re-fetches the JS file on every page load, but I believe all modern browser treat an XMLHttpRequest as if it was requested normally by the browser and will cache those results.  If you don’t want caching, like if your JavaScript code output is server-side dynamic, such as generated by a CGI, then just append a random string as a URL parameter to force a fresh request every time.

    bookmark & share:
    • Digg
    • del.icio.us
    • Reddit
    • StumbleUpon
    • Google
    • Facebook
    • YahooMyWeb
    • BlinkList
    • blogmarks
    • Fark
    • Furl
    • Live
    • Propeller
    • Pownce
    • Bumpzee
    • BlogMemes
    • Slashdot
    • Spurl
    • Technorati
  • 0 Comments
  • Filed under: Internet
  • Facebook recently updated their privacy features which were supposed to enhance member privacy, in particular who might be able to see you in the context of networks as well as who has access to see your friend list.

    One public relations issue they had was with the confusion of the description of the options, in particular for those members who had not previously defined a specific privacy setting for certain things and who were then presented with an option to make their friend list visible to “everyone”.  Default thinking for most people is to interpret “everyone” to mean those limited to their friends when in reality it means, literally, everyone.  That confusion alone caused many members to expose their friend list in a way they weren’t intending.

    Beyond that point, there are those who believe that even if they understood the settings that their friend list is, in fact, private if they set it that way when, in reality, it’s not as private as they think.  I will explain in a moment, and once the issue is understood then the hope is that it will educate people to pressure Facebook to be more serious about privacy and not simply the perception of it.

    For the sake of argument, presume you have a Facebook account and friend list of 100 people.  Let’s presume you’ve set your privacy settings very strict, limiting only your direct friends to be able to see your friend list.  Or, even made your friend list totally hidden to everyone.  The perception you are given is that now your friend list is “private”, invisible to outside prying eyes.  And, to a casual observer, it would be - anyone who has any access to your account or even just a knowledge that you have a Facebook account, even if that account exposed other things you wanted to expose, would never see who is on your friend list.

    Here is where this “privacy” setting fails.  In fact it fails in the key way that it’s probably intended to NOT fail: It does not protect your privacy to outside “crawl” companies as well as it would a casual person perusing your profile.  For simplicity sake, presume that Google is a third party interested in finding out as much as they can about every Facebook profile they could.  Now consider that even though YOUR friend list is set to private, the friend lists of your friends may not be.  You cannot control their privacy settings any more than you can control how your privacy is extended when any of your information which otherwise you make available to them would be controlled.

    For a company like Google, with the time & resources to be able to crawl just about anything made available to the public (and even things not publicly available depending on their deals with various sites), if they can crawl all your friends’ profiles and see THEIR friend list, they could see you on those lists and, through inference, be able to build up a profile of your friend list - a list you have effectively set to “private”.  The amount of this list which gets exposed is limited only by how many of your friends have set the same privacy settings as you.  Judging by the default behavior of most people, it’s fair to say that, on average, approximately 80% of any given person’s friend list on Facebook would be exposed even if that person’s friend list has been set to the most private setting available.

    Facebook can solve this, but it’s unlikely they will because in order for them to get value out of their network they are pressured to ensure that even if they offer “privacy” settings, those settings are effectively easy to break down by outside crawlers.  It’s highly likely they’ve even provided such outside parties an exact road map on how to make the inferences.

    How can Facebook solve this?  It would be as simple as offering an additional privacy setting which allows members to determine whether their visibility on their friends’ friend lists would expose them to any depth beyond those people’s direct friends.

    I recently got asked by a friend why I might want to limit people knowing who my friends are.  It’s got nothing to do with that.  I have no problem with my friends knowing who my other friends are, but I don’t want Google or other such companies of questionable trust or intentions to have this information unless I allow them to have it.  This could be controlled and within my ability to control if Facebook would provide a privacy option to allow me to define how DEEPLY I am seen on others’s friend lists beyond those people’s direct friends.

    Keep this in mind the next time you consider whether large corporations like Facebook or Google are really giving you control over your privacy or if what they’re really doing is pulling the wool over your eyes.  Consider that they are making you believe they’re protecting your privacy when what they’re actually doing is setting up private pacts with each other which allow them to expose whatever it is about you they want to expose to each other in order to maximize the value of your data for their self interest.

    bookmark & share:
    • Digg
    • del.icio.us
    • Reddit
    • StumbleUpon
    • Google
    • Facebook
    • YahooMyWeb
    • BlinkList
    • blogmarks
    • Fark
    • Furl
    • Live
    • Propeller
    • Pownce
    • Bumpzee
    • BlogMemes
    • Slashdot
    • Spurl
    • Technorati
  • 0 Comments
  • Filed under: Internet
  •  

    March 2024
    S M T W T F S
    « May    
     12
    3456789
    10111213141516
    17181920212223
    24252627282930
    31