18 September 2006

Web Privacy - How to get it, how it can be taken away, how we could get more of it

Scott McNealy said that "you have no privacy, get over it", but that shouldn't stop you from fighting to keep every bit of privacy you can get. In this post, we look at some old and new techniques that web sites can use to track your identity and some ways that you can protect your identity and use of the internet.

Cookies: It's well known that cookies can reduce your privacy by allowing a web site to track your identity across multiple visits, separated by minutes, days or years (Google cookies last until 2038, see here for issues with this and a workaround). In theory cookies are only accessible by the web-site that created them (so yahoo.com can't access a cookie for google.com), but some techniques (like Jookies and link-colour spying, see below) do allow one site to spy on your activity on another site.

IP Address monitoring: Every computer connected to the internet has a unique IP address. A web site can easily store this address and use it to track your activity. You get some privacy from the fact that:
  • your address may change over time (depending on your Internet provider and any DHCP settings), and
  • organisations almost always have an edge router or proxy, which is typically all that a web-site can see
Jookies: Cookies are not the only way to identify you; expect to see more web sites use the interesting technique described by Mukund (among others) and discussed recently on Slashdot. Normally javascript files are unchanging; the same file is served to all users. But by generating a Javascript file for each user, the web site can serve you a file that contains your identity. Any web page from that site simply has to use this Javascript to pass your identity back to the server. Since the script file is marked to remain in your cache for a long period, it will stay around like a cookie on your computer.

Link colour spying: Another method of spying is to use Javascript to query the colour of a hyperlink; because visited links are often displayed in a different colour to a non-visited one, a web-site can determine if you have opened a site previously.

Cache timestamp spy: A slightly more complex technique is to use the date-stamp on files in your cache. When a web site serves your browser a file, it can choose to give the file an expiry date. Then when your browser requests that file, it passes that value using the If-modified-since header property. If the web site constructs an arbitrary unique date (say differing only by a second) for each user, this value may be used to identify you.

Cache timing: Another way to spy on the cache is to use Javascript to load the files; a file that is in the cache will be retrieved much faster than the same file accessed over the web, so if the load time is measured, a web site can determine if you have visited the site that hosts the file. See this web timing article (PDF) for more details.

Cross-site cookies: Internet Explorer and Firefox both employ partial techniques to prevent a web-site from accssing cookies created by another web-site, but they each have different weaknesses:
  • IE only allows the web page's main site (the top-level frame domain) to set (create) a cookie, but that cookie may be read when any page accesses a file (e.g. an image, javascript or CSS file) from that site
  • Conversely, Firefox will let any site set a cookie, but wil only allow the web page's main site to read a cookie
Finally of course, you send an identity "signature" (http headers) to each web site you visit that contains the following information:
  • what media-types you accept (effectively, what plug-ins you have installed)
  • what language(s) you accept (e.g. en-US means English US)
  • (potentially, but few if any current browser do this) your mail address
  • what kind of browser ("user agent") you are running; for example, my browser reveals the following: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.7) Gecko/20060909 Firefox/1.5.0.7
  • Javascript running in your browser can also send your operating system and screen resolution to a web-site
So what else can browsers do to make it easier for users to maintain some level of privacy? Here are some thoughts (with examples focussing on Firefox because it is a platform that enables extensions to rapidly evolve or customise the browser to make it easier to use, safer and more powerful):
  1. Block spurious cookies: From Martin Pool: "There may be more fine-grained controls, such as only accepting cookies from the same server as the top-level page currently viewed and not from servers for subsidiary requests such as images or frames.". In my view, this feature should be an option that is integrated directly into each browser's cookie control mechanism rather than provided via an extension.
  2. Don't send the referer URL: when you access a web page, your browser sends the referrer URL to tell the site what page you clicked from; in the case of search engines, this will also pass the keywords you searched for. Although there are many legitimate uses for this information (web site maintainers and bloggers use this to find out what their readers were looking for so that they can try to provide more of that), it can also reduce privacy. If the browser allowed the user to not send the referrer URL for specific sites (e.g. http://www.google.com), this would immediately provide greater privacy. Extensions like RefControl (and others) for Firefox can do this for you.
  3. Don't cache certain (types of) files from certain sites: if you know that a site is using caching techniques to identify you, you could stop caching files from that site. If you knew that it was just a Javascript file or a specific image file with a certain timestamp, you could refine it to just block those specific files from that site from the cache. Although there are many shortcuts to manually clear the cache, I'm aware of no Firefox extension that provides this precise feature, but I'll post an update here when one becomes available; my suggestion for a name: CacheBlock. In the meantime, you can get some benefit by using the Firefox AdBlock extension to block specific image files, but you have to figure out which ones are being used for timestamp spying; what is really needed is a web-based service that returns the URL patterns of such files that may appear on a given web-site. Alternatively you can use Stanford SafeCache which divides your cache by domain (the main web-site, e.g. google.com) so that a hosted file that is included on different web pages from different domains will be retrieved separately for each domain. This technique has the downside that it slows down some of your browsing experience and it uses your bandwidth to download a file multiple times. I suggest that an alternative approach would be to artificially delay retrievals from the cache where the main web page is different from the one that originally caused an included file to be cached.
  4. Block certain data: most web sites don't need precise details of the browser you are running (because they generate at most 2 or 3 flavours of HTML); the user agent http header could be generalised to just provide Internet Explorer or Mozilla. Suggested extension name: PrivacyBlock.
  5. Custom security for JavaScript: the Java programming langauge has very fine-grained security control that allows the user or installer of an application to define exactly what a program can and can't do; however most JavaScript runtimes support only two modes: on or off. It would be very useful if a browser came with a number of JavaScript profiles that the user could choose as a default and (where necessary) for an individual web-site. The profile would choose from a list of capabilities that are either granted or denied. Firefox has no such capability or extension that I'm aware of, but again I'll update this post when it does; suggested extension name: ScriptBlock.
  6. Specifically disable the :visited CSS class: this could be handled by a general-purpose (future) extension like ScriptBlock, but you can get some benefit using Stanford SafeHistory: "offsite visited links [are] marked only if the browser's history database contains a record of the link being followed from the current site"; this means that a web site can't spy on your accesses to other web-site unless the page is in your history (which means that some spying is still possible, unless you keep deleting down your history). I suggest that a specific feature to disable :visited would be safer.
  7. Consider using an anonymizing proxy: There are several implementations of anonymising proxies such as Tor; one downside is that some sites block such proxies because of potential abuses (e.g. spamming via mail, blogs or wikis).
  8. Use Internet Explorer and Firefox: This is simple but powerful technique that I came up with; if you access different sites using different browsers, you have multiple independent caches and sets of cookies; for example, you could use Google search and your Blogger blog via Firefox but access other Google services like Mail and Calendar via Internet Explorer. While using the two browsers, you effectively have two independent identities (athough sites can still track your IP address, and multi-sites like Google can cooperate behind the scenes to "merge" the identity information into a common picture). This might seem difficult or onerous for you to do, however Firefox has extensi0ns like IE Tab and IE View which will do this automatically for you, providing you access the site originally within Firefox
So the good news is that there is a lot you can do to improve your privacy, but currently there is no "big privacy switch" you can use to just turn it on and forget about it.

Finally, be cautious about using web privacy or safety features from big companies unless you know how they work. Google's Toolbar includes an anti-phishing "safebrowsing" feature that could send your personal or financial details in a visible way (cleartext) across the internet. I'm not sure I want Google to have this information, I certainly don't want anyone else to.

1 comment:

SpacerGuy said...

You're a mine of information. Interesting reporting on Googles lack of text security.