Colm's Beta Blog: September 2006

28 September 2006

Enough with the Mork jokes already

I recently had occasion to want to parse Firefox's history file and uncovered a lot of history surrounding the file format, called Mork. While some of the design decisions leading to this format are questionable (to say the least), I have to say I'm appalled at the number of folks who have jumped on the band-wagon of slagging off the developer of Mork, David McCusker.

This started with folks like Jamie Zawinski who referred to the developer in a published work as a "complete barking lunatic". You probably don't know who Zawinski is but he was at one time a respected Netscape engineer.

Unfortunately, when Zawinski started this, it became open season on Mork and it's luckless developer. Everyone who wanted to do something with the format found Zawinski's article and chimed in with their own insults. What's worse is that Zawinski exposed a series of private e-mails with the developer of Mork, to what purpose I can't fathom as they offer no useful guidance that was concrete enough to influence the actual file format away from its flaws. All of Zawinski's relevant critiques (and maybe even libelous criticisms) were written after the format was implemented (in spite of his quote "The awful thing about getting it right the first time is that nobody realizes how hard it was').

Perhaps the worst part is the hypocrisy of it all: the Mork format has been in use since 1999 by the Mozilla/Firefox history and address book. If it was so bad, why hasn't it been re-written by now?

So what do I think of Mork? I agree with several of Zawinski's technical comments, but none of his personal ones. But having looked myself at the format in some detail, I wonder where the complexity is supposed to be? It is basically a serialized series of dictionaries (hash-maps) where a dictionary can represent either the meta-data (database columns), a data-table, or a transaction (an update to a data-table that is appended to the file, to allow for rapid writing). The only actually awful thing about it is the encoding of wide-chars.

But the worst part about Mork for me has little to do with the format:

the only open-source Java code that effectively parses the format is derived from Python code that is in turn derived from Perl code that made heavy use of regular expressions.
this algorithm takes 27 seconds to parse a 1.4Mb history file, something that the Mork implementation does in under a second.

Now the same file converted to CSV and preserving all attributes can also be parsed by Java in less than 1 second! But perhaps the most striking thing is that the original Perl code with the algorithm that runs so slowly was written by... Jamie Zawinski.

To David McCusker, I hope that you're laughing at all this debate about something you developed years ago. Maybe once in a while you scratch your head about a couple of aspects of the resulting format which may have been a result of too-rapid coding rather than exceptionally poor design as has been assumed by many.

And to the hordes of developers who have heaped insults on David - shame on you. I wonder how you would feel to have your worst code exposed in a widely used application, and then mocked by all and sundry? Open-source means that you have access to the developers, but that is a privilege, not an open opportunity to tear someone's reputation to pieces.

By all means have a go at Microsoft (for example) as a large corporate entity that has perpetrated some awful file formats (try parsing the .lnk file format or god forbid, a Word document, but in contrast see how easy it is to work with their .url file format) and some very poor performance-related decisions; but leave individuals alone, in part because the responsibility is rarely just on one person but also because a developer is a person. You know - with feelings and stuff. No, really.

23 September 2006

Kano Model and Product Innovation

I'm in a mind-mapping mood today - here's one about the Kano model and how it can guide the features of a product. Your product or service needs a unique recipe of all three types of features: basic (what I call duh or doh!), performance (good) or excitement (great, wow).

Gemba/customer visits (a contribution to Wikipedia)

I've decided that Wikipedia probably gets more readers than my blog (yes, really) so I should contribute there where possible. Here's my contribution to the article on Gemba visits. I learned this technique at Sun back in 1998; it was a real eye-opener to sit with open eyes and ears but closed mouth, watching end-users working with Sun products (on that occasion, it was Sun's deskop environment, CDE).

A Gemba visit is often simply called a customer visit. The hallmarks that make it uniquely useful are:

the purpose is firstly to observe, occasionally to question, rarely to guide or direct
the visit occurs in the context where the product or service is used, which allows direct observation of problems that arise, workarounds that are applied, and capabilities or services that are simply never used
sometimes the customer (or client or user) is asked to describe what they are doing while they are doing it; this provides insight into their thought processes as they work, which often reveals differences between their mental model and the model of the developers or providers of the product or service
the customer will often express wishes or needs while they are working in context that they would forget or suppress in a different context such as a structured interview or sales meeting

So what happens when you do a Gemba (customer) visit? My advice - prepare to be amazed. The people who develop products work in a closed environment; they share so much (maybe approximate age, education, interests, culture, not to mention information and work experiences) that their mental model of how a product is used is surprisingly out-of-whack with what a real live customer will do.

If for some reason you can't do Gemba visits (e.g. the product is used only in secure environments, or you simply don't know who your real customer is because your distribution channel obscures them), you can do something that may be more or less effective: a structured usability test. It is similar to a Gemba visit in that the customer is observed while using a product or service, but they are doing so in a slightly artificial environment:

they may be at your premises rather than their own
they are given specific tasks to complete rather than pursuing their own goals
there may be certain constraints, such as to try to complete a task in a given time or to do as many of a list of tasks as possible
they may be video-taped for later analysis by a wider group

So tell me - how do you go about researching how to improve your company's product or service?

20 September 2006

Big companies investing to deny science of Climate Change

It's a bit like living in the matrix; our reality is shaped by forces beyond our control and the truth is far grimier than we would like to admit.

The Guardian posts an extract from Heat by George Monbiot that highlights specific individuals like Frederick Seitz who have used pseudo-science and a highly selective quoting from sources (often obsolete or demonstrably inaccurate) to "demonstrate" that climate change is unfounded.

I posted earlier how Michael Crichton is likely to have been funded from similar sources as he apparently successfully convinced folks like Seth Godin that man-made climage change is a fiction. Unfortunately the truth is rather stranger than that...

19 September 2006

O'Reilly Code Search - A valuable developer's resource

It goes without saying that Google are not merely good but great, but that doesn't mean that they own "search" as O'Reilly have just gone on to prove.

The O'Reilly Code Search provides a means to search through their extensive catalogue. The results show the code in the context of the page on the book, and there is a link to the respective book's home page where you can buy the book or just download the examples first.

It supports attribute-based searching, so you can look for specific authors, ISBN numbers and titles, as well as the natural option to use code keywords or API names.

Of course this helps the sale of O'Reilly books, but it's also a real value-added service to their customers, current and future. My hat is off to O'Reilly both for the concept, the openness and it's execution.

18 September 2006

Web Privacy - How to get it, how it can be taken away, how we could get more of it

Scott McNealy said that "you have no privacy, get over it", but that shouldn't stop you from fighting to keep every bit of privacy you can get. In this post, we look at some old and new techniques that web sites can use to track your identity and some ways that you can protect your identity and use of the internet.

Cookies: It's well known that cookies can reduce your privacy by allowing a web site to track your identity across multiple visits, separated by minutes, days or years (Google cookies last until 2038, see here for issues with this and a workaround). In theory cookies are only accessible by the web-site that created them (so yahoo.com can't access a cookie for google.com), but some techniques (like Jookies and link-colour spying, see below) do allow one site to spy on your activity on another site.

IP Address monitoring: Every computer connected to the internet has a unique IP address. A web site can easily store this address and use it to track your activity. You get some privacy from the fact that:

your address may change over time (depending on your Internet provider and any DHCP settings), and
organisations almost always have an edge router or proxy, which is typically all that a web-site can see

Jookies: Cookies are not the only way to identify you; expect to see more web sites use the interesting technique described by Mukund (among others) and discussed recently on Slashdot. Normally javascript files are unchanging; the same file is served to all users. But by generating a Javascript file for each user, the web site can serve you a file that contains your identity. Any web page from that site simply has to use this Javascript to pass your identity back to the server. Since the script file is marked to remain in your cache for a long period, it will stay around like a cookie on your computer.

Link colour spying: Another method of spying is to use Javascript to query the colour of a hyperlink; because visited links are often displayed in a different colour to a non-visited one, a web-site can determine if you have opened a site previously.

Cache timestamp spy: A slightly more complex technique is to use the date-stamp on files in your cache. When a web site serves your browser a file, it can choose to give the file an expiry date. Then when your browser requests that file, it passes that value using the If-modified-since header property. If the web site constructs an arbitrary unique date (say differing only by a second) for each user, this value may be used to identify you.

Cache timing: Another way to spy on the cache is to use Javascript to load the files; a file that is in the cache will be retrieved much faster than the same file accessed over the web, so if the load time is measured, a web site can determine if you have visited the site that hosts the file. See this web timing article (PDF) for more details.

Cross-site cookies: Internet Explorer and Firefox both employ partial techniques to prevent a web-site from accssing cookies created by another web-site, but they each have different weaknesses:

IE only allows the web page's main site (the top-level frame domain) to set (create) a cookie, but that cookie may be read when any page accesses a file (e.g. an image, javascript or CSS file) from that site
Conversely, Firefox will let any site set a cookie, but wil only allow the web page's main site to read a cookie

Finally of course, you send an identity "signature" (http headers) to each web site you visit that contains the following information:

what media-types you accept (effectively, what plug-ins you have installed)
what language(s) you accept (e.g. en-US means English US)
(potentially, but few if any current browser do this) your mail address
what kind of browser ("user agent") you are running; for example, my browser reveals the following: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.7) Gecko/20060909 Firefox/1.5.0.7
Javascript running in your browser can also send your operating system and screen resolution to a web-site

So what else can browsers do to make it easier for users to maintain some level of privacy? Here are some thoughts (with examples focussing on Firefox because it is a platform that enables extensions to rapidly evolve or customise the browser to make it easier to use, safer and more powerful):

Block spurious cookies: From Martin Pool: "There may be more fine-grained controls, such as only accepting cookies from the same server as the top-level page currently viewed and not from servers for subsidiary requests such as images or frames.". In my view, this feature should be an option that is integrated directly into each browser's cookie control mechanism rather than provided via an extension.
Don't send the referer URL: when you access a web page, your browser sends the referrer URL to tell the site what page you clicked from; in the case of search engines, this will also pass the keywords you searched for. Although there are many legitimate uses for this information (web site maintainers and bloggers use this to find out what their readers were looking for so that they can try to provide more of that), it can also reduce privacy. If the browser allowed the user to not send the referrer URL for specific sites (e.g. http://www.google.com), this would immediately provide greater privacy. Extensions like RefControl (and others) for Firefox can do this for you.
Don't cache certain (types of) files from certain sites: if you know that a site is using caching techniques to identify you, you could stop caching files from that site. If you knew that it was just a Javascript file or a specific image file with a certain timestamp, you could refine it to just block those specific files from that site from the cache. Although there are many shortcuts to manually clear the cache, I'm aware of no Firefox extension that provides this precise feature, but I'll post an update here when one becomes available; my suggestion for a name: CacheBlock. In the meantime, you can get some benefit by using the Firefox AdBlock extension to block specific image files, but you have to figure out which ones are being used for timestamp spying; what is really needed is a web-based service that returns the URL patterns of such files that may appear on a given web-site. Alternatively you can use Stanford SafeCache which divides your cache by domain (the main web-site, e.g. google.com) so that a hosted file that is included on different web pages from different domains will be retrieved separately for each domain. This technique has the downside that it slows down some of your browsing experience and it uses your bandwidth to download a file multiple times. I suggest that an alternative approach would be to artificially delay retrievals from the cache where the main web page is different from the one that originally caused an included file to be cached.
Block certain data: most web sites don't need precise details of the browser you are running (because they generate at most 2 or 3 flavours of HTML); the user agent http header could be generalised to just provide Internet Explorer or Mozilla. Suggested extension name: PrivacyBlock.
Custom security for JavaScript: the Java programming langauge has very fine-grained security control that allows the user or installer of an application to define exactly what a program can and can't do; however most JavaScript runtimes support only two modes: on or off. It would be very useful if a browser came with a number of JavaScript profiles that the user could choose as a default and (where necessary) for an individual web-site. The profile would choose from a list of capabilities that are either granted or denied. Firefox has no such capability or extension that I'm aware of, but again I'll update this post when it does; suggested extension name: ScriptBlock.
Specifically disable the :visited CSS class: this could be handled by a general-purpose (future) extension like ScriptBlock, but you can get some benefit using Stanford SafeHistory: "offsite visited links [are] marked only if the browser's history database contains a record of the link being followed from the current site"; this means that a web site can't spy on your accesses to other web-site unless the page is in your history (which means that some spying is still possible, unless you keep deleting down your history). I suggest that a specific feature to disable :visited would be safer.
Consider using an anonymizing proxy: There are several implementations of anonymising proxies such as Tor; one downside is that some sites block such proxies because of potential abuses (e.g. spamming via mail, blogs or wikis).
Use Internet Explorer and Firefox: This is simple but powerful technique that I came up with; if you access different sites using different browsers, you have multiple independent caches and sets of cookies; for example, you could use Google search and your Blogger blog via Firefox but access other Google services like Mail and Calendar via Internet Explorer. While using the two browsers, you effectively have two independent identities (athough sites can still track your IP address, and multi-sites like Google can cooperate behind the scenes to "merge" the identity information into a common picture). This might seem difficult or onerous for you to do, however Firefox has extensi0ns like IE Tab and IE View which will do this automatically for you, providing you access the site originally within Firefox

So the good news is that there is a lot you can do to improve your privacy, but currently there is no "big privacy switch" you can use to just turn it on and forget about it.

Finally, be cautious about using web privacy or safety features from big companies unless you know how they work. Google's Toolbar includes an anti-phishing "safebrowsing" feature that could send your personal or financial details in a visible way (cleartext) across the internet. I'm not sure I want Google to have this information, I certainly don't want anyone else to.

15 September 2006

Internet comic art - for the cynic in all of us

We've still got Dilbert (but Dilbert blog is better). We've got a softer world, gapingvoid, toothpastefordinner, and more others than you can shake a stick at.

Now give a big hi to xkcd.

DRM - Bend over down under?

The ever fascinating Slashdot reports that Australia is enacting pro-DRM laws as part of it's Free Trade Agreement with the US. Mod-chipping hardware, multi-region DVDs, in fact any mechanism to disable technological limitations to limit distribution and use will be subject to severe penalties. This gives consumer devices and media similar protection to that afforded to software (which is equally subject to circumvention by using crackz and serialz that are available on sites of dubious repute).

Although many will complain, people's "rights" are invariably in contention with copyright. Scott McNealy said many years ago "you have no privacy - get over it"; equally I would say you have no rights to use a product, except as the license allows - get on with it.

Consumers have only one power - the ability to buy. If you don't like it, don't buy it - the way you spend speaks louder than your vote.

Windows Vista to create 50,000 new jobs in Europe

When News.com posted an article where IDC claim that Vista will create 50000 new jobs in Europe, it wasn't long before the wags on Slashdot chimed in. But many commenters there missed the audacity of it all; IDC claim that these will be new jobs.

Why is IDC's claim partly right?

1. More software. New versions of Windows and Office provide a new look-and-feel that makes older apps look dated (in the same way that the fashion industry does). This encourages both development and sales of other updated products, though unfortunately Windows has not improved technically to the same degree and many users report that features like XP's wizard-based search facility in Explorer is clumsier to use than Windows 2000.

Aside: compare this with Solaris and Linux which have new features that improve performance and reliability; Solaris in particular has innovated with capabilities like ZFS and Dtrace.

2. More hardware. Beta-testers of Vista have reported that it requires a significantly greater increase in hardware power than previous Windows upgrades. This could be regarded as a destructive event, possibly an example of the "broken window" phenomenon in economics (reported on Slashdot). This will slow down initial demand for the OS, but it will eventually lead to greater demand for upgrading desktops and notebooks capable of running it well, which will certainly help distributors and supporters of PCs and some components like memory and graphic cards.

3. More support. A new version of Windows stimulates demand for updated educational materials, books, training.

So ok, why might this be wrong?

1. It's still Windows. In some sense, Vista simply cannibalises the network of people and companies that work on previous versions of Windows.

2. It's not just Windows. The Microsoft near monopoly is declining - slowly. It was thought that Linux would draw users from Microsoft by providing an alternative desktop OS to Windows; though that hasn't occured as fast as some expected, it is happening. Apple Mac has seen a slight increase in sales due to "drag" from iPod - a hardware-devouring Windows release may make the Mac look not just attractive but value for money.

3. The online alternative. Finally, "Web 2.0" has spawned an array of increasingly rich online applications; while none are an Office replacement, they provide rich ways to communicate and coordinate. In tandem with a free portable suite like OpenOffice.org, you can have a rich desktop on (old) Windows XP or new Linux or Solaris. If you really want the look of Windows Vista, you'll be able to get some of that through theming plug-ins for Windows (and Java for that matter).

Microsoft Works won't impede Google's online application suite

Microsoft appear to be mulling over the option to release their Works "suite" online in some form. Works has languished for some time, in part due to the popularity of the full-blown Office suite but also because of competition from quality free products like OpenOffice.

Meanwhile Web 2.0 apps from several sources, and significantly the emerging online suite (Mail, Calendar, Spreadsheet, with Mail acting as a slim word-processor) from Google, are providing many of the features and most of the usabilty of Works. The only significant lack is a database and a rich editor with mail-merge capabilities; hardly beyond the capabilities or resources of Google.

So what are the possible forms that a Works re-release might take? That depends on how much Microsoft are willing to invest; ranging from nearly nothing to say 4 person years of work, here are some options:

free online download of Works, possibly supplemented by contextual ads (say based on the content of your document)
a pseudo-online solution where Works would be offered as an ActiveX plug-in within Internet Explorer; different Works app-lets would be loaded into different plug-in instances
Works with integration to Windows Live Spaces
Works Extra Lite, with some infrequently used but bulky features like spell-checking and file conversions off-loaded to a Microsoft server

But in the end, I think MS would be wasting their (and our) time with this. Works is just a low-end (even crippled) office suite compared to OpenOffice.org, and it will never be a Web 2.0 app that you can access anywhere.

On the other hand, if they decided to make OneNote freely available online, add a spreadsheet, simple database and blogging capability - now that would be interesting as it would offer a rich user experience and a different feature set.

There are a few ways this could play out:

MS could offer OneNote as I outline above
Google could decide to offer richer offline but web-integrated versions of their apps, with integrated Google (re)search facilities
OpenOffice.org could add journalling capabilities

There's even a fourth option - a dark horse using Java for rich offline capability and natural browser integration as an applet or via Web Start. That's what I've been wanting to develop for 10 years now; maybe this idea has found it's time.

Zune - MIcrosoft's iPod Beater?

Although pieces of the puzzle have been put floating around for weeks, Zune has finally come into focus.

Zune is a hard disk-based player with an 802.11b/g wireless network adaptor for downloading and sharing music files, an FM tuner and a 3 inch LCD for navigation and video playback.

Zune Marketplace will be Microsoft's music download service; you can either pay by the file or subscribe to get all you want.

When you share music files, it's the opposite of what happens in The Ring movie - the media dies 3 days after they watch it. That's because Zune will add a DRM time-to-live wrapper to copies of files you share.

On the upside, Zuneinsider reports that Zune will support your existing, unprotected, music library (MP3, AAC and WMA files), as well as video formats (WMV, H.264, MPEG4, 320x240), and JPG photo files.

Interestingly, Microsoft may offer a 'competitive migration' option - you will be able to download free copies of music that you previously downloaded from iTunes to your iPod.

So far, sounds good - but we don't yet know the price for this potential iPod beater, and users will be very critical if the hands-on user experience fails to live up to the high standard set by Apple. Also, many users are only interested in solid state players that are lighter and more robust than the HDD variants.

I'll certainly take a hands-on look when this little handful hits retail, but I wonder about one thing - why didn't Microsoft call it Xune to create a brand connection with it's other consumer entertainment device - the Xbox? And what capabilities will it have in connection with XP Media Center?

Beta Blogger

This is Colm on Blogger Beta, which offers tagging (labels) for posts and WYSIWYG template editing. While my current home on the web is still http://colmsmyth.blogspot.com, this gives me a chance to play with some new technology and try a different sort of blogging - brief, topical and universal rather than long and niche.

Colm's Beta Blog