28 September 2006

Enough with the Mork jokes already

I recently had occasion to want to parse Firefox's history file and uncovered a lot of history surrounding the file format, called Mork. While some of the design decisions leading to this format are questionable (to say the least), I have to say I'm appalled at the number of folks who have jumped on the band-wagon of slagging off the developer of Mork, David McCusker.

This started with folks like Jamie Zawinski who referred to the developer in a published work as a "complete barking lunatic". You probably don't know who Zawinski is but he was at one time a respected Netscape engineer.

Unfortunately, when Zawinski started this, it became open season on Mork and it's luckless developer. Everyone who wanted to do something with the format found Zawinski's article and chimed in with their own insults. What's worse is that Zawinski exposed a series of private e-mails with the developer of Mork, to what purpose I can't fathom as they offer no useful guidance that was concrete enough to influence the actual file format away from its flaws. All of Zawinski's relevant critiques (and maybe even libelous criticisms) were written after the format was implemented (in spite of his quote "The awful thing about getting it right the first time is that nobody realizes how hard it was').

Perhaps the worst part is the hypocrisy of it all: the Mork format has been in use since 1999 by the Mozilla/Firefox history and address book. If it was so bad, why hasn't it been re-written by now?

So what do I think of Mork? I agree with several of Zawinski's technical comments, but none of his personal ones. But having looked myself at the format in some detail, I wonder where the complexity is supposed to be? It is basically a serialized series of dictionaries (hash-maps) where a dictionary can represent either the meta-data (database columns), a data-table, or a transaction (an update to a data-table that is appended to the file, to allow for rapid writing). The only actually awful thing about it is the encoding of wide-chars.

But the worst part about Mork for me has little to do with the format:
  1. the only open-source Java code that effectively parses the format is derived from Python code that is in turn derived from Perl code that made heavy use of regular expressions.
  2. this algorithm takes 27 seconds to parse a 1.4Mb history file, something that the Mork implementation does in under a second.

Now the same file converted to CSV and preserving all attributes can also be parsed by Java in less than 1 second! But perhaps the most striking thing is that the original Perl code with the algorithm that runs so slowly was written by... Jamie Zawinski.

To David McCusker, I hope that you're laughing at all this debate about something you developed years ago. Maybe once in a while you scratch your head about a couple of aspects of the resulting format which may have been a result of too-rapid coding rather than exceptionally poor design as has been assumed by many.

And to the hordes of developers who have heaped insults on David - shame on you. I wonder how you would feel to have your worst code exposed in a widely used application, and then mocked by all and sundry? Open-source means that you have access to the developers, but that is a privilege, not an open opportunity to tear someone's reputation to pieces.

By all means have a go at Microsoft (for example) as a large corporate entity that has perpetrated some awful file formats (try parsing the .lnk file format or god forbid, a Word document, but in contrast see how easy it is to work with their .url file format) and some very poor performance-related decisions; but leave individuals alone, in part because the responsibility is rarely just on one person but also because a developer is a person. You know - with feelings and stuff. No, really.

1 comment:

SpacerGuy said...

I think its fair to say when a person has fallen off their ivory tower, it can be hard watching other new rising stars of the Internet World advancing and shining brightly above you. Wikipedia quotes J. Zawinski " the single most braindamaged file formate..... you can read the rest under Mork. If this is true, Anyone who has a sister or brother, son or daughter with braindamage, would find that particular quote quite wounding to say the least. The author of that quote does not convince me of his assessment of Mork at all, not even in the slightest.