Saturday 25 December 2010

Brevity leading to obfuscation

I ran into this article today. In 25 lines of code a guy created a python spelling checker.

http://norvig.com/spell-correct.html

Now its impressive to be able to do that but at the end of the 4,500 word explanation I realised that the code was obfuscation for the sake of brevity. He has named functions and variables with reasonable names, and I could guess initially at what some things were doing.

So what are the obvious readability problems with the code.

The explanation mentions edit distances, the code doesn't, it mentions edits1.
Even though I knew about edit distances and my initial thoughts were it must use them somehow, you really have to guess that edits1 is returning those strings with string edit distance of 1 from the original string.

The explanation uses Bayes theorem, something I am familiar with, the code doesn't mention probability or Bayes or anything of a statistical nature in any words.

The code has functions like train, but it doesn't say what its trying to produce or how its training it. Is it creating a list of probabilities? From the code you have to guess.

The code leaves no hints as to what data structures it is using other than maybe lists.

He ran into a bug or two but that's not a big deal, nearly all code has bugs. He also did it quite rapidly, so for that I salute him.

But as for maintainable by another programmer who might have had to pick it up, I seriously doubt it. This code requires 4500 words of comments to be maintainable.

No comments: