[Updated June 7 2009]
Easy reading is hard writing. This is true of software, not just prose.
I work really hard at writing simple, elegant, easy-to-read code.
And the details really matter…
I was discussing just this point with my friend David, who teaches numerical computing and software engineering, and we concluded that most students are rarely exposed to well-written software, and aren’t required to write well-written software. Then they graduate into a world where the speed of software development trumps all: fastest coder wins, every time, readability be damned.
But there is a false dichotomy here: literate programming can save time in the long run for anyone willing to invest the time to learn some simple principles.
What follows is a bit of rant.
Structural Inconsistency
Consider the following:
1 2 3 4 5 6 7 | enum { multitex = (1 < 0), // Use multiple textures no_normals = (1 < 1), // Do not generate normals make_test = (1 < 2), // Generate a constraint test rig open_out = (1 < 3), // Door opening outward open_in = (1 < 4), // Door opening inward }; |
The stumbling point is this: the options list is
do
do not
do
do
do
That is, one of these options is a negative test. Which one? Don’t remember, have to look it up.
And sure enough, examining the code, the if-else block for no_normals could have easily been reversed and tested against something like use_normals.
Tip: When structural inconsistency cannot be avoided, group like items together, and place the negative conditions (which are harder for the human brain to process) at the end of the structure.
Like this:
do
do
do
do
do not
Placing the “do not” at the end requires the reader’s mind to only shift gears once, not twice going from “do” to “do not” then back to “do.”
Semantic contradiction
Here’s an excellent example paraphrased from some code I am recently read, in an otherwise extremely well-written python-based web application:
1 2 3 4 | class Person(db.Model): name = db.StringProperty(default = '') hunger = db.IntegerProperty(default = 4) # 0 = hungry, 10= full loneliness = db.IntegerProperty(default = 4) # 0 = lonely, 10=opposite of lonely |
Ok, so what's the problem...?
Well, think carefully about this statement indicating maximum hunger: hunger = 0! Or this one indicating maximum loneliness: loneliness = 0. It gets even tougher, as the code uses predefined variables MAXIMUM_HUNGER and MAXIMUM_LONELINESS, which, respectively, mean... not hungry and not lonely.
What's happening is that the state of the variable that describes the state of the cloud uses a contradictory meaning. The variable describing the cloud can take arbitrary values, defined at the programmers whim. But in this case, the state of the variable doesn't match in meaning. When the variable has a high value, the cloud has a low value.
A general principle here is to match the meanings of both the variables and the object states. This saves digging through one or more source files to figure out how the variable state is mapped to the object state.
Success... or Fail?
I once worked on a project using a custom language, where functions would return either "success" (a noun), or "fail" (a verb). That's just wack to me. Either of the following has a nicer ring:
Succeed/Fail: verbSuccess/Failure: noun
The language itself, overall, was actually very nice, the language designer more brilliant than not. But it's still jarring. I try to avoid mixing parts of speech like this.
Does literate programming really matter?
It does to me. Poorly written code costs me money because it takes more time to understand. Code is hard enough to read without dealing with arbitrary logical inconsistencies in the structure. Like an "accounting irregularity," such structural inconsistency always give me a vaguely queasy feeling: "What's really going on here?" or "Why did the programmer do something different for apparently no reason?"
See, we use programming languages because we can't just speak to computers and have them do what we want. For a lot of programmers, if they spoke the way they programmed, people would rightly regard them as illiterate!
Most programmers I know would be better served by some serious study of Strunk and White's Elements of Style than brushing up on the latest fad in esoteric syntax.
There's more: not only can most programmer's not write, most don't even read that well.
Here's proof: every "word processing" application I have ever used, which includes Microsoft Word, WordPerfect, OpenOffice, Google Documents, and a host of Javascript-based web editors default to small type, single spacing, small margins and lines with long character counts. Long lines with lots of characters are hard to read. Worse, it's impossible to speed read when your eyes have to scan back and forth across a page, instead of taking in lines and paragraphs by block from top to bottom.
(My hunch is that the formatting model used by these programs is the programmer's last college term paper. Which at 1 inch margins was readable at arm's length in 12 point type double line spacing.)
Contrast these programs with the default settings for LaTeX typesetting, which produces outstandingly readable text, very similar to the WordPress Kubrick theme.
Basically, programmers have given us tools to write unreadable documents: it's not possible to speed read a document when there are more than about 60 characters per line.
Contrast this to the explosive growth of WordPress... obviously designed by readers... where nearly all themes limit character count to around 60 per line.
Two little principles for coding
These two little principles will help you write much more understandable programs, in any language:
Structural consistency: Don't write logically related blocks of code where "One of these things isn't like the others."
Semantic agreement: Make variables and functions behave intuitively, the way common sense would suggest. The meaning of something like hunger = 0 should be intuitively obvious. An added bonus of semantic agreement is reducing the need for documentation: the code means what it says.









{ 1 trackback }
{ 1 comment… read it below or add one }
I agree this do – do not – do – do – do pattern is probably poorly thought-out in this particular case, but sometimes it’s not so obvious.
Say you need to implement a command-line option for an imaginary program, that enables or disables Nagle’s algorithm on sockets owned by that program. How would you name it?
Would you call it “enable TCP_NODELAY”, which disables Nagle (which is enabled by default) (and wouldn’t that be confusing?)
Would you call it “disable Nagle’s algorithm”, which takes the negative route, and thus would clash with the other options?