About eight months ago on my blog, I said thanks to the person who sent me a copy of The Art of Unix Programming off of my Amazon Wishlist. Since then I've read the whole book cover to cover and thought overall it was worth the read.

One of the big points that stuck in my mind, for whatever reason, was data-driven programming. When I read it, something just clicked and it made perfect sense. Over and over in the last few months, I've found myself applying this style quite a bit. At work, I've written a handful of Perl scripts to help with some tasks, and many of these have been quite data-oriented. In my free time, I've hacked quite a bit at my Onkyo receiver control program, and I thought I would point out a few examples there of how data-driven programming can make things a lot easier.

This commit sums up a lot of the changes I made to the command parsing just a few days ago in moving from a procedural to a data-driven style of programming. The code deals with parsing listening modes and inputs from the user—something like the "dvd" portion from "input dvd" where we have already determined this is an input command. Rather than having conditional after conditional like so:

if(strcmp(dup, "DVR") == 0 || strcmp(dup, "VCR") == 0)
  ret = cmd_attempt(prefix, "00");
else if(strcmp(dup, "CABLE") == 0 || strcmp(dup, "SAT") == 0)
  ret = cmd_attempt(prefix, "01");
else if(strcmp(dup, "TV") == 0)
  ret = cmd_attempt(prefix, "02");
...

We can do something more like the following:

static const char * const inputs[][2] = {
  { "DVR",       "00" },
  { "VCR",       "00" },
  { "CABLE",     "01" },
  { "SAT",       "01" },
  { "TV",        "02" },
  ...
};

/* compile-time constant */
loopsize = sizeof(inputs) / sizeof(*inputs);
for(i = 0; i < loopsize; i++) {
    if(strcmp(dup, inputs[i][0]) == 0) {
        ret = cmd_attempt(prefix, inputs[i][1]);
        break;
    }
}

The difference is pretty clear, and it makes it incredibly easy to add another input format and corresponding code to the list if necessary. This is a lot cleaner than adding yet another 2 lines with boilerplate strcmp() calls and other code.

The code snippet I highlighted above is relatively recent. However, I converted the receiver status codes to be data-driven some time ago. Doing this was a no-brainer. Although initially I only had 21 statuses in the array, the list has since swollen to 96. In addition, as I've added data, it has been incredibly easy to change the way it is used. Originally the status parsing was a simple strcmp() loop like the above code. It then moved to a linked-list data structure traversal where the comparison was done with precomputed hash values on the strings rather than repeated strcmp() calls. Since this is my one chance to spend way too much time eecking out little performance gains, it was worth it. :)

In the last few days, I moved away from the linked-list implementation to an array-based one (commit) as we know our size requirements ahead of time. This saves on malloc() calls and removes the overhead of a pointer for each status (those 384 bytes are important!).

struct status *statuses = status_list;
unsigned long hashval = hash_sdbm(sptr);
/* this depends on the {NULL, NULL} keypair at the end of the list */
while(statuses->hash != 0) {
    if(statuses->hash == hashval) {
        ret = strdup(statuses->value);
        break;
    }
    statuses++;
}
if(ret) {
    return(ret);
}

The most important thing to take home is that keeping the data separate from the algorithms is cleaner, more flexible, and carries little performance penalty (in my case, I was able to make things faster by separating them).

P.S. If it wasn't obvious enough, having just added code coloring to my blog gave me some incentive to write a post that would take advantage of it.