One of the main paradigms I’m interested in working with when writing new bits of code is the concept of ‘data driven’ logic. On the surface this seems simple enough, but for the uninitiated the premise is that one should be able to dictate the logic of a routine via the input data, while doing little (if any) actual coding.

For instance, I recently ran into a situation where I needed to manage a series of input and output files, moving them from point A to point B. Most of the time we just need to rename the files en-route, but occasionally other manipulation might be required such as encrypting/decrypting the files, filtering subsets of data out, or performing other tasks.

In an initial incarnation of a similar set of code, the author had a pair of mapping files — one for inbound data, one for outbound. In this file, we had a source file type and destination file type, as well as source and destination file names. The codebase was pretty marginal, and wasn’t scalable enough to handle my needs, so I went about building my own script. In the process, however, I opted to maintain the ’source type/source file’ and ‘destination type/destination file’ constructs, which allows me to know, each step of the way, what format the file should be in.

This also presented me with a need to identify whether or not any processing needed to be done between the source and destination. I could either evaluate the source and destination types and explicitly check to see if an additional routine needed to be executed, using something like this:

if( $file->{'intype'} eq 'PGP' && $file->{'outtype'} eq 'CSV' )
{
    &pgp2csv( $file );
}
elsif( $file->{'intype'} eq 'XML' && $file->{'outtype'} eq 'CSV' )
{
    &xml2csv( $file );
}

But, this approach means I have to modify my logic, as well as add a new function, for every file combination where post-processing needed to be done, which quickly gets messy. Right now this code will only handle two generic scenarios; in reality, I’ll need to handle a few dozen, with variations for each vendor. You can see where this gets to be nasty…

The data-driven approach, however, eliminates the need for the explicit logic entirely. Instead, all we need to do is make sure that the source and destination types in the configuration file are correct, and that we added a function to handle the conversion. Sure, we still have to write the conversion routine, but now the logic block simply looks like this:

my $conversion = lc( $file->{'intype'} ) . '2' . lc( $file->{'outtype'} );
if( defined( &$conversion ))
{
    &{\&$conversion}( $file );
}

$conversion is built using the source and destination file types, joined by the number 2, giving us strings like ‘xml2csv’ — and then we check to see if a function matching that name exists. If it does, and only if it does, then we call that function and pass it the $file hash (which is simply a parsed representation of the line from the configuration we’re currently working on).

This isn’t as much of a gain as other instances, but it is a clear example of how data itself can be used to control program flow and logic. It doesn’t free you from writing code, but it reduces the amount of code you have to (re)write, and moves some level of control into userspace, which is generally a good thing.

Of course, it could be even further abstracted by writing a set of generic functions for handling data translations (for instance, an encrypt and decrypt pair, a CSV generator, and XML handler, etc) and then using a second datafile to define which steps a specific translation goes through (and which parameters to pass to each step), thereby allowing the whole process to be managed via a configuration editor (maybe even a *gasp* GUI!) and even further minimizing the amount of code that has to be written, but this will be left as an exercise for the reader as it exceeds the scope of my little project here…

Cheers, and happy coding!