Wednesday, October 30, 2013

The FizzBuzz of scripting

I have read a lot of articles on codinghorror, joelonsoftware and the like, about the FizzBuzz test, but somehow when I have to actually take a F2F interview, and the candidate is like 6-8+ years experienced, I'll have some hesitation to actually start with FizzBuzz. Instead, I start with a casual conversation about the past work s/he has done and see if I can probe more on any of the keywords from their resume. Recently, I had to take an interview of an engineer for QA role, and all that I was asked to check was How good is he in Perl, now, I don't want to get into a debate on Perl, its TIMTOWTDI philosophy etc. Somehow after the first few minutes of talking, this idea struck me to ask him about a variant of FizzBuzz - I will term it as The FizzBuzz of scripting, i.e,

Split an input [text] file on the following criteria:
if line matches foo write it out to /tmp/foo.out
if line matches bar write it out to /tmp/bar.out
if line matches foo and bar write it out to /tmp/foobar.out

The candidate started out like: "I can use multiple greps, and write a simple shell script..". Then I added a little more constraints: has to be fast, input is large, and I want a more
robust and maintainable script.
The initial Perl script he came up with, using regex was fine, then I wanted a little more tweaking, so I started asking about the points in code, where optimization was possible, at some point he gave up that no optimization was possible.

All I was looking for, was the simplest change - that you can compile the regex once and then match many-times over!

/.*mypattern$/ and print; # Does a compile and match

This will not be obvious to many programmers who haven't tried different flavors of regexes and in different languages, that, applying a regex is a two stage process:
  1. Compile
  2. Match
$re = qr/.*mypattern$/; # Compile..
/$re/ and print; # Match

Usually the compile is compute intensive, and if the pattern isn't going to change, then it’s best to compile once and store. Some languages clearly separate the steps, like

re.compile ()
..
re.match()

Of course this is how we would do it in C using POSIX regex as well
regcomp()
..
regexec()

The point I am trying to make, is that certain things are not obvious. Being polyglot'y helps, if not, a lot of reading. Maybe I am belabouring the obvious, but sometimes the obvious isn't too obvious! :)

No comments: