Wednesday, October 30, 2013

The FizzBuzz of scripting

I have read a lot of articles on codinghorror, joelonsoftware and the like, about the FizzBuzz test, but somehow when I have to actually take a F2F interview, and the candidate is like 6-8+ years experienced, I'll have some hesitation to actually start with FizzBuzz. Instead, I start with a casual conversation about the past work s/he has done and see if I can probe more on any of the keywords from their resume. Recently, I had to take an interview of an engineer for QA role, and all that I was asked to check was How good is he in Perl, now, I don't want to get into a debate on Perl, its TIMTOWTDI philosophy etc. Somehow after the first few minutes of talking, this idea struck me to ask him about a variant of FizzBuzz - I will term it as The FizzBuzz of scripting, i.e,

Split an input [text] file on the following criteria:
if line matches foo write it out to /tmp/foo.out
if line matches bar write it out to /tmp/bar.out
if line matches foo and bar write it out to /tmp/foobar.out

The candidate started out like: "I can use multiple greps, and write a simple shell script..". Then I added a little more constraints: has to be fast, input is large, and I want a more
robust and maintainable script.
The initial Perl script he came up with, using regex was fine, then I wanted a little more tweaking, so I started asking about the points in code, where optimization was possible, at some point he gave up that no optimization was possible.

All I was looking for, was the simplest change - that you can compile the regex once and then match many-times over!

/.*mypattern$/ and print; # Does a compile and match

This will not be obvious to many programmers who haven't tried different flavors of regexes and in different languages, that, applying a regex is a two stage process:
  1. Compile
  2. Match
$re = qr/.*mypattern$/; # Compile..
/$re/ and print; # Match

Usually the compile is compute intensive, and if the pattern isn't going to change, then it’s best to compile once and store. Some languages clearly separate the steps, like

re.compile ()
..
re.match()

Of course this is how we would do it in C using POSIX regex as well
regcomp()
..
regexec()

The point I am trying to make, is that certain things are not obvious. Being polyglot'y helps, if not, a lot of reading. Maybe I am belabouring the obvious, but sometimes the obvious isn't too obvious! :)

Friday, June 21, 2013

A simple documentation generator

No, I don't want to create yet-another documentation-generator like Doxygen, there are lots of them, each with their specific purpose, and strengths. But mostly (at least AFAIK), all of them are intended to be embedded in the host language, and to document the APIs of the same language. I.e, if one is writing some C code, and wants to document the APIs of the same C code - all is well!. But I wanted something different (ah! as usual!).
I am embedding Lua into our application, and I am writing libraries to be used by Lua, in C. And, I want to document those APIs [of Lua], which I am building, by adding the usual doxygen-like comments in the source files.
I have 2 choices at this point, either I define some comment format/tags (like @something \something-else) and then extract them out of the source, then parse and generate the docs), Or ... use something which is well suited for data-description. Aha!, I'm anyway learning/using Lua, which has its roots in data-description, why not embed Lua code into my C comments?
Of course!
All I need is, a brief description of what the function does, its arguments and return types, at the
minimum, I can capture it, in Lua, like:
   mydoc {
       name = "str_len", -- function name
       args =  { string = 's' }, -- input arg and its type
       ret = { len = 'i' }, -- return val and its type
       brief = "return the length of input string",
   }

And a way to quickly fetch it from the source - make the comment begin with '*!' (or something similar)
/*
 *!   mydoc {
 *!       name = "str_len", -- function name
 *!       args =  { string = 's' }, -- input arg and its type
 *!       ret = { len = 'i' }, -- return val and its type
 *!       brief = "return the length of input string",
 *!   }
 */
And how to convert this to doc ?  simple!, just grep out these comments from the C source, and then define the mydoc function in Lua:
  function mydoc (t) -- excluding err handling and formatting
    if t.name then
        print (libname .. ":" .. t.name , "\n") -- libname also from source
    end
    if t.brief then
        print (" -- ", t.brief, "\n")
    end
    if t.args then
        print ("\tTakes " .. tablelength(t.args)  .. " arguments")
        describeargs(t.args) -- writes the values and their types
    else
        print "\tTakes no arguments\n"
    end
    if t.ret then
        print ("\tReturns " .. tablelength(t.ret)  .. " values")
        describeargs(t.ret)
    end
  end

Run all of this (the mydoc function, and the extracted blocks) together in Lua, and there you go! Now, generating text doc was so easy, how difficult is it to generate Markdown?  I added both, as options (to the bash wrapper, which extracts the comments and runs Lua on it) :-)

Tuesday, January 08, 2013

Presentations on a text editor!?

My teammate was making fun of me (for favoring all things text!) and joked that I should be making presentations/slides in text as well, and present with vim! :)
While this was a joke, I was pondering over the idea, that someone must have already tried this, given the plethora of scripts available for vim, and indeed I found one which works well- presen.vim.
The text that you'd write will be in Markdown, i.e, you can compose your usual structured text, with sections, sub-sections etc, and presen.vim will convert this into a presentation.
Combined with the DrawIt vim plugin (to put some neat diagrams, rectangles, I could get started off with making presentations on vim!), it works, and its cool :)
While this might save you from PowerPoint Hell, you might land up in PlainText Hell! :)

And BTW, Markdown in itself is cool, I've been composing all my text (README files etc) in Markdown these days. You can download the package from the author's website, and there are many Markdown editors available for windows.

--Edit: 9-May-2013--
I found TPP much better than the vim presentation!, its in ruby, and has some funky PowerPoint features - like text scroll from left/right/top etc :). Try it!

Sunday, December 16, 2012

Edit/open files quick


As developers, we edit lot of source files, and each has a specific extension
and mostly we either type them or let the shell do the auto-completion.
Most of the source files that I work on, follow this long_underscore_separated_names convention, and I hate to even press TAB TAB TAB...
When I know a file name, or am seeing it on console (with the previous ls), I want to be able to open a file (in the current directory) with as few keystrokes as possible.

I came up with a small Perl script, which, when combined with shell aliases
can be a real time saver (and save you from RSI as well! :) )
The same effect can be achieved with sed/awk or any other regex-well-supported
languages, but Perl is my favorite.

Approach

Given a file like: my_lisp_parser.c,  I want to be able to call it as
m l p or just mlp (first letters of the words, assuming file names separated with
underscore, or hyphen)
The extensions, lets not worry for now...
So, how do we go about?
The regex way...
build a regex from these characters and then do a quick look-up in the current dir:
like, m_l_p.
How to build this regex:
simple, split the input if its one word, or concat if its multiple words, but
before joining, add [^_-]*, that's the equivalent of our something above.

What to do in case of ambiguity, i.e, if we have a file like
my_lex_print.c (some hypothetical name, duh!)
then, using mlp will have a problem, so we should say
m le p
where le is the disambiguator.

So, now we can edit some files, with some extensions, what about those with
no extension ? usually I edit a lot of Makefile, that, I can handle as a
special case.

The extension is given as an argument to the script, but while using this
script, we do not invoke it directly, but with shell aliases, therefore
typing the extension will not be necessary. I use bash BTW, so here are my bash aliases (for most commonly used file extensions)

alias vc='~/short_open.pl c'
alias vh='~/short_open.pl h'
alias vp='~/short_open.pl p[ly]'
alias vy='~/short_open.pl yml'
alias vd='~/short_open.pl diff'
alias vs='~/short_open.pl sh'
alias vl='~/short_open.pl l[ou][ga]'
alias vm='~/short_open.pl mk'
 

I can open a .c file like vc mlp, or a .h file like vh mlp

The full script is here:

$editor = shift;
$ext = shift;

if ( scalar(@ARGV) == 1 ) {
    $in = shift;
    if ((length($in) == 1) and ($in eq 'm')) {
        system("$editor [Mm]akefile");
        exit 0;
    }
    if ((length($in) == 1) and ($in eq 'R')) {
        system("$editor README");
        exit 0;
    }
    if ((length($in) == 1) and ($in eq 'w')) {
        system("$editor wscript");
        exit 0;
    }
    @p = split( //, $in ); # 1 arg - pick and split each char
}
else {
    @p = (@ARGV); # multi arg - use them as is
}
$res = '^';
$res .= join( '[^_-]*[-_]', @p );
$res .= '[^_-]*[.]';
$res .= $ext . '$'
$re = qr/$res/;
foreach $a ( glob("*.$ext") ) {
    if ( $a =~ m/$re/ ) {
        print " [$a] \n";
        system ("$editor $a");
        exit 0;
    }
}

# If it were not really a long name with underscore, then try
# a plain short name (assume 1 arg)
if ( scalar(@ARGV) > 1 ) {
    exit 1;
}
$res = $in;  
$res .= '.*[.]'# deliberate unbounded match so that any part
$res .= $ext;     # of the file name can be input
$re = qr/$res/;
foreach $a ( glob("*.$ext") ) {
    if ( $a =~ m/$re/ ) {
        print " [$a] \n";
        system ("$editor $a");
        exit 0;
    }
}