Code Rants: October 2016

One of my favorite subjects during college days was Parsing and compiler construction (remember The dragon book?). Ever since the first time I tried lex/yacc (flex/bison) it was a mysterious piece of software that I wanted to master using!

Zoom forward 10 years, a lot has changed in the field: LR(1) is passé, GLR and LL(1) are in-vogue (thanks to tools like ANTLR)

When I thought about venturing into writing a parser for the YANG modelling language (RFC 6020), I first did some study+survey, and I formed some new opinions about parsing:

- Don't roll out a RDP in desperation

- No separate tokenization phase (scanner-less)

- Not LR(1) - too limited

- Not ANTLR! (though its a fantastic tool - it leans a lot towards Java!)

- Not Bison GLR! (no proper documentation!)

- Use PEG (yet another learning opportunity :-))

- Not C! (more about it later)

PEG is cool, and with a love for regex PEG attracted me way too much :-) I tried many PEG based parser generators.

To start with I picked up Waxeye , though it has a choice of many targets, the C port is not optimal (author himself told, that more work's needed).

Then I tried peg/leg, though its very similar to lex/yacc in usage, I was finding it hard to debug/modify the grammar. Part of the problem could have also been the YANG ABNF grammar (as defined in the RFC). There are no reserved words in YANG. I tried peg/leg much more than Waxeye, but eventually I gave up once I realized that I was getting nowhere! So, C is gone - as I had only 2 options that had a C target!

Lua/LPeg: Though I'm familiar with Lua, this was my first tryst with LPeg. I had heard a lot of praise for Roberto's work, and I got to know why, quite soon. Within a few hours I was able to create a matcher (deliberately avoiding the word parser here) that could recognize 80% of the YANG files at my disposal! (had to tweak the matcher for the remaining 20%). This time, I chose a different approach, instead of using the YANG ABNF grammar, I created some simple relaxed rules.

Here is a brief description of the approach taken.

YANG statements can be either of the form:

1. identifier [value];

Or:

2. identifier [value] {

One or more of 1. (depending on identifier)

}

Note that value is optional. Since there are no reserved words, it leads to a simple matcher (~30lines!). Even though there are no reserved words, we have some keywords, and based on that we should be able to say whether the nesting is right or wrong – I call this as the validation phase. And, that too, can be simplified to triviality, if we keep the AST as a simple Lua array (table).

For e.g: A leaf can be under a container, but not vice-versa:

Valid:

container C1{

leaf l1 {

type string;

}

Invalid:

leaf l1 {

container C1 {

}

I keep a list of allowed-children for each type of node. Additional checks are performed with custom functions per node-type. And, since these children are arranged as arrays – we can also ensure order.

That’s it! :-) - take a look at: https://github.com/aniruddha-a/lyang

Comments/feedback welcome.

Code Rants

Monday, October 10, 2016

How (not) to write a parser

About Me

Blog Archive