Many a times, I see some text, which is not in any known format. But, it looks vaguely familiar, or simple enough to transform. The reason I would want to transform, is of course to work with it, to load it into my scripting environment, to analyze, consume or apply some complex programmatic logic to it.
Here, I give some examples, and show the conversions. This could help in recognizing
raw text, and transforming them to their closest [known] cousins.
Case 1
Lets start with something simple. If you see a file something like this:Alice:
sal=20000
age=23
role=engineer
Bob:
sal=21000
age=28
role=engineer
and you want to load this into your preferred programming environment (like a
Python dict, Lua table, or a Perl hash) to work with it. As it stands, it is not in a format that is directly usable!, but if we can make a small change in the data, say change the line containing colon in the end, to [line
[Alice]
sal=20000
age=23
role=engineer
[Bob]
sal=21000
age=28
role=engineer
Now, this is a valid .ini file format (popular in the Windows world). And, there are libraries
for most languages to load and work with INI files!
What you need, is a little Perl, or sed regex to convert from the former to the latter!. And
dont think about Jamie's popular quote and be afraid
good fit, but make sure you really understand regexes to weild one when needed)
Case 2
If you have seen some router configs (like JUNOS config), or some BibTeX entries, then the followingwill be faimilar:
interface {
eth0 {
ip4 10.1.1.2;
bia aa:11:22:11:00:11;
}
}
Again, this may not be directly loadable into your environment, but see this again, doesnt it look
close to JSON
As JSON:
{
"interface" : {
"eth0" : {
"ip4" : "10.1.1.2",
"bia" : "aa:11:22:11:00:11"
}
}
}
Or, Lua table:
interface = {
eth0 = {
ip4 = '10.1.1.2',
bia = 'aa:11:22:11:00:11'
}
}
again, both of these can be achieved with minimal changes.
Case 3
This might look very similar to Case 1, but observe the nesting and a richer data set![Alice]
sal=20000
age=23
role=[current=engineer;previous=DevOps,TAC]
[Bob]
sal=21000
age=28
role=[current=engineer;previous=]
Now, converting this to .ini doesn't seem to fit!, can we convert it to something else? say, I do this:
Alice:
sal: 20000
age: 23
role:
current: engineer
previous:
- DevOps
- TAC
Bob:
sal: 21000
age: 28
role:
current: engineer
previous:
Aha, now this is valid YAML!
for all languages to load and work with YAML.
Case 4
We all know CSVBut what if the data were like this:
a:b:c:"d"
or
a|b|"c"|d
isn't it simple to change the delimiter to comma (',')? so that, you can work with CSV libraries.
Bonus - if you have to send the data to a suit
Note: the regex should be careful enough to handle quoting! (that applies to all cases listed above)
To summarize, you don't need complicated parser to load text into your favorite language, to analyze it, or to apply programmatic transformations to it. All you need, is to recognize the format, and check which is the closest known format to which you can convert it to, so that you can conveniently work with it. The following table might make it easier to remember:
Text | Easily converted to |
Delimited (line oriented) | CSV |
Grouped, and simple key-value | INI |
Indented, multi level, with lists | YAML |
Brace nested, and key-value | JSON/Py-dict/Lua-table |
No comments:
Post a Comment