No problem!
A friend of mine had a problem, she had to merge two [huge] Excel workbooks, by matching names in one, with the names in another. Something like:
WorkBook1
+---------+---------+----------+----------+
| Name1 | field1 | field2 | ... |
+---------+---------+----------+----------+
| ... | | | |
+---------+---------+----------+----------+
| ... | | | |
+---------+---------+----------+----------+
WorkBook2
+---------+---------+----------+----------+
| field2_1| field2_2| Name2 | ... |
+---------+---------+----------+----------+
| ... | | | |
+---------+---------+----------+----------+
| ... | | | |
+---------+---------+----------+----------+
If they were Database tables, then we could have done something on the lines of:
SELECT * FROM WorkBook1,WorkBook2 WHERE Name1=Name2;
But, these are excel sheets, and worse yet, the names in WorkBook1 are in format:
"FirstName LastName"
and names in WorkBook2 are in format:
"LASTNAME, FIRSTNAME"
(i.e, uppercase, and with comma) Duh! and there will be many names with 3-4 words - imagine the permutations.
Excel experts could say this can be solved with some cool macros, or maybe VB scripts!, but I am old-school Unix text guy!. I think in terms of text filters only!
To solve the problem of name matching, Take this [hypothetical] name:
Shankaran Ganesh Krishnan
the permutations will be:
Shankaran Krishnan Ganesh
Krishnan Shankaran Ganesh
Krishnan Ganesh Shankaran
Shankaran Ganesh Krishnan
Ganesh Shankaran Krishnan
Ganesh Krishnan Shankaran
Some names can also contain the initials [with a period], like:
Ganesh K. Shankaran
So, how can we do the name matching ? ... for a moment I thought of using a permuter
and then saving all permutations (stupid!), but that's not required!
Lets say we do the following,
- Remove dots and commas
- Change to lowercase (trim spaces too)
- Sort the names by words
If you had "Shankaran, Ganesh Krishnan" in WorkBook1, and "GANESH, SHANKARAN KRISHNAN" in WorkBook2, then both will become: "ganesh krishnan shankaran"
Now, the only problem that remains, is to save the .xls as .csv, so that I can load it to Perl (Parse::CSV). Unfortunately, Excel doesn't have an option to save all the sheets in the workbook
at once to CSVs, I had to do that manually for each sheet and then merge. Other than that, its pretty straight forward.
If you are about to say: Show me teh codez!
here you go ...
What good is coding skills, if you cannot put it to use, at the right time, to help friends!? :-)
A friend of mine had a problem, she had to merge two [huge] Excel workbooks, by matching names in one, with the names in another. Something like:
WorkBook1
+---------+---------+----------+----------+
| Name1 | field1 | field2 | ... |
+---------+---------+----------+----------+
| ... | | | |
+---------+---------+----------+----------+
| ... | | | |
+---------+---------+----------+----------+
WorkBook2
+---------+---------+----------+----------+
| field2_1| field2_2| Name2 | ... |
+---------+---------+----------+----------+
| ... | | | |
+---------+---------+----------+----------+
| ... | | | |
+---------+---------+----------+----------+
If they were Database tables, then we could have done something on the lines of:
SELECT * FROM WorkBook1,WorkBook2 WHERE Name1=Name2;
But, these are excel sheets, and worse yet, the names in WorkBook1 are in format:
"FirstName LastName"
and names in WorkBook2 are in format:
"LASTNAME, FIRSTNAME"
(i.e, uppercase, and with comma) Duh! and there will be many names with 3-4 words - imagine the permutations.
Excel experts could say this can be solved with some cool macros, or maybe VB scripts!, but I am old-school Unix text guy!. I think in terms of text filters only!
Shankaran Ganesh Krishnan
the permutations will be:
Shankaran Krishnan Ganesh
Krishnan Shankaran Ganesh
Krishnan Ganesh Shankaran
Shankaran Ganesh Krishnan
Ganesh Shankaran Krishnan
Ganesh Krishnan Shankaran
Some names can also contain the initials [with a period], like:
Ganesh K. Shankaran
So, how can we do the name matching ? ... for a moment I thought of using a permuter
and then saving all permutations (stupid!), but that's not required!
Lets say we do the following,
- Remove dots and commas
- Change to lowercase (trim spaces too)
- Sort the names by words
If you had "Shankaran, Ganesh Krishnan" in WorkBook1, and "GANESH, SHANKARAN KRISHNAN" in WorkBook2, then both will become: "ganesh krishnan shankaran"
Now, the only problem that remains, is to save the .xls as .csv, so that I can load it to Perl (Parse::CSV). Unfortunately, Excel doesn't have an option to save all the sheets in the workbook
at once to CSVs, I had to do that manually for each sheet and then merge. Other than that, its pretty straight forward.
If you are about to say: Show me teh codez!
here you go ...
No comments:
Post a Comment