

The 5 lines that mystified O’Reilly - how to use a spreadsheet in bash - nailer
http://venturecake.com/the-5-lines-that-mystified-oreilly-how-to-use-a-spreadsheet-in-bash/

======
ibsulon
To parse a single CSV file with well defined rules is easy. However, different
spreadsheet vendors handle escaped characters and various issues differently.
[http://www.rtraction.com/blog/devit/excel-csv-to-mysql-
hell....](http://www.rtraction.com/blog/devit/excel-csv-to-mysql-hell.html)
speaks of one problem that can occur. If you're trying to give example code
that will work with a random CSV in the wild, it's going to be more than five
lines. I'm guessing such a discussion was probably edited out as non-germane.

------
jcl
Heh... Looks like he didn't test against a spreadsheet that contains a value
with a comma in it. And cells with carriage returns, although rarer, would
also parse incorrectly. The O'Reilly book is correct that the Python/Perl
library would cope better.

The reason the O'Reilly book says that it is difficult to define CSV is
because many software authors assumed that exporting to CSV is easy (like this
article), so they implemented the finer points in slightly different,
incompatible ways.

~~~
nailer
The simplest way to use a spreadsheet with a value with commas, is to pick a
different delimiter - eg, '%' or ':'.

Then change the SEPARATOR variable in the included scripts accordingly.

Cells with carriage returns would indeed be a problem but as you mentioned,
they're quite unlikely.

~~~
jcl
I can see how picking a different delimiter character is a simple solution,
but it's not practical... The point of CSV is that it is easy to export
directly from a spreadsheet program, so that you can handle large amounts of
data automatically. If you have to scour your spreadsheet each time to make
sure your chosen delimiter is not present in the data, this defeats the
purpose of an automated approach. (And some spreadsheet programs will not let
you choose the export delimiter. And your code will still not handle, say, a
cell with a quotation mark in it.)

It's OK for a quick-and-dirty solution and a limited amount of data, but it's
not much harder to write a solution that correctly handles all input using the
Python library.

~~~
nailer
> If you have to scour your spreadsheet each time to make sure your chosen
> delimiter is not present in the data, this defeats the purpose of an
> automated approach.

'scour' is a little exaggerated, don't you think?

Most data doesn't have commas in it, if you think it might, use a different
delimiter.

> It's not much harder to write a solution that correctly handles all input
> using the Python library.

Sure, you can use Python, but I'm writing the article for Bash users - people
getting into scripting for the first time - and the solution works reliably.

Edit: removed the '99% of cases' as provided there is at least one character
that does not appear in your data (eg, %, #, etc) the solution simply works -
and otherwise you have an extreme edge-case scenario.

~~~
scott_s
"The solution works reliably for 99% of cases" is another way of saying the
solution is not reliable. It's the 1% cases that make the problem harder, and
that's what the author was referring to.

edit: As a pedagogical exercise it's fine, but I wouldn't rely on it in
practice.

~~~
nailer
> "The solution works reliably for 99% of cases" is another way of saying the
> solution is not reliable.

No, it's a way of saying the solution does not work in extreme edge case
scenarios, specifically where there is no single character that in usused in
your content and therefore no delimiter available.

~~~
scott_s
Except I think that situation is more likely than you think.

