The reason the O'Reilly book says that it is difficult to define CSV is because many software authors assumed that exporting to CSV is easy (like this article), so they implemented the finer points in slightly different, incompatible ways.
Then change the SEPARATOR variable in the included scripts accordingly.
Cells with carriage returns would indeed be a problem but as you mentioned, they're quite unlikely.
It's OK for a quick-and-dirty solution and a limited amount of data, but it's not much harder to write a solution that correctly handles all input using the Python library.
'scour' is a little exaggerated, don't you think?
Most data doesn't have commas in it, if you think it might, use a different delimiter.
> It's not much harder to write a solution that correctly handles all input using the Python library.
Sure, you can use Python, but I'm writing the article for Bash users - people getting into scripting for the first time - and the solution works reliably.
Edit: removed the '99% of cases' as provided there is at least one character that does not appear in your data (eg, %, #, etc) the solution simply works - and otherwise you have an extreme edge-case scenario.
Maybe a little, but it is an extra manual step you'd need to take each time you export CSV data, simply because the bash parser isn't equipped to handle it. And, as I said, you'd also need to check for quotation marks in the data -- Excel escapes them by doubling them up -- which the bash parser also won't handle as expected. These are not exactly uncommon characters in text data -- hardly extreme edge cases.
...Not that I mean to harp on the issue, but you did describe O'Reilly's statement that parsing CSV is "harder than it sounds" as "complete and utter horseshit". In light of the article's oversights, that strikes me as a little exaggerated. ;)
Having a quite around them is common, and shown inside the second example for this reason.
I think O'Reilly should provide a solution that works for their audience (primarily system administrators, judging from the books other contents) if they have a chapter on the topic. Preferably in Bash, as that's what the book is focused on.
Mentioning specific parsing tools built into Python would also also be of use (other chapters in the same book do pull out other languages if deemed necessary).
This chapter stands out as it exists in the TOC and has no code at all, when a very simple solution will handle common admin tasks easily.
edit: As a pedagogical exercise it's fine, but I wouldn't rely on it in practice.
No, it's a way of saying the solution does not work in extreme edge case scenarios, specifically where there is no single character that in usused in your content and therefore no delimiter available.