Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Oh gosh yes. Companies love their CSV exports! I think the reason is that it lets them make glue - they can take data from something, play with it, analyse it, produce graphs, using tools they understand (almost invariably Excel), and sometimes then import into another system.

Yes, my employer's customers like their CSV imports too. If you're looking at a big data entry feature and they ask you how the interface works and you say "you give it a CSV file with these columns", they're happy.



You're lucky. We always try to explain that CSV is a mostly-clean format while XLS and XLSX are a mess.

Yet, most of the time customers insist on XLS export, and we are happy if they at least acccept XLSX instead.


But CSV isn't a mostly clean format. It's a collection of dozens of formats, some of them clean, some of them not, and with no way of labeling which one of the dozens you have in the file itself. Whereas XLS and XLSX, for all their binary cruft, are at least capable of unambiguously (AFAIK) representing, say, a string with a quote and a newline in it. (If there are encoding issues, well, same with CSV.)

I suspect you're thinking of something like "CSV as outputted by Excel with the default settings", which may be clean and unambiguous, but is far from the only CSV in the world.

For that reason (and I'm just saying this generally, not targeted at the comment I'm replying to), always use the most mature CSV output library you can find for your stack. Never write your own CSV export code by bashing strings together unless you have no other choice because you're working in some stack that doesn't have or can't use existing CSV libraries.

(You can ignore the RFC for CSV too; it's not "wrong", it's just one particular flavor that was declared long after the horses had left the barn, learned to form a horse civilization, discovered the atom bomb and bombed themselves back to being plain horses again. It barely rises to the level of being "advisory" in practice.)


XLS and XLSX are neither clear nor unambiguous and I would take 12 CSV formats over either any day. I actually haven't ever met anyone who seriously had to extract data from XLS/X frequently that didn't revile it for its ambiguities. Not to mention the ways that the application munges the user input prior to saving it, leaving the biggest ambiguity of all, what the heck is ACTUALLY in that cell.

EDIT: I will add that EXPORTING to XLS/X is usually quite straightforward and makes users happier than getting a CSV often times. Until they start wanting to get crazy with template design and then you get into Excel styling application and you're back to hating it all over again.


The single biggest problem is date / time formats.

Excel encodes dates and times as floating point numbers. The date / time-ness of a cell is encoded in its format. Some of the formats are hard-coded, some float depending on locale, and some use explicit format-pictures embedded in the styles.

But of course there's no guarantee that the user has formatted a cell with a date as a date. Sometimes the user has pasted a US-format date into a copy of Excel running in a UK locale, so dates that fit have day and month swapped, while dates that don't fit show up as strings. This is non-trivial to handle in the general case. I've had to add customer-specific workarounds for this case in particular.


You've unleashed a flood of horrible memories of reading XLS and XLSX files with dates in. Thanks a bunch.

You'd think when Microsoft came up with their new shiny file formats they'd actually make them new and shiny and good.

But nope.


Actually, there are some commercial libraries that make working with even complex Excel spreadsheets pretty straightforward. I've used Aspose.Cells a lot and was very happy with it:

http://www.aspose.com/products/cells

[NB I have no connection with Aspose, just been a happy customer for many years].


Let me create a headache for you: there is no "CSV as created by Excel with the default settings". There are a lot of different ones for each language setting of Excel, and they are incompatible. So if you're going to work anywhere but the US or even have just some foreigners in the US as your users you're in for a total nightmare doing "Excel CSV"

The worst thing Microsoft ever did to Excel was even translating the formula functions in different languages. So SUM() isn't actually called SUM() in half of Europe...


Evil pro-tip : you can output a csv file and name it "foo.xls". Excel will open it fine, with just a click-thru complaint dialog that no one will read anyway.


Israeli bank websites export an HTML table and call the file "foo.xls", and it works quite well.


I don't talk about XLS or XLSX, I just say Excel and give them SpreadsheetML files.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: