Hacker News new | past | comments | ask | show | jobs | submit login

> Well what would be a more accurate title? "CSV format should only be for external interchange or archival; columnar formats like Parquet or Arrow better for performance"?

Something more boring, like "Consider whether other options make more sense for your data exports than CSV". Plenty of people have suggested other good options in comments on this submission, such as for example sqlite. I think the post comes off as if I'm trying to sell a particular file format for all uses cases, when what I had in mind when writing it was to discourage using CSV as a default. CSV has a place, certainly, but it offloads a lot of complexity on the people who are going to consume the data, in particular, they need to figure out how to interpret it. This can't necessarily be done by opening the file in an editor and looking at it, beyond a certain size you're going to need programming or great tools to inspect it anyway.

I was given an initial export of ~100 poor quality CSV files totaling around 3TB (~5-6 different tables, ~50 columns in each) in size a few years back, and had to automate ingestion of those and future exports. We could've saved a lot of work if the source was able to export data in a friendlier format. It happened more than once during that project that we were sent CSVs or Excel sheets that had mangled data, such as zip codes or phone numbers with leading 0s removed. I think it is a good thing to inform people of these problems and encourage the use of formats that don't necessity guessing data types. :shrug:

> People are busy; instead of hinting "something more robust than CSV", mention the alternatives and show a comparison (load time/search time/compression ratio) summary graph. (Where is the knee of the curve?)

This might be an interesting thing to follow up later, but would require a lot more work.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: