You left out the absolute worst bit: Excel tries to "helpfully" guess the correct data format for each cell in a CSV file. This can lead to silent data loss. Long strings of numbers are converted to a Numeric type that only stores a certain number of significant figures. The string "1-5" is converted to a Date field with the value "5-Jan"
Sure, it's a nice trick, but now it's not really a CSV file any more. If I have to create a file that can only be read by Excel it might as well be XLSX.
There's a slight difference between having to strip "=" from the column values for other applications to handle it, and require those other applications to be able to handle XLSX..
"Are the rules for Excel CSV parsing written down anywhere?"
They are loosely described in ECMA-376 section 18.8.30 under the "General" format code. But like with most details surrounding XLS and XLSX, the documentation is barely coherent and incorrect in some places.
Except that when users click on or download a csv-file, it Excel opens it automatically, bypassing any options and not even bothering to check if the contents of the csv file matches its defaults.
This is 90% of use cases with regular users.
For the other 10%, the interface for specifying the field types sucks, doesn't cover certain options and you can't set your preferences.
Imagine being a user and having to do that every day in order to import your daily whatever report, and having to start all over if you make one little mistake.
And that's just for the options Excel supports. Don't even get me started on all the other quirks. Ever tried to read a csv file who's first line starts with "ID"?
… and precisely how many users do you think know to do this? They might if it's happened before, been correctly diagnosed and explained before and it's obvious as soon as you open the file. It's more common, however, that the Excel bug only causes data loss or invalid results in a percentage of records and, when people notice it, it's corrected by hand.
If you need to do this a lot, generate xlsx - it's not that hard and there are decent libraries for most languages. Relying on users correctly setting column types every time is like making your daily commute depend on having airbags.
Any numbers with leading zeroes, i.e. 0000123, excel will interpret as the integer 123 even if you change the column format to "text". It's infuriating.
The only way I found to get around this was to open a new workbook, change the column type to "text" and then paste the data in. I believe this was excel 2010 on windows.
Indeed. I don't know what version of Excel the parent is using, but Excel for Mac 2010 handles csv files just fine, even ones that have weird delimiters. I'd love to see an example of a non-readable csv file.
Edit: I haven't tried different Unicodes yet, so granted that may not work.