Very glad to see this here, hopefully it'll help with increasing adoption of the RFC.
I find one of the biggest pains of working with CSV is that all too often, someone gets Excel involved, and then Excel absolutely butchers the file. Off the top of my head: It strips leading 0's (even if quoted), converts big numbers to scientific notation, sometimes interprets non-date values as dates, and when saving, keeps the mangled values, doesn't preserve quotes, and uses regional settings -- to the point in some locales, it becomes a "semi-colon separated file"! Even just opening and then saving a .csv (without editing) can totally mangle it to the point it's unusable. Sometimes Excel can't even open a file saved by Excel without mangling it even more.
To be fair, the latest version of Excel has improved handling of CSV files immensely.
Whenever you have the option, use tab-delimited if it ever has to go into Excel reliably.
If I remember correctly, non-Anglo CSVs using a semi-colon as a delimiter is fully to blame on it.
Though CSV directly implies commas I find it helpful to consider it a synonym for “delimited text”. The specifics of the delimiter don’t bother me so much so long as the format is consistent.
Now, whether it would've been better to stick with the comma and just properly quote all the values for locales where the comma is a decimal separator and thus often used in spreadsheet columns is a moot point, the damage has been done.
Personally I just call it "character separated values" anyway. I quite like tab-separated values. Makes it okay to read quite often and I think it's the default output of Postgres' COPY command.
I wish Excel did support csv better, but we are in the .001% of Excel users who even know that it doesn't do a good job.
XLSX (Office Open XML) is full of safety valves that let Microsoft's existing products continue doing whatever poorly documented or undocumented stuff they were doing previously. Microsoft didn't want to have to go back to features which worked in Office already and either rip them out or re-implement them, and it didn't have documentation for those features that anybody else would be able to implement‡. So Office Open XML just says in those cases well here's a blob of data and good luck unless you're Microsoft Office.
This is tolerable for exporting to Excel. I can emit compliant Office Open XML that gets my numbers into XL reliably. So that's nice.
But when importing from Excel you're fighting that impedance mismatch. Rather than explain to users "Something about your document is incompatible and I swear it's Microsoft's fault" it's just better to say "Use CSV".
‡ e.g. suppose there's a line in Excel which defines a function FOO() by calling into some particular Windows DLL. Well that's not a useful thing to standardise. So do you call the relevant MS department and ask them to paste all their documentation for that DLL into your "spreadsheet" standard? No. You write "Implementation defined" and it becomes a black box.
It's one of those reasons why Microsoft tried to shoehorn their own trash Open document format that would fit them and let them keep backwards compatibility with stoneage versions of their office suite. https://www.computerweekly.com/news/2240225262/Microsoft-att...
It's like saying everyone should have adapted to internet explorer rather than following proper web standards. Even if XSLX was some kind of standard it's a bad idea because at the end of the day MS shouldn't be trusted with those.
Turns out it didn't handle them well. It didn't escape them or anything. Out spits this line in the csv format that has more columns than expected and everything from the that field onwards was off by one. Most teams were lucky, their code barfed when presented with a line with too many fields. Some teams were unlucky and it didn't. How unlucky _they_ were was somewhat varied based on how much data validation they did.
That caused a big headache for a lot of people trying to recover from the actions of the genius who thought doing that against the production endpoint vs the test endpoint.
That incident pretty much solidified my reputation in the team as some kind of weird voodoo type of chaos monkey, because my shifts _always_ had a significantly higher sev2 rate caused by things failing that I'd not touched or had any influence over.
To be fair, it had been running for a long time that way without anyone doing that to them.
I love the ones that present a REST api but still can't handle commas, because they have wrapped some ancient importer in a front end that simply writes the request data to a csv! I had to write one once, but I double quoted all fields and had to translate UTF-8 back to some other standard the old ERP used! It was junk, but I believe is still chugging away years later...
EDIT or absence thereof
 which is a python port of Google's libphonenumber: https://github.com/google/libphonenumber
There is also the non-maintaned version that the company spun out from:
Sounds like CleverCSV could determine and generate a schema automatically given a CSV file.
I wonder if for 90% of users this seems like a critical UI feature!