> JSON was an explicit design, and it's much better than CSV. Interesting. Is it...

lelanthran · 2024-04-22T13:42:54 1713793374

The text parsing[1] is not really relevant, I feel. It's what comes after parsing that's a problem.

With CSV you have to guess at what the delimiter is, you have to guess if the first row are column names, you have to guess what each element is (string, date, integer, etc), you have to guess how escaped characters are represented.

The good news is that from a medium sized input, all of these would be pretty simple to guess while parsing each cell. The bad news is that, if you receive only a single row, you can't very well tell WTH the types are.

With JSON you at least get to read numbers as numbers and strings as strings (and special symbols as special symbols, like `null`).

The downside of JSON is that, like CSV, you still have to specify what the structure of the data is to a receipient, except that, unlike CSV, it's more difficult and prone to error.

[1] Although, parsing CSV is so simple you can frequently just write the parser yourself.

rusk · 2024-04-22T13:52:14 1713793934

The problem with CSV is it doesn’t specify encoding at the data layer. Somewhat counterintuitively since it has the word “comma” in its name.

No it’s more correctly thought of as a protocol for representing tabular structures as “delimited text”, but DTTF doesn’t have the same ring to it unfortunately.

This faffing around specifics makes CSV as a concept more flexible and “well defined enough” for its main user base, at the cost of simplicity and portability.

kazinator · 2024-04-22T15:55:21 1713801321

The CSV RFC is oriented toward CSV being a MIME type. The line separator in CSV is required to be CR-LF. This can occur in the middle of a quoted datum, and the spec doesn't say whether it represents an abstract newline character or those two literal bytes.

rusk · 2024-04-22T17:20:34 1713806434

My understanding was this would terminate the record unless enclosed by “encapsulators”, whereupon indeed it would be interpreted as literal text.

Though defined as CRLF in the RFC, presumably for interoperability, you are typically free to define alternative record separators, as well as field separators and encapsulators, and most modern implementations would be smart enough to work with this.

gopher_space · 2024-04-22T19:07:30 1713812850

> With CSV you have to guess at what the delimiter is, you have to guess if the first row are column names, you have to guess what each element is (string, date, integer, etc), you have to guess how escaped characters are represented.

I don't think I've been in a situation where I wasn't either writing the export I'm ingesting in CSV because everything else is in CSV, or automatically rectifying imports based on earlier discovery because I didn't have a choice.

IMHO the biggest problem with CSV is that the schema you need exists as a poorly converted and maintained Word doc in a "Beware of the Leopard" location.

samatman · 2024-04-22T18:59:38 1713812378

It's interesting to imagine a counterfactual world: in this world, "JSON" data is lousy with non-spec-compliant data. Single-quoted strings, bare keys, arrays with no commas, EBCDIC, everyone hates it, no one can get rid of it.

Meanwhile every CSV is pristine RFC 4180. You'd be laughed out of the room if you tried to fob off anything else as CSV.

There are reasons we're in this world and not that one, but the reasons aren't obvious.

bazoom42 · 2024-04-22T13:28:07 1713792487

CSV is not standardized, it might use comma or semicolon as seperator, there are different conventions for quoting and escaping, headers might be included or not.

JSON is also hierarchical which makes it appropriate for a wider range of data structures.

That said, I’m not sure there is any agreed upon standard for representing data tables in JSON, so it might not be any better for the CSV use case.

rusk · 2024-04-22T13:40:37 1713793237

I think most people hold RFC-4180 to be the CSV standard, even if it’s not officially a standard. It is sufficiently parametrised to suit all variants while at the same time providing a language and structure that is portable across all.

Don’t get hung up on the fact that RFC’s aren’t officially standards. It’s a publicly accessible standard which many people fall back on as a reference, which is more than you can say for a lot of ISO standards.

marcosdumay · 2024-04-22T18:16:50 1713809810

Oh, you can hold it all you want, popular implementations still break it.

SkyPuncher · 2024-04-22T20:54:38 1713819278

Exactly, the files my users submit are rarely spec compliant. I have to process them properly regardless of correctness.