> CSV uses decimal representations of numeric data, which means you are getting ...

cbsmith · on Aug 19, 2021

> XML, JSON, and YAML all have this issue, too.

Yes. Though to their credit, some of those work with her numbers, which at least gets you 4 bits out of every 8 bits.

> And I know what you're about to argue, but JSON's datetime format is not in the spec. The common JSON datetime format is convention, not standard.

I'm not sure what argument you thought I was making, or why that comment is relevant.

> All you've shown is that CSV has the same limitations that XML, YAML, and JSON have, and those three formats specifically designed and intended for data serialization. Yes, the other formats do have other advantages, but they don't eliminate those three limitations, either.

I'm not sure what your mean by "eliminate", or why you think it matters that there are other formats with the same design trade offs.

> This is for data serialization, which means it's going to potentially be used with data systems that are wholly foreign separating great distances or great timespans. What data serialization format are you comparing CSV to? What do you think CSV is actually used for?

CSV is used for a variety of purposes. The context of the article is using it for data transfer.

The claim was that it was a compact format for data transfer, which is demonstrably not true.

> Are you arguing for straight binary? You know that CSV, XML, YAML and JSON all grew out of the reaction to how inscrutable both binary files and fixed width files were in the 80s and 90s, right?

I'm not sure what "straight binary" means to you. JSON is, for the most part, a binary encoding standard (just not a particularly good one).

You've got the heritage a bit wrong, as XML was not originally designed for data transfer at all. It was an attempt to simplify the SGML document markup language, and the data transfer aspects were subsequently grafted on. JSON & YAML have a slightly more complicated heritage, but neither was intended as a data transfer format. They've all been pressed in to service for that purpose, for a variety of reasons, that can charitably described as tactically advantageous but strategically flawed.

> Binary has all sorts of lovely problems you get to work with like endianness and some systems getting confused if they encounter a mid-file EOF.

I don't know how to break this to you, but text formats can have endianess (in fact, insanely UTF-8 does!), and systems being confused and whether they are at EOF as well.

> Yes, you do end up with a wasted space, but the file is in plain text and ZIP compression is a thing if that's actually a concern.

Wouldn't ZIP be a binary format, with all the problems and concerns you have with binary formats?

So to summarize what you are saying... "CSV is a compact format because you can compress it if you are concerned and all the space it wastes".

Would it be fair to say then that any binary format is a text format because you can convert the binary into a text representation of the data? ;-)

CRConrad · on Sept 2, 2021

> some of those work with her numbers

Whose?

> The claim was that it was a compact format for data transfer, which is demonstrably not true.

More compact than the realistic competitors (XML, JSON, YAML, etc).