Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> It's relatively easy to parse (while not as easy as a sane well-specified version of CSV would be).

Wrong. CSV is horrible to parse with its string quoting rules. JSON accepts only utf8 and is not misleaded by \, nor ",". Done both, JSON is much simpler and can represent nested structures, arrays or maps. CSV importers are mostly broken, and should be left to Excel folks only.

> It's that application logic shouldn't be operating on transport data formats

Nobody is talking about application logic here but you. JSON is one of the best transport data formats, and this library makes it much easier to encode/decode from/to JSON and C++.

We are not talking about javascript's hack to prefer JSON over full objects. Of course is an internal representation always better than an external, esp. such a simplified one. But for transports simple ones are preferred over complicated serialized ones, because then you can easily do MITM stack-overflows or abusing side effects on creating arbitrary objects.



Exactly this.

If you didn't do a compilers class and you want a simple language to play around with for lexing/parsing, JSON works great. Here's the core of a probably correct JSON lexer, albeit a super inefficient one, that I whipped up in <110 lines of Python a few years ago out of curiosity[1].

By comparison, check out the state machine transitions at the heart of the cpython implementation of the csv module[2]. It's not really a fair comparison (my JSON lexer is written in Python and uses regexes, the csv parser is written in C and does not use regexes) but even ignoring how nicely the csv parser handles different csv dialects, I still find it strictly more complex.

[1]: https://github.com/chucksmash/jsonish/blob/master/jsonish/to...

[2]: https://github.com/python/cpython/blob/41c57b335330ff48af098...


You misread me as well. I'm not saying that CSV is a good format. It's not, because it is ill-specified.

All I'm saying is that flat database tuples are even easier to parse than JSON (which is nested, so requires a runtime stack). It was a total side note (in parentheses!).

My main argument is that JSON is a mess to validate.


>> It's relatively easy to parse (while not as easy as a sane well-specified version of CSV would be).

> Wrong. CSV is horrible to parse with its string quoting rules.

I made a reasonable attempt at preventing exactly this misunderstanding, but I guess if people want to misread you they will.

> Nobody is talking about application logic here but you.

The application logic (I'm including internal data representation here) is the meat of all efforts, so you should absolutely be considering it, instead of needlessly putting isolated efforts in optimization of side actors. Parsing JSON should IMHO be a total side concern, and using an oversized library like the one we're talking about fails to acknowledge that. Such a library is either a wasted effort (if its data types aren't used throughout the application), or else (if they are) use of the library leads to massive problems down the road.


CSV as a format doesn't really exist. CSV is a family of similar but not always compatible data formats each with their own special rules and edge cases.

Note that the quote is talking about a well-specified CSV, not any CSV in general. A well-specified CSV would indeed be fairly easy to parse.


> CSV as a format doesn't really exist.

RFC-4180


That was created after how many years of CSV in the wild? Nobody disagrees here that parsing CSV in practice is a horrible minefield with lots of manual adjustments.


RFC-4180 is dated 2005 - so your statement that a standard "doesn't exist" has been out of date for 14 years.

Yes of course there was no recognised standard before that. Just like before Greenwich Meantime there was no recognised standard for universal time coordination ...


It's not my statement, and also please let's not split hairs but look at the actual situation in practice. (Also, RFC-4180 sucks. It only codifies a subset of existing - bad - practice).


Apologies for the misattribution.

The situation in practice is that when people want a “standard” way to do CSV, there is in fact a standard they can use, that does cover most sensible things you’ll want to do with CSV, and addresses the most common corner cases (eg delimeter in field) in a fairly sensible way.

You are free yet to make whatever proprietary extensions or otherwise, at the risk of losing compatibility just as you are with any other standard.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: