Then you should be enlightened: Of the myriads of existing transfer or serializa...

josefx · on Sept 29, 2019

> JSON is still the only secure by default one (if you ignore the two later updates, which made it insecure),

And enough people put it into eval that several companies started prepending while(1); to their JSON messages. Don't blindly trust user input no matter what format it comes in.

> JSON is by far the easiest to parse (small and secure, no references),

We all know its definition fits on a business card, which is the reason we have at least seven different specifications[1] on the details omitted from the card.

[1]http://seriot.ch/parsing_json.php

minitech · on Oct 7, 2019

> And enough people put it into eval that several companies started prepending while(1); to their JSON messages.

while(1); existed to work around a browser vulnerability where the same-origin policy could be bypassed by including cross-origin JSON as a <script>. Nothing to do with eval.

jstimpfle · on Sept 29, 2019

> JSON is still the only secure by default one (if you ignore the two later updates, which made it

How can a data format be insecure?

> JSON is by far the easiest to parse (small and secure, no references),

It's relatively easy to parse (while not as easy as a sane well-specified version of CSV would be). But it also offers no structural integrity, which means that you need to augment the parsing code with terribly ugly validation code. (I don't really use JSON frequently, but the attempts at providing a principled validation framework on top of JSON that I've seen were... unsatisfying).

> In every application, any dependency to XML should be minimized, contained and preferably eliminated.

That's missing the point. It's not about JSON vs XML vs whatever. It's that application logic shouldn't be operating on transport data formats. And even if JSON has a simple representation as runtime data objects (in most scripting languages), that representation is pretty far from ideal compared to a representation tailored towards your specific application logic.

__MatrixMan__ · on Sept 29, 2019

> How can a data format be insecure?

If it tries to be too featureful, then it requires parsers to be very flexible. Malicious input can trick a too-flexible parser into doing things that the developer of whatever function called the parser probably didn't want to happen.

For example, YAML:

https://arp242.net/yaml-config.html

rurban · on Sept 29, 2019

I have this overview:

https://metacpan.org/pod/Cpanel::JSON::XS#SECURITY-CONSIDERA...

which just misses details on stack-overflows on overlarge nesting levels, or denial of service attacks on overlarge strings, arrays or maps.

Better formats which prepended sizes do have an advantage here, such as msgpack. But msgpack has no CRC or digest verification to detect missing or cut-off tails. JSON (its secure 1st RFC 4627) must be properly nested, it does not need this. From the 2nd RFC 7159 on it became insecure, and the 3rd RFC 8259 is merely a joke, as it didn't fix the known issues, only removed a harmless feature.

enbea · on Sept 29, 2019

Could you elaborate on how are the 2nd and 3rd versions insecure and a joke? I've reread them and see no issues with either. Basically apart from clarifications and fluff about limits, security, and interoperability the only differences in the JSON spec itself are allowing any value at the top level and requiring UTF-8 for cross-system exchange.

Since UTF-8 is the only sensible format for JSON it makes little sense to require UTF-16 and UTF-32 support. ( unless you have some special requirements on encoding, in which case you can just disregard that part and convert it on both ends yourself )

The only "issue" with non-object values I see is the one mentioned in the above link where naively concatenating JSON might lead to errors when you send two consecutive numbers but that's going to rarely happen so your system can just reject top-level numbers if it doesn't expect them. And even then the simple solution is to just add whitespace around it.

rurban · on Sept 30, 2019

The 2nd version made it insecure, as scalars are not delimited anymore, and MITM or version mismatches can change the value.

schmorp wrote this:

> For example, imagine you have two banks communicating, and on one side, the JSON coder gets upgraded. Two messages, such as 10 and 1000 might then be confused to mean 101000, something that couldn't happen in the original JSON, because neither of these messages would be valid JSON.

> If one side accepts these messages, then an upgrade in the coder on either side could result in this becoming exploitable.

The 3rd version was a joke, because the outstanding problems were not addressed at all, and removing BOM support for the 4 other encodings is just a joke. First, you cannot remove a feature once you explicitly allowed it, esp. since it's a minor and an almost unused one.

And remember that http://seriot.ch/parsing_json.php was already published then, and the most egregious spec omissions had been known for years already. such as undefined order of keys, or undefined if duplicate keys are allowed. Allowing unsorted keys is also a minor security risk, as it exposes the internal hash order, which can lead to hash seed calculation.

rurban · on Sept 29, 2019

> It's relatively easy to parse (while not as easy as a sane well-specified version of CSV would be).

Wrong. CSV is horrible to parse with its string quoting rules. JSON accepts only utf8 and is not misleaded by \, nor ",". Done both, JSON is much simpler and can represent nested structures, arrays or maps. CSV importers are mostly broken, and should be left to Excel folks only.

> It's that application logic shouldn't be operating on transport data formats

Nobody is talking about application logic here but you. JSON is one of the best transport data formats, and this library makes it much easier to encode/decode from/to JSON and C++.

We are not talking about javascript's hack to prefer JSON over full objects. Of course is an internal representation always better than an external, esp. such a simplified one. But for transports simple ones are preferred over complicated serialized ones, because then you can easily do MITM stack-overflows or abusing side effects on creating arbitrary objects.

chucksmash · on Sept 29, 2019

Exactly this.

If you didn't do a compilers class and you want a simple language to play around with for lexing/parsing, JSON works great. Here's the core of a probably correct JSON lexer, albeit a super inefficient one, that I whipped up in <110 lines of Python a few years ago out of curiosity[1].

By comparison, check out the state machine transitions at the heart of the cpython implementation of the csv module[2]. It's not really a fair comparison (my JSON lexer is written in Python and uses regexes, the csv parser is written in C and does not use regexes) but even ignoring how nicely the csv parser handles different csv dialects, I still find it strictly more complex.

[1]: https://github.com/chucksmash/jsonish/blob/master/jsonish/to...

[2]: https://github.com/python/cpython/blob/41c57b335330ff48af098...

jstimpfle · on Sept 29, 2019

You misread me as well. I'm not saying that CSV is a good format. It's not, because it is ill-specified.

All I'm saying is that flat database tuples are even easier to parse than JSON (which is nested, so requires a runtime stack). It was a total side note (in parentheses!).

My main argument is that JSON is a mess to validate.

jstimpfle · on Sept 29, 2019

>> It's relatively easy to parse (while not as easy as a sane well-specified version of CSV would be).

> Wrong. CSV is horrible to parse with its string quoting rules.

I made a reasonable attempt at preventing exactly this misunderstanding, but I guess if people want to misread you they will.

> Nobody is talking about application logic here but you.

The application logic (I'm including internal data representation here) is the meat of all efforts, so you should absolutely be considering it, instead of needlessly putting isolated efforts in optimization of side actors. Parsing JSON should IMHO be a total side concern, and using an oversized library like the one we're talking about fails to acknowledge that. Such a library is either a wasted effort (if its data types aren't used throughout the application), or else (if they are) use of the library leads to massive problems down the road.

haileys · on Sept 29, 2019

CSV as a format doesn't really exist. CSV is a family of similar but not always compatible data formats each with their own special rules and edge cases.

Note that the quote is talking about a well-specified CSV, not any CSV in general. A well-specified CSV would indeed be fairly easy to parse.

rusk · on Sept 29, 2019

> CSV as a format doesn't really exist.

RFC-4180

jstimpfle · on Sept 29, 2019

That was created after how many years of CSV in the wild? Nobody disagrees here that parsing CSV in practice is a horrible minefield with lots of manual adjustments.

rusk · on Sept 30, 2019

RFC-4180 is dated 2005 - so your statement that a standard "doesn't exist" has been out of date for 14 years.

Yes of course there was no recognised standard before that. Just like before Greenwich Meantime there was no recognised standard for universal time coordination ...

jstimpfle · on Sept 30, 2019

It's not my statement, and also please let's not split hairs but look at the actual situation in practice. (Also, RFC-4180 sucks. It only codifies a subset of existing - bad - practice).

rusk · on Sept 30, 2019

Apologies for the misattribution.

The situation in practice is that when people want a “standard” way to do CSV, there is in fact a standard they can use, that does cover most sensible things you’ll want to do with CSV, and addresses the most common corner cases (eg delimeter in field) in a fairly sensible way.

You are free yet to make whatever proprietary extensions or otherwise, at the risk of losing compatibility just as you are with any other standard.

legulere · on Sept 29, 2019

> How can a data format be insecure?

XML external entities allow for arbitrary file inclusion: https://en.wikipedia.org/wiki/XML_external_entity_attack

You can make a badly configured XML parser allocate memory until it crashes: https://en.wikipedia.org/wiki/Billion_laughs_attack

You cannot host user-generated xmls on a domain without making yourself vulnerable to cross site scripting attacks. Browsers will happily execute any javascript you include: https://stackoverflow.com/questions/384639/how-to-include-ja...

taffer · on Sept 29, 2019

> And even if JSON has a simple representation as runtime data objects (in most scripting languages), that representation is pretty far from ideal compared to a representation tailored towards your specific application logic.

Often you have control over the sending and receiving parts of an API and can design the transport format to fit your application logic.

For example, I am currently using a GraphQL API and the data structures I get from this API are exactly what I need in my application.

kyberias · on Sept 29, 2019

> In every application, any dependency to XML should be minimized, contained and preferably eliminated.

Of course, it's not limited to JSON. For applications, it should be a tiny insignificant detail what format is used for serialization. Be it JSON or some superior format, such as XML.