
Parsing JSON is a Minefield - beefburger
http://seriot.ch/parsing_json.php
======
s_q_b
Well, first and most obviously, if you are thinking of rolling your own JSON
parser, stop and seek medical attention.

Secondly, assume that parsing your input will crash, so catch the error and
have your application fail gracefully.

This is the number one security issue I encounter in "security audited" PHP.
(The second being the "==" vs. "===" debacle that is PHP comparison.)

As one example, consider what happens when the code opens a session, sets the
session username, then parses some input JSON before the password is
evaluated. Crashing the script at the json_decode() fails with the session
open, so the attacker can log in as anyone.

Third, parsing everything is a minefield, including HTML. We as a community
invest a lot of collective effort in improving those parsers, but this article
does serve as a useful reminder of a lot of the infrastructure we take for
granted.

Takeaways: Don't parse JSON yourself, and don't let calls to the parsing
functions fail silently.

~~~
mikeash
A parser should never crash on bad input. If it does, that's a serious bug
that needs immediate attention, since that's at least a DoS vulnerability and
quite likely something that could result in remote code execution. You
definitely need to assume that the parser could _fail_ , but that's different.
Unless you're using "crash" in some way I'm not familiar with?

~~~
herge
What about raise an exception?

~~~
marxidad
Exceptions should only be used for exceptional cases. For a parser, bad input
should be expected.

~~~
esrauch
You say that as fact but you must know it is a matter of opinion: I would say
you should match the language idioms. For example, Python iterators and
generators work by raising/throwing when there are no more items: it is fully
expected and will always happen when you write a for-in loop.

~~~
iopq
That seems like bad design. Rust iterators return an Option<T>, for example.

~~~
andy_ppp
In Python a lot of flow control uses exceptions as it's cheaper to ask for
forgiveness rather than permission and then still have to deal with errors.

------
mi100hael
> In conclusion, JSON is not a data format you can rely on blindly.

That was definitely not my take-away from the article. More like "JSON is not
a data format you can rely on blindly if you are using an esoteric edge-case
and/or an alpha-stage parsing library." I haven't ever run into a single JSON
issue that wasn't due to my own fat fingers or trying to serialize data that
would have been better suited to something like BSON.

~~~
Tepix
You are not considering a hostile environment like the internet where an
attacker will use edge cases to get unforeseen results.

~~~
mi100hael
I couldn't really care less if someone POSTs some garbage JSON that results in
them getting a 500 response. Better than someone POSTing an XML bomb and
affecting other peoples' requests. Please enlighten me if you know of a
serialization format with libraries for all common languages that lacks any
gotchas or edge cases.

~~~
Someone
All software has edges, so edge cases are unavoidable.

The best you can do is:

\- interpret the spec to the letter.

\- for every fragment of a statement you write, consider whether it might
conceivably go wrong, and handle those cases (in the simplest matter because
'handling' means writing code, and that code, too, needs to go through this
process).

For example, a json parser must be prepared to handle missing values,
extremely long keys and values (integers may have thousands of digits, think
long about the question whether 64 bits always is enough for storing a string
length, etc.), etc.

\- if you are truly paranoid, have very stringent security requirements, or
expect to be heavily attacked, run the parser in a separate process.

\- fuzz your implementation.

~~~
gmfawcett
> think long about the question whether 64 bits always is enough for storing a
> string length, etc.), etc.

I'm struggling to think of any realistic scenario where this isn't true!

~~~
Someone
So do I, but you should still consciously decide whether to add an overflow
check.

Let's do a quick estimate: one can read in the order of 2^24 bytes/second from
disk. A day has on the order of 2^16 seconds, so that's 2^40 bytes/day.

=> You will need 2^24 days to read 2^64 bytes. I think that's around 50k
years. That an attacker will try to generate a buffer overflow this way is a
risk I would take, even if I thought the hardware had room to store that
string.

The only way I can foresee a real risk is when an optimizer can optimize away
the computation of a string whose length it is asked to compute by an
attacker.

That's still very much far-fetched, and if no string gets allocated it's hard
to see how it could become a security issue, but it could be a reason to be
extra careful, for example when providing online access to a C++ compiler,
with its template metaprogramming capabilities.

------
SloopJon
Figures that something like this would be posted on my day off. I put this
through a parser that I cover, and found that the only failures were for top-
level scalars, which we don't support, and for things we accept that we
shouldn't. I'll look through the latter tomorrow, as well as the optional "i_"
tests.

Test suites are a huge value add for a standard, so thank you, Nicolas, for
researching and creating this one. I was surprised that JSON_checker failed
some of the tests. I use its test suite too.

------
gcirino42
The correct answer to parsing JSON is... don't. We experimented last hackday
with building Netflix on TVs without using JSON serialization (Netflix is very
heavy on JSON payloads) by packing the bytes by hand to get a sense of how
much the "easy to read" abstraction was costing us, and the results were
staggering. On low end hardware, performance was visibly better, and data
access was lightening fast.

Michael Paulson, a member of the team, just gave a talk about how to use
flatbuffers to accomplish the same sort of thing ("JSOFF: A World Without
JSON"), linked in this thread:
[https://news.ycombinator.com/item?id=12799904](https://news.ycombinator.com/item?id=12799904)

~~~
dvt
Not sure what your point is (or the point of that presentation, for that
matter).

Of course there are binary serialization formats that are faster than XML or
JSON, and of course they're less error-prone. This has been known for about 40
years now.

JSON/XML are used precisely _because_ people want a human-readable interchange
format. For high-performance uses, consider Google's Protocol Buffers or
Boost::serialize. You're acting like you just hackathoned the biggest thing
since sliced bread, but that's exactly how payloads have been sent (until
high-bandwidth made us all lazy) since the inception of the Internet.

~~~
userbinator
From experience, I think the whole "human-readable" idea is a bit overrated.
All it means is that the format is entirely/mostly in ASCII. But if you have a
hex editor, like all good programmers should, binary formats are not any less
human-readable (or writable) nor more difficult to work with; and for some,
even a text editor with CP437 or some other distinctive SBCS will suffice
after a while. It's somewhat like learning a language; and if you are the one
developing the format, it's a language that you create.

Then again, I grew up working with computers at a time when writing entire
apps in Asm/machine language was pretty normal as well as other things which
would be considered horribly impossible by many developers of the newer
generation, and can mentally assemble/disassemble x86 to/from ASCII, so my
perspective may be skewed... just a tiny little bit. ;-)

~~~
Volt
But the phrase is "human-readable" and not "programmer-readable".

~~~
Whitestrake
A minor gripe with your comment, but as a programmer conceivably must be
human, both conditions are satisfied when a programmer is capable of reading
it.

------
DanielRibeiro
Wow! This was a great practical analysis of existing implementations, besides
a great technical overview of the spec(s). Thanks for open sourcing the
analysis code[1], and for the extended results[2]

[1]
[https://github.com/nst/JSONTestSuite](https://github.com/nst/JSONTestSuite)

[2] [http://seriot.ch/json/parsing.html](http://seriot.ch/json/parsing.html)

------
kstenerud
I did write my own parser, but for a reason: I need it to be able to recover
as much data as possible from a damaged, malformed, or incomplete file.

Turns out that a good chunk of these tests are for somewhat malformed, but not
impossible to reason about files. Extra commas, unescaped characters, leading
zeroes... I'd rather just accept those kinds of things rather than throw an
error in the user's face. It's a big bad world out there, and data is by
definition corrupt.

And this is borne out when I plug my parser into this test suite: Many, many
yellow results, which is exactly how I want it.

------
paulddraper
Lots of issues are trivially answered.

\---

> Scalars..In practice, many popular parsers do still implement RFC 4627 and
> won't parse lonely values.

Right. RFC 7159 expanded the definition of a JSON text.

> A JSON text is a serialized value. Note that certain previous specifications
> of JSON constrained a JSON text to be an object or an array.

If RFC 7159 wasn't different from 4627, there'd be no reason for 7159. Same
with RFC 1945 and 7230 for HTTP. (Of course, HTTP is versioned...maybe he just
means to repeat the earlier versioning criticism.)

\---

> it is unclear to me whether parsers are allowed to raise errors when they
> meet extreme values such 1e9999 or 0.0000000000000000000000000000001

And then quotes the relevant part of the RFC 7159 grammar with answers the
question:

> This specification allows implementations to set limits on the range and
> precision of numbers accepted. Since software that implements IEEE 754-2008
> binary64 (double precision) numbers [IEEE754] is generally available and
> widely used, good interoperability can be achieved by implementations that
> expect no more precision or range than these provide, in the sense that
> implementations will approximate JSON numbers within the expected precision.
> A JSON number such as 1E400 or 3.141592653589793238462643383279 may indicate
> potential interoperability problems, since it suggests that the software
> that created it expects receiving software to have greater capabilities for
> numeric magnitude and precision than is widely available.

Parsers may limit this however they like. And so may serializers. This
includes yielding errors. (Though approximating the nearest possible 64-bit
double is IMO the better choice.)

\---

So yeah, in the end there is fair amount of flexibility in standard JSON.

To summarize:

> An implementation may set limits on the size of texts that it accepts.

> An implementation may set limits on the maximum depth of nesting. [this one
> was never mentioned though]

> An implementation may set limits on the range and precision of numbers.

> An implementation may set limits on the length and character contents of
> strings.

Most implementations on 32-bit platforms will not parse 5GB JSON texts.

------
peatmoss
What ever happened with EDN (pronounced "eden") from the Clojure people?
[https://clojure.github.io/clojure/clojure.edn-
api.html](https://clojure.github.io/clojure/clojure.edn-api.html)
[https://github.com/edn-format/edn](https://github.com/edn-format/edn)

I always thought that seemed like a nice alternative data format to JSON.
Anyone using this it in the wild?

~~~
arohner
Clojure programmers use it everywhere. I suspect almost nobody else does
though.

~~~
tragic
Even if you're a clojure shop, you've got the issue that at your system
boundary, everything else the world accepts json and/or XML, and nothing
supports EDN/transit. So your data needs to be serialisable to one of those
anyway, at least if it crosses the boundary. There is such a thing as a
network effect, even in data serialisation formats...

------
dgreensp
An informative article. The point is not that parsing JSON is "hard" in any
sense of the word. It's that it's underspecified, which leads to parsers
disagreeing.

Although the syntax of JSON is simple and well-specced:

* The _semantics_ are not fully specified

* There are multiple specs (which is a problem even if they are 99% equivalent)

* Some of the specs are needlessly ambiguous in edge cases

* Some parsers are needlessly lenient or support extensions

------
Confusion
There was a great article at some point that explained why 'be liberal in what
you accept' is a very bad engineering practice in certain circumstances, such
as setting a standard, because it causes users to be confused and annoyed when
a value accepted by system A is subsequently not accepted by supposedly
compatible system B. Leading to pointless discussions about what the spec
'intended' and subtle incompatibility. Anyone know what article I mean?

~~~
webmaven
Perhaps "The Harmful Consequences of Postel's Maxim"?:

[https://news.ycombinator.com/item?id=9824638](https://news.ycombinator.com/item?id=9824638)

If not, perhaps one of these:

[http://programmingisterrible.com/post/42215715657/postels-
pr...](http://programmingisterrible.com/post/42215715657/postels-principle-is-
a-bad-idea)

[https://bitworking.org/news/There_are_no_exceptions_to_Poste...](https://bitworking.org/news/There_are_no_exceptions_to_Postel_s_Law_)

[http://trevorjim.com/postels-law-is-not-for-
you/](http://trevorjim.com/postels-law-is-not-for-you/)

~~~
Confusion
Yes, thank you, it was the one by 'programmingisterrible' and the linked paper
by Patterson, Sassaman, and Bratus.

------
mmagin
"NaN and Infinity"

Yeah. And I learned this the hard way with the Perl module JSON::XS. It
successfully encodes a Perl NaN, but its decoder will choke on that JSON.
(Reported it to the maintainer who insists that is consistent with the
documentation and wouldn't fix it)

~~~
TazeTSchnitzel
Similarly, Python's encoder violates the JSON specification by default, as it
produces `Infinity`, `NaN` and `-NaN`, which other JSON parsers choke on.

~~~
joesb
I don't get it. Why would unaware JSON parsers choke on `Infinitiy`, `NaN` or
`-Nan`? JSON has no concept of schema.

So if a parser sees "Inifinity", which it doesn't have any concept of, why
would it do anything except treating that as string of a word "Infinity"?

~~~
cdmckay
Because it's not quoted like a string, it's a literal

------
realkitkat
If JSON is comparable to minefield, then I guess XML and ASN.1 are nothing
short of nuclear Armageddon in complexity and ones ability to shoot themselves
into the leg ;-)

~~~
Impossible
Also, as someone who has written an XML parser, according to some of the
comments in this threads I'm way beyond medical help, and should give up on
life :).

~~~
lameexcuse333
Depends on when you did this crazy thing. Was it in the dark days before they
were "ubiquitous"? (for lack of a better word. also it's just a fun word)

Then you're a hero to those who made use of it.

If someone were to do that today, though, then yes, seek help.

;)

------
kowdermeister
I still love JSON regardless :) Client / server side languages have first
class support for serialization and in most cases the data structures are
rather easy.

I'd be very skeptical if one would suggest an alternative format for a web
based project, however I can imagine such situations.

------
indexerror
> In conclusion, JSON is not a data format you can rely on blindly.

What does HN suggest for configuration files (to be written by a human
essentially)?

I am looking at YAML and TOML. My experience with JSON based config files was
horrible.

~~~
robert_tweed
Funnily enough, as I've been experimenting with Chef and trying to stick to
JSON config files where allowed, I was again struck that (a) it's not a good
choice for config files (b) it's an OK choice though (c) lots of people are
using it anyway (d) nearly everyone that does so (including Chef) allows
comments, so in reality are not actually using JSON at all.

Point (d) is the important one. I really think we need a standard for json-
with-comments. JSONC or whatever, but it should have a different standard
filename and it should have an RFC dictating what is and isn't allowed.
Personally I would allow only // comments because there are too many subtle
issues with C-style comments, but it may be too late to agree on that.

Half the point of JSON is that if application A stores its data as JSON then
application B can parse that without any nasty surprises. Except, there are
now probably thousands of noncompliant implementations in the wild that only
exist because the standard doesn't allow comments. Each one of those standards
adds subtle differences (in addition to the comments themselves) depending
largely on how they remove the comments before passing to the standards-
compliant JSON parser (assuming they do that, which being DC's recommended
approach, is as close to a standard as currently exists).

~~~
pilif
_> (b) it's an OK choice though_

I really think it's not an OK choice. A config file format that doesn't allow
comments provides some of the worst possible UX.

One of the nice things about config files is that normally they are self-
documenting, explaining the meaning of the various directives and providing
possible values.

Without comments, you have to constantly switch between the documentation and
the config file.

Also, the restriction on trailing commas is another really bad issue for a
config file language as it pollutes diffs, makes moving lines around
needlessly difficult and is one more landmine waiting to happen for the
sysadmin editing a file.

No. JSON not at all OK as a config file language.

~~~
mSparks
How does json "not support comments"?

{"comment":"default values for this object"}

~~~
pilif
Since when is in-band signalling a good idea? What if one of your
configuration keys is named "comment"?

~~~
mSparks
{"notes to self":["Don't edit config files by hand","use a decent hierarchy"]}

//Http://jsoneditoronline.org

~~~
DonHopkins
You could even write comments as a linear RSS feed of nested OPML outlines, by
converting all that XML to JSON.

[http://convertjson.com/xml-to-json.htm](http://convertjson.com/xml-to-
json.htm)

~~~
mSparks
Yep, I went through the process of replacing all our XML objects into json
ones about 6 years ago now. Smaller files and much easier to read, manipulate
store and transfer.

And while I personally quite liked XSLT, javascript is a much more flexible
and reliable option.

------
metafunctor
The page has been taken down for some reason (getting a 403).

Google cache:
[http://webcache.googleusercontent.com/search?q=cache:8jVuBmx...](http://webcache.googleusercontent.com/search?q=cache:8jVuBmxKEkQJ:seriot.ch/parsing_json.php)

~~~
lucb1e
Works for me.

Edit: But you are not alone. Elsewhere in the thread:
[https://news.ycombinator.com/item?id=12797032](https://news.ycombinator.com/item?id=12797032)

~~~
metafunctor
Yep, looks like it was down only for a few minutes.

------
dep_b
Now the mess that is called JavaScript dates has crept into any system
imaginable in the world. I can understand we needed to go for the lowest
denominator but Crockford's card really could cram in another line with a date
time string format.

~~~
inimino
I think adding a date type would have probably doubled the complexity of JSON
and taken it right out of the sweet spot that has made it popular.

~~~
niftich
This. Datetime is hard, because RFC 3339 is underspecified [1], and TOML
wrestled with this a lot.

[1]
[https://news.ycombinator.com/item?id=12364393#12364805](https://news.ycombinator.com/item?id=12364393#12364805)

------
eridius
Speaking as someone who wrote a JSON parser, this article and the accompanying
test suite looks to be very valuable, and I will be adding this test suite to
my parser's tests shortly.

That said, since my parser is a pure-Swift parser, I'm kind of bummed that the
author didn't include it already, but instead chose to include an apparently
buggy parser by Big Nerd Ranch instead. My parser is
[https://github.com/postmates/PMJSON](https://github.com/postmates/PMJSON)

------
jayd16
tl;dr JSON with a bunch of shitty extensions is awful. The error handling
among JSON parsers is inconsistent.

------
ohstopitu
When I didn't know better, I wrote my own JSON parser for Java (it was years
back and I didn't know about java libraries). From experience: DON'T. DO. IT.

That said, if you have decided to do it....

1) know fully well that it'll fail and build it with that assumption.

2) Please, please, please...give useful error messages when it does fail or
you'd be spending way too much time over something simple.

------
SFJulie
By sheer randomness I was having the thought about it today: I made some code
to highlight where the stdlib json module sees the mistakes in JSON decoding
in the python stdlib.

I used the exception with string "blabal at line x, col y, char(c - d)" to
actually highlight (ANSI colors) WHERE the mistake were.

[https://gist.github.com/jul/406da833d99e545085dac2f368a3b850...](https://gist.github.com/jul/406da833d99e545085dac2f368a3b850#file-
test-png)

I played a tad with it, and the highlighted area for missing separators,
unfinished string, lack of identifier were making no sense. I thought I was
having a bug. Checked and re-checked. But, No.

I made this tool because, whatever the linters are I was always wondering why
I was not able to edit or validate json (especially one generated by tools
coded by idiots) easily.

I thought I was stupid thinking json were complex.

Thanks.

------
austincheney
Writing parsers is hard and takes some experience, but its not as hard or as
impossible as most of these comments make out. JSON is retarted simple to
parse, even in the face of certain edge case ambiguities.

I can say this from experience after having written an HTML/XML parser that
provides support for various template schemes: Twig, Elm, Handlebars, ERB,
Apache Velocity, JSP, Freemarker, and many more. I have written a JavaScript
parser that supports React JSX, JSON, TypeScript, C#, Java, and many more
things.

In years I have been programming I frequently hear whining like, "its too
hard". Don't care. While you are wasting oxygen crying about how hard life is
somebody else will roll a solution you will ultimately consume.

~~~
aikah
where are all these goodies you boast about then ?

~~~
austincheney
[http://prettydiff.com/](http://prettydiff.com/)

------
novaleaf
when parsing human constructed JSON, use JSON5 for the win.

[http://json5.org/](http://json5.org/)

------
Tepix
The important lessen is that you can't blindly rely on your JSON parser to
save your ass when you are dealing with untrusted input.

If sending 1000 "["s will crash your application, you have a problem.

I hope the JSON parser authors will improve their parsers.

------
RangerScience
This is fantastic. However, it looks like the detailed conclusion is "exactly
matching the RFC is a minefield".

About a month ago (for the third time, since I don't own the first two
implementations) I made a very forgiving (and very error-unprotected) JSON
parser:
[https://github.com/narfanator/maptionary](https://github.com/narfanator/maptionary)

The core of JSON parsing, from that experience, seems really simple; it's
catching all the edge cases that's hard.

In any event, I look forward to taking the time to test against this test
suite!

------
nitwit005
People tend to screw up the unicode aspects more than the general parsing.
And, indeed, the example JSON parser provided checks for a UTF-8 byte order
mark, but doesn't validate that the data is valid UTF-8, so it will let
through strings that might cause an application problems.

Although there is a commented out method to validate a code point, so I guess
he understood that it was an issue.

------
RX14
Fixes for many of the issues raised by this post have since made it into
crystal master: [https://github.com/crystal-
lang/crystal/commit/7eb738f550818...](https://github.com/crystal-
lang/crystal/commit/7eb738f550818825786e90389ac84d2a2eb13e13)

~~~
devmunchies
I love how quickly things move in crystal right now. ️

------
RangerScience
Do you have an explanation anywhere of why each (or any) of the edge cases is
supposed to succeed or fail, or why it commonly does what it's not supposed to
do?

I realize that's almost as much work as writing each test case in the first
place, but even a subset of the test cases having that explanation would be
valuable.

------
nickpsecurity
Crap like this is why people should just use older ones that work. Some of the
issues I see in the comments weren't present in Sun's XDR:

[https://tools.ietf.org/html/rfc4506](https://tools.ietf.org/html/rfc4506)

Or even LISP s-expressions if you want organized text.

------
cbhl
In practice, people use increasingly smaller subsets of JavaScript to transmit
data.

For example, a common pattern is to transmit (numeric) user IDs as strings so
that they don't get mangled by floating-point precision issues with large
numbers. You see both Twitter and Facebook APIs do this, for example.

~~~
zodiac
I've read that you should treat IDs as strings anyway because you want to
discourage incrementing IDs, adding IDs together etc. Anything else you want
to do with integer IDs, e.g. comparing, can be done with sting IDs as well.

------
newsat13
Anyone else seeing forbidden?

Forbidden

You don't have permission to access to this document on this server.

~~~
gt2
yes, same here

~~~
lucb1e
You are not the only ones, but it works for me. Google cache link is posted
elsewhere in the thread:
[https://news.ycombinator.com/item?id=12797047](https://news.ycombinator.com/item?id=12797047)

------
vhost-
You definitely can't rely on it. Just the other day I was given a task to take
a request payload from our front end and do some stuff with it on our backend.
The payload looked like this: {"thing": [{"values": ["foobar"], "type": "blah
blah"}, "some identifier"], "other thing": "some string"}. It's mixing types
in arrays which is problematic for most statically types languages.

Tips for Go: Don't use map[string]interface{} and circumvent the type system
(I've seen this a lot in production). The fix involves the UnmarshalJSON and
MarshalJSON interfaces. This lets you put the data into a structure that's
sane and re-encode it back to something the other system expects.

------
77pt77
> it won't parse u-escaped invalid codepoints: ["\ud800"]

How is this not expected behaviour?

The string is not well-formed.

Same thing with decent XML parsers. They croak when you give them invalid
codepoints.

------
redleggedfrog
Crockford needs to write "JSON, The Good Parts."

~~~
seagreen
No he doesn't. "JSON, the good parts" is just JSON. The problems the post is
describing have to do with RFC 7159 allowed extensions.

------
boggydepot
So what's the alternative? If you had a time machine and went back in time,
what would you recommend/bring as an alternative to JSON?

------
metaloha
Am I wrong in seeing that PHP seems to fail the least weirdly in the full
results?

~~~
TazeTSchnitzel
It's more complicated in practice. Prior to PHP 7.0, the official JSON
extension was replaced with a completely different one in many distributions
due to licensing issues, and since PHP 7.0, PHP's official JSON extension is
yet another completely different implementation.

------
amelius
One of the biggest flaws of JSON is that it doesn't support "undefined". This
makes translating Javascript structures to and from JSON actually not preserve
the original value. Sigh.

~~~
indubitably
But undefined is specific to Javascript… there are lots of other Javascript
things that JSON doesn't handle either, like Set or Map objects. It's not
intended to serialize arbibtrary JS objects — it's intended to serialize a
useful least-common-denominator which has proven useful through experience.

------
singularity2001
JSON = require('json5')

And you can even use comments!! (no comment)

// [http://json5.org/](http://json5.org/)

------
ninjakeyboard
unless for fun, rolling your own json parser is like writing bubble sort for
use in your prod app.

------
tofupup
it is an improvement but when i am in these situations i usually grab the
first library out there.

------
chrismarlow9
this is kind of a vulnerability developers wet dream, especially that graph...

------
michaelp983
There are alternatives to JSON that are OS, available in many languages,
actively supported, and both CPU and memory efficient.

FlatBuffers & Netflix [https://www.youtube.com/watch?v=K_AfmRc-
TLE&feature=youtu.be...](https://www.youtube.com/watch?v=K_AfmRc-
TLE&feature=youtu.be&t=21m30s)

------
andrewvijay
Bad thing to read when I'm writing a sass to json module

~~~
lucb1e
The encoding takeaway seems simple: escape everything with \uxxxx characters
that is outside of the ASCII range /[ -~]/ (regex) and you'll be pretty much
fine. Set the encoder to utf-8, don't leave [dangling,commas,], and a few
other things that are obvious from json.org.

~~~
inimino
Escaping everything outside of ASCII is terribly human-unfriendly and wasteful
in bytes as well.

------
rahrahrah
Oh boy this is another one of those threads..

Party A: X is harder than it looks

Party B: X isn't as hard as you're making it look

Boring.

------
mjpa
Wrote my own JSON parser
([https://github.com/MJPA/SimpleJSON](https://github.com/MJPA/SimpleJSON)) a
while ago... not sure how it's a minefield unless I'm missing something?

~~~
SloopJon
Perhaps you'd be interested to know that your JSONDemo program fails the
following tests:

hang:

y_number_huge_exp.json

segfault:

n_structure_100000_opening_arrays.json

n_structure_open_array_object.json

fail:

n_number_then_00.json

n_string_unescaped_tab.json

n_structure_capitalized_True.json

~~~
mjpa
I would :)

Was going to run the tests myself later but needed a box with python3 on it!

------
edem

        JSON is the de facto standard when it comes to (un)serialising and exchanging data in web and mobile programming. 

I disagree. Take protobuf for example. You get schemas, data structures, and a
parser in one package which is actually a lot smaller than JSON and compiles
to nearly all the commonly used languages. Ever since I've started using it my
life became so easier! If you don't want your data to be human readable (which
is very common) you should not use JSON as a data interchange format.

~~~
djur
JSON is still the de facto standard, regardless of whether it should be.

~~~
edem
I don't think so. Can you back it up with facts?

~~~
oldmanjay
Since you haven't demonstrated your premise beyond talking about what you
personally prefer, that particular ball is still in your court. The world
isn't obliged to accept your unsupported opinions as truth until you're
convinced otherwise.

~~~
edem
I don't need to demonstrate anything. It _is not_ de facto standard since if
it were everybody were using it which is not the case (Google is the best
example). Look up what the term means and you will understand.

~~~
paulddraper
Google uses JSON for their APIs.

[https://developers.google.com/drive/v2/reference](https://developers.google.com/drive/v2/reference)

------
youdontknowtho
You know what is more like a minefield...a minefield...

[http://www.afghan-network.net/Landmines/](http://www.afghan-
network.net/Landmines/)

Not trying to be a dork, but thought this would be a good place to bring
up...if anyone is interested...in the usage of landmines in current conflicts
and the way that they tend to linger.

Call this a comment factoid. Off topic, but interesting.

