
Json ⊄ js - rpsubhub
https://medium.com/joys-of-javascript/42a28471221d
======
fdr
Perhaps more pervasive and despairingly problematic is number representation.
JSON only, per specification, supports arbitrary precision numeric
representation. However, Javascript is -- also per specification (as I
understand it) -- entirely floating point. That makes working with money in
Javascript somewhat hazardous. While there are various arbitrary precision
libraries out there for Javascript to assuage this, the problem is most JSON-
parsing routines will always force one through a floating point conversion
anyway, so a loss of precision is more or less inevitable.

While swiftly coercing raw off-the-wire number representations to one's
arbitrary precision library of choice can avoid most cases of noticeable
accumulated error, it is irksome that the only way everyone seems to get by is
by cross their fingers that any loss in precision caused by "JSON.parse" is
meaningless to their application.

Or, the problem can be soundly avoided by using strings in the JSON payload,
which is lame but effective and probably one's best practical choice. It is
clearly an example of corruption spreading from Javascript to an otherwise
reasonable feature of the data representation format JSON.

~~~
mikeash
JavaScript uses IEEE754 doubles for numbers, so as long as you can count on
all of your parsers to do that, then you can rely on any integer up to 2^54
being represented exactly. For money, work in cents.

~~~
beagle3
The "work in cents" thing usually works for same currency, but not when you
involve foreign exchange:

Up until 4-5 years ago, the EUR/USD or GBP/USD or GBP/EUR were quoted with 4
decimals; Nowadays, there are quite a few places where you have to deal with
the 5th decimal to make the books balance. It's going to be a roundoff error
either way, but it might be a round-off error that keeps been counters,
auditors and therefore eventually IT people awake at night.

~~~
dev_jim
"Works in cents" can be generalized for an integer with implied decimals. 4 or
5 decimal places gives you plenty of room to work with if you are just using a
double to exactly represent an integer value.

~~~
beagle3
That's almost never true using 32 bit math, is increasingly less and less true
for 64 bit math (e.g., if you need 5 decimal places, you're left with just 13
digits - that's only trillions!) - but most importantly, a support nightmare -
a lot of software written until 2002 or so assumed 4 digits is enough -
Microsoft stuff (e.g. Visual Basic, I think other stuff too) had a "money"
4-decimal-digit fixed point type.

And then the market adopts a 5th digit - and you have to essentially rewrite
all the math stuff.

~~~
mikeash
I'd hope you would abstract the representation so that you don't have to
rewrite everything just to add another digit of precision....

~~~
beagle3
Well, yes and no.

If you're using language support (e.g., in VB until 2004 "dim cost as Money"
was a declaration that was good for every purpose), then you're out of luck,
and you do need a rewrite.

But even if you don't - this is far from trivial: Is your own implemented
money type a fixed point type? If so, what is your fixed point? (e.g. Yen only
needs 3 after the dot, but trillions in the front is not enough; GBP/USD needs
5 after the dot, but trillions usually IS enough).

Is it fixed per currency? Is it floating decimal point? The abstraction here
is far from trivial if care about performance.

Of course, if you've used a language or a library with a usable decimal type
(I'm sure there's other, but I've only been happy with Python), the
abstraction has been taken care of.

But generally, abstracting money (value+currency) is not as simple as one
would assume, and it is very rare that a production system gets it both right
and future proof.

~~~
Dylan16807
>(e.g. Yen only needs 3 after the dot, but trillions in the front is not
enough; GBP/USD needs 5 after the dot, but trillions usually IS enough).

At that point screw it, 64.32 for 18.9 digits and round it to 128 bits with a
currency code. Comes with built-in unit checking.

------
Rauchg
Magnus Holm did a great job describing this after he fixed the problem in
Rack::JSONP. Good reference as well for those interested in the topic:

<http://timelessrepo.com/json-isnt-a-javascript-subset>

~~~
0x0
What a shame that \u hack isn't acceptable in javascript,
<http://code.google.com/p/v8/issues/detail?id=1939#c4>

~~~
coolj
> The abstract operation Quote(value) wraps a _String_ value in double quotes
> and escapes characters within it.

It doesn't look like <http://es5.github.com/#Quote> has anything to do with
serializing to JSON string--am I missing something?

------
lucian1900
Petty much all web technologies appear to be made of twine and duck tape. It's
so sad that this is in many cases the best we have.

~~~
Already__Taken
That just seems to be what happens when so many people have to collaborate.

Plus its still just mega young. Look at legal systems, they've been around
forever and it can be your life's work just to understand them at a competency
to participate.

The miracle of software isn't that it works, it's that it does anything at
all.

~~~
rwmj
That's ridiculous. JSON is replacing XML which is a well-established
interchange technology that doesn't have all the problems of JSON (like no
good way to represent 64 bit numbers, weak typing, barely any types in the
first place, etc).

~~~
masklinn
> like no good way to represent 64 bit numbers

Yep, it only has strings either way.

> weak typing

Yep, it only has strings either way (also, "weak typing" does not mean jack
shit, and talking about typing strength for a serialization format makes zero
sense)

> barely any types in the first place

Yep, it only has one.

~~~
aurelianito
If you look at the XML schema specs (<http://www.w3.org/TR/2000/CR-
xmlschema-2-20001024/#decimal>) you will see that they have been standardized
numbers.

Things in XML do not need to be just strings.

PS: I still don't like XML, but your comment is technically incorrect.

Edit: Typo FIX.

~~~
masklinn
> Things in XML do not need to be just strings.

Things in XML _are_ just strings.

Schemas are metadata, annotations to tell processors "treat this string as
[some other datatype]" (note how it's not going to work if you're not using
the schema _and_ a schema-aware processor?)

And guess what? Nothing stops you from doing exactly the same thing in JSON.
In fact you don't have much of a choice for the datatypes JSON doesn't
natively support (dates being the most common one, but not the only one by any
mean). And good JSON interfaces provide for embedding transcodings directly in
the parsing or dumping (that's what the `reviver` and `replacer` arguments do
in JSON.parse and JSON.stringify) for exactly that purpose.

> PS: I still don't like XML, but your comment is technically incorrect.

Nope. Specific XML dialects may have non-string datatypes (XML-RPC certainly
does), but XML only has strings. In the same way CSV only has strings, but
specific CSV uses may have more. That's the plain facts of the matter.

~~~
jeffdavis
"Things in XML are just strings."

XML and JSON are representations -- so they are all just bits. It's
meaningless to say that, however.

The metadata and surrounding standards are what give those bits more meaning.
So compare what the standards have to offer.

(Aside: I don't like XML and I think JSON is way over-hyped and under-
delivers.)

~~~
masklinn
> So compare what the standards have to offer.

Which I did. The XML standard only offers strings, and you can add schemas to
JSON.

------
xmas_project
The "uniname" command in the uniutils package (apt-get install uniutils) is
great for checking "invisible" or weird Unicode characters. `echo '{"str":
"own ed"}' | uniname` will show the LINE SEPARATOR (\u002028) hidden in the
string.

------
rpearl
I have to comment on this line:

> Some libraries implement an unsafe, but fast, JSON parse using “eval” for
> older browsers

eval is not fast! In fact it is the opposite of fast. Most JIT optimizations
go away in the face of eval()! Do not use it even if you _know_ it's safe. Use
JSON.parse instead.

~~~
shod
"Older browsers" means those that predate native JSON methods. Libraries often
use eval() as a fallback.

------
javajosh
This is a good catch, but the presentation is awfully passive aggressive. Yes,
if JSON wants to strictly be a subset of JS, then those two characters need to
be treated specially - either excluded from JSON, or specially escaped in JSON
libraries. The simplest solution, clearly, is to exclude them from JSON. (You
can still use those characters, but you have to escape them). There's no
compelling reason not to do this - and I doubt that this will cause a problem
anyway.

~~~
gurkendoktor
If you can paste these invisible characters into a chat box to break a website
that uses eval(), then someone will inevitably do it, simply because it is the
internet.

------
JulianMorrison
The other reason: if you are using a Turing-complete language to eval your
nominally context free grammar as if it were code, you are an idiot who is
asking to be hacked, whether through quirks like this or otherwise. (This also
goes for PHP and ERB.) <http://www.cs.dartmouth.edu/~sergey/langsec/occupy/>

~~~
Rauchg
It's actually trivial to make evaluation safe for JSON input as seen in the
very short `parseJSON` implementation in jQuery:

[https://github.com/jquery/jquery/blob/master/src/core.js#L52...](https://github.com/jquery/jquery/blob/master/src/core.js#L523)

Also, for the JSONP technique to work, it _has_ to be valid JavaScript so the
escaping is necessary.

~~~
chimeracoder
I don't know enough Javascript to give the analogous exploit myself, but that
approach looks dangerously similar to some attempts I have seen at making
Python's "eval" safe - attempts which, I might add, all fail to capture corner
cases that any sufficiently determined or motivated attacker would eventually
discover.

~~~
fforw
First of all, if native JSON parsing exists, jQuery will use that.

The validation code they use in case there is no native JSON implementation
available is borrowed from Douglas Crockford's json2.js (
[https://github.com/douglascrockford/JSON-
js/blob/master/json...](https://github.com/douglascrockford/JSON-
js/blob/master/json2.js#L448) ) which was the inspiration for the native JSON
implementations and should really be correct by now, both in terms of
correctness but also circumventing regexp weaknesses in some engines.

~~~
JulianMorrison
It's still the Wrong Way™. You want to parse JSON, you write a lexer that
recognizes its symbols and a parser that consumes them and spits the JS
equivalent. Otherwise you are playing an eternal game of whack-a-mole with the
JS eval parser. No. Just no.

~~~
fforw
If you can find a case in which the validator fails, it's wrong. Otherwise, if
it's looks like a JSON parser and works like a JSON parser, i.e. is
indistinguishable from a "proper" parser, it makes no sense but satisfying OCD
to rework it.

Note that the code is now only a work-around for older browsers. Every modern
browser supports native JSON parsing anyway.

------
sil3ntmac
I often pass inline data from server -> client JS using a meta tag. In rails3
it would look like:

<meta name='blah' content="<%= @data.to_json %>" />

However this has always seemed unclean to me. Does anyone else have a better,
alternative method of inlining data? I'd rather not use inline scripts for the
exact reason they mention.

~~~
ricardobeat
Use data-* attributes [1], or print it in a script tag with a text/anything
type:

    
    
        <script type="text/json" id="mydata">
            { data: "..." }
        </script>
    
        var data = JSON.parse($('mydata').textContent)
    

[1] [https://developer.mozilla.org/en-
US/docs/HTML/Global_attribu...](https://developer.mozilla.org/en-
US/docs/HTML/Global_attributes#attr-data-*)

~~~
benmmurphy
If you put it in a script tag you need to ensure you escape </script>. Though,
it is probably safer to escape <, >, and & just to be sure.

~~~
fforw
The </script> issue is why JSON and Javascript both allow you to escape the
slash character with a backslash.

So just going <\/script> is enough.

------
powrtoch
I feel really dense, but I don't understand why the example line is throwing
an error. The article mentions line terminators, but it doesn't seem to
contain any, and I also don't understand why "owned" would be escaped the way
the author says... it looks as though the interpreter is just rejecting the
use of quotes around the key. But I'm sure I'm just missing something, so I'd
be much obliged if someone could enlighten me.

~~~
jfarmer
There's an invisible-to-the eye unicode character in the string "owned," if
you copy and paste the text from the website.

JSON is fine with these characters, but JavaScript is not.

For plain-jane JSON this is usually fine, since you're not just evaluating the
JSON as JavaScript, but are running the returned data through a JSON parser. A
properly-designed JSON parser will escape any JSON-valid-but-JavaScript-
invalid characters.

JSONP, however, works differently and will use use the JavaScript parser. Womp
womp.

The blog post also lists two other cases, although the first case -- parsing
JSON using eval -- is both insecure and incorrect. I haven't seen people do
that in ages and ages.

~~~
mhotchen
Just an fyi, jQuery's core JSON parser actually uses eval (well, new
Function(), which is almost the same, but with scope protection).

[https://github.com/jquery/jquery/blob/master/src/core.js#L52...](https://github.com/jquery/jquery/blob/master/src/core.js#L529)

~~~
troels
Looks like someone should send a pull request then.

~~~
jgeralnik
To be fair, it does escape characters and verify with a regex that the data is
actually JSON before eval-ing it.

------
andrewcooke
anyone else see the strangest mess on view source for that page? (chrome on
ubuntu)

[edit: i guess it's confused by the line break characters. i reported an
issue.]

[edit2: whoa, wget seems to show the same thing when i look at the source in
emacs...]

[edit3: ok, i am an idiot. it's just quotes in a bunch of meta tags. sorry.
move along. nothing to see here.]

~~~
prezjordan
I'm curious why every paragraph is named like that.

------
beatgammit
One more thing that really bugs me about JSON is that it doesn't support
Infinity, -Infinity and NaN. Python's JSON library does, which leads to some
interesting breakages.

Sure, for NaN you can use null, but for Infinity, you have to use really
large/small numbers, which can also lead to other problems.

~~~
udp
If Python's JSON library accepts Infinity by default, that's a bug and should
be reported.

~~~
JulianWasTaken
It isn't. It's documented behavior. See
<http://docs.python.org/2/library/json.html>. (simple)JSON deviates from the
spec in a few minor ways.

~~~
andrewcooke
more exactly, it's an option you can disable.

------
lacker
Another problem is unicode characters that are not in the basic multilingual
plane. JSON supports those, but many Javascript environments (like basically
all browsers) do not.

------
Xion
The other thing is that JSON scoffs at trailing comma (both in object and
array literals), while JavaScript engines are perfectly happy to accept them.

~~~
troels
Not all of them are. IE6 will choke.

~~~
martin-adams
And IE7, and IE8.

~~~
masklinn
IE8 (maybe IE7 as well, not sure anymore) will accept trailing commas in
objects. All IE will also "accept" trailing commas in arrays, but _they will
create a final `undefined` element_

------
ot
Previous discussion: <http://news.ycombinator.com/item?id=2550145>

------
michaelmcmillan
Judofyr has covered this in depth on his blog: <http://timelessrepo.com/json-
isnt-a-javascript-subset>

------
Toshio
YAML is the new JSON dontcha know.

