Hacker News new | comments | show | ask | jobs | submit login
Json ⊄ js (medium.com)
216 points by rpsubhub 1577 days ago | hide | past | web | 102 comments | favorite

Perhaps more pervasive and despairingly problematic is number representation. JSON only, per specification, supports arbitrary precision numeric representation. However, Javascript is -- also per specification (as I understand it) -- entirely floating point. That makes working with money in Javascript somewhat hazardous. While there are various arbitrary precision libraries out there for Javascript to assuage this, the problem is most JSON-parsing routines will always force one through a floating point conversion anyway, so a loss of precision is more or less inevitable.

While swiftly coercing raw off-the-wire number representations to one's arbitrary precision library of choice can avoid most cases of noticeable accumulated error, it is irksome that the only way everyone seems to get by is by cross their fingers that any loss in precision caused by "JSON.parse" is meaningless to their application.

Or, the problem can be soundly avoided by using strings in the JSON payload, which is lame but effective and probably one's best practical choice. It is clearly an example of corruption spreading from Javascript to an otherwise reasonable feature of the data representation format JSON.

JavaScript uses IEEE754 doubles for numbers, so as long as you can count on all of your parsers to do that, then you can rely on any integer up to 2^54 being represented exactly. For money, work in cents.

The "work in cents" thing usually works for same currency, but not when you involve foreign exchange:

Up until 4-5 years ago, the EUR/USD or GBP/USD or GBP/EUR were quoted with 4 decimals; Nowadays, there are quite a few places where you have to deal with the 5th decimal to make the books balance. It's going to be a roundoff error either way, but it might be a round-off error that keeps been counters, auditors and therefore eventually IT people awake at night.

"Works in cents" can be generalized for an integer with implied decimals. 4 or 5 decimal places gives you plenty of room to work with if you are just using a double to exactly represent an integer value.

That's almost never true using 32 bit math, is increasingly less and less true for 64 bit math (e.g., if you need 5 decimal places, you're left with just 13 digits - that's only trillions!) - but most importantly, a support nightmare - a lot of software written until 2002 or so assumed 4 digits is enough - Microsoft stuff (e.g. Visual Basic, I think other stuff too) had a "money" 4-decimal-digit fixed point type.

And then the market adopts a 5th digit - and you have to essentially rewrite all the math stuff.

I'd hope you would abstract the representation so that you don't have to rewrite everything just to add another digit of precision....

Well, yes and no.

If you're using language support (e.g., in VB until 2004 "dim cost as Money" was a declaration that was good for every purpose), then you're out of luck, and you do need a rewrite.

But even if you don't - this is far from trivial: Is your own implemented money type a fixed point type? If so, what is your fixed point? (e.g. Yen only needs 3 after the dot, but trillions in the front is not enough; GBP/USD needs 5 after the dot, but trillions usually IS enough).

Is it fixed per currency? Is it floating decimal point? The abstraction here is far from trivial if care about performance.

Of course, if you've used a language or a library with a usable decimal type (I'm sure there's other, but I've only been happy with Python), the abstraction has been taken care of.

But generally, abstracting money (value+currency) is not as simple as one would assume, and it is very rare that a production system gets it both right and future proof.

>(e.g. Yen only needs 3 after the dot, but trillions in the front is not enough; GBP/USD needs 5 after the dot, but trillions usually IS enough).

At that point screw it, 64.32 for 18.9 digits and round it to 128 bits with a currency code. Comes with built-in unit checking.

I don't think any of the common business programming languages let you add new types to the numeric tower. C++ does, through operator overloading. Haskell has very nice support for this; to get started all you need is "newtype Money = Money Rational deriving (Show, Eq, Ord, Ratio, Integral, Num)". If you later want to make the type more restrictive than Rational, you can do that without changing any calling code (which will work on normal Num instances, most likely).

Unfortunately, "Num" isn't precisely what you want in money - you don't multiply dollars by dollars to get dollars...

But it works well enough.

Rather than arbitrary multiplication, you should probably have an OrderUnits type, then define a function of type Money -> OrderUnits -> Money that calculates the price of multiple units.

(Money 3 * 2 will return Money 6 if fromIntegral 2 = Money 2. But that is not really the operation you want to perform, so you might consider not implementing Num for Money.)

Right, exactly.

That's faster, but I think can be a poor trade-off because:

1. The speed gain is negligible for most programs

2. The addition of any pricing that requires fractional cents will require careful work to handle unit conversion to maintain integral representation.

#2 becomes ugly when one has an API many people use, and one must bother them to update their code paths to use the new, higher precision that can handle all money as integers. This is not even counting the case where one tries to cut a corner and someone ends up not doing the conversion they ought to.

Also, programs often are more lucid when operating in terms of the frequent units of choice, such as dollars or fractional dollars. Few domains price everything in cents by preference, because in aggregation often dollars -- sometimes many -- are exchanged. The problem gets worse if one needs weird units like 0.1c to regain integral numbers.

There are ways around these, but falling back on strings seems to me the lowest-maintenance option.

On the other hand, integral representations have few dependencies (a compliant javascript interpreter), which is also a pretty big plus for that.

Work in 1/100ths of cents? You will have to convert those strings to numbers at some point (or use a specialised library format).

The latter. Such as big.js (not an endorsement, my understanding is too weak for that).

Just about every practically used programming environment from every walk of life has such a thing in the standard library, and I suppose they are there for good reasons. However, the fact is that Javascript doesn't have one, so one will have to weigh dependencies into their decision.

Go - math/big

Java - BigDecimal

Python - decimal

Ruby - BigDecimal

PostgreSQL - numeric

Interesting mention: decimal in C#, deemed sufficient for monetary calculations at 128 bits of precision.

Another interesting mention: Oracle, which in my understanding handles just about all its numbers this way, by default. This might tell you something about their early customer base.

CS history lesson -- don't use rationals for money. You have no business messing with the denominator, so the extra freedom of using arbitrary rationals is just more rope to hang yourself with.

Incidentally, 1/100th of a cent is a "milray", defined in A Connecticut Yankee In King Arthur's Court.

It's 2^53 (9 quadrillion).

Math.pow(2, 53) === Math.pow(2, 53) + 1 // true

Math.pow(2, 53) === Math.pow(2, 53) - 1 // false

Yep, remembered "53" but then misremembered it as the number of mantissa bits.

Lua does not provide integers, too. From their mailing list I have learnt that the IEEE rules guarantee, that e.g. doubles have absolutely now problem representing 32bit integers. You write the same here. It seems to not be intuitive, but IEEE took care of it.

But I wonder why you write that for values in 1/100 one should multiply by 100! Wouldn't an IEEE double have no problems with values with two decimal points, too? Or are there many problems with that, e.g. that $0.01 + $0.01 might already cause rounding errors which do not even out by the clever IEEE rules?

Of course 2^54 woudln't be the highest value where one shouldn't worry anymore, but something smaller.

1/5 in decimal is 0.00110011... repeating in binary, therefore not exactly representable, therefore it could lead to errors. Not big errors, but enough that smart people would prefer not to use it for money.

To give a concrete example, 19.99 can't be exactly represented in floating-point, and ends up being 19.989something. If you have a price of $19.99 that's represented in floating-point, and if you truncate rather than round as you pass it off for display, you'll end up displaying a price of $19.98. With enough manipulation, the error can accumulate to the extent that rounding the result won't save you either.

You mean, for American money, work in cents.

The name cent shouldn't throw you off here as applying only to US currency. Many other currencies have a notional 1/100th unit of the basic one, that usually also exists as a cash value.

So s/cent/atomic subunit/, because "for money, work in cents" is meaningless for yen and ambiguous for Bitcoin ("bitcents" or "satoshis"?)

If only people reading my comment had some sort of brain with which to interpret it instead of being doomed to apply only the exact literal meaning.

I'm afraid you ask too much of the Internet.

Yes, he clearly should have taken into account any possible subunit denomination, including exotic native tribe money units like sea shells, because international programmers reading HN are not expected to think of that of their own.

Such vitriol! Can your currency code handle sea shells? It should! :)

I only mention it because the yen thing surprised me once as a newbie programmer. (A tip: "%0.2f" is no substitute for actual i18n.) It never hurts to be aware of the edge cases.

>Can your currency code handle sea shells?

I'm afraid it mine can only handle sea shells!

>I only mention it because the yen thing surprised me once as a newbie programmer.

So what's the yen thing? How does it differ?

The yen has no fractional units (anymore), so when I added support for that currency but continued to use the same format string as all the others, prices like ¥1000.00 looked unnatural.

The historical subunit of the yen is the "sen" (銭), which you often seen mentioned in prewar books. Postwar inflation made sen-denominated currency obsolete, but...

[A bit of googling also turns up the "rin" (厘), which is a thousandth of a yen...]

Many domains, from gas stations to stock markets, require sub-penny pricing.

If sub-penny pricing is needed, usually myridollars (10^-4) suffice. Your example requires millidollars. I'm curious if there's a natural use case requiring sub-myri resolution.

Securitized bonds present many such cases.

The idea is that you have income for each loans, which then pay into bonds based on whatever rules may have been set. These rules often have terms like a fixed interest on the outstanding principal for senior bond issues, and then various divisions for the junior, with a weird "IO" piece that gets the leftovers that don't divide neatly. The rules can be anything that they were structured to be. The result surprisingly frequently is something where the allocation of the final penny in billions of dollars can be impacted by floating point ambiguity. (And the prospectus seldom will clarify this - the ultimate control lies in whatever the servicer's computer program does.)

Hopefully we won't see billions of dollars being handled in javascript.

Even with securitized bonds, myridollars are sufficient to represent the payments.

Do not confuse "unambiguous" with "correct". If you use myridollars and the servicer used floating point with roundoff errors, the servicer's implementation is by definition correct.

All of the interchange formats that I can think of at 1 AM (including FIX and exchange-proprietary formats) are very specific regarding the data. For example, the FIX floating point specifically limits the number of significant figures to 15

Minor electronic components like surface mount capacitors have tiny prices. I just looked up a model that costs $0.00745.

EDIT: Millage taxes on property are denoted in mills—thousandths. The taxes are often fractional mills, like 2.225. So you need 10^-8 resulution at the least.

I'm guessing those capacitors are sold by the reel, maybe 4,000 to the reel.


The only real solution for high precision numbers in JavaScript are string-based "big num" classes, unfortunately. Sending important numbers (banking, gas prices, etc) as the built-in number type is just setting yourself up for pain.

JavaScript allows all the same numeric literals in its syntax, it just imposes additional semantics on them.

If I remember correctly, this was a problem with Facebook user IDs. I believe they ended up making the value a string in the API to deal with this (and padding new user IDs by some large power of ten so that developers would hit this code-path early enough to catch it).

> However, Javascript is -- also per specification (as I understand it) -- entirely floating point. [...] While there are various arbitrary precision libraries out there for Javascript to assuage this

Interesting! I hadn't seen those before.

That strongly suggests that browsers ought to add some built-in extension types for arbitrary-precision arithmetic, using fast native libraries like GMP. That would then allow such JavaScript libraries to use the native types when available, and their existing pure-JavaScript support when not.

Browsers shouldn't do it. Ecmascript should do it.

Eventually, sure; I'd love to see some standardized signed/unsigned integers, for that matter. However, in the meantime, a browser could add an extension that existing JavaScript libraries could transparently use when available.

Magnus Holm did a great job describing this after he fixed the problem in Rack::JSONP. Good reference as well for those interested in the topic:


What a shame that \u hack isn't acceptable in javascript, http://code.google.com/p/v8/issues/detail?id=1939#c4

> The abstract operation Quote(value) wraps a String value in double quotes and escapes characters within it.

It doesn't look like http://es5.github.com/#Quote has anything to do with serializing to JSON string--am I missing something?

Petty much all web technologies appear to be made of twine and duck tape. It's so sad that this is in many cases the best we have.

That just seems to be what happens when so many people have to collaborate.

Plus its still just mega young. Look at legal systems, they've been around forever and it can be your life's work just to understand them at a competency to participate.

The miracle of software isn't that it works, it's that it does anything at all.

That's ridiculous. JSON is replacing XML which is a well-established interchange technology that doesn't have all the problems of JSON (like no good way to represent 64 bit numbers, weak typing, barely any types in the first place, etc).

> like no good way to represent 64 bit numbers

Yep, it only has strings either way.

> weak typing

Yep, it only has strings either way (also, "weak typing" does not mean jack shit, and talking about typing strength for a serialization format makes zero sense)

> barely any types in the first place

Yep, it only has one.

If you look at the XML schema specs (http://www.w3.org/TR/2000/CR-xmlschema-2-20001024/#decimal) you will see that they have been standardized numbers.

Things in XML do not need to be just strings.

PS: I still don't like XML, but your comment is technically incorrect.

Edit: Typo FIX.

> Things in XML do not need to be just strings.

Things in XML are just strings.

Schemas are metadata, annotations to tell processors "treat this string as [some other datatype]" (note how it's not going to work if you're not using the schema and a schema-aware processor?)

And guess what? Nothing stops you from doing exactly the same thing in JSON. In fact you don't have much of a choice for the datatypes JSON doesn't natively support (dates being the most common one, but not the only one by any mean). And good JSON interfaces provide for embedding transcodings directly in the parsing or dumping (that's what the `reviver` and `replacer` arguments do in JSON.parse and JSON.stringify) for exactly that purpose.

> PS: I still don't like XML, but your comment is technically incorrect.

Nope. Specific XML dialects may have non-string datatypes (XML-RPC certainly does), but XML only has strings. In the same way CSV only has strings, but specific CSV uses may have more. That's the plain facts of the matter.

"Things in XML are just strings."

XML and JSON are representations -- so they are all just bits. It's meaningless to say that, however.

The metadata and surrounding standards are what give those bits more meaning. So compare what the standards have to offer.

(Aside: I don't like XML and I think JSON is way over-hyped and under-delivers.)

> So compare what the standards have to offer.

Which I did. The XML standard only offers strings, and you can add schemas to JSON.

That's quite a conclusion to jump to. Maybe you haven't noticed but the Web is running great, especially when developed by thoughtful, competent programmers.

It's sad how many non-web developers still complain about how the Web is sour grapes.

It's not that the people working on it haven't been great, it's that there's so much path dependence, it's frustrating. Which is not at all to say that there isn't path dependence in plenty of other domains.

The "uniname" command in the uniutils package (apt-get install uniutils) is great for checking "invisible" or weird Unicode characters. `echo '{"str": "own
ed"}' | uniname` will show the LINE SEPARATOR (\u002028) hidden in the string.

I have to comment on this line:

> Some libraries implement an unsafe, but fast, JSON parse using “eval” for older browsers

eval is not fast! In fact it is the opposite of fast. Most JIT optimizations go away in the face of eval()! Do not use it even if you know it's safe. Use JSON.parse instead.

"Older browsers" means those that predate native JSON methods. Libraries often use eval() as a fallback.

This is a good catch, but the presentation is awfully passive aggressive. Yes, if JSON wants to strictly be a subset of JS, then those two characters need to be treated specially - either excluded from JSON, or specially escaped in JSON libraries. The simplest solution, clearly, is to exclude them from JSON. (You can still use those characters, but you have to escape them). There's no compelling reason not to do this - and I doubt that this will cause a problem anyway.

If you can paste these invisible characters into a chat box to break a website that uses eval(), then someone will inevitably do it, simply because it is the internet.

The other reason: if you are using a Turing-complete language to eval your nominally context free grammar as if it were code, you are an idiot who is asking to be hacked, whether through quirks like this or otherwise. (This also goes for PHP and ERB.) http://www.cs.dartmouth.edu/~sergey/langsec/occupy/

It's actually trivial to make evaluation safe for JSON input as seen in the very short `parseJSON` implementation in jQuery:


Also, for the JSONP technique to work, it _has_ to be valid JavaScript so the escaping is necessary.

I don't know enough Javascript to give the analogous exploit myself, but that approach looks dangerously similar to some attempts I have seen at making Python's "eval" safe - attempts which, I might add, all fail to capture corner cases that any sufficiently determined or motivated attacker would eventually discover.

First of all, if native JSON parsing exists, jQuery will use that.

The validation code they use in case there is no native JSON implementation available is borrowed from Douglas Crockford's json2.js ( https://github.com/douglascrockford/JSON-js/blob/master/json... ) which was the inspiration for the native JSON implementations and should really be correct by now, both in terms of correctness but also circumventing regexp weaknesses in some engines.

It's still the Wrong Way™. You want to parse JSON, you write a lexer that recognizes its symbols and a parser that consumes them and spits the JS equivalent. Otherwise you are playing an eternal game of whack-a-mole with the JS eval parser. No. Just no.

If you can find a case in which the validator fails, it's wrong. Otherwise, if it's looks like a JSON parser and works like a JSON parser, i.e. is indistinguishable from a "proper" parser, it makes no sense but satisfying OCD to rework it.

Note that the code is now only a work-around for older browsers. Every modern browser supports native JSON parsing anyway.

I often pass inline data from server -> client JS using a meta tag. In rails3 it would look like:

<meta name='blah' content="<%= @data.to_json %>" />

However this has always seemed unclean to me. Does anyone else have a better, alternative method of inlining data? I'd rather not use inline scripts for the exact reason they mention.

Use data-* attributes [1], or print it in a script tag with a text/anything type:

    <script type="text/json" id="mydata">
        { data: "..." }

    var data = JSON.parse($('mydata').textContent)
[1] https://developer.mozilla.org/en-US/docs/HTML/Global_attribu...

If you put it in a script tag you need to ensure you escape </script>. Though, it is probably safer to escape <, >, and & just to be sure.

The </script> issue is why JSON and Javascript both allow you to escape the slash character with a backslash.

So just going <\/script> is enough.

Can you give it directly to the JS working on that data by setting a var (var x = <%= @data.to_json %>;) or passing it in as a parameter to an initializing function call (APP.init(<%= @data.to_json %>);)? Edit: 1. I hope it's not user generated data. 2. The output in this case wouldn't need to be JSON syntax, it could be object literal notation, saving space on all those double quotes but I doubt you have a helper method available for that.

you can put anything inside class attribute

<div id="data" class="{json: 'x'}"/>

you can mix actual classes with JSON, just put JSON at the end:

<div class="header inline {json: 'x'}"/>

CSS selectors will work just fine

I feel really dense, but I don't understand why the example line is throwing an error. The article mentions line terminators, but it doesn't seem to contain any, and I also don't understand why "owned" would be escaped the way the author says... it looks as though the interpreter is just rejecting the use of quotes around the key. But I'm sure I'm just missing something, so I'd be much obliged if someone could enlighten me.

There's an invisible-to-the eye unicode character in the string "owned," if you copy and paste the text from the website.

JSON is fine with these characters, but JavaScript is not.

For plain-jane JSON this is usually fine, since you're not just evaluating the JSON as JavaScript, but are running the returned data through a JSON parser. A properly-designed JSON parser will escape any JSON-valid-but-JavaScript-invalid characters.

JSONP, however, works differently and will use use the JavaScript parser. Womp womp.

The blog post also lists two other cases, although the first case -- parsing JSON using eval -- is both insecure and incorrect. I haven't seen people do that in ages and ages.

That makes sense. I was confused because I was also getting the error when I typed the line into the interpreter by hand. The issue being that I forgot to assign it to a variable (facepalm). Thanks for the explanation.

Just an fyi, jQuery's core JSON parser actually uses eval (well, new Function(), which is almost the same, but with scope protection).


Looks like someone should send a pull request then.

To be fair, it does escape characters and verify with a regex that the data is actually JSON before eval-ing it.

The line terminator is there, it just doesn't render as anything visible. (It's inserted directly as a unicode character, rather than it's HTML-escaped equivalent, so you won't even see it if you view source!)

However, if you take a hex dump of the page, it becomes quite apparent:


Note: The file is UTF-8 encoded, so you'd be looking for E2 80 A8 instead of \u2028.

anyone else see the strangest mess on view source for that page? (chrome on ubuntu)

[edit: i guess it's confused by the line break characters. i reported an issue.]

[edit2: whoa, wget seems to show the same thing when i look at the source in emacs...]

[edit3: ok, i am an idiot. it's just quotes in a bunch of meta tags. sorry. move along. nothing to see here.]

I'm curious why every paragraph is named like that.

One more thing that really bugs me about JSON is that it doesn't support Infinity, -Infinity and NaN. Python's JSON library does, which leads to some interesting breakages.

Sure, for NaN you can use null, but for Infinity, you have to use really large/small numbers, which can also lead to other problems.

If Python's JSON library accepts Infinity by default, that's a bug and should be reported.

It isn't. It's documented behavior. See http://docs.python.org/2/library/json.html. (simple)JSON deviates from the spec in a few minor ways.

more exactly, it's an option you can disable.

You can try to report it as a bug, but the docs indicate that this is by design.

Another problem is unicode characters that are not in the basic multilingual plane. JSON supports those, but many Javascript environments (like basically all browsers) do not.

The other thing is that JSON scoffs at trailing comma (both in object and array literals), while JavaScript engines are perfectly happy to accept them.

Not all of them are. IE6 will choke.

And IE7, and IE8.

IE8 (maybe IE7 as well, not sure anymore) will accept trailing commas in objects. All IE will also "accept" trailing commas in arrays, but they will create a final `undefined` element

Judofyr has covered this in depth on his blog: http://timelessrepo.com/json-isnt-a-javascript-subset

YAML is the new JSON dontcha know.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact