I was attempting to solve this very problem in the Rust BigDecimal crate this we...

marcosdumay · 2024-04-02T01:04:17

It's best if your parser fails.

Serde has an interface that allows failing. That one should fail. There is also another that panics, and AFAIK it will automatically panic on any parser that fails.

Do not try to handle huge values, do not pretend your parser is total, and do not pretend it's a correct value.

If you want to create an specialized parser that handles huge numbers, that's great. But any general one must fail on them.

akubera · 2024-04-02T04:10:43

This isn't about parsing so much as letting the users do "dangerous" math operations. The obvious one is diving by zero, but when the library offers arbitrary precision, addition becomes dangerous with regard to allocating all the digits between a small and large value

  1e10 + 1e-10 = 10000000000.0000000001
  1e10000000000000000000 + 1e-10000000000000000000 = ...

It's tough to know where to draw the lines between "safety", "speed", and "functionality" for the user.

[EDIT]: Oh I see, fix the parser to disallow such large numbers from entering the system in the first place, then you don't have to worry about adding them together. Yeah that could be a good first step towards safety. Though, I don't know how to parametrize the serde call.

kccqzy · 2024-04-02T04:45:39

If you are using a library with this kind of number representation, computing any rational number with a repeating decimal representation will use up all your memory. 1/3=0.33333… It will keep allocating memory to store infinite copies of the digit 3. (In practice it stores it using binary representation but you get the idea.)

akubera · 2024-04-02T05:10:02

For the Rust crate, there is already an arbitrary limit (defaults to 100 digits) for "unbounded operations" like square_root, inverting, division. That's a compile time constant. And there's a Context object for runtime-configuration you can set with a precision (stop after `prec` digits).

But for addition, the idea is to give the complete number if you do `a + b`, otherwise you could use the context to keep the numbers within your `ctx.add(a, b)`. But after the discussions here, maybe this is too unsafe... and it should use the default precision (or a slightly larger one) in the name of safety? With a compile time flag to disable it? hmm...

AaronFriel · 2024-04-02T00:15:11

I'd strongly recommend against this default - it's a major blocker for using the Haskell library with web APIs as it transforms JSON RPC into into readily available denial of service attacks.

8 billion digits (~100 bits?) is far more than should be used.

Would it possible to use const generics to expose a `BigDecimal<N>` or `BigDecimal<MinExp, MaxExp, Precision>` type with bounded precision for serde, and disallow this unsafe `BigDecimal` entirely?

If not, I expect BigDecimal will be flagged in a CVE in the near future for causing a denial of service.

akubera · 2024-04-02T05:00:00

I think that's the use-case for the rust_decimal crate, which is a 96-bit floating number (~28 decimal digits) which is safer and faster than the bigdecimal crate (which at its heart is a Vec<u64>, unbounded, and geared more for things like calculating sqrt(2) to 10000 places, that kind of thing). Still, people are using it for serialization, and I try to oblige.

Having user-set generic limits would be cool, and something I considered when const generics came out, but there's a lot more work to do on the basics, and I'm worried about making the interface too complicated. (And I don't want to reimplement everything.) D

I also would like a customizable parser struct, with things like localization, allowing grouping-delimiters and such (1_000_000 or 1'000'000 or 10,00,000). That could also return some kind of OutOfRange parsing error to disallow "suspicious" values, out of range. I'm not sure how that to make that generic with the serde parser, but I may some safe limits to the auto serialization code.

Especially with JSON, I'd expect there's only two kinds of numbers: normal "human" numbers, and exploit attempts.

kccqzy · 2024-04-02T01:35:56

I think Haskell's warning-in-the-doc approach is not strong enough. I'd be in favor of distinguishing small and huge values using the type system. Have a Rust enum that contains either a small-ish number (the absolute value being 10^100 or less, but the threshold should be configurable preferably as a type parameter) or a huge number. Then the user will be required to handle it. Most of the time the user does not want huge numbers, so they will fail the parse explicitly when they do a match and find it.

akubera · 2024-04-02T05:12:33

That seems to be the sentiment here. I'll take it into consideration. Thanks.

CJefferson · 2024-04-02T07:48:21

I don't think there is any "sensible limit" which is big enough for everyone's needs, but low enough you won't blow out memory.

An 8 billion digit number is 2.5G? (Did I do my maths right?) All I need to do is shove 1,000 of those in a JSON array, and I'll cause an out-of-memory anyway.

On the other hand, any limit low enough that I can't blow up memory by making an array of 100K or so is going to be too low for some people (including me, I often make numbers of low-million numbers of digits).

Providing some method of putting a limit on seems sensible, but maybe just make a LimitedBigDecimal type, so then through the whole program there is a limit on how much memory BigDecimals can take up? (I haven't looked at the library in detail, sorry).

im3w1l · 2024-04-02T10:08:26

If I understand the situation correctly, in Haskell an unbounded number is the default that you get if you do something similar to JSON.parse(mystr). That means you can have issues basically anywhere. Whereas in Rust with Serde you would only get an unbounded number if you explicitly ask for one. That's a pretty major difference. Only a small number of places will explicitly ask for BigDecimal, and in those cases they probably want an actual unbounded number. And they should be prepared to deal with the consequences of that.

My 2cent anyway.

kccqzy · 2024-04-02T16:36:41

Nope you didn't understand the situation correctly. First, almost nobody directly parses from a string to JSON AST: people almost always parse into a custom type using either Template Haskell or generics. Second, parsing isn't the issue; doing arithmetic on the number is the issue.

tome · 2024-04-02T21:11:09

Surely the generics approach would go via an aeson Value as an intermediate format, and thus possibly store an unbounded Scientific.

kccqzy · 2024-04-03T17:31:49

Storing it isn't the problem.