Hacker News new | past | comments | ask | show | jobs | submit login
Toward an API for the Real Numbers (acolyer.org)
171 points by azhenley 5 months ago | hide | past | favorite | 81 comments



Fun fact: The Boehm of this paper is the same Boehm as the garbage collector

Exact Reals are an interesting topic. They go beyond fractions and and algebraic numbers by representing numbers as some kind of object that supports querying an arbitrary number of digits. This object may be a stream, generator, or function. There is some similarity between its implementation techniques and those of automatic differentiation. The theory behind exact reals goes quite deep.

Some other interesting things to look at: - https://hackage.haskell.org/package/exact-real - https://www.youtube.com/watch?v=h7g4SxKIE7U - https://github.com/dpsanders/ExactReals.jl


> Boehm

Also the creator of the rope data structure for storing and efficiently mutating large strings (used by many text editors); as well as the c memory model.


Simon Tatham has an exact real calculator called “spigot” https://www.chiark.greenend.org.uk/~sgtatham/spigot/


The API itself is perhaps less interesting than the implementation strategy. I'm presently developing a C library for exact real and complex numbers (http://fredrikj.net/calcium/) that goes much further to make equality decidable even in complicated cases while also enabling efficient computations. Still working on implementing functionality and writing up a formal paper.


Looks like a cool library. I'm curious why the Android calculator app needs reliable exact equality tests. It doesn't appear to have an equality test function.

Edit: the paper claims that you need exact equality tests to avoid printing unnecessary zeros at the end of decimal expansions, among other things.

Interesting that this work was done by Hans-J. Boehm, better known for the widely used Boehm garbage collector.


The equality test is equivalent to deciding if something is 0. You can probably ask the app to calculate sqrt(2) * sqrt(2). If that doesn't show a 2 as answer, you've lost. Same goes for sin(arcsin(1/2)). You want the exact answer.


Equality is also needed if you want to know whether "x<y".

Proof: 1) suppose you can decide whether "x<y" for all x,y. 2) Then, for all x,y you can decide whether x==y because that's just "!(x<y) && !(y<x)"


Quote: "Operations complete in under a millisecond, not GFLOPS performance, but certainly good enough for many applications".

Then you have a scientist reading this, wants it for research and then wonders why a simple 3 lines in Python code takes 3 days to complete. True story!


If these 3 lines of Python give you correct results...

I keep wondering and never have this answered to my satisfaction: if so many people are just using floating point numbers in scientific calculations, and arguably even most scientists don't understand FP math any way better than the users of Android's Calculator app, then how many bad results are out there, that could be attributed to FP errors?

Related: I don't understand why the default in scientific computing isn't some kind of scheme with error band propagation. What I mean is, when dealing with measurements, the result of 2.1 * 2.12345 isn't 4.459245, because the result exceeds the precision of the first factor. I'd like a common API that handles this, whether implicitly (through notation) or explicitly (through requiring you to state the precision of every number you use).


It seems strange that they don't use a rational type, those are much easier for "normal" people to reason about.

What you want would maybe be better as probability distributions (maybe parameterized with rational numbers)


Probability distributions are a related concepts, and I'd love to work with them as first-class objects. But just carrying the confidence intervals around would be a huge win; even ignoring the underlying distribution, it matters whether your result is 1.23 ± 0.0001, or 1.23 ± 20.


In "safe" languages we often talk about how you can't ignore errors in return values. Your program "leaks" if that happens. But with integer overflows and floating point precision, we just ignore the leaks as something that programmers already know about and have to consider while coding. Of course, it's not an easy problem to deal with, as computers don't have infinite memory and they need to be performant, but we really need to spend more time thinking about and supporting improvements like this. If it only takes 6k lines to provide an API for more precise real numbers, then almost every programming language should have it. The ideas of allowing programmers to get information about the real and potential precision errors in operations are also interesting. Well, in general the ability to define the kind of numbers you want to be working with and their precision is fairly poor in computer science as a whole, there's still a long road ahead.


> the ability to define the kind of numbers you want to be working with and their precision is fairly poor in computer science...

It's poor because it's undecidable. We would ideally want to work with the real numbers, but no computational model can represent all real numbers (by a trivial cardinality argument).

Obviously, we can make useful progress in practice and evolve beyond the floating point representation (at least for general tasks), similar to what we did for other theoretically undecidable problems (like garbage collectors), but let's not pretend it's something that's easy to fix.

The paper is interesting and talks about very real problems, but it's still a far cry from something that can be used in production.


> but no computational model can represent all real numbers (by a trivial cardinality argument)

No computational model can represent all integers, either, by a simple "number of atoms in the universe" argument.

Cantor isn't the problem here.

----

Separately, I'm almost certain that exact real arithmetic is being used in Houdini to do boolean intersections. That's as "production" as you can get.


By the same token computers are finite state machines? Yes, technically they are, but it's not really useful to conceptualise them this way.

It's relatively easy to make a model that works essentially the same as the actual integers (most modern programming languages have a built-in or stdlib variant of bignum), while it's prohibitively difficult to do the same with real numbers. And yes, computability is in fact the difference here, because as long as you are able to keep up the illusion of infinite memory, the integer version problem is trivial, but the real version stays just as hard.

-- edit answering the edit: I'm not familiar with that application, but if I'm not mistaken graphical software uses a different model than the one outlined in the OP, based on algebraic approximations. I'm not saying this stuff doesn't exist, obviously. I was talking about the general rationale behind the OP, i.e. using a model like this one instead of floating point computation for generic software.


The issue is one of representation and algorithms. For integers, you can provide a simple representation together with simple algorithms for addition, subtraction, multiplication, division with modulo, printing in decimal form, comparison which one is larger, etc.

For real numbers, we can come up with precise representations as well. Basically ASTs of permitted operations like addition, root of this polynomial, integral of this function, etc. If your set of permitted operations is large enough, you can represent any number that human language can encode. But operations on those representations are non-trivial.

Basically, for reals there is no representational space which is easy for "powerful" operations to navigate. For integers, there is the n-ary representation. But even that representation makes some operations non-trivial. What is the next larger prime number? What are a given integer's prime factors? Switching the representation to represent integers through their prime factors makes answering that question easier, but then suddenly other questions become non-trivial like addition of two numbers.

None of the two issues has anything to do with cantor.


> no computational model can represent all real numbers (by a trivial cardinality argument).

I agree with your points about this problem being difficult. But the cardinality argument is neither here nor there. The cardinality argument is based on classical logic, and has no relevance to computation.

In constructive mathematics it is perfectly possible that all real numbers can be represented by finite algorithms. And even that all functions defined on all the reals are continous: a finite constraint of the output can be computed from a finite constraint on the input.

But, starting from a definition of the reals as a Cauchy sequence we have a long way to go to get something which can be practically computed on. I think RealPCF [0] was a step in the right direction.

[0]: https://www.cs.bham.ac.uk/~mhe/papers/realpcf.pdf


Again, I don't disagree with any of this. The argument advanced by the OP is that floating point numbers are not intuitive to use for the general public because they don't work like the unique complete ordered field up to isomorphism.

All I'm saying with the cardinality argument is that because real numbers are fundamentally unlike integers, obvious methods of doing things won't work, and we are stuck (at least for the time being) with leaky abstractions.

I make no claim about the existence of better/more intuitive solutions. My only counterpoint is that while the ideas outlined in the paper are certainly valuable, I don't think the framework is mature enough to replace the default representation for real numbers in a programming language like Java, at least not without further work and/or proof that it can be adapted to a more general setting without disastrous performance decay.

> But, starting from a definition of the reals as a Cauchy sequence...

I trust that what you're saying is true, but I'm not familiar with that topic, I'm sorry. Thanks for sharing the paper.


> The paper is interesting and talks about very real problems, but it's still a far cry from something that can be used in production.

It’s used in production in the default Android calculator app. Does that not count?


It does count, and it's certainly "points" for the ideas there (not that the author needs my validation...)

But an API for a language like Java needs more generality than that. The use case was (relatively) simple, (relatively) predictable, and (relatively) self-contained.

Crucially, equality in general is still undecidable (it wouldn't play nice with Collections), and it's not obvious that numerical algorithms would in general play nice with it. I'm also not really convinced that having random functions that could either terminate or not in your API is really more intuitive than floating points, but de gustibus non disputandum est.

There's also the issue of performance in non-toy examples. With the world going more and more the route of ML, efficiency of numerical algorithms is not a secondary concern.


> it's not obvious that numerical algorithms would in general play nice with it

Perhaps. Some algorithms in numerical analysis are only "backwards stable" as opposed to forwards stable. This includes all possible algorithms for things like Singular Value Decomposition. See: https://mathoverflow.net/questions/369930/why-is-uncomputabi...

> With the world going more and more the route of ML, efficiency of numerical algorithms is not a secondary concern.

ML is going in the direction of very low precision floating point numbers, like 16 bit floats. High precision doesn't seem to matter there.


Surely working with computable reals is sufficient? There's no point in writing a computer program that attempts to work with non-computable reals, for obvious reasons. And computable reals are countable.


It certainly is. The point is, is it necessary? (To be really pedantic one could ask which computable reals, but let's assume the usual definition, i.e. the set of reals defined by a computable Dedekind cut.)

Although computable reals have many properties shared by the actual real numbers (e.g. they are a field, which is not that trivial of a property), they are also different in very significant ways, for example, they aren't computably comparable, they aren't computably enumerable, their order is not computable.

Also, real analysis works very differently on that set, for example, it is not guaranteed that the least upper bound of a bounded increasing sequence of elements of the computable reals is itself a computable real.

Even ignoring feasibility in practice, working with non-standard mathematical structures is very unintuitive, much more than floating point numbers can be. (Let alone reasoning about the time-complexity of operations in that setting!)


That may be a valid theoretical objection, but Java does not have a "least upper bound" function, so its computability is not of practical interest to designing a real number Java API.

Real Java programs produce numbers via arithmetic and finite applications of java.lang.Math functions. Those are the numbers that this API hopes to handle.


It does have Double::compare though, doesn't it?

The issue isn't writing Java methods that do stuff, the issue is writing Java methods that expose an API that can be reasonably used to model a real-world problem that uses real numbers.

I'm not saying it can't be done. Much in the same way as writing a garbage collector is theoretically an undecidable problem and yet we have no lack of garbage collectors around, I expect research in this topic to move forward and eventually produce algorithms and methods that can be put in production software.

The point I'm trying to make is that (yet again, pretty much like garbage collectors) you are serving multiple masters here. We want the API to be a reasonable approximation to the actual real arithmetic, while at the same time being efficient enough that it's usable in practice, while at the same time being intuitive enough to match people's expectation of how real numbers 'should' work.

Again, we already have a solution to that problem. It's called 'using floating points'. The case being made is that 'using floating points' is hard and prone to error, because it satisfies requirements (1) and (2) but fails spectacularly at (3).

Of course people with (relatively) sophisticated math knowledge are going to be able to work it out no matter what, the same way people are able to work with floating points right now, and very efficiently so. We all took Numerical Analysis in college. The issue is that the kind of problems raised by using floating points are hard to think about, require very specific knowledge, and cause bugs that are hard to even detect.

Because a model that considers all computable real numbers would be similarly unintuitive, we should restrict ourselves to a smaller set that works in a more intuitive way. How should that set work is a HARD problem.


> Because a model that considers all computable real numbers would be similarly unintuitive, we should restrict ourselves to a smaller set that works in a more intuitive way. How should that set work is a HARD problem.

I'd argue that we actually have a pretty good answer: decimal floating point satisfies all three of your requirements.


Equality of computable reals is undecidable.

https://mathoverflow.net/questions/40618/uncomputability-of-...


So what? Equality of floating point numbers is only "decidable" if you don't care about getting the right answer.

Likewise, equality of functions is also undecidable, but you never see people bringing that up in the context of Higher Order Functions.


> It's poor because it's undecidable. We would ideally want to work with the real numbers, but no computational model can represent all real numbers (by a trivial cardinality argument).

This is a fallacy. The standard model for Computable Real Analysis, Type Two Effectivity, supports computation over all real numbers. See: https://en.wikipedia.org/wiki/Computable_analysis#Real_numbe...

Essentially, in TTE, a real number is an arbitrary stream of digits that can come from any source. Therefore any real number can be computed on.

Additionally, the ability to decide equality of floating point numbers is an illusion. Rounding errors can make numbers that ought to equal each other not equal each other.


You can compute on any real number you're given, but IIUC that's not what "computability" means. By the cardinality argument above (there are more infinite streams of digits than finite streams of digits, in an important sense) there will exist streams of digits that you cannot produce by any program (regardless of how you encode it).


You can produce an arbitrary infinite stream using a random number generator, for instance.


The probability of generating any given infinite stream is 0, though.


But, assuming you have a true random number generator, so is the probability that a stream you generate is computable.


Yes.


Where is the fallacy? I don't doubt type-2 effectivity is used in the way you mention, but that wasn't what I was talking about. See my other reply.


> In "safe" languages we often talk about how you can't ignore errors in return values.

It’s pretty important to define what you mean by “error”. We should not conflate state or execution errors with approximation errors, they are very very different kinds of errors. It is quite often not an error condition to have an approximation, and so would not count as a “leak”.

> If it only takes 6k lines to provide an API for more precise real numbers, then almost every programming language should have it.

There are a lot of libraries in a lot of languages for supporting arbitrary precision, big numbers, and other various representations. I’d venture to say that almost every programming language does have it, if you count a third party or open source library as having it.

On the other hand, high precision reals is a performance drain of enormous magnitude if you don’t always need it. Do keep in mind that even with all their problems, 32 bit floats and 64 bit doubles are really fast compared to software number libraries, the IEEE formats have broad hardware support that doesn’t exist for high precision reals.


> There are a lot of libraries in a lot of languages for supporting arbitrary precision, big numbers, and other various representations. I’d venture to say that almost every programming language does have it, if you count a third party or open source library as having it.

Arbitrary precision is very different from exact real arithmetic.


Sure I agree completely and didn’t mean to suggest otherwise. Most languages have implementations of all the above - attempts at certain types of exact real, arbitrary precision, and big integers. It’s worth noting that this library is made out of big nums and rationals, so it tries to be exact when it can, but it can’t always, so it may not be accurate to call it exact real arithmetic. That’s why the title is “toward” real numbers and didn’t claim to be exact.

And I’m curious whether you meant that exact real arithmetic is different in such a way that it would change either of the points that I made? Does it mean that performance is good enough that all languages should have it built in? Does it mean that approximation errors should then be considered the same as execution errors?


> But with integer overflows

Some languages give exceptions here, unless specially told not to.


(sin⁻¹(cos⁻¹(tan⁻¹(tan(cos(sin(9))))))−9)^−1 produces "Timeout. Undefined?". That's the closest I could get to baffling or tricking it. (it's division by zero)


Is there a typo in that? Where's the division by zero?

Wolfram alpha has that as exactly equal to

    1/(3pi - 18)
https://www.wolframalpha.com/input/?i=%28sin%E2%81%BB%C2%B9%...


It's multi-valued, I suppose. tan^-1 (tan (x)) is equal to x plus any integer multiple of pi (or of 180°). Same for the other pairs of functions. But one of the values results in a division by zero.


You did it in radians. Try degrees.

Edit: (Or use 0.15708 instead of 9 in your link.)


cos is not the inverse of sin, cos(sin(9)) is not 9


I think the intended simplification is : (sin⁻¹(cos⁻¹(tan⁻¹(tan(cos(sin(9))))))−9)⁻¹ -> (sin⁻¹(cos⁻¹(cos(sin(9))))−9)⁻¹ -> (sin⁻¹(sin(9))−9)⁻¹ -> (9−9)⁻¹ -> 0⁻¹ -> 1/0


I see, right. This is curious.


Tangentially related, the stock Android calculator app is fantastic. It has good accuracy as mentioned in the article and an easy to explore and use history. It is one of the few things that will make me hunt around for my phone when I am working on my desktop.


As a refugee from Windows Phone I actually use Uno Calculator on Android, which is a port of Windows Calculator. It also has good accuracy and easy to explore history, and also sane interface without Android's idiosyncrasies. And you can install it on your presumably non-Windows desktop too.


Why? Your operating system of choice lacks a proper calculator on your desktop PC?


Yes. I use gnome personally and macOS at work and both default calculators are not nearly as nice to use.


Then it's probably a good time to write one yourself. Or use wine to run the one from Windows.


Another example of how to solve this problem is the rational number library used in the Microsoft Calculator. It is called “RatPak” and it is open source:

https://github.com/microsoft/calculator/blob/master/src/Calc...


Rationals are a strict subset of the reals.

There are also the irrationals, and most of those cannot even be expressed (fortunately, since they cannot be expressed, there's no chance of you bumping into them by accident - so it's somewhat of a moot point).

The subset that you want to aim for is not the rationals, but the definable reals: https://en.wikipedia.org/wiki/Definable_real_number.


The comments on that article are frustrating because they clearly didn't read it, by proposing that a rational type is sufficient.

"We solved this 40 years ago in SmallTalk!"


I am surprised not to see any mention of the numerical tower, used by some lisps. Did I just miss it?


Common Lisp and Scheme don’t have built-in support for exact real arithmetic


Doesn't the nature of the tower mean you could add it in and it would just slow down computations that required it? I'm guessing the dynamic nature of the languages is the factor in leaning on?

That is, both have the surprise for most people that (/ 1 3) doesn't get rounded or truncated by default, due to the default of supporting rationals. You have to coerce into a floating point, right? Couldn't you do the same with exact reals? Put them at the top of the tower and maintain existing projections through the tower?

Edit:. Note that I don't think this precludes the algorithms here. Just that a type system that could use the tower could be augmented to get here. If that makes sense. Hence why I would think it would be mentioned. It is the expectations that things project in one direction that folks want? Right?


Aren't there many applications where Java's BigDecimal is a simple and suitable implementation to avoid many of Float's problems?


What about NaN values? Are the runtime errors or do they propagate down the chain of execution? Are they comparable?


For a high-level API, shouldn't this be expressed with types such as a `Result<T,E>` at the library level?

edit: now that I think about it, that would require some really nice optional chaining syntax to be even slightly usable.


Oh now I'm curious if IEEE 754 NaN rules are isomorphic to a number optional type with monadic chaining syntax


Basically. And you can stick error codes in the value too, so you can know down the line what caused the NaN.


There are no NaN values in the real numbers.


You need some way to cope with operations that don’t have real results for some values, such as log and sqrt of negative numbers


you can just represent those as a pair of real numbers, one for the real part, and one for the imaginary part. As long as you only (recursively) apply exponential and trig functions on those numbers, I believe you'll stay within the realm of complex numbers.


True, but I thought this was about real arithmetic not complex arithmetic


0/0, tan(pi/2), log(0)


I was replying to the sqrt and log of negative numbers. For undefined functions, I'd throw an exception


The paper says the Java code throws exceptions in such cases.


Representing reals as fractions (rationals) goes a long way toward achieving this goal.


The point of the paper is that is nowhere near enough for common calculator operations like trigonometry or exponentials


The real numbers are notoriously non-equivalent to the rational numbers.


sure, but supporting rational numbers solves for 2/3 of the types of "real" numbers in a dramatically better way than just making less crappy decimal representations.


I haven't read the paper yet, but if I understand the post correctly, this interface represents numbers as the product of a rational and a recursive real, where the latter may be 1.


IEEE 754. If you're going to do any serious computation in floating point, serious meaning people's lives are going to depend on the outcome or a lot of money is on the line, then you MUST know what you're doing. We have standards in place so people can learn it, people creating math libraries can adhere to it, and those creating these kinds of applications can have confidence that things are going to work. I'm sorry addition isn't commutative, there are known ways for dealing with it. I took two courses in Numerical Computing in college. Trust me, the people working in this space are serious, and really, there's a few basic rules of the road you need to learn and you're on your way.


> I'm sorry addition isn't commutative...

My understanding is that in IEEE 754 compliant implementations of floating-point arithmetic, commutativity over "+" holds but not associativity. (With the possible exception to commutativity for the NaNs, though I don't remember offhand whether this caveat is necessary).


You are correct. It's associativity that doesn't hold, not commutativity. It really throws people off.


It's a lot more than "a few basic rules of the road". Why squander your attention budget on a numeric type with so many caveats and edge cases? What do you gain from it? If you actually need the performance of IEEE 754, then sure, use it, and have fun debugging all of the problems it has. But it's an extremely poor default for general-purpose programming.


I disagree. IEEE 754 is a fantastic solution if you want a no-brainer plug-and-play approximation of the real numbers.

Most people who complain really just want a rational number type.


Immaturity - it's the plague of our industry. The experts in floating point arithmetic, each having years of real-world experience, came together and hashed-out a standard embodying their collective experience. Yes, it's complicated. Yes, you need to learn it. No, you're highly unlikely to come up with anything better. Moreover, even if you did come up with something better it WILL have bugs, there WILL be edge cases that catch everyone by surprise. So instead of using a standard having 35 years worth of collective experience, whose edge cases are well-known, whose implementation is literally etched in silicon, and thousands upon thousands of libraries utilizing it - we'd rather start over and re-invent. That way we can spend the next 35 years uncovering a whole new and unknown set of issues. Sigh.


A cultish devotion to "performance" and a refusal to revisit tradeoffs that were for hardware literally 10 million times slower than today's systems is the real plague of our industry. Yes, IEEE 754 was a towering technical achievement in its day (though its use is marred in practice by an industry that implemented only the most problematic of its rounding modes and ignored the trapping mode that should have been the default as soon as hardware performance improved a little), no-one's denying that. But it's got a lot of sharp edges (that were necessary at the time; indeed IEEE 754 makes significant performance compromises compared to e.g. Cray non-754 floating point modes) that a mature industry would not expose by default.




Applications are open for YC Summer 2021

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: