Hacker News new | past | comments | ask | show | jobs | submit login
Unum Computing: An Energy Efficient and Massively Parallel Approach to Numerics (slideshare.net)
57 points by bane on July 24, 2015 | hide | past | web | favorite | 54 comments

Are there any more details on unums available online (i.e. without purchasing his book)? The presentation is quite heavy on promotion, but a bit sparse on details.

I also note that he dedicates a page on his website to "Gustafson's law". http://johngustafson.net/glaw.html

Not many. I did go ahead and purchase his book. It's... kind of a weird read, honestly. Useful - it clarified some things about the proposed Unum format - but also written very casually, like "pop science" prose, which seems inappropriate - who would buy a book about a floating-point number format if they didn't want a dry, boring book full of technical details? Some of the arguments are made as though to convince a non-technical audience - maybe Gustafson wants managers (or former engineers who haven't worked as engineers for a long time) to read his book, but I think it would have been better to publish all of the details without any fluff, first.

Also worth noting that half of the book is about the "ubox" method for solving optimization problems - also cool, but may be overkill if you are just interested in the numeric format itself. Personally, I've been working on an implementation of the format that I can toy around with - I have no real interest in learning a lot about the cool algorithms I could do with it until I can show myself that it works for basic arithmetic, etc, as well as the author claims.

Gustafson also makes the code available (I think [here](https://www.crcpress.com/The-End-of-Error-Unum-Computing/Gus...). It's Mathematica code... there is a free viewer for that format (if you don't have Mathematica) which can print out a PDF with richly formatted equations.

Also, Googling for that link led me to this [Python implementation someone whipped up](https://github.com/jrmuizel/pyunum).

Some would call it an enjoyable read. Definitely not the typical stodgy format of papers, but addressing the reader and throwing a few jokes in doesn't hurt the technical validity. He wrote it so it could be accessible and defends it as such in the prologue. Dry papers get passed over these days, I'm afraid.

That's true - and I did enjoy it. I'm biased - still in university, so I'm used to dry papers. The lack of stodginess wouldn't have bothered me if I had been able to obtain key details on the format for free. I was a little annoyed to have to buy a 50-dollar paperback book instead of just downloading a short paper via the college library, though - I was convinced of the potential benefits of the format by one of Gustafson's earlier presentations and want to help in the efforts towards software/hardware implementations, so I didn't need enhanced accessibility to remain interested in the book.

If you are not in HPC, you may not have heard about Gustafson's law, but it is a legit and important thing... His 1988 paper is still a very worthy read (And winner of the first Gordon Bell award for showing a 1000x speedup, which was previously thought impossible)

Fair enough: a pity he doesn't explain what it is though.

Wikipedia is your friend... https://en.wikipedia.org/wiki/Gustafson%27s_law

Would be happy to explain it further, but about to get on a flight.

Are there architectural or other challenges to building processors for this? How compatible are unums with modern processors? Would LAPACK et al have to be re-written?

It requires additional complexity than what is in modern FPUs, but it arguably more efficient when actually operating due to being able to have the same accuracy while using fewer bits.

Most people don't realize that it is the data movement that is most expensive thing in a processor... It takes 100 picojoules to do a double precision (64 bit) floating point operation, but a humongous 4200 picojoules to actually move the 64 bits from DRAM to your registers. The really crazy thing is that around 60% of that power used to move the data is wasted in the processor itself, in the logic powering the hardware cache hierarchy. My startup (http://rexcomputing.com) is solving this with our new processor, and are working with John Gustafson in experimenting with unum for future generations of our chip.

There's an even worse disparity in the amount of time required for those two processes. How is that ratio affected by your technology?

We bring the ratio down as well... A load/store to a cores local scratchpad (Our software managed and power efficient version of a traditional L1 cache) is 1 cycle, compared to 4 cycles for an Intel processor. Add in the fact that we have 128KB of memory per scratchpad (compared to 16 to 32KB L1 D$ for Intel), you don't need to go to DRAM as much, greatly increasing performance/throughout on top of the 10x+ efficiency gain.

Even in the case of a core access accessing another cores local scratchpad when they are on opposite corners of the chip, it takes only one cycle per hop on the Network on Chip... meaning for our 256 core chip, you can go all away across the chip (and access a total of 32MB of memory) in 32 cycles... Less than the ~40 cycles it takes to access L3 cache on an Intel chip.

Comparing a 1 cycle scratchpad latency to Intel's 4 cycle L1 latency is misleading. Are you making chips that operate at up to 4GHz, or is this just 128KB of local SRAM attached to a ~1GHz core?

If you care about efficiency, then you are burning significantly more energy to run at 4GHz, and are still moving the same amount of data in and out of your local memory. If that is your bottleneck, then you are running at 4x the clock speed for no real gain, as your memory can't keep up with the speed of your functional units.

But to answer your question directly, we are targeting 1GHz conservatively...we think it could do more, but as we are focused on efficiency, we think it is a good middle ground between performance and energy usage. We'll be able to make a more informed decision (and possibly change that) when we have silicon in hand.

If you have 4 GHz of processing IEEE but getting a wrong answer at the end, how do you compare that to a smaller number of FLOPS with a rigorously-bounded answer? I'm not saying they're apples to oranges, I'm saying that you can't just compare the quality of two foods based on calorie count alone.

Smart that you all used a scratchpad. I've been encouraging their use for years now for the higher efficiency and predictability.


The same way as you store some "meta data" by adding x10^23 to a value of 6.022 ... here you are adding meta data of precision (certainty) and resolving a ton of "NaN" and error issues with IEEE.

My understanding is that it's not compatible with modern processors, and new hardware would be required to efficiently execute it.

It seems to be a superset of floating point (and fixed point?) though, so maybe LAPACK could still work? I may be misreading or misunderstanding this however.

New hardware would be needed. I think libraries like LAPACK basically are fast because they do a good job taking advantage of the hardware implementation of floating point math, and are written with sufficient understanding of the format to mitigate approximation errors. Therefore you would probably need to rewrite huge parts of such libraries to use the new Unum hardware or software - though doing so might be relatively straightforward - as you say, Unum can be seen as a superset of floating point and it does support all of the same operations.

The big issue I see with these is that the number of bits is dependent on the stored value. It's also not a power of two. This has a lot of problematic consequences. Indexing into a list of them wouldn't be constant time, for example. You'd need to unpack them (into fixed size unums?) first.

That said, information here is sparse and I'm not an expert on numerical computing (although I do graphics at work and know some about the subject).

There is a whole chapter on "fixed-size unums" and the issues of making them just as easy to index as arrays as are floats. It really is not an obstacle; it's just more work for the CPU to manage pointers and variable sizes, very similar to the way computers made the transition to displaying variable-width fonts in the 1980s at a time when many asserted that it would be horribly difficult and inefficient and require pixel-accurate placement instead of landing letters in 6x8 pixel blocks.

For any unum system there's an 'archtecture spec' that puts parameters on the maximum sizes of the exponent and mantissa. You can have values that consume less bits, the format contains a tag that says how many bits you are consuming for each part of your number, a lower size indicates for exact values that less size is needed; for uncertain values it means the uncertainty is higher.

You could pad your values to achieve constant time indexing.

This is exactly right - there's an environmental configuration. Think of the way it's ridiculous to play angrybirds with 32 or 64bit precision on your phone, you could set the precision appropriately. Or on the other side, think of how IEEE lops off data from every multiply or divide (any precision) without any point of reference for uncertainty.

Slide 21:

> I have been unable to find a problem that breaks unum math.

Then perhaps try (x+y) != x, where x is a very large, and y is a very small positive number.

Suppose x and y are exact numbers. Then x+y is represented as the open interval (x, x+ULP) where x+ULP is the smallest representable exact unum greater than x. Since x is disjoint from the open interval (x, x+ULP), it satisfies the inequality.

Folks, if you don't want to buy the book, Amazon lets you do a "look inside the book" that gets enough introduction to explain the unum format, for free.

This is funny, but even Mathematica gives bad answer for that equation from slide 25...

There are actually quite a number of places where Mathematica gives a wrong answer and unum math does not! For example, if you ask Mathematica to find all real values of x for which 1 == 1, it returns the empty set. Unum math correctly returns the entire real number line.

Great! Welcome to HN. Since you're here and I posted your slides, I'm wondering what you see as the most significant challenges towards implementation of this idea and what kind of performance delta (plus or minus) we might see over the existing standards?

I suppose that's the consequence of the ubiquity of IEEE floating point standard... I worry that we will have to live with that for some time still...

> Complete representation of all real numbers using a finite number of bits

He's 1/3 of the way to three impossible things before breakfast by slide 15.

All real numbers are representable, but not necessarily to arbitrary precision.

You should really read the book. A lot of the atomic operations seem cumbersome, because the unum doesn't have a fixed size representation, but you just get over that when you remember that the real problem is shuttling data over buses. More compute transistors is not a problem.

If I had a number system of -100, 0, and 100, could I claim all real numbers are represented, just not to arbitrary precision? With n bits, there are 2^n possible representations max. If you can say all real numbers are represented for n around 30, or 60, or even 1,000,000, then can't you say the same for n equal to 1 or 2?

Now I think we can say some representations are better than others (at least for certain applications). It may even be possible that some representation with a given n can be better in all cases than a representation with a larger n. So an n equals 1 system will obviously be terrible compared to many systems with n around 30 or 60. But I'm not really concerned with that point, just the claim of being able to represent all real numbers.

Sure, you can claim that but it doesn't mean squat unless your underlying circuits correctly process operations (*,+,/etc) on the representation in a way that is faithful to your claim.

In the book he talks about a simple unum with values -inf, less than -2, -2, between -2 ... -1, -1, between -1 ... 0, 0, 0...1, 1, 1...2, 2, > 2, and inf, also intervals like 0...2, -2...0 etc 1...inf, emerge when you intentionally use less precision.

This is a mathematically closed system that represents all real numbers and is even useful!

No. The set of "representable" (i.e. computable) numbers is countable (because the set of programs is countable), so in fact almost all real numbers cannot be represented.

I got the impression that the unum explicitly modeled a range of values, so I'm not sure that applies.

Not an expert here though.

Regardless if it is representing number, ranges, or a mix of both, there are still 2^n maximum possible representations for n bits. Perhaps you could create a mix of numbers and ranges equal to (2^n)-2 and then add one range that is from the lowest number represented to negative infinity and add another that does the same for positives. But I can do that system with -, 0, +, NAN. See, I just represented all numbers. It isn't as useful, but boy are there savings on storage, cost, and computation time.

Unums shouldn't be seen as better than ints, longs, floats, or doubles anymore than a double is seen as better than an int. They have different uses and strengths, and there are problems where none are a good choice. Perhaps unums are the best choice for some problems. Perhaps they are clearly and significantly better in some areas. But they should be thought of as an alternative to the existing formats and not a superior replacement.

I would argue that unums are a "superior replacement" for doubles in many cases, though: in the case that you support unums that are "wide" enough, you can represent doubles exactly, plus you have additional values, plus some nice rules about when approximation error occurs/is propagated and not as many bits need to be stored or moved around on buses. It'll be a while before there's an implementation anywhere near as fast as existing FPUs, but Gustafson makes a good argument for his format. Personally, I'm more interested in the correctness benefits than the space/power/time savings - even if unums are never faster than 64-bit floats, they present an interesting way to do real-number arithmetic and my brief exposure to them leaves me much more confident that I could write numerical algorithms correctly than with doubles - I do numerical/statistical algorithms with doubles in my work, and it's really a pain to reason about things that the "uncertain" (open-inverval) values of the unum format would greatly simplify.

They're also an inferior replacement in the case that you want to take advantage of highly-optimized hardware, and that getting a correct answer doesn't really matter. I don't see unums replacing floats for, say, video game graphics. But for numerical computation, it seems like the only real flaw with unums compared to doubles is the nonexistence of a hardware implementation, and the existing popularity of doubles.

Agreed. For neural networks. I would argue the opposite is true, you should just have a 16bit float that casts really large values to infinity silently without throwing errors, with a logistic lookup that maps "inf" to +/- 1... A mathematically incorrect float is operationally superior to the correct one.

Bonus points if values near 0 are treated as 0 (encourages sparsity!)

For hardware without a FPU, you can do floats in software when accuracy is more important than speed (vs fixed point), one can imagine that if this thing actually works then for hardware without a "UPU" one can just do unums in software when needed.

That's fine, but then you're representing dyadic intervals, not real numbers. Simple question: what's unum[sqrt(2)]?

Rougly speaking It would be 1.414... + plus a bit that indicates the result is between two adjacent exact values. This bit would not be set if the result were exact.

It's not the same as dyadic interval math, it's kind of a hybrid.

And you could set the environmental variables to be as precise as you need. If you need 29 digits of precision, you can quickly show that you know 1.4142135623730950488016887242 and nothing more.

A reasonable metaphor would be that you cannot draw all of the Mandelbrot diagram, but you can render any piece of it to any resolution you choose. Being able to describe it precisely, and to know the limits of your accuracy, is useful.

I am not sure that I understand what you're saying. Given a charitable reading of the presentation, the author seems to be saying that his standard explicitly specifies range of values, as opposed to the IEEE Float. It did not seem like he was saying that he could explicitly represent an infinite amount of exact, distinct values using a finite number of bits. Low level data structures are not my area of expertise, so did I misunderstand what he was saying or what you are saying?

Under that reading, he's saying nothing: just represent the interval [-inf, inf] as "0" and call it a day.

So assuming that he's saying anything at all, he's at least being imprecise, and the actually claim should be something like "represent any dyadic interval using a finite number of bits".

He's a bit showy about the format. Wish he would just put out a technical paper.

Anyway, I guess his motivation might be "you can represent any real number (with finite bits, and therefore finite precision)". In the book, he presents an interesting case: little 4-bit versions of the Unum that can represent:

-inf, (-inf, -2), -2, (-2, -1), (-1, -1/2), -1/2, (-1/2, -0), -0, 0, (0, 1/2), 1/2, (1/2, 1), 1, (1, 2), 2, (2, inf), inf.

Putting together a pair of them, the book outlines simple interval arithmetic (where pairs of numbers can represent any interval between numbers on the line above, and single numbers can represent some of the closed intervals as above). The reason these are kind of neat is that using the standard Unum algorithms (without any fudging), you can get "correct" (albeit terribly imprecise) results for many real number computations. Questions like "is there a number satisfying a numerical predicate in some range?" or the value of a trigonmetric or exponential expression will come out "correct" (but you might get an answer like (-inf, inf)). If things work out as well as he claims (and demonstrates for some cases), then you can basically do the math to figure out how precise you want to be and choose an appropriate specialization of the format - or take advantage of the format's flexibility and do computations starting at a low precision and increasing precision until you are satisfied. In particular, it's kind of cool that you can do computations with little 8-bit intervals, and possibly circumvent doing more expensive computations (e.g. if you test if a property will hold anywhere in the Unum range and it won't hold anywhere, assuming you (and Gustafson) have done the math right, you can avoid doing more expensive checks with increased precision).

Anyway, point is, the presentations are kind of flashy and misleading - and you're right, you can't represent any real number (just finitely representable dyadic intervals)... but the format itself _does_ seem promising...

If I add (1/2, 1) to (1, 2), then I should get (3/2,3), which is not representable with any of the above values. So what is the result and in what sense is it correct?

As I understand it, it's just "floating" interval arithmetic, so like in interval arithmetic you should understand a representation [a,b] as 'The number is contained in the interval [a,b]'. In other words, interval arithmetic with carefully assigned floating point limits such those don't violate the interval arithmetic and are concise.

Not sure how it would go, I think it depends if you maintain the precision or increase it after this operation. I guess 2 bit: (1/2,1)+(1,2)=(1,inf) (?) and 4 bit (2 exp and 2 mantissa?): (1/2,1)+(1,2)=(3/2,3). Maybe there's a built in check to compare how short your interval can get and stop at a reasonable precision (in this case there's no point going for more than 4 bits).

Honestly I find it quite elegant and is at least trying to solve a big issue with bandwidth limitations. I do wish the exposition was more clear/straightforward.

I think the correct answer is (0,inf) which is a value you can get by dropping to a lower resolution within the same unum environment.

Ah, that makes sense. The 4-bit representations do not suffice by themselves, but imply the support of some or all of the smaller representations as well, and I guess you always need a representation for [-inf,inf].

First of all, a correction: the 4-bit unums can represent any of

-inf, (-inf, -2), -2, (-2, -1), (-1, 0), 0, (0, 1), (1, 2), 2, (2, inf), inf, and both quiet and signaling NaN. They do not represent ±1/2 or use ±1/2 as an endpoint.

If you add, say, 1 to (1,2), you get (2, inf). The open interval means it does not CONTAIN infinity, but ends at a finite value too large to represent. If the largest positive real you can represent is 2, then (2, inf) is a mathematically correct answer.

Your list should include -1 and 1 as well, shouldn’t it?

Edit: here was the slide http://i.imgur.com/0eQvvK1.png

Those would be among the 3-bit unums.

D'oh, missed the NaNs and wrote out a list without referring to the book... then checked the length of the list to make sure it had 8 things. Silly me!

If it delivers on slides' claims, it's pretty awesome stuff. All the floating point nonsense has always aggravated me. I dodged it where I could since I wasn't doing scientific computation. I appreciate any improvements, especially efficient.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact