
Unum Computing: An Energy Efficient and Massively Parallel Approach to Numerics - bane
http://www.slideshare.net/insideHPC/unum-computing-an-energy-efficient-and-massively-parallel-approach-to-valid-numerics
======
simonbyrne
Are there any more details on unums available online (i.e. without purchasing
his book)? The presentation is quite heavy on promotion, but a bit sparse on
details.

I also note that he dedicates a page on his website to "Gustafson's law".
[http://johngustafson.net/glaw.html](http://johngustafson.net/glaw.html)

~~~
methodOverdrive
Not many. I did go ahead and purchase his book. It's... kind of a weird read,
honestly. Useful - it clarified some things about the proposed Unum format -
but also written very casually, like "pop science" prose, which seems
inappropriate - who would buy a book about a floating-point number format if
they didn't want a dry, boring book full of technical details? Some of the
arguments are made as though to convince a non-technical audience - maybe
Gustafson wants managers (or former engineers who haven't worked as engineers
for a long time) to read his book, but I think it would have been better to
publish all of the details without any fluff, first.

Also worth noting that half of the book is about the "ubox" method for solving
optimization problems - also cool, but may be overkill if you are just
interested in the numeric format itself. Personally, I've been working on an
implementation of the format that I can toy around with - I have no real
interest in learning a lot about the cool algorithms I could do with it until
I can show myself that it works for basic arithmetic, etc, as well as the
author claims.

Gustafson also makes the code available (I think
[here]([https://www.crcpress.com/The-End-of-Error-Unum-
Computing/Gus...](https://www.crcpress.com/The-End-of-Error-Unum-
Computing/Gustafson/9781482239867\))). It's Mathematica code... there is a
free viewer for that format (if you don't have Mathematica) which can print
out a PDF with richly formatted equations.

Also, Googling for that link led me to this [Python implementation someone
whipped
up]([https://github.com/jrmuizel/pyunum](https://github.com/jrmuizel/pyunum)).

~~~
davisourus
Some would call it an enjoyable read. Definitely not the typical stodgy format
of papers, but addressing the reader and throwing a few jokes in doesn't hurt
the technical validity. He wrote it so it could be accessible and defends it
as such in the prologue. Dry papers get passed over these days, I'm afraid.

~~~
methodOverdrive
That's true - and I did enjoy it. I'm biased - still in university, so I'm
used to dry papers. The lack of stodginess wouldn't have bothered me if I had
been able to obtain key details on the format for free. I was a little annoyed
to have to buy a 50-dollar paperback book instead of just downloading a short
paper via the college library, though - I was convinced of the potential
benefits of the format by one of Gustafson's earlier presentations and want to
help in the efforts towards software/hardware implementations, so I didn't
need enhanced accessibility to remain interested in the book.

------
cossatot
Are there architectural or other challenges to building processors for this?
How compatible are unums with modern processors? Would LAPACK et al have to be
re-written?

~~~
trsohmers
It requires additional complexity than what is in modern FPUs, but it arguably
more efficient when actually operating due to being able to have the same
accuracy while using fewer bits.

Most people don't realize that it is the data movement that is most expensive
thing in a processor... It takes 100 picojoules to do a double precision (64
bit) floating point operation, but a humongous 4200 picojoules to actually
move the 64 bits from DRAM to your registers. The really crazy thing is that
around 60% of that power used to move the data is wasted in the processor
itself, in the logic powering the hardware cache hierarchy. My startup
([http://rexcomputing.com](http://rexcomputing.com)) is solving this with our
new processor, and are working with John Gustafson in experimenting with unum
for future generations of our chip.

~~~
semi-extrinsic
There's an even worse disparity in the amount of time required for those two
processes. How is that ratio affected by your technology?

~~~
trsohmers
We bring the ratio down as well... A load/store to a cores local scratchpad
(Our software managed and power efficient version of a traditional L1 cache)
is 1 cycle, compared to 4 cycles for an Intel processor. Add in the fact that
we have 128KB of memory per scratchpad (compared to 16 to 32KB L1 D$ for
Intel), you don't need to go to DRAM as much, greatly increasing
performance/throughout on top of the 10x+ efficiency gain.

Even in the case of a core access accessing another cores local scratchpad
when they are on opposite corners of the chip, it takes only one cycle per hop
on the Network on Chip... meaning for our 256 core chip, you can go all away
across the chip (and access a total of 32MB of memory) in 32 cycles... Less
than the ~40 cycles it takes to access L3 cache on an Intel chip.

~~~
Scaevolus
Comparing a 1 cycle scratchpad latency to Intel's 4 cycle L1 latency is
misleading. Are you making chips that operate at up to 4GHz, or is this just
128KB of local SRAM attached to a ~1GHz core?

~~~
trsohmers
If you care about efficiency, then you are burning significantly more energy
to run at 4GHz, and are still moving the same amount of data in and out of
your local memory. If that is your bottleneck, then you are running at 4x the
clock speed for no real gain, as your memory can't keep up with the speed of
your functional units.

But to answer your question directly, we are targeting 1GHz
conservatively...we think it could do more, but as we are focused on
efficiency, we think it is a good middle ground between performance and energy
usage. We'll be able to make a more informed decision (and possibly change
that) when we have silicon in hand.

------
yoklov
The big issue I see with these is that the number of bits is dependent on the
stored value. It's also not a power of two. This has a lot of problematic
consequences. Indexing into a list of them wouldn't be constant time, for
example. You'd need to unpack them (into fixed size unums?) first.

That said, information here is sparse and I'm not an expert on numerical
computing (although I do graphics at work and know _some_ about the subject).

~~~
dnautics
For any unum system there's an 'archtecture spec' that puts parameters on the
maximum sizes of the exponent and mantissa. You can have values that consume
less bits, the format contains a tag that says how many bits you are consuming
for each part of your number, a lower size indicates for exact values that
less size is needed; for uncertain values it means the uncertainty is higher.

You could pad your values to achieve constant time indexing.

~~~
davisourus
This is exactly right - there's an environmental configuration. Think of the
way it's ridiculous to play angrybirds with 32 or 64bit precision on your
phone, you could set the precision appropriately. Or on the other side, think
of how IEEE lops off data from every multiply or divide (any precision)
without any point of reference for uncertainty.

------
amelius
Slide 21:

> I have been unable to find a problem that breaks unum math.

Then perhaps try (x+y) != x, where x is a very large, and y is a very small
positive number.

~~~
jlgustafson
Suppose x and y are exact numbers. Then x+y is represented as the open
interval (x, x+ULP) where x+ULP is the smallest representable exact unum
greater than x. Since x is disjoint from the open interval (x, x+ULP), it
satisfies the inequality.

Folks, if you don't want to buy the book, Amazon lets you do a "look inside
the book" that gets enough introduction to explain the unum format, for free.

------
haddr
This is funny, but even Mathematica gives bad answer for that equation from
slide 25...

~~~
jlgustafson
There are actually quite a number of places where Mathematica gives a wrong
answer and unum math does not! For example, if you ask Mathematica to find all
real values of x for which 1 == 1, it returns the empty set. Unum math
correctly returns the entire real number line.

~~~
bane
Great! Welcome to HN. Since you're here and I posted your slides, I'm
wondering what you see as the most significant challenges towards
implementation of this idea and what kind of performance delta (plus or minus)
we might see over the existing standards?

------
stephencanon
> Complete representation of _all_ real numbers using a finite number of bits

He's 1/3 of the way to three impossible things before breakfast by slide 15.

~~~
dnautics
All real numbers are representable, but not necessarily to arbitrary
precision.

You should really read the book. A lot of the atomic operations seem
cumbersome, because the unum doesn't have a fixed size representation, but you
just get over that when you remember that the real problem is shuttling data
over buses. More compute transistors is not a problem.

~~~
Lawtonfogle
If I had a number system of -100, 0, and 100, could I claim all real numbers
are represented, just not to arbitrary precision? With n bits, there are 2^n
possible representations max. If you can say all real numbers are represented
for n around 30, or 60, or even 1,000,000, then can't you say the same for n
equal to 1 or 2?

Now I think we can say some representations are better than others (at least
for certain applications). It may even be possible that some representation
with a given n can be better in all cases than a representation with a larger
n. So an n equals 1 system will obviously be terrible compared to many systems
with n around 30 or 60. But I'm not really concerned with that point, just the
claim of being able to represent all real numbers.

~~~
dnautics
Sure, you can claim that but it doesn't mean squat unless your underlying
circuits correctly process operations (*,+,/etc) on the representation in a
way that is faithful to your claim.

In the book he talks about a simple unum with values -inf, less than -2, -2,
between -2 ... -1, -1, between -1 ... 0, 0, 0...1, 1, 1...2, 2, > 2, and inf,
also intervals like 0...2, -2...0 etc 1...inf, emerge when you intentionally
use less precision.

This is a mathematically closed system that represents all real numbers and is
even useful!

------
nickpsecurity
If it delivers on slides' claims, it's pretty awesome stuff. All the floating
point nonsense has always aggravated me. I dodged it where I could since I
wasn't doing scientific computation. I appreciate any improvements, especially
efficient.

