
Computing 10,000X more efficiently (2010) [pdf] - luu
http://web.media.mit.edu/~bates/Summary_files/BatesTalk.pdf
======
hershel
Singluar computing have built a chip based on this tech, and deployed in a
military UAV. It enabled tracking of objects(which ws impossible before due to
power constraints), at 6400x the performance/power of other best known
methods.

[http://www.defensetechbriefs.com/component/content/article/1...](http://www.defensetechbriefs.com/component/content/article/17021)

[http://proceedings.spiedigitallibrary.org/proceeding.aspx?ar...](http://proceedings.spiedigitallibrary.org/proceeding.aspx?articleid=1693583)

From the paper: "The hardware is designed to perform high dynamic range
arithmetic with roughly 1% error per operation. Singular has developed and
studied varied algorithms for producing high quality results despite the
approximate hardware. These studies use a perfect simulation of the
accelerator’s arithmetic. Tasks that have been explored include summing
thousands of numbers without accumulating error, k-nearest neighbor
classification (KNN), foreground/background separation using Gaussian mixture
models, iterative reconstruction tomography, deblurring using the Richardson-
Lucy deconvolution algorithm, FFTs, radar processing, neural net learning, and
other tasks. Most of these algorithms need slight adaptations to prevent
cumulative effects of the 1% error, but with those adaptations all perform as
desired."

~~~
samstave
And these are precisely the things, which we know little, that the NSA may be
exploiting with extreme effectiveness that scares the shit out of me.

I recall an article from ___2005_ __that described a system that could track
__ _BULLETS IN MID FLIGHT_ __over a vast swath, in-the DARK, which provided
detailed view in all ballistics in a fighting theater....

~~~
m_darkTemplar
It's probably not of much use to the NSA where breaking encryption requires
precise math that will fail quickly if operations are off by 1%.

~~~
sophacles
I can however see lots of application in various pattern analysis techniques
based on machine learning. There's lots of places in those that a bit of fuzz
won't really matter, at least for first pass filtering.

Similarly voice recognition would probably have to deal with far larger errors
from the transmission and capture of sound anyway.

Just because it's not good for decryption, doesn't mean it isn't good for the
overall set of NSA operations.

~~~
klipt
Yeah, even human _brains_ can do voice recognition, and they're noisy as fuck.

------
valarauca1
This topic was already discussed albeit satirically in another publication.
Namely the, "The Slow Winter."[1] Which while comedic makes good points about
hardware architecture.

[1]
[https://www.usenix.org/system/files/1309_14-17_mickens.pdf](https://www.usenix.org/system/files/1309_14-17_mickens.pdf)

------
ced
Off-topic: if an integral is an approximation of a physical process, then that
process is an approximation of the integral. I.e. we could in principle have a
"chip" running a miniature lab experiment (like water flowing) with sensors
that would output "an approximate solution of the Navier-Stokes equations".
Numerical analysts would then write new versions of their algorithms assuming
"an efficient hardware solution of N-S". Has this concept ever been used
anywhere? I guess quantum computing would qualify.

~~~
smoyer
We used to use a carefully biased diode in the feedback loop of an op-amp to
create logarithmic signals. I was expecting the article to dispense with
digital electronics (in the core of the machine) and use tuned physical
properties to actually do the calculations.

What a disappointment that the first few slides imply such a break-through and
then we find that the research is simply about reducing precision to the
minimum needed. And it's patented? (rolls eyes)

~~~
foxhill
it's patented? 16 bit floats have been a thing for a long time.. :/

edit: never mind. somehow convinced myself that this was just 16 bit floating
point arithmetic. it is not.

~~~
smoyer
I wrote an IEEE single-point compatible floating point library for embedded
8088/8086 processors in the mid-to-late '80s ... my employer was very cheap!

On the other hand, I learned a ton about efficient ways to implement various
algorithms and remember a great way to do square roots that involved an
initial multiplication (using normalized binary FP numbers) and then converged
in about six iterations.

------
squigs25
I think human brains are probably wired for speed over accuracy too. This
might have a lot of potential in machine learning and eventually AI, where
information processing speed is often more important than accuracy.

~~~
daenz
If you believe in evolution and natural selection, then your functionality is
just fast and accurate enough to keep you from dying before mating. That's as
good as the optimization gets, except by random accident.

~~~
lkbm
My genes are more likely to be passed on for multiple generations if I survive
long enough to mate AND care for my children. If thy die at age five, my
mating did nothing.

~~~
AnimalMuppet
True, but his point still stands - if you believe in evolution and natural
selection, your brain is designed for "good enough fast enough" rather than
"truly correct" reasoning.

------
acd
There is probably an efficiency reason the human brain has 85 000 000 000(85
billion) neurons that operate with lossy compression operate in parallel
drawing less than 20W of power.

~~~
byerley
Probably more relevant is the greatly reduced clock speed (ignoring that the
brain doesn't so much have a centralized clock - neurons only fire at <1k Hz
compared to the ~2k Ghz processors used in clusters).

The technology also seems somewhat incomparable. The purely chemical signal
processing at the synaptic cleft presumably saves a lot of power.

~~~
agumonkey
maybe it's more a throughput thing, if the brain is massively parallel, it's a
giant pipeline with many things in flight all the time

------
throwaway_yy2Di
I think this guy successfully patented half-precision floating point. No
really, look:

[http://www.google.com/patents/US8150902](http://www.google.com/patents/US8150902)

What a troll!

------
ChuckMcM
This guy went on to found a company as I recall, which I think was then
acquired. Its a pretty clever way to make fast floating point.

~~~
hershel
Who did squire them ?

------
nobodysfool
It would be nice to see where he is with this now - that was 2010 and it
appears nothing has been written since about it.

~~~
ChuckMcM
An update from 2013 apparently :
[http://www.bdti.com/InsideDSP/2013/10/23/SingularComputing](http://www.bdti.com/InsideDSP/2013/10/23/SingularComputing)

~~~
hoilogoi
It would be funny if a technology pioneered by "Singular Computing" were a
primary contributor to reaching the Singularity.

~~~
eli_gottlieb
It wouldn't be funny, it would probably be the intended point of the name.

------
dfc
Can anyone explain what page eleven, "Tomography," is supposed to mean? The
example on the previous page seemed straight forward, when I got to page
eleven I had no idea what I was looking at.

~~~
vladtaltos
radon transform is used to produce tomography images. just take a look at the
wikipedia page... s/he's talking about that...

~~~
dfc
I guess(??) I could have been clearer.

I understand what tomography is. I am curious how to interpret the three
images and the differences. What does this page demonstrate to the reader? To
me they look almost exactly the same. The one big difference is that the
original has the the intersecting arrows. (I am also not really sure what the
arrows are supposed to convey.)

~~~
hbar
It's a simulation showing how 1% FP error doesn't have much effect on the
result. They look almost the same...that's the point.

If you don't know what the arrows represent then you maybe don't understand
what tomography is after all.

------
pointernil
I understand the speed and energy efficiency of the proposed design can only
be reached with specially designed hardware, but couldn't some of the expected
effects be achieved on existing cpu-core-hardware designs by using the lower
precision requirements?

Also this could be nice code golf idea: solve algorithm xyz only with a
strictly limited cpu instruction set, which could be chosen to be those
operations which are very quick resp. very energy efficient on a target
platform...

~~~
trhway
specifically it would be interesting to compare GPU (instead of classic CPU)
vs. proposed imprecise approach

------
jejones3141
Reminds me of Albert Edgar and Samuel Lee's paper in CACM in 1979, "FOCUS
Microcomputer Number System".

------
GFK_of_xmaspast
1% error is, in some sort of half-assed sense, comparable to a signal-to-noise
ratio of -40dB which strikes me as a lot of noise to be adding to a problem.

------
fenollp
This HW is a lot like [the
Parallella's]([http://www.parallella.org/](http://www.parallella.org/))

------
IvanK_net
This guy is totally stupid. I just read 4th slide.

1% error - what that means? Error in mantissa? Shorter mantissa? Simplified
IEEE?

He writes O(5K) and presumes it to be "better" than 500K ... wtf? Does he know
what O() means?

So, he basically "rounds" the values in images, mentions some recent
technologies and says "if we had better engineers, we would have flying cars
blah blah blah ...".

~~~
varelse
This design is utter computational tripe that completely ignores Amdahl's Law
or any notion of data-parallelism.

His 1% error comes from relying on 7-bit logarithmic floating point used in a
task-parallel manner. This is a non-starter for HPC. While most theoretical
models certainly have 1% or greater underlying errors in accuracy, a 1% or
worse error in precision is going to doom these algorithms to numerical
instability without Herculean additional effort that would obliterate the
computational advantage here.

Neural networks? See The Vanishing Gradient Problem.

Molecular dynamics? It's numerically unstable without 48-bit or better force
accumulation as proven by D.E. Shaw.

NVIDIA, AMD, and Intel have invested a huge sum in manycore processors that
are already too hard to program for most engineers. These processors are a
cakewalk in comparison to what's proposed here.

Finally, even if you did find a task amenable to this architecture (and I'd
admit there may be some computer vision tasks that might work here), where's
the data bus that could keep it fed? We're already communication-limited with
GPUs for a lot of tasks. Why do we even need such a wacky architecture?

~~~
jblow
Computational approximation has been shipped en masse very successfully.

You mention GPUs but did you know that GPUs already do a lot of approximate
math, for example, fast reciprocal and fast reciprocal square root?

You mention how approximation must be impossible in all these applications
(because REASONS) but all methods that numerically integrate some desired
function are doing refined approximation anyway. If you have another source of
error, that lives inside the integration step, it may be fine so long as your
refinement is still able to bring the error to zero as the number of steps
increases.

Your diagnosis of "utter computational tripe" and the accompanying vitriol
seem completely inappropriate.

~~~
varelse
Really? I don't know about GPUs? That's news to me! Did you know that the
precision of the fast reciprocal square root on NVIDIA GPUs is 1 ulps out of
23 bits? That's a world of difference away from 1 ulps out of 7 bits. I
wouldn't touch a 7-bit floating point processor. Life is too damned short for
that.

And that's because I have spent days chasing and correcting dynamic range
errors that doomed HPC applications that tried to dump 64-bit double-precision
for 32-bit floating point. It turns out in the end that while you _can_ do
this, you often need to accumulate 32-bit quantities into a 64-bit
accumulator. Technically, D.E. Shaw demonstrated you can do it with 48 bits,
but who makes 48-bit double precision units?

I stand by the computational tripe definition (with the caveat that Hershel
has now posted an app where this architecture is possibly optimal). My
objections to the broad extraordinary claims made in the presentation above.

And hey, you're a game developer, let me give you an analogy: would you
develop a software renderer these days if you were 100% constrained to relying
on mathematical operations on signed chars? It's doable, but would you bother?
Start with Chris Hecker's texture mapper from Game Developer back in the
1990s, I'm guessing madness would ensue shortly thereafter. Evidence: HPC apps
on GPUs that rely entirely on 9-bit subtexel precision to get crazy 1000x
speedups over traditional CPU interpolation do not generally produce the same
results as the CPU. If the result is visual, it's usually OK. If it's
quantitative, no way.

~~~
stephencanon
> who makes 48-bit double precision units?

IIRC Cray did (but they called it “single”). =)

Snark aside, I agree broadly with the points your making here. This isn’t
especially groundbreaking; this is using the fact that logarithmic number
representations don’t require much area to implement if you don’t need high-
accuracy and are willing to trade latency for throughput (something that FPGA
programmers have been taking advantage of since forever), and then going
shopping for algorithms that can still run correctly in such an environment.

