
Computing 10,000x more efficiently (2010) [pdf] - Moshe_Silnorin
http://www.gwern.net/docs/2010-bates.pdf
======
jules
Unfortunately this solves the wrong problem. The bottleneck isn't arithmetic,
it's data movement. The number of transistors doing arithmetic is already a
very tiny fraction of a modern chip. Reducing that tiny fraction to an even
tinier fraction by making arithmetic inaccurate isn't a good trade-off.

~~~
minthd
The bottleneck actually is arithmetic. "GPUs have much higher ALU throughput
since the GPU chip area is almost entirely ALU"

[http://devblogs.nvidia.com/parallelforall/bidmach-machine-
le...](http://devblogs.nvidia.com/parallelforall/bidmach-machine-learning-
limit-gpus/)

Also on the horizon there is 3d chip manufacturing technology(3d-monolithic)
,with extremely large bandwidth between the two different layers of the
chip,possibly being gpu + dram.

~~~
exascale1
The bottleneck is not arithmetic for a long time, it's data movement.
Arithmetic is practically free nowadays. See presentation by Horst Simon
(Deputy Director of Lawrence Berkeley National Laboratory) "No exascale for
you!" [0]

The energy cost of transferring a single data word to a distance of 5mm _on-
chip_ is higher than the cost of a single FLOP (20 pico-Joules/bit). 5mm =~
the distance to L2 cache or another CPU core. The cost of transferring data
off-chip (3D chip and/or RAM) is orders-of-magnitude higher, see graph.

[0]
[http://iwcse.phys.ntu.edu.tw/plenary/HorstSimon_IWCSE2013.pd...](http://iwcse.phys.ntu.edu.tw/plenary/HorstSimon_IWCSE2013.pdf)

------
Quanticles
Our company works on this kind of stuff for those who are interested
([http://isosemi.com](http://isosemi.com))

We're seeing more like 10-100x improvements in energy efficiency and
performance, not 10000x, unless the comparison point is a full blown CPU/GPU.

~~~
nickpsecurity
I was recently digging into analog papers to try to figure out how to apply it
more general-purpose or at least tap into it for special purpose functions.
I'm not hardware trained so much as a systems guy who knows enough to give
others tips on what to look into. Here's some links I discovered:

[https://www.schneier.com/blog/archives/2015/07/friday_squid_...](https://www.schneier.com/blog/archives/2015/07/friday_squid_bl_488.html#c6701962)

Accidentally running into another group using analog selectively for
acceleration is pretty neat. The coprocessor was a believable improvement
showing analog power. What's your thoughts on the computing with free space
and no transistors stuff? Do those other links come off as bogus to a pro or
plausible enough to encourage local college students to try something with it?
I think there's vast untapped potential in shifting certain functions back to
analog and improving the integration of the two. Maybe in general-purpose,
too. Almost certainly in INFOSEC w/ analog supporting obfuscation and tamper-
detection.

~~~
Quanticles
There is a lot of interesting research out there on analog computing, analog
neural networks, and far-out stuff like transistor-free computing. Going from
research project to product on Digikey is a really huge leap for most research
though. Designing a chip is very expensive, so the product better be a slam
dunk. Most of these analog neural network projects can do some sort of
learning with small black and white patterns, which does not approach the
accuracy or scale of software neural networks.

What we're working on is an accelerator for the convolutional neural networks
that are winning competitions like ILSVRC. Even that by itself is insufficient
for a business case, though. You also have to have end application in mind
too, and that end application better be power intensive or performance
constrained enough that software cannot accomplish what you need it to do.
Because, if software is good enough, then why take a risk on a fancy new
hardware component?

~~~
p1esk
Right now, software (GPU based) implementations of neural networks are
acceptable because the models are constantly changing. Whatever you build in
hardware today will be obsolete in a year (unless your hw is flexible enough,
but then it loses a lot of its efficiency, and GPUs will probably catch up
with you soon).

However, as we discover more algorithms for general intelligence, we will
reach a point where the model can learn on its own - just like a human baby
does. That will be the point where we will need size, speed, and power
efficiency, rather than flexibility. That will be a good moment to offer a
hardware solution, and that's when an analog chip will suddenly become more
attractive than a digital one.

~~~
Quanticles
The products that we are creating are reprogrammable and reconfigurable, just
like a GPU or FPGA. Updates are like a firmware update. Our hardware would be
no more obsolete over time than a GPU or CPU running in its place, and given
the huge improvements over CPU/GPU, it would be many years before CPU/GPU
would catch up to any particular product anyway.

They are not able learn on chip - that is a non-starter and not particularly
useful anyway. Customers dont want self-driving cars that need to learn how to
drive, they want self-driving cars that already know how to drive.

~~~
p1esk
_it would be many years before CPU /GPU would catch up to any particular
product anyway_

Can you back up your claims with actual performance numbers? I looked at your
website, and I don't see any products - do they exist? What is the flops/W for
you best CNN implementation? How many ImageNet images can it process per
second? What is the accuracy (assuming you can only do 8 bit precision)?

Also, how much does your chip cost?

~~~
Quanticles
We're in the process of fabricating a prototype and are not publicly releasing
detailed estimates at this time.

I can say that the cost depends on what you want to do - systems can range
from less than 1mm^2 to the entire reticle depending how much performance you
want.

~~~
p1esk
Wait, you haven't even built a prototype? How can you possibly know if your
chip will even work, let alone be better than any existing GPU?

I'm sure you're aware that since Mead's retina chip there have been dozens of
attempts to build NN chips, both analog and digital, and very few of them got
further than the simulation stage (ETANN or ANNA chips come to mind), and no
one managed to produce a commercially successful product.

Nvidia Tegra X1 claims to have 1Tops @10W for 16 bit precision, and the cost
is probably under $100. They can probably double that performance if they drop
precision to 8 bit. That's what they ship today, and next year they will
release the Pascal version, which will undoubtedly will be bigger, faster, and
more efficient. What makes you sure you can compete with them?

~~~
Quanticles
Actually, I can give you a better answer...

An ASIC is always going to be at least 10x better than a CPU/GPU for
performing the same algorithm. The question isn't whether or not an ASIC can
beat NVIDIA, the question is whether the target market is large enough to
support an ASIC company.

At Isocline we assume that this market IS big enough to support an ASIC. Our
competition is not NVIDIA, it's the future all-digital ASIC company that can
do the same thing, but without all of the whiz-bang technology. If we have to,
we could probably fall-back to be that all-digital company, but I'd prefer to
maintain our technology advantage.

NVIDIA's advantage is flexibility, there's always going to be a lot of demand
for that.

~~~
p1esk
_An ASIC is always going to be at least 10x better than a CPU /GPU for
performing the same algorithm._

In theory, this has always been the case. Yet every single neural net ASIC
built in the last 25 years has failed in the marketplace, for the same reason
- "silicon steamroller". Invariably, when the ASIC was ready to ship, which
was almost always much later than was hoped for, general purpose chips have
caught up in performance.

I'm not attacking your startup in particular. I'm just pointing out the
history behind the field of specialized neural hardware.

p.s. Your competition _is_ Nvidia (or Intel, or Xilinx, etc), because they are
well known, big players, who produce reliable products, with huge development
infrastructure and expertise. Nvidia specifically has been focusing on deep
learning applications, they are already targeting computer vision for cars
with their mobile GPUs. If I'm Ford or Toyota, who would I consider for
partnership when I need chips potentially making life or death decisions on
the road? If your technology really works (big "if", because you haven't built
anything yet), then your best hope is one of those big players acquires you.

~~~
Quanticles
These are all good questions/points

History is something we need to contend with, not just for neural networks,
but also for analog computing which has a similarly troubled past.

For NN history, there has not actually been a market for NN accelerators until
recently. You can see this because:

1\. No NN algorithm was worth accelerating until AlexNet came along in 2012

2\. What commercial products even use NN now? Currently it is mostly just
voice recognition which is processed server-side.

Right now we are not attempting to go after any markets that a GPU would be
sufficient for the reasons you mention; we're sticking to products that can
only work with our technology. By the time we went after an overlapping market
our credibility would be established and that wouldn't be an issue.

~~~
p1esk
Yes, the market for NN based products is still in its infancy. It can explode
if Apple or Samsung decide to do image or voice processing locally on a
smartphone, by using a coprocessor/accelerator chip alongside with CPU/GPU. It
could make sense considering the expense (power, time, bandwidth costs) of
sending every image off to a datacenter for processing.

I'm curious, have you considered using analog weights (e.g. floating gate
transistors, or DRAM capacitors)? This could reduce multiplication from 32
transistors to just one!

~~~
Quanticles
Analog weights can save a lot of delay/power/cost if you can implement them
right, easier said than done

------
msandford
I might also ask why only go for a 1% number? It seems like it'd be pretty
doable to get a 0.1% approximation as that's only a 30dB SNR versus a 20dB
SNR. Maybe I'm super naive but it doesn't seem like it should be tremendously
difficult even if it does cut your core count by 20-50%.

Part of the reason I argue for this is that there are tons of sensors which
are 0.1% sensors and if you can offer the rest of the computational pipeline
at 0.1% then (so long as your errors don't accumulate)_you don't lose any
accuracy processing your information this way.

It also seems like this would be pretty great for graphics cards, no? I mean
it'd take a lot of work to make OpenGL run on it, but once you did you could
have either very inexpensive cards, very powerful cards, or both.

~~~
oh_sigh
> so long as your errors don't accumulate

I think that may end up being the hardest part of all this.

~~~
msandford
Totally agree.

------
sklogic
Many GPUs used to have a tiny, fast and very imprecise SFU - it's ok for GLES
but useless for anything else.

~~~
boxfire
Any problem where the data is imprecise or the model itself is very
approximate. Machine learning type problems, machine vision, as the paper
shows interferometry, lossy image processing. I can think of a few more data
processing problems, but I think you might get the point. The use of a
precision processor even in single precision when your input data is already
5%+ noise is a huge waste. This has immense application.

~~~
sklogic
Yes, makes sense... These units are accessible via (undocumented) intrinsics
in some OpenCL implementations (or in GLSL), but, unfortunately, there is no
portable solution. And the FP precision requirements in the OpenCL standard
are way too high, even for the FP16 extension.

------
asgard1024
I have wondered why don't we use only exponent in floating point numbers as a
representation. If the exponent is close to 1 then you can represent mantissa
of any size with it. It seems that the article is suggesting something
similar.

------
bra-ket
what happened to this project since 2010?

~~~
psilence
Ongoing, by the look of the founder's LinkedIn:

[https://www.linkedin.com/pub/joseph-
bates/2/853/aa3](https://www.linkedin.com/pub/joseph-bates/2/853/aa3)

And his company's patents,

[http://patents.justia.com/assignee/singular-computing-
llc](http://patents.justia.com/assignee/singular-computing-llc)

Ah, I get the name now. "The Singularity Is Near" came out in 2005
[https://en.wikipedia.org/wiki/The_Singularity_Is_Near](https://en.wikipedia.org/wiki/The_Singularity_Is_Near)

~~~
bjd2385
I read that book recently. It was alright, seemed a bit far-fetched to me,
however. I mean, obviously we're moving along at a fast pace, but by the time
just _some_ of us (let alone _all_) experience the author's projections we'll
probably be in the real-life Star Trek era.

------
karmakaze
Very interesting. Wondering if the Apple ISA will have anything to do with
this.

