
Microsoft Bets Its Future on a Reprogrammable Computer Chip - benaadams
https://www.wired.com/2016/09/microsoft-bets-future-chip-reprogram-fly/
======
xt00
It's a little odd saying that Microsoft is betting its future on reprogramable
chips.. From reading the article they are just simply now using FPGAs where
previously they (along with many other companies) were not. It would be like
saying Amazon is betting its future on UPS trucks.. What you put in the FPGA
matters.. The reprogramability is really just to allow them to implement
algorithms that they use heavily that currently are not executed efficiently
using current Intel chips. Basically what this is showing us is that the pace
of development in software and algorithms that power the cloud is pretty fast
so they need the ability to improve performance on data and computation flows
that currently (and maybe in the future) will not be well supported on current
silicon. FPGAs are not very cost efficient, so unlikely they would not buy
that many compared to the number of Intel processors and nvidia gpus. They
basically fill in gaps that currently are not being well supported. Intel now
owns Alterra so this internal thrust at MSFT may be why they bought them..

~~~
RandomOpinion
> _Intel now owns Alterra so this internal thrust at MSFT may be why they
> bought them._

Intel openly acknowledges it. From the article:

" _Microsoft’s services are so large, and they use so many FPGAs, that they’re
shifting the worldwide chip market. The FPGAs come from a company called
Altera, and Intel vice president Diane Bryant tells me that Microsoft is why
Intel acquired Altera last summer—a deal worth $16.7 billion, the largest
acquisition in the history of the largest chipmaker on Earth. By 2020, she
says, a third of all servers inside all the major cloud computing companies
will include FPGAs._ "

Altera strikes me as an odd choice though. I'd have thought Intel would buy
out Xilinx, the industry leader, instead.

~~~
mastax
>Altera strikes me as an odd choice though. I'd have thought Intel would buy
out Xilinx, the industry leader, instead.

Intel had been fabbing (some of?) Altera's chips for a few years before the
acquisition, so from this angle it makes more sense than Xilinx. As to why
Altera had this partnership and not Xilinx, who knows. Perhaps being in second
place motivates you to shake things up.

~~~
totalZero
My understanding is that Intel has a service called Intel Custom Foundry that
gives smaller partner companies access to Intel's fabrication pipeline. Altera
was a client of Intel Custom Foundry, and the two companies started to build
some FPGA-accelerated x86 products together, so it was a pretty natural
acquisition versus a potential Xilinx deal.

------
nickpsecurity
I'm surprised some cloud vendor hasn't acquired eASIC yet. They throw together
stuff easily whose price/performance is between FPGA's and standard ASIC's.
Have a bunch of machines for rapid, chip prototyping from FPGA's to their
S-ASIC's at 90nm, 45nm, and 28nm. Any IP for networking, storage, security,
whatever could be turned into a chip easily.

Vendor could pull a IBM or SGI with little boards with high-end CPU's + FPGA's
or S-ASIC's for various coprocessors. Not sure if it's ultimately best idea
but surprised I haven't seen anyone try it. Wait, just looked up their press
releases and it seems they're doing _something_ through OpenPOWER:

[http://www.easic.com/easic-joins-the-openpower-foundation-
to...](http://www.easic.com/easic-joins-the-openpower-foundation-to-offer-
custom-designed-accelerator-chips/)

~~~
ttul
eASIC is a great little niche-filler. You can get a design fabbed in test
quantities for as little as $150K or so, which is akin to "free" next to real
ASIC design fabrication.

~~~
ttul
Note that Altera also offers a similar service but it costs more.

------
foobarcrunch
Whaaa? Isn't this what Tilera and Tabula were chasing? Maybe they were too
early and/or didn't have the momentum to drive the industry? It does seem like
an inevitable technological evolution direction, however compilers, debuggers
and so on will need to optimize for an almost entirely new set of constraints.

The thing with using FPGA's in systems (they're great for low-volume, high-
priced items where ASICs would be too costly) is they end up just emulating
logic which could be more cheaply implemented as actual execution units (as
many modern FPGA's things like cache, ROM, ALUs, etc.). That is, it's
expensive flexibility that isn't really all that useful. Sure you could
reconfigure a "computer" from doing database things to suddenly add more GPU
cores to play games, but how useful or power/cost efficient would that be?
Sure it's nice to cut down on ASICs and upgrading them after the fact, but it
seems like more like category development than offering practical advantages
to solve a real problem. Maybe a super-fast HPC-on-a-chip would be possible,
but I don't see that we're storage or compute constrained, however we maybe
bandwidth and latency constrained in terms of shrinking clusters to a single
rack of ridiculously power-hungry reprogrammable chips.

Instead of infinitely customizable, arbitrary logic, you might have a crap-ton
of simplified RISC cores with some memory and lots of interconnected bandwidth
or something in-between FPGA and MPPA.

[https://en.wikipedia.org/wiki/Massively_parallel_processor_a...](https://en.wikipedia.org/wiki/Massively_parallel_processor_array)

~~~
Eridrus
I keep hearing that FPGAs suck, but they seem like a reasonable middle ground
between CPUs and ASICs in terms of cost efficiency as this article mentions.

------
spydum
Curious to hear what exactly the devices do? Article hints at compression and
such, and the fact that they've moved devices to the edge of the machine to
handle network connectivity makes me think it's a shift for better interaction
with SDN more than anything else. I don't get how it has much to do with AI
though?

~~~
gradys
FTA

> ... in the coming weeks, they will drive new search algorithms based on deep
> neural networks—artificial intelligence modeled on the structure of the
> human brain—executing this AI several orders of magnitude faster than
> ordinary chips could.

~~~
aab0
That raises as many questions as it answers. Why FPGAs and not GPUs which can
run just about any deep neural network but usually faster and more
efficiently?

~~~
emcq
GPUs have worse performance per watt than a tuned FPGA. Some newer FPGAs can
have 400 megabits of on chip RAM - that's huge, significantly larger than the
128-256k cache typically available on chip for a GPU that turns into big
energy savings.

~~~
p1esk
_GPUs have worse performance per watt than a tuned FPGA_

Citation needed.

Maxwell Jetson TX1 is claimed to achieve 1TFlops FP16 at <10W, and soon to be
released Pascal based replacement will probably be even more efficient.

~~~
emcq
While I dont have any external publications addressing this general claim,
this is taken from my current and past experiences with internal studies
focused on neural networks implemented on the TX1, other GPU, custom ASICs,
and FPGA approaches. In terms of power efficiency it generally goes ASIC >
FPGA > GPU > CPU. If you're doing just fp32 BLAS it's hard to beat a GPU, but
it turns out many problems have features that you can optimize for.

The TX1 power consumption including DRAM and other subsystems peaks 20-30W.
Typical usage is 10-15W if you're running anything useful.

That 1 TFLOP counts a FMA instruction as 2 flops - while accurate and useful
for say dot products - for other workloads the throughput will be half of this
number.

As an example of an FPGA performing significantly better than the TX1 is
DeepPhi [0].

[0]
[http://www.deephi.com/en/technology/](http://www.deephi.com/en/technology/)

~~~
p1esk
In that link, where's the comparison of fpga vs TX1?

~~~
emcq
If you click on Papers, there is a link to "Going Deeper with Embedded FPGA
for Convolutional Neural Network", which compares against the TK1:
[https://nicsefc.ee.tsinghua.edu.cn/media/publications/2016/F...](https://nicsefc.ee.tsinghua.edu.cn/media/publications/2016/FPGA2016_None_6tAJnDW.pdf)

While not the TX1 vs FPGA result you want, this is very close. For example
they aren't using the latest FPGA or GPU, and are not using TensorRT on the
GPU and on the FPGA side they are using fatty 16-bit weights on an older FPGA
rather than newer stuff you can do with lower precision (which improves the
efficiency of the FPGA having more high speed RAM collocated with computation
vs GPU which is primarily off-chip).

If you want to learn more about this stuff, I suggest a presentation by one of
Bill Dally's students (chief scientist at NVIDIA): [http://on-
demand.gputechconf.com/gtc/2016/presentation/s6561...](http://on-
demand.gputechconf.com/gtc/2016/presentation/s6561-song-han-deep-
compression.pdf)

~~~
p1esk
Thanks, but TK1 is using FP32 weights, as opposed to FP16 on FPGA. If you
double the GOP/s number for TK1 to account for that, you will end up with
pretty much identical performance, and the paper claims they both consume ~9W.

I'm not saying you're wrong, just that to make a convincing claim that FPGAs
are more power efficient than GPUs, one needs to do an apples to apples
comparison.

And of course, let's not forget about price: Zynq ZC706 board is what, over
$6k? And Jetson TK1 was what when released, $300? If you need to deploy a
thousand of these chips in your datacenter, to save a million per year on
power, you will need several years to break even, and by that time, you will
probably need to upgrade.

It just seems that GPUs are a better deal currently, with or without looking
at power efficiency.

------
andreyk
Brief summary: Microsoft is now using FPGAs (flexible hardware that can be
'programmed' to implemebt various chips) as part of its Cloud tech stack. The
FPGAs can run their algorithms with better speed and energy efficiency than
CPUs, but are less flexible (a pain to alter). The article does not explain it
very well, I think MS itself lays it out quite clearly:
[https://www.microsoft.com/en-us/research/project/project-
cat...](https://www.microsoft.com/en-us/research/project/project-catapult/)

Also annoyingly this article does not link to the paper about this, which also
explains it better than the article. I recall this one from MS about how they
use the FPGAs in Bing; was pretty impressed by it at the time.
[https://www.microsoft.com/en-
us/research/publication/a-recon...](https://www.microsoft.com/en-
us/research/publication/a-reconfigurable-fabric-for-accelerating-large-scale-
datacenter-services/)

