
A Cloud-Scale Acceleration Architecture: FPGAs in Microsoft’s Datacenters - eDameXxX
https://www.microsoft.com/en-us/research/publication/configurable-cloud-acceleration/
======
CalChris
TL;DR

    
    
      Microsoft uses FPGAs. Google uses ASICs. MapD uses GPUs.
      FPGAs are more flexible. ASICs are faster.
    

I think the FPGA approach is generally more useful and will trickle out into
the marketplace. Indeed they are already available. So this article is more
about Microsoft's fabric.

    
    
      https://www.mrcy.com/products/ensemble-fcn8213-server-class-fpga-processing-blade-advancedtca/
    

It's not impossible to think this would make its way into an Azure cloud
product. That won't happen with ASICs. But Google will clearly have an
advantage on the problems they focus on.

~~~
Cyph0n
FPGAs are already powering key parts of both Azure and Bing. I asked Doug
Burger after his presentation two weeks back at my university, and he said
that currently the fabric is not reconfigurable. In other words, hardware
acceleration is not available to customers yet, but it looks like they're
working towards that.

A nice little tidbit from the talk. They were able to translate all of
Wikipedia from English to Russian using 90% of the deployed FPGAs in 0.1
seconds.

------
jackyinger
The reuse of their network infrastructure with this approach is elegant, BUT
it takes about half of the FPGA to support the network pass through and
lightweight transport layer.

FPGAs are about as expensive as high end CPUs, can you afford to buy a high
end server and burn half the cores on your OS?

On the flip side, if Microsoft can really eat that cost and offer FPGA space
as a public cloud service, as an FPGA dev, I'd rather pay to play than bust 5K
on a high end dev board.

~~~
astrodust
FPGAs are really expensive today, but I think we're starting to see a real
shift in production volumes, especially with Intel jumping in, which might
rapidly cut costs on them.

Exciting opportunities ahead for those that can take advantage of hardware
like this.

~~~
jackyinger
It certainly is exciting. Tech that began by replacing discrete logic ICs (TI
7400 series) is getting powerful enough to pack a computing punch.

Look up systolic array architectures for matrix multiplication if you're not
already familiar. They play nice with FPGAs (which are design space limited in
comparison to ASICs). And can efficiently implement algorithms that are prone
to combinatoric explosion by amortizing expenisve off chip memory accesses
over multiple operations as the data passes trough the sys array.

If the FPGA manufactures want to open the advantages to all comers they'd make
their design software free (as in beer, not freedom) an pAy for it with
silicon sales.

~~~
dom0
> It certainly is exciting. Tech that began by replacing discrete logic ICs
> (TI 7400 series) is getting powerful enough to pack a computing punch.

<nitpick> Those were PALs, basically fuse-programmed logic-in-tables. Even a
single of the later CPLDs could already replace a board or two filled with SSI
logic. FPGAs took that to replacing an entire cabinet.

------
trhway
looks like a new class of hardware component emerging - NIC with an additional
FPGA chip. Or may be a NIC which has only FPGA, an FPGA larger than the NIC's
code alone would require. Or just FPGA card with NIC ports on it.

~~~
kijiki
NetFPGA. Been around in various revisions since 2007.

[https://en.wikipedia.org/wiki/NetFPGA](https://en.wikipedia.org/wiki/NetFPGA)

~~~
xellisx
I see they haven't done 25 GB / 100 GB stuff yet.

~~~
pyvpx
you can get a nice "SmartNIC" from Silicom that has a Tile Gx-72 and a Virtex
FPGA on-board. I have no idea how much they cost; only that you can't buy just
one :(

[http://www.silicom-usa.com/pr/server-
adapters/programmable-f...](http://www.silicom-usa.com/pr/server-
adapters/programmable-fpga-server-adapter/capture-server-
adapters/ts-100-gigabit-capture-server-adapters/pe3100g2f2tstc4/)

------
ddorian43
So what algorithms/data-structures/dbs do they use the fpgas for ?

~~~
algorithmsRcool
They are using the FPGAs for specializing network hardware on the fly. This
lets them decrease latency across their network as well as allow centralized
management.

Edit:

It looks like they are also allowing applications to use the FGPA-FPGA
communication layer to reduce latency for distributed systems.

> _" All network traffic is routed the FPGA, allowing it to accelerate high-
> bandwidth flows. An independent PCIe connection to the host is also
> provided, allowing the FPGA to be used as a local accelerator..."_

> _" By enabling the FPGAs to generate and their own networking packets
> independent of the each and every FPGA in the datacenter can reach every
> other one (at a scale of hundreds of thousands) in small number of
> microseconds, without any intervening. This capability allows hosts to use
> remote FPGAs for with low latency, improving the economics of the
> deployment, as hosts running services that do not their local FPGAs can
> donate them to a global pool and value which would otherwise be stranded."_

------
user5994461
> [From the article] In this paper we propose a new cloud architecture that
> uses reconfigurable logic to accelerate both network plane functions and
> applications. This Configurable Cloud architecture places a layer of
> reconfigurable logic (FPGAs) between the network switches and the servers,
> enabling network flows to be programmably transformed at line rate...

Network planes are done with FPGA and ASIC since a long time. Microsoft is
late to the game.

~~~
LargoLasskhyfv
Maybe late rolling it out in production. Experimenting with it not so much.
2007 [https://www.microsoft.com/en-
us/research/project/emips/](https://www.microsoft.com/en-
us/research/project/emips/) Even open sourcing it in 2011
[http://blog.netbsd.org/tnf/entry/support_for_microsoft_emips...](http://blog.netbsd.org/tnf/entry/support_for_microsoft_emips_extensible)

------
adrianratnapala
Is this entirely about how Microsoft is using FPGAs for some of its own
internal datacenter needs. In that case I can understand why Google might see
ASICs as a better way to do the same thing.

Where I see FPGAs shining on the cloud is if vendors started putting them
online so that clients could program them for their own applications. But I
don't know enough about such things to know if there is yet a market for that.

~~~
youdontknowtho
I think that it's only a matter of time until you see that from Amazon and
Azure. Financial companies have been offloading hot paths in code to FPGA for
about five years.

------
gumby
I'm not convinced that FPGAs are effective in general purpose computation.
FPGAs are quite expensive in terms of power, density and overall cost per
function. It's not like this hasn't been tried many times before (Google even
bought a company that used FPGAs for a reconfigurable network stack processor,
but as far as I can see the product never made it).

------
webaholic
I wish they would work out how to run the chisel toolchain on those fpgas.
From what I understand, you have to program in verilog currently which is a
huge pain.

Please enable chisel and put it in your Azure cloud. It will be a great
platform for custom software.

~~~
DigitalJack
Can you recommend some resources on chisel? I've been an ASIC/FPGA designer
for 20 years. We mostly do VHDL and SystemVerilog for design, and
SystemVerilog for verification.

I've only heard about chisel briefly, used in reference to a risc design I
think.

I've considered making a lisp like language for design on a few occasions. But
maybe chisel solves the pain points?

~~~
aseipp
Chisel is the language used to describe the RISC-V Rocket core, which is the
current primary RISC-V implementation and reference (with real, synthesized
boards coming out of it).

> I've considered making a lisp like language for design on a few occasions.
> But maybe chisel solves the pain points?

To be honest, I've only been doing FPGA design for a few weeks but, like:
basically _anything_ is better than VHDL/Verilog IMO. :)

Personally, I prefer CLaSH as opposed to Chisel. (Chisel is a DSL built using
Scala - Clash is actually a compiler, not a DSL, from Haskell to Hardware. I
think CLaSH conceptually is very nice.)

But honestly it really does seem to me that anything is a vast improvement
over the current languages, for the most part. Your life is simply vastly
improved by being able to do things like use a REPL in Scala/Haskell, or
having cosimulation support in your toolchains (so you can run your designs as
cycle-accurate software simulations, with no code changes, before doing
synthesis).

Even then -- the intuition to use, or build something like Lisp isn't
necessarily misplaced! Hardware description and functional programming share
some interesting commonalities, it seems.

~~~
blackguardx
The key to learning HDL is to understand that it isn't software. There are no
instructions to a state machine. You are describing the functionality of the
state machine itself.

~~~
aseipp
Not to sound rude, but what are you addressing with this statement? I've been
doing functional programming for over a decade; I'm quite familiar with the
concept of declarative programming, so circuits were not a big leap from this
point (one of the many ways in which FP nicely dovetails with hardware
design.)

My point wasn't really about state machines or whatever. Languages like
Verilog and VHDL are absolutely terrible at things like abstraction
capabilities. You don't have modules, you don't have a REPL, you don't have
any kind of abstraction facilities almost at all, on top of the languages
being ridiculously verbose. They sit in some bizarre uncanny valley where
anything takes a surprising amount of code to write, yet is terrible in terms
of reuse, modularity, understandability, etc.

Example: A few weeks ago I ran into a case where I couldn't easily wrap a cell
library primitive by wrapping it in a Verilog module, and then re-use that
module a few times. The primitive was a PLL on my board. Oops, can't do that
-- because the synthesis toolchain disliked a module around the PLL primitive,
as the primitive had some arguments which were required to be constants
(despite the fact all call sites to the wrapping module had constant
arguments). So I had to use a Verilog macro which _generated_ a Verilog
module, and I had to instantiate that macro like 10 times to 10 different
modules, each one corresponding to a different clock speed, with no reuse
outside of a Macro. Because some parameters had to be constant. If Verilog
metaprogramming wasn't just a rip off of C macros, it might not be so
terrible. But shit like this is just tedious.

I can't even do things like use higher order functions, or strong typing
capabilities, or things like leverage a real module system (with boundary and
abstraction capabilities) -- even when I am completely in control of their use
and completely understand how they will synthesize. Tools like REPLs vastly
improve time-to-iterate with an interactive environment. Cosimulation being a
primary feature of these tools means I generally get fairly accurate software
simulations directly from my HDL itself, independent of any EDA toolchain etc.

The languages _are just bad languages_. Can you get work done in them? Yes,
and everybody does. Do they accurately reflect the fact you're declaratively
describing a hardware circuit? Yes. That doesn't mean Verilog/VHDL, as
languages, aren't objectively terrible.

I understand why this is. The cost models associated with these tools -- and
process -- are fundamentally different (for hardware, it's not about
"determinism" like software programmers care about -- but things like MTTF,
lead time, and optimizing around that), and Verilog is really a small portion
of the overall lifecycle of developing a chip (you still need post-synth
testing, possibly do gate-level debugging, post-synth modifications, on-board
testing, etc). And tools like Clash/Chisel are _not_ perfect by any means, and
I still have plenty of slack to pick up by myself, nonetheless.

The languages are still bad, despite all that.

~~~
blackguardx
I think we are approaching this from opposite perspectives. I've been doing
FPGA development for 10 years and am an EE. C was the first programming
language that I learned over 20 years ago as a teenager. I've only very
recently been learning high-level languages and associated abstractions.

A lot of the trouble with FPGA development in my opinion isn't the HDL, but
the toolchains themselves. I haven't met anyone that likes FPGA toolchains.
Things like PLL instantiation problems are likely to do with the tool and not
the language.

A lot of people (I'm not saying this is you) coming from a software background
tend to approach FPGAs as if it were a software problem. They seem very
similar, after all. You write some code into a window, hit a button, and bam,
you have some functionality. To some extent, I think it leads them to tend to
shove a square peg in a round hole. Just like there are programming paradigms
governing software design, there paradigms governing FPGA design as well.
Trying to force certain abstractions can lead to headache.

Abstractions are very important in any design process, be it analog hardware,
digital hardware, software, bridge design, etc. Software tends to favor
abstractions because they are powerful, but also very cheap. One can afford to
waste some CPU cycles if it reduces development time and the occurrence of
bugs. In FPGA and ASIC design, abstractions are just as important but one
can't afford to waste clock speed or chip area to get it. FPGAs are expensive
and ASICs even more so. The design paradigms of HDL exist for that reason, as
frustrating as they may be.

In a sense, I've just grown accustomed to Verilog's warts. I was appalled at
the FPGA development process when I first started but at this point I just
take it for granted. I think thing will get better but it will take time. Even
so, I think that it is best to stick with Verilog/VHDL for the time being.
Just like if one wanted to web development, one would have to learn
javascript, which is also considered terrible.

~~~
ZenoArrow
> "Even so, I think that it is best to stick with Verilog/VHDL for the time
> being."

If you do it for your job, perhaps. However, if you're a hobbyist, that
doesn't make as much sense. As a hobbyist you have the luxury of being able to
use the tools you like the best. In any case, FPGA design shouldn't be closely
coupled to any one language, they're just a tool that allows you to express
the design you have in mind. If Clash or Chisel can help you express that
design more succinctly and clearly than the competition, there's no need to
stick to the other tools. The need only exists for professionals, as changing
to languages with lower market share carries risk.

