Hacker News new | comments | show | ask | jobs | submit login
Hardware is the new software (acolyer.org)
70 points by ingve 124 days ago | hide | past | web | 44 comments | favorite

>A careful reading of Intel patents suggests that SGX >instructions are implemented entirely in microcode. If new >ISA features are new microcode, that opens up the >possibility of supporting them on existing CPUs.

The SGX instructions are essentially key management for full memory encryption. Does he really think memory encryption could be implemented in microcode?

That's an obvious flaw of his proposal.

I'm really surprised no paper reviewer caught that. What happened to peer review?

> Does he really think memory encryption could be implemented in microcode?

Yes? Why not? This argument rather relies on knowing what the (trade secret) capabilities of the microcode are. I don't see why this warrants a smear on peer review.

The argument the paper makes does not rest on whether memory encryption is implemented solely in microcode. All the problems with SGX lie with its management part not with the encryption part. I'm referring to side-channel attacks and lack of secure counters.

If you read the paper carefully, there are many other arguments one can make against it. However, it is an interesting paper and its message is worth considering.

Peer-review is not a perfect process. I think the research community is better off with publishing such a paper.

Why don't we see CPUs with an integrated FPGA yet?

- No popular applications that use this yet (see history of 3D graphics)

- Tooling is terrible and non-free

- Power consumption

- Tiny die area FPGA not that useful, huge die FPGA is wasting die you could use for cache

- Cost of the nonintegrated solution is huge e.g. http://www.nallatech.com/store/fpga-accelerated-computing/pc...

I want to emphasize the Tooling is terrible and non-free

It is really hard to get a young programmer excited about programming FPGAs when the tools are archaic, buggy, proprietary, and non-portable. The higher-level interfaces (C abstractions etc.) are also lacking.

Indeed. The whole field is missing 20-40 years of experimental research with different abstractions. It's like trying to write FORTRAN-77. About the only thing that's ahead of this is "Chisel".

There's also a disease of people trying to write "C to HDL" converters without thinking of the very different paradigms involved there. While it can be made to work it will always be extremely inefficient.

Tooling is actually okay, it seems archaic because with a FPGA you are actually designing the very fundamentals of computing. There is no hand holding there.

It would be the equivalent of a newbie programmer looking at GCC and saying: "This program that makes other programs looks so archaic, it runs on an 70's computer terminal."

i disagree, getting accustomed to one of these tool chains takes a really long time. you can run a pipeline for hours only to have it fail with some unreadable message. everything is layered over with lots of poor deep-menu ui, but it doesn't save you from having to understand the pipe in its entirety. oh right, i forgot that depending on what you are doing you may have to open up several different tools manually.

oh right, and the insistence on using feature wizards, which somehow work fine for the on example case but fail in mysterious ways or aren't capable of expressing what you actually want to do

its really a long way away from "gcc hello.c ; ./a.out"

then multiply that with the tool licensing and various feature sets, the IP licensing...support for different parts, entirely different tooling for different vendors.

part of it as you say is that reality is a lot more inherently complicated than software...but the tooling really is a mess. time and cost are a huge barrier to entry.

You are probably mixing RTL design with Place and Route, those are two completely different things.

You can design and simulate without your FPGA, most of the tools allow for simulation without calling a GUI. It will be not different than calling GCC. Worst case scenario, you can always use this open source Verilog simulator, it's good enough if you are only designing for FPGAs: http://iverilog.icarus.com/

Again, you don't need any FPGA to do the design and you shouldn't even use it, it's a waste of time synthesizing the design every time you make a change.

Place and Route comes after the design gets to a complete state and you should only be able to do it once.

If you are doing something different from this flow, you are most likely doing it wrong.

i almost certainly am. made the mistake of listening to the AE

+1 vhdl is much harder to reason about than c. With fpgas you don't write a code flow, instead you write logic that gets executed all at the same time.

It's good for things that are compute heavy, but not for anything else.

I'd also wager that asics might make more sense than fpgas. Which seems to be the path we are going down

VHDL (or Verilog) isn't harder to reason about than C in my experience, it's just different.

I haven't used VHDL for work yet (just skimmed the existing code base), but I just took a course on it, this was my main takeaway. There were some gotchas in the language (perhaps in the error reporting in the particular implementation more than the language itself) that took some getting used to, but the hardest conceptual problem for the class (most of us were programmers, and the hardware guys were green and came from non-CMPE backgrounds) was the idea that many of the statements you write happen concurrently, or are scheduled to happen at a future point in the code. So code like:

    X <= X + 1;
    Y <= X;
  end process;
With X = 4 at the start, you will end up with (X,Y) = (5,4).

Once you start wrapping your head around these concepts reasoning isn't so hard (and having been a CMPE major once I had the basis for this academically, but it was far in the past so had been forgotten). Certainly not harder than trying to understand distributed systems and coordination between them, in fact it's a lot of the same problems just applied at a different scale.

The problem with programmers trying to design hardware is that when they see code, they think of instructions. If you start shifting your mindset to actual hardware constructions, like flip-flops and combinational logic, the hardware language will start making a lot of sense.

There is also IceStorm though (for use with iCE40 FPGAs).

Added to this list:

- most programmers have a poor grasp on what the real advantages and disadvantages of FPGAs are, instead treating them as magic sauce

- you need to go and benchmark: https://electronics.stackexchange.com/questions/140618/can-f... also https://electronics.stackexchange.com/questions/132034/how-c...

Commercially they're mostly used for low-latency predictable-throughput signal processing.

What if you could make the FPGA and the cache the same thing? FPGAs, CPUs and GPUs are all merging into a computational goo. DSPs already got handed their asses by the FPGA vendors including hard blocks for multipliers. That was all it took to kill the entire arch.

Could FPGA be configured as cache out of the box, or is there too much overhead for this to be at all meaningful?

Too much overhead. FPGAs normally have dedicated memory blocks, since making memory out of "normal" logic cells needs way to much space (registers, temporary buffers for single values, ... can often be embedded in the logic design, but not larger blocks of memory)

The synthesis tools even try wery hard to optimize dumb registers away from your design (you typically blow away ~40 gate equivalents for one bit of simple memory implemented by logic cell). While having clock-gated registers makes the design harder to understand and thus is undesirable, another reason why it is undesirable is that it tends to break any such optimizations performed by synthesis toolchain.

For Intel, I guess no clear business case, or something. Integrating FPGA support with operating systems seems like an interesting problem too, and perhaps one which needs some effort before desktop/server systems add this.

In the embedded world they're available and not even super new, the Zynq[1] from Xilinx has been out for a few years. It combines an FPGA fabric with hard ARM CPU core(s).

Intel (through the acquisition of Altera) also has such products, like Stratix[1] which also combines FPGA and ARM. I seem to remember seing some x86+FPGA for embedded applications from Intel, but didn't find it with a quick search.

[1] https://www.xilinx.com/products/silicon-devices/soc/zynq-700... [2] https://www.altera.com/products/soc/portfolio/stratix-10-soc...

> In the embedded world they're available and not even super new,

I feel you might be understating this! FPGA's and ASIC's have in fact been in use for decades now, they are very much ingrained into many high end products. Any digital oscilloscope has an FPGA, and has done for a very long time, for example. These are often either used in combination with a CPU for the UI part. The CPU might either be an IC on the PCB or what is quite popular is having the CPU be a synthesised one programmed into the FPGA.

ARM, for example, have a whole architecture specifically designed for optimal FPGA synthesis.

There are a number of relatively cheap SBC's out there with an FPGA too. http://linuxgizmos.com/tiny-sbc-runs-linux-on-xilinx-zynq-ar...

The Stratix/Arria series are ungodly expensive, however. The low-end ARM/FPGA combo series for Intel is the Cyclone V[1], which is much closer in specs/capabilities to the ordinary Zynq platforms.


> I seem to remember seing some x86+FPGA for embedded applications from Intel, but didn't find it with a quick search.

I suspect these have gone out to "select customers".


When a problem that seems like it might drive FPGA adoption becomes sufficiently popular, it gets dedicated hardware instead. That's why we have GPUs, AES instructions, and CPU-hardware-based random number generators, and not onboard FPGAs. FPGAs basically get evaporatively-cooled out of the CPU.

> [when] FPGA adoption becomes sufficiently popular, it gets dedicated hardware instead.

I always thought this too, but turn around time (time to market) in hardware isn't fast enough, and as the article argues, Intel has a vested interest in adding instructions. Most large companies are motivated to add complexity and keep the users spending their time on the companies projects. It doesn't pay to make simpler products that might become commodities.

Examples that I don't think would move from FPGA to CPU:

- Specialized algorithims that don't have enough users.

- Changing algorithims: encryption, HTTP2 parser, data structure/index scanning

- Tied to specific software: a game that's popular, but not enough to justify CPU instructions

- Portable FPGA: imagine an M.2 or USB style connection fast enough to run an add-on FPGA. It could be the start of making a reconfigurable computer; just add the chips for your software.

Then you fall off the other end of the scale: You don't have enough people clamoring for hardware acceleration of this whose needs aren't met by some already-existing plugin card or something.

You aren't left with a niche of people who are making decisions about what CPU to buy based on what FPGAs they have, as a group, with sufficiently similar enough specs that they look like a single market to Intel or a chip maker. (That is, someone who just wants "a couple of instructions" and someone who wants "an FPGA for parsing HTTP2 requests" are dissimilar enough not to be the same thing.)

In this case, being able to name a multiplicity of very edge use cases isn't an advantage; it is another reason why it doesn't happen. You need one big use that enough people agree on that it's economical to centralize it into the CPU, not a ton of scattered ones with radically different implications in terms of die size, how the data flows in and out, what percentage of users are using it, etc. In fact, in your use cases, if we take them seriously and put them in the system in the place to maximize performance, because otherwise why bother, the FPGAs you're calling for don't even live in all the same places.

And when you get that one use case, you get specialized hardware, like GPUs.

(Incidentally, GPUs are themselves another reason. Many of the things you might previously have cited as reasons to have FPGAs twenty years ago are now efficiently implemented in GPUs. No, not by any means all such problems, but enough that it carves another massive hunk of the use-case-space out of the FPGA-on-the-CPU argument.)

Another way of looking at it is that the fundamental problem that FPGAs have is that it is physically impossible for them to beat dedicated hardware in the same configuration that you'd load into an FPGA, so once you have that popular use case, it gets put into normal silicon instead and outperforms the FPGAs.

Interfacing FPGAs to PCI or PCIe isn't hard, there are plenty of dev boards available to let you add whatever you want.

The ARM+FPGA chips work well too, the IO blocks are decoupled enough from the CPU that you can add your own offload engines.

Ok, but what happened to "general purpose computing"?

I don't understand the question.

Unless you are using word-thinking and think that "general purpose computing" is some sort of promise that the maximally-general-purpose hardware is guaranteed to be available, in accordance with some particular ideosyncratic definition of "maximally general purpose hardware"? General purpose computing refers more to your ability to run any program you choose rather than particular hardware capabilities. It stands in opposition to platfroms where the programs are locked down, like gaming consoles.

On CPUs the end of Moore's law, but having dedicated accelerators for special tasks is not exactly new. And they free up the general-purpose resources to do things that actually require them, instead of wasting time and heat budget on easily-optimized tasks.

That's what graphics cards are for.

/joking, mostly

Besides reasons already mentioned above (and adding my own emphasis on suboptimal power performance), CPUs have GHz-scale clock domains, thanks to pipeline stages that minimize the number of gates traversed on each cycle. FPGA designs often run at frequencies ~10x slower. While one could instantiate pipeline stages using FPGA's logic calls, it's badly inefficient use of silicon. Maintaining separate clock domains, on the other hand, also consumes lots of silicon due to needing to isolate the domains, to avoid injecting noise into shared power/ground planes.

With things like AWS F1 instances[1], it seems anyone who wants to experiment with, or make production use of FPGA accelerated computing can do so without a large capital or time investment in the hardware platform itself.

That being said, as many others have pointed out in this thread, there are still tool-chain and conceptual hurdles to doing this effectively. If/when FPGA acceleration becomes more common-place, I think you'll see tighter integration of the FPGA with the server hardware platforms, perhaps to the point of it being on-package, if not on-die. The demand of the marketplace will dictate the pace and level of integration. That is of course also if the major CPU makers are in-tune and responsive to these market demands. I wouldn't be shocked if Amazon, Google, etc... were designing their own server CPU chipsets from scratch, to include some features they can't get from Intel/AMD. But at least with the Intel acquisitions of Nervana[2] and Altera[3], I'd say Intel seems to be aware of evolving computing needs.

But the major data center operators have been building their own solutions further up the "hardware stack" for some time. Amazon, Facebook and Google have bypassed the major equipment makers (Dell, HP, Cisco, EMC, etc...) and gone ahead with their own server and switch hardware designs to suite their specific datacenter needs[4].

1 - https://aws.amazon.com/ec2/instance-types/f1/

2 - https://www.recode.net/2016/8/9/12413600/intel-buys-nervana-...

3 - https://newsroom.intel.com/news-releases/intel-completes-acq...

4 - https://www.wired.com/2016/03/google-facebook-designing-open...

Intel is working on that, it's most of why they bought Altera

Xilinx and Altera have/had FPGAs with CPUs (ARM and PPC I think.)

Intel is slowly getting on board. The Altera acquisition was likely geared toward this, and they already had reconfigurable "coprocessors" on certain products.

See US20170090956A1

Usually it's the other way around. That is, FPGAs with onboard hard CPU cores. Great for making custom SoCs.

Cypress PSoC?

The SmartFusion2 from Microsemi is also a contender for Cortex-M based FPGA/CPU combination boards, on the extreme low end. I haven't used a Cypress yet. (The SF2 dev kits actually come with about 25k logic cells, which is a fair amount for such a small board with an M3).

Tailfins for CPUs. Do not want.

MIPS did something like that. Each MIPS variant required different compiler flags to get good performance. Software distribution was a huge pain, with multiple versions. MIPS peaked in the 1990s.

"absent improvements in microarchitecture they [CPUs] won’t be substantially faster, nor substantially more power efficient, and they will have about the same number of cores at the same price point as prior CPUs. Why would anyone buy a new CPU?"

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact