Hacker News new | comments | ask | show | jobs | submit login
Intel Programmable Systems Group takes step towards FPGA based system in package (newelectronics.co.uk)
43 points by zxv on July 30, 2016 | hide | past | web | favorite | 16 comments

I'm a newbie so pls excuse my ignorance .. Cypress has these had these things for a bit .. they call them PSOC. Basically, an ARM core plus some programmable logic. Is the difference here that the processor and PLD are both beefier? I'm generally confused about the difference between PLD vs FPGA vs CPLD. It seems there is no precise definition and it changes based on who is talking.

I would consider Cypress's PSoC as an ARM kit with a small logic block embedded. Except what is already present in the chip (DAC,CAN ..), it is impossible to implement any complex solution on the limited logic cells present in PSoCs.Basically glue layer between ARM and other components in design.

Closest thing might be Xilinx Zynq or Stratix 10. Both are comparable PSoC, comes with ARM core but with high density FPGA instead of small CPLD.

The MX series from article adds DRAM in larger size and with larger bandwidth than what is available currently.

My understanding of PLD/CPLD vs FPGA is that the CPLD is based of EEPROM for logic cells which are programmed before deployment (With exceptions), and they are operational as soon they are powered up. They are generally small in terms of number of logic cells they offer.

FPGA in other hands are based on RAM. They are programmed / booted at the power on (Internal/External Flash). They become operational once the program/design is transferred from flash to RAM based logic cells. They offer large number of logic cells thus allows the implementation of complex designs.

Hmm this made me wonder: what would happen if you made a system that supported multiple architectures (ARC, PPC, x86, FPGA, ...) at the same time ?

We're working on it - check out Popcorn Linux: http://www.popcornlinux.org

Basically, it's a Linux variant that launches a separate kernel on each ISA-island (think a set of cores), links the kernels together with a custom messaging layer and page coherency protocol to create a single system image for applications, and then provides a compiler and set of runtime tools to enable developers to write code like they would for a traditional single-ISA SMP machine but that can take advantage of the different ISAs in the system.

The short story is that there are performance and power benefits to be had, but only if you can support quick and efficient migration between architectures. I'm not at liberty to say right now exactly how that works (we're still publishing the work), but suffice to say: we've hooked up an x86-64 machine and aarch64 machine, made it look like a single system, and migrated applications between the two. It's pretty cool to watch processes move back and forth between architectures :-)

A general purpose CPU paired with an FPGA to offload specialized workloads onto seems like a really sweet deal - that is until you realize that configuring the FPGA with a new bitstream is pretty slow (so live reconfigurations would be irregular) and the toolchains for building code which controls interoperation between the CPU and whatever you've placed into the programmable fabric is poor (so designing good custom hardware accelerators is a time consuming dev task).

I spent a semester working with a Xilinx SoC, and the experience was enlightening. My computer engineering friends were very comfortable with gate description diagrams and debugging with input/output wires and waiting literal hours between test cycles. I was the only software engineer in the room, and all I could do was ask myself how anyone could be OK with this awful tooling situation. It really befuddled me - I was especially frustrated while using high-level synthesis tools which take C++ and convert it into a functioning hardware description (Alleviating the need to rewrite business logic in VHDL or Verilog). It would take well-formed C++ code with a simple API and give a pretty good hardware description (sometimes with better perf than a handwritten equivalent, with a little optimizing), but fail to generate a corresponding API for it on the associated CPU for anything beyond simple register access (despite starting with what was likely the desired software API)! IMO, FPGA tooling could use a lot of TLC, but maybe I just had a bad experience.

When you think of c++ you think sequentially, but hardware doesn't work that way. I think verilog or vhdl make more sense instead of trying to get c++ to work for hardware or having to come up with more c++ code to account for the way hardware works.

Things are changing tho, some modern FPGAs can be configured by writing to some SRAM, making the process of configuring much quicker.

Somen vendors even provide an OpenCL API/SDK with which you can express your algos at a higher level than VHDL.

FPGAs are awesome :)

And long before that, Xilinx made FPGAs with embedded POWER processors.

This can be a real breakthrough in computing technology. Just in time, as the improvement of desktop and server CPUs has stalled almost completely.

FPGA and CPU together is nothing new. AMD started offering Fiji + HBM in 2015 and recently nVidia joined with their Pascal + HBM2. Intel is lagging in general due to lack of competition.


I never said that. Thanks for -1.

That's not mine. You wrote about Fiji and Pascal, but AFAIK they have nothing to do with FPGA.

Site seems down. I tried to find an alternative source but didn't come up with much.

* Jordan Inkeles, Altera's director of product marketing for high end FPGAs

Speaking in 2012, Danny Biran – then Altera’s senior VP for corporate strategy – said he saw a time when the company would be offering ‘standard products’ – devices featuring an FPGA, with different dice integrated in the package. “It’s also possible these devices may integrate customer specific circuits if the business case is good enough,” he noted.

There was a lot going on behind the scenes then; already, Altera was talking with Intel about using its foundry service to build ‘Generation 10’ devices, eventually being acquired by Intel in 2015.

Now the first fruit of that work has appeared in the form of Stratix 10 MX. Designed to meet the needs of those developing high end communications systems, the device integrates stacked memory dice alongside an FPGA die, providing users with a memory bandwidth of up to 1Tbyte/s.

“A few years ago,” said Jordan Inkeles, director of product marketing for high end FPGAs, “we partnered with Intel for lithography and were very excited. We also looked at Intel’s packaging technology and asked ‘can we use that?’. The answer was ‘yes’. The combination has allowed us to do things we thought were not possible.”

The concept is based on what Altera – now Intel’s Programmable Systems Group (PSG) – calls ‘tiles’. Essentially, these are the dice which sit alongside the FPGA. Tiles are connected to the FPGA using Intel’s EMIB – embedded multi-interconnect bridge – technology. “It’s not a traditional silicon interposer,” Inkeles explained. “It’s a little bridge chip which is used where you need to connect two pieces of silicon.”

* Statix 10 MX is said to combine the programmability and flexibility of STratix 10 FPGAs with integrated 3D stacked high bandwidth memory devices

Stratix 10 MX devices are designed to help engineers solve demanding memory bandwidth challenges which can’t be addressed using conventional memory solutions. The parts integrate four stacks of HBM2 DRAM, each with up to four memory dice. PSG says the parts are suitable for use where bandwidth is paramount. Apart from providing 10 times more memory bandwidth than conventional solutions, Stratix 10 MX devices are said to be smaller and to use less power.

“This idea of integrated chips opens up things,” Inkeles said. “FPGAs are trying to be everything to everyone. They have to support wireless, wired, networking, radar and high performance computing, amongst others. We saw divergence in what was possible.”

PSG started thinking about transceivers. “If we had transceivers in separate tiles, we could come out with devices for different markets,” Inkeles continued. “It also makes sense for analogue, which doesn’t move at the same pace as digital, and for design reuse. So we could use a tile that meets today’s needs – say a 28G transceiver – then come out in the future with a 56G PAM4 tile and a 28G NRZ tile. In the same process node time frame, we can deliver two very different types of product.”

This is the concept underpinning the MX. “Parallel memory is becoming a huge challenge,” Inkeles observed. “You can continue to use parallel interfaces, but with the memory right next to the FPGA to maintain signal integrity and reduce power. But, while Hybrid Memory Cube (HMC) is a good solution, it has to be serial,” he continued, “as you can’t get signal integrity on a 72bit wide datapath. Or you can put memory in the package.

“By providing up to four stacks of four DRAM dice, we’re providing a memory bandwidth never seen before. Each stack can run to 256Gbyte/s, so four stacks give 1Tbyte/s. That’s unprecedented and can’t be achieved with HMC.

“Power consumption is reduced because the memory is right next to the FPGA and drive strength is much smaller – only pJ/bit – because you’re not driving signals to a memory that could be 6in away.”

There is a downside, however; it’s an expensive solution. “You’re paying for bandwidth,” Inkeles admitted. “But customers complain about the effort it takes to do board layout and to get the DDR chips right. We’ve solved that without using any I/O or transceivers. And if 16Gbyte of DRAM in package isn’t enough, you still have transceivers and I/O available for use with external components.”

Inkeles pointed to three broad application areas for the MX device. “There’s high performance computing (HPC), cloud computing and data centres, but they all look for different things.

“HPC says ‘give me everything, while cloud says it’s worried about the cost per bit. Data centres can build algorithms in logic, which is quicker than a GPU, but need the memory bandwidth to ‘feed the beast’.”

Apart from imaging applications, such as medical and radar, Inkeles says there are applications in wireline communications. “Gone are the days of just routing traffic,” he said. “Everyone is now looking to differentiate their products, for example, by providing statistics on the data being handled. So they need to hold a piece of traffic for a moment to analyse what it is, then send it onwards. This couldn’t be done before because there wasn’t the bandwidth.”

MX is the first implementation of PSG’s strategy and the interesting thing is ‘what comes next?’. It’s quite possible that optical functionality might appear at some point in Intel PSG’s Stratix 10 parts.

Five years ago, Altera announced plans to integrate optical interfaces into its FPGAs as a way to cope with increasing communications bandwidth. Despite demonstrating the technology later in 2011, the idea remained on the shelf. Inkeles said: “We have continued to evolve the technology, but haven’t gone public with the developments.”

Inkeles noted: “Although PAM4 offers a way to stay in the electrical domain, we will, at some point, run out of capability and we’ve been preparing for that transition. Now we have transceivers on tiles, we can take out one tile and replace it with an optical interface.

“We’ve been working behind the scenes,” Inkeles continued, “but the right time to put a product into the market will depend on the economics.”

Altera’s acquisition by Intel also gives it access to silicon photonics technology. “We have exciting capabilities,” Inkeles added.

* Heterogenous 3D system in package integration could enable a new class of FPGA based devices

Another potential step is integrating such components as analogue, ASICs and CPUs alongside an FPGA. Intel PSG says EMIB offers a simpler manufacturing flow by eliminating the use of through silicon vias and specialised interposers. The result, it claims, will be integrated systems in package that offer higher performance, less complexity and better signal and power integrity.

Inkeles sees this as potentially a new market. “ASICs have become smaller and faster, but not cheaper. Unless you’re going to sell millions, you will have a tough time,” he said. “ASSPs are going away, unless you can find more customers or more volume.”

Is it possible that Biran’s vision of ‘standard products’ might be close to reality and could that even include custom versions of a Stratix 10? “Will we do custom?,” Inkeles wondered. “It’s within our ability. It’s not something we’re promoting, but we are engaging with customers.

“We have a range of options. Now we’re part of Intel, the ‘sky’s the limit’. As Altera, we developed HardCopy and had an ASIC team, but it wasn’t our core competence. But Intel Foundry can do ASIC,” he concluded.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact