You're responsible for time critical and/or DSP things that are critical system architecture, and will never be outsourced (it's key IP, you'd be stupid to outsource that).
Don't let anyone know, we have a good thing going on.
That doesn't usually stop companies. At best it creates an opening for new companies to recapture some high-value markets once the dust from the off-shoring stampede settles.
I don't suppose there's an appropriate tutorial on egghead.io.
EDIT: You may also hear the phrase "making bank" to mean making (earning, not printing) a lot of money.
I've never seen non-college degree electrical engineers.
I’m glad I don’t have to job surf, it would be hard without paper, but my experience would be very valuable in my tiny field.
I expect the main workflow will be using OpenCL to offload arbitrary work to the coprocessor along with a few Intel-provided modules capable of common tasks.
The great thing about having the FPGA on-die via UPI is that the cache-coherency, decreased latency and massive bandwidth will allow much more granular offloading of work. This is as compared with PCIe coprocessor where it only makes sense to offload larger chunks of work and minimise the communication and data passing between the two.
The greater the granularity of work that we can offload, the more viable the OpenCL/high-level synthesis/heterogeneous computing type stuff will be, as it will integrate more seamlessly into existing software development methods. This is the holy grail at the moment for FGPA vendors: to get to the point where software developers can program them on their own.
As to your point though I guess we'll find out soon what the dev tools for this will actually look like.
From what I've seen the only applications for this will be pre-canned FPGA images that were written in-house by Intel for things like encryption or FEC.
A former colleague is now an Altera/Intel FAE, I will ask how things are going the next time we meet up for a beer.
Even very well-oiled products like the Amazon F1 and Intel's new acceleration cards are pretty non-trivial to use unless you're an experienced engineer, and they clearly spent a lot of time polishing off the roughest edges to make it as approachable as possible. Amazon more or less paid a team of seriously experienced people to make the whole flow as painless as possible (including a pretty good software SDK, and a lot of tooling around Vivado), and it's still non-trivial!
Some of the ARM/FPGA combos are a lot more approachable overall, but the tooling, BSPs etc are normally huge pains the moment you want to get creative, and the SDKs are comparatively worse, vs the "high end" ones listed above, in my opinion. Normally I just end up replacing them with my own Linux BSP, more or less, if I need the actual host side.
A really annoying aspect of all this though is that most kits which could offer features like high-speed peripherials (PCIe, etc) are pretty expensive for hobbyist developers to acquire and use, so there's certainly a bit of a self-fulfilling-prophecy going on here.
The translation from Software -> Flip-Flops isn't nearly as natural and it's easy to try and apply SE techniques that while possible are totally unsuited for an FPGA.
It's a interesting space for sure and I'm definitely curious how these hybrid systems shake out.
I have my hobby project in SpinalHDL up at https://craigjb.com
Edit: also, GTKWave is pretty good! It’s a simple and straightforward waveform viewer that works on all platforms.
Grab a Lattice CPLD/FPGA demo board($25-$50) and have at it!
If you are coming from a software background then maybe look at getting a board with one of the ARM+FPGA hybrid chips, you can run a proper OS on the ARM CPU and I would guess that the bus interface to the FPGA part will be simpler than using Intel's UPI.
Have a look at nGraph: http://ngraph.nervanasys.com/docs/latest/optimize/generic.ht...
Co-design for hardware+software is tough, for sure. But the reality is that hardware has to be present to build the software on top of it. People need something to play with. So the "bag on the side" of FPGAs here is kind of like Lego blocks for cache / acceleration. If you are running a DNN for inference, for example, cache is usually your bottleneck. Rent GPUs to train the model, figure out your bottleneck, and build your own isolated and local system for the "expensive lot of folks" to create the valuable IP.
Over time I suppose they could have some machine learning that automatically configures the FPGA based on which programs you are running and the types of computations they have historically used.
The evolved solution involved using only 37 out of the 100 gates available, no clock, and some units were logically disconnected yet disabling them caused the chip to fail at its task, indicating it was relying on EM effects particular to the chip. Truly amazing (and perhaps a bit creepy too).
Nor do I think they will be invented soon. Machine learning is bad at doing discrete things, bad at things requiring zero mistakes, and bad at problems for which the answer can't be perfectly verified (due to needing to solve the halting problem).
Xilinx is already pushing it's Vivado HLx which is pretty much that. I'm not aware of Altera's current offerings but they won't overlook this so easily.
These (almost by necessity) make you write code that doesn't look like any other C code. You can't just use some pre-existing C library and compile it to use FPGA resources. At that point, why not go a step further and just use a proper HDL?
You're assuming that you're starting with an FPGA only project. In reality most projects that are going to be accellerated with these Xeon+FPGA systems is existing software where only a handful of hotspots will have be ported to the limited C language which is significantly less effort than rewriting the entire algorithm in HDL.
It took me a while to grok HDL and write working code (good, maintainable code aside) even though I took digital design courses.
FPGAs don’t get firmware.
HDL is a description of hardware.
HDL can be referred to as code. And I do know what HDL is and what FPGAs are, thank you.
No, it can't. That's why the caps are there and I'm sticking to it. You basically justified it ;)
Code is something that you design to run sequentially. HDL is a description of gates and logic that runs all at the same time.
If you wanted to argue that the bitstream is firmware - that's a tougher discussion.
Unlike Xilinx, though, Intel (very very recently) just started offering their OpenCL-for-FPGA SDK for free, and it works on most of their device families, including Arria/Xeon and Cyclone/ARM. I always found it disappointing that the OpenCL SDKs were normally licensed, since they're more-or-less a logical extension of HLS support. So that's nice of Intel.
They don't have any equivalent to Xilinx SDSoC though, but for datacenter targets they're shipping a different set of SDKs anyway (called "OPAE"), so maybe in the future they'll build something on top of the OPAE and OpenCL support (e.g. single-source model, like Sycl)
A lot of flailing about, as the ship goes down.
(Speaking simply generally, about this "word". Not about the topic(s) in this thread.)
It's not really that hard to get rolling at this point.
So the non-recurring engineering (NRE) costs are very high, and you don't get to recover them over much volume. Even at 10x the cost, the vendor is still losing money.
Intel isn't making a lot of money on those boards if they aren't actually taking a loss.
1) A PCI-e board is non-trivial. It requires some engineer to sit down and do some serious signal integrity work.
2) A board with a fine-pitch BGA is non-trivial. These boards generally have via-in-pad and blind vias on the top two layers. The boards are also generally 8-12 layers.
3) The support circuitry on those boards in non-trivial. There's high-speed, fine pitch connectors that are probably in the $100 each range, themselves.
4) Really, you're paying for customer support. You will hit bugs. You will have to communicate with Intel. You probably won't order enough volume to make it worthwhile for Intel.
xilinx is a much better choice in fpga
That could be a real problem for getting performance out of these systems.
Sounds a lot like what OpenCAPI (https://opencapi.org) offers.
(disclosure: work for IBM on CAPI/OpenCAPI firmware enablement)
A number of obvious interesting applications jump out at me: crypto acceleration for mining, machine learning acceleration, JIT acceleration for interpreted languages… How doable is all this? Would you have to roll your own code or are there libraries?
Wrt interpreted languages, I use Ruby fairly often and now that there's MRuby I wonder if Ruby could be made run blazingly fast on something like this Xeon+FPGA thing?
Oh to have a spare $€£￥ to get me one of these.
This is important because unlike in software where performance scales well. For FPGA you would have to decompose every matrix multiplication into 11x16 style matrix multiplies. They don't mention this overhead in their specs.
The model is just too different for these things to make sense 99% of the time. It's like trying to run general purpose code on a GPU, but even worse.
Typical programming languages run code one line at a time from the top to the bottom, occasionally calling functions or looping.
FPGA's instead have code which specifies a kind of circuit diagram of what is connected to what.
You can make something to translate one to the other, but by translating a programming language to a circuit diagram you usually end up bits of creating circuitry for each line of code, and then extra circuitry which makes each bit be activated in sequence.
That typically leaves 99.9% of the circuitry unused (deactivated) at any point in time. FPGA's don't have much space for circuitry, so by translating a program from C to an FPGA directly you'll usually end up with a huge design with a very low throughput, since the vast majority of the circuit is sitting idle most of the time.
Real FPGA designs will typically aim to use circuitry very sparingly, and keep as much of it in use as possible all the time to get maximal performance. Complex, sequential and non-performance critical tasks are typically not a good fit for an FPGA and are usually offloaded to a programmable processor instead.
The more useful way to write hardware using programming languages is to create a model of hardware and then use the language features to manipulate that model, and use it to create abstractions for actual hardware patterns.
"...The input of Hastlayer can be a program written in dozens of programming languages (including several of the most popular ones as C#, C++, Python, PHP..."