I have dabbled with Arduinos but have never used an FPGA.
Can you say whether this tutorial would still work on something that is not an IceStick?
The claimed $40 is not expensive enough to put me off, and if it means I have a tutorial to learn from instead of trying to wing it on my own, it's probably worth the extra.
STM32 are worth looking into, with "blackpill" boards at around $2 on aliexpress. Investigate stm32-base.org if you're curious.
>Can you say whether this tutorial would still work on something that is not an IceStick?
Yes, absolutely. Compatibility across the board is excellent for the whole iCE40 family, at least while using the open flow (icestorm, yosys and nextpnr). I've never tested the proprietary vendor tools. For the suggested iCESugar, You'll have to map the pins (edit the pcf constraint files to match those in the iCESugar board), but this is a basic operation you'll have to do for any design outside of a tutorial. You'll also have to tell nextpnr-ice40 to route for LP5K instead of HX1K.
As the external clock source in ICESugar is 12MHz like iCEStick, you won't even have to do any adjusting of parameters involving clocks.
I'd suggest using PMOD 2/3 as they don't share any pins with the onboard functionality. PMOD 1 can be freed by removing the jumpers connecting the serial port, and by not using the FPGA-dedicated USB port.
Overall, the iCESugar has a lot of I/O, whereas it is easy to run out of usable pins on the iCEStick.
There's some differences between the FPGA chips, but the tooling is the same. Relative to the HX1K, among other niceties, the LP5K is way newer (newest subfamily), has two internal oscillators, has way more (5K vs 1K) logic blocks, so they can fit a lot more logic, some hard blocks for basic peripherals (which are out of the way unless wired by your design) and more sysMEM blocks. This has proven to be extremely useful in my projects.
The only aspect in which the LP5K is disadvantaged is in slower propagation speed (HX is the "high performance, power doesn't matter" subfamily), so the max clocks for a given design will be lower. Say, 120MHz vs 90MHz on the same design (not as dramatic as this). This will typically not matter in most tutorials out there, which seldom use the PLLs at all, thus the clock is always the non-scaled 12MHz source.
I've got a BlackICE II from a couple of years ago. Any reason to upgrade? I was thinking of ULX3S ECP5 board but it seems a bit pricey. https://www.crowdsupply.com/radiona/ulx3s
The nanoDLA's a cheap OSHW 24MHz logic analyzer that works with pulseview (sigrok) and is sure to be invaluable when trying to debug design issues with the FPGA.
Lattice has a worship $10 iCE40 series. I'm playing with iCE40HX8K at the moment (Olimex iCE40HX8K-EVB board). I was able to configure a RV32I based SoC with APB3, UART, Timer, RAM and SDRAM controller that fits in 2500 LUTs. I'm using SpinalHDL, which appeared to be a very convenient object oriented way of defining your hardware. SpinalHDL is a Scala based HDL language (Chisel from HiFive is another one, also Scala based) that compiles to a very long Verilog file which you then feed to IceStorm toolchain (Yosys) to synthesize bitstream for FPGA. It also allows to incorporate RV32I binary which deploys right into BRAM of your FPGA after startup. I think iCE40 is a very good FPGA to start learning open source design tools.
Thanks, will check it. I'm playing with FPGAs for a couple of weeks only, RISCV was a starting force for me. Currently am trying to figure our which is better SpinalHDL or Chisel. I find SpinalHDL's syntax more convenient and less verbose, but Chisel has way larger support from HiFive surrounding community.
I think in general Chisel has more people working on it. However, most people using Chisel are targeting ASICs, thus Spinal HDL wins when it comes to supporting features that are important to FPGAs. Chisel also hasn't anything close to the awesome VexRiscV core which was developed for FPGAs.
The most developed cores written in Chisel are much bigger and more complex as they target ASICs.
See the chipyard repository which tries to lower the learning curve a little bit : https://github.com/ucb-bar/chipyard/
This is very interesting. I thought Berkeley's BOOM is way more advanced implementation of RISCV than VexRiscV and it is written in Chisel3. Though I have not played with BOOM yet, cannot say how deeply can it be configured. But you seem right, BOOM does not look like most effective in terms of LUTs utilisation core. Another thing I also disliked in Chisel3 is dependancy on new intermediate representation of hardware (FIRRTL) which adds one more layer of abstraction and compilation.
> I thought Berkeley's BOOM is way more advanced implementation of RISCV than VexRiscV and it is written in Chisel3.
It is a lot more advanced and thus harder to get started with than VexRiscV.
> BOOM does not look like most effective in terms of LUTs utilisation core.
So the problem is also that the BOOM design is targeted at ASICs. The developers do not generally synthesize BOOM for FPGA. Only as a FireSim project which is using FPGAs to do fast simulations in order to get more accurate performance figures by running real world benchmarks (multiple MHz of target frequency).
None of the developers are interested in using BOOM as a computer on an FPGA and thus no one has provided support for that.
> Another thing I also disliked in Chisel3 is dependancy on new intermediate representation of hardware (FIRRTL) which adds one more layer of abstraction and compilation.
I really enjoy working with firrtl. It is generally easy to inspect and quite human readable.
With firrtl you can:
- automatically add coverage instrumentation:
- for fuzzing: https://github.com/ekiwi/rfuzz/tree/master/instrumentation/src/rfuzz
- for simulator independent coverage [wip]: https://github.com/freechipsproject/treadle/pull/263
Well, as I said this all stuff is very new to me, I do not understand many of the solutions, thanks for explaining. Using C++ as HDL is also interesting idea, for many this could lower the entry barrier.
Due to icestorm's timing, it is now the FPGA family with most mature open flow support. They're also cheap and have cheap OSHW development boards, which helps.
Besides iCE40 (project icestorm), there's also ECP5 (project trellis) and quicklogic eFPGA (support provided by vendor itself!), all in good shape.
Then there's some more, like GW1N (project apicula) and Xilinx family 7 (project x-ray), in a partially-working state.
It's somewhat economics of scale, but also that small/cheap FPGAs are sized rather closer to what you'd expect in a microcontroller (and that is quite sufficient for many designs).
Supporting PCIe and other high-speed interfaces (10Gb ethernet and beyond, etc) requires physical transceivers which look a whole lot more like "wireless communications over a PCB trace" and less like the traditional "drive a digital signal over a GPIO pin". These interfaces also typically drive requirements for data buffering as well as higher clock rates: all of these things increase size and cost.
That said, you can get basic chips that will do this for much less than "thousands of dollars" - Xilinx's Artix line is optimized for low cost + relatively high numbers of transceivers compared to the number of logic cells. You'd probably be interested in a development board something like the PicoEVB, which is in the USD $200 range and provides a M.2 form factor / PCIe interface to a FPGA. The actual FPGA itself to do this can be had for less than this... but the cost of the PCB, connectors, adding DDR memory, etc., etc., do start to add up.
The most reasonable board for hobby PCIe (< $100) right now is probably the SQRL Acorn, it was designed for mining but it turned out to be pretty useless for that purpose. There are cheaper FPGAs with PCIe support coming onto the market, but they tend to be low on fabric since they are designed for low end applications (Lattice Crosslink NX).
I think there are a few fundamental problems with FPGA compute offload. First is that, as niche products, FPGAs are always a node or two behind the leading edge, so their logic gates run slower than CPUs and GPUs; even their "hard" compute blocks are not as fast. Second, the fundamental nature of FPGAs as a "sea of logic" mean that routing delays reduce your max frequency, or make pipelining necessary thereby incurring latency. Third is memory; historically, FPGAs have not supported high-bandwidth GDDRx, and if you're crunching on something you generally want bandwidth. The latest high-end FPGAs do have HBM but they are quite expensive.
So why would anyone want something that's slower and more expensive? Well, you have to be doing something special, like a lot of custom parallel processing pipelines, or with hard realtime requirements. Basically, a niche, and that doesn't lend itself to economies of scale.
The problem is floating point math. Mostly we want GPU's (and TPU's) to crunch floating point match really quickly. However, FP units are complicated things and take a disproportionate amount of FPGA fabric. Add to this the lower clock speed and suddenly GPU's start to look really cheap.
Some FPGAs are commodities. Here’s one with a PCB for $13. Can be used with an open tool chain too. I could probably find an even cheaper one if I looked harder.
I suppose the price is a bit misleading because the AX also requires the $12 programmer so with shipping+tax to CA it's closer to $30 total out of pocket.
Personally I use the BX because having the flash, and a bit more logic, is convenient.
I know hardly anything about actually using FPGA’s (though learning would probably make my life easier), but when I was doing research a while back I found some Lattice ones which were ~$200. No idea if they are capable of what you’d want to do.
That chip is $967 on Digikey, yet that entire dev kit is $259. It's annoying that the only way to get reasonably priced FPGAs is to launder them through China.
Projects like this are getting me excited for a possible future where you can run the entire MCU in simulation. Instead of buying a given MCU for embedded and dealing with crappy weird peripheral issues, or undocumented behavior, you an choose a RISC V core and architectural implementation. Then download the core and simulate it in software with an open source qemu module with peripherals and all using an open source RTOS, perhaps with some commercial RISC-V add-ons for your project. The burn it to an fpga and test real world or if you need/want purchase an ASIC version made by a third pzrty. Of course we're probably far from that in practice but in theory it's achievable.
You can already do a lot with https://renode.io I have run a full simulation with a verilator (slooow) of a small RISC-V before I had hardware in hand.
Verilator [1] converts Verilog to C++ and runs simulations very fast. You can also use 8bitworkshop [2] to simulate Verilog in the browser. I believe they use Verilator to convert to C++ and then to javascript via emscripten.
Check aliexpress for iCESugar. That's an iCE40 UP5K development board that's half the price (~$30) as the icestick and much more powerful.
It also works fine with the open stack.
Full disclosure: I own iCEStick, iCESugar, BlackICE MX and TinyFPGA-BX.