IceStick Tutorial: experience FPGA design and RISC-V using $40 FPGA device

snvzz · on Dec 8, 2020

Whatever you do, don't buy an icestick. They're not cost efficient.

Check aliexpress for iCESugar. That's an iCE40 UP5K development board that's half the price (~$30) as the icestick and much more powerful.

It also works fine with the open stack.

Full disclosure: I own iCEStick, iCESugar, BlackICE MX and TinyFPGA-BX.

jstanley · on Dec 8, 2020

I have dabbled with Arduinos but have never used an FPGA.

Can you say whether this tutorial would still work on something that is not an IceStick?

The claimed $40 is not expensive enough to put me off, and if it means I have a tutorial to learn from instead of trying to wing it on my own, it's probably worth the extra.

snvzz · on Dec 9, 2020

>I have dabbled with Arduinos

STM32 are worth looking into, with "blackpill" boards at around $2 on aliexpress. Investigate stm32-base.org if you're curious.

>Can you say whether this tutorial would still work on something that is not an IceStick?

Yes, absolutely. Compatibility across the board is excellent for the whole iCE40 family, at least while using the open flow (icestorm, yosys and nextpnr). I've never tested the proprietary vendor tools. For the suggested iCESugar, You'll have to map the pins (edit the pcf constraint files to match those in the iCESugar board), but this is a basic operation you'll have to do for any design outside of a tutorial. You'll also have to tell nextpnr-ice40 to route for LP5K instead of HX1K.

As the external clock source in ICESugar is 12MHz like iCEStick, you won't even have to do any adjusting of parameters involving clocks.

I'd suggest using PMOD 2/3 as they don't share any pins with the onboard functionality. PMOD 1 can be freed by removing the jumpers connecting the serial port, and by not using the FPGA-dedicated USB port.

Overall, the iCESugar has a lot of I/O, whereas it is easy to run out of usable pins on the iCEStick.

There's some differences between the FPGA chips, but the tooling is the same. Relative to the HX1K, among other niceties, the LP5K is way newer (newest subfamily), has two internal oscillators, has way more (5K vs 1K) logic blocks, so they can fit a lot more logic, some hard blocks for basic peripherals (which are out of the way unless wired by your design) and more sysMEM blocks. This has proven to be extremely useful in my projects.

The only aspect in which the LP5K is disadvantaged is in slower propagation speed (HX is the "high performance, power doesn't matter" subfamily), so the max clocks for a given design will be lower. Say, 120MHz vs 90MHz on the same design (not as dramatic as this). This will typically not matter in most tutorials out there, which seldom use the PLLs at all, thus the clock is always the non-scaled 12MHz source.

bb88 · on Dec 9, 2020

There are several dev fpga boards. I recommend the ICEBreaker board for a first go.

ruslan · on Dec 8, 2020

Olimex iCE40HX8K-EVB is $50, it comes with 512KB SRAM chip, two buttons, two LEDs and lots of IO wired to five block connectors.

snvzz · on Dec 8, 2020

Thanks for the tip. I'd get one if I didn't already have too many iCE40s.

UncleOxidant · on Dec 8, 2020

I've got a BlackICE II from a couple of years ago. Any reason to upgrade? I was thinking of ULX3S ECP5 board but it seems a bit pricey. https://www.crowdsupply.com/radiona/ulx3s

snvzz · on Dec 9, 2020

Nothing major that I am aware of in II vs Mx. I'd argue the old board is better thanks to SRAM.

I did order ulx3s but several months later I am still waiting, no news.

homarp · on Dec 8, 2020

git: https://github.com/wuxx/icesugar icesugar maker: https://twitter.com/JohnnyW11773607

he has his own shop on aliexpress

snvzz · on Dec 8, 2020

>he has his own shop on aliexpress

https://muselab-tech.aliexpress.com/store/5940159

Sells a few interesting tools besides iCESugar.

The nanoDLA's a cheap OSHW 24MHz logic analyzer that works with pulseview (sigrok) and is sure to be invaluable when trying to debug design issues with the FPGA.

bit-hack · on Dec 8, 2020

Its still blows my mind that he fit this in ~1k LUTs. I'm new to FPGA but the closest I got was about 4.5K LUTs for a comparable core.

I wish there were more easy to access material about timing and size optimization for FPGA.

homarp · on Dec 8, 2020

see also https://github.com/olofk/serv for small core!

throwaway08320 · on Dec 8, 2020

FOMU also provides an RV32 on an FPGA that fits inside an USB port:

https://tomu.im/fomu.html

ruslan · on Dec 8, 2020

Lattice has a worship $10 iCE40 series. I'm playing with iCE40HX8K at the moment (Olimex iCE40HX8K-EVB board). I was able to configure a RV32I based SoC with APB3, UART, Timer, RAM and SDRAM controller that fits in 2500 LUTs. I'm using SpinalHDL, which appeared to be a very convenient object oriented way of defining your hardware. SpinalHDL is a Scala based HDL language (Chisel from HiFive is another one, also Scala based) that compiles to a very long Verilog file which you then feed to IceStorm toolchain (Yosys) to synthesize bitstream for FPGA. It also allows to incorporate RV32I binary which deploys right into BRAM of your FPGA after startup. I think iCE40 is a very good FPGA to start learning open source design tools.

ekiwi · on Dec 8, 2020

In case you have a FOMU and are interested in exploring Chisel, I recently added a simple Blink example that should help people get started: https://github.com/im-tomu/fomu-workshop/blob/master/chisel/...

ruslan · on Dec 8, 2020

Thanks, will check it. I'm playing with FPGAs for a couple of weeks only, RISCV was a starting force for me. Currently am trying to figure our which is better SpinalHDL or Chisel. I find SpinalHDL's syntax more convenient and less verbose, but Chisel has way larger support from HiFive surrounding community.

ekiwi · on Dec 9, 2020

I think in general Chisel has more people working on it. However, most people using Chisel are targeting ASICs, thus Spinal HDL wins when it comes to supporting features that are important to FPGAs. Chisel also hasn't anything close to the awesome VexRiscV core which was developed for FPGAs. The most developed cores written in Chisel are much bigger and more complex as they target ASICs. See the chipyard repository which tries to lower the learning curve a little bit : https://github.com/ucb-bar/chipyard/

ruslan · on Dec 9, 2020

This is very interesting. I thought Berkeley's BOOM is way more advanced implementation of RISCV than VexRiscV and it is written in Chisel3. Though I have not played with BOOM yet, cannot say how deeply can it be configured. But you seem right, BOOM does not look like most effective in terms of LUTs utilisation core. Another thing I also disliked in Chisel3 is dependancy on new intermediate representation of hardware (FIRRTL) which adds one more layer of abstraction and compilation.

ekiwi · on Dec 9, 2020

> I thought Berkeley's BOOM is way more advanced implementation of RISCV than VexRiscV and it is written in Chisel3.

It is a lot more advanced and thus harder to get started with than VexRiscV.

> BOOM does not look like most effective in terms of LUTs utilisation core.

So the problem is also that the BOOM design is targeted at ASICs. The developers do not generally synthesize BOOM for FPGA. Only as a FireSim project which is using FPGAs to do fast simulations in order to get more accurate performance figures by running real world benchmarks (multiple MHz of target frequency). None of the developers are interested in using BOOM as a computer on an FPGA and thus no one has provided support for that.

> Another thing I also disliked in Chisel3 is dependancy on new intermediate representation of hardware (FIRRTL) which adds one more layer of abstraction and compilation.

I really enjoy working with firrtl. It is generally easy to inspect and quite human readable. With firrtl you can:

- automatically add coverage instrumentation:

  - for fuzzing: https://github.com/ekiwi/rfuzz/tree/master/instrumentation/src/rfuzz

  - for simulator independent coverage [wip]: https://github.com/freechipsproject/treadle/pull/263

- besides Verilog the firrtl compiler can generate SMTLib or btor2 file for model checking: https://github.com/chipsalliance/firrtl/tree/master/src/main...

- for ASIC development it is important to be able to replace all memories in your design with appropriate designs specific to the technology you target: https://chipyard.readthedocs.io/en/latest/Tools/Barstools.ht...

Adam's thesis has some more examples on what you can do with firrtl: https://www2.eecs.berkeley.edu/Pubs/TechRpts/2019/EECS-2019-...

There also might soon be a second implementation of a firrtl to Verilog compiler if you prefer C++ over Scala: https://github.com/llvm/circt

ruslan · on Dec 9, 2020

Well, as I said this all stuff is very new to me, I do not understand many of the solutions, thanks for explaining. Using C++ as HDL is also interesting idea, for many this could lower the entry barrier.

throwaway08320 · on Dec 8, 2020

worship?

ruslan · on Dec 9, 2020

I meant famous among FOSS community, of course.

snvzz · on Dec 9, 2020

Due to icestorm's timing, it is now the FPGA family with most mature open flow support. They're also cheap and have cheap OSHW development boards, which helps.

Besides iCE40 (project icestorm), there's also ECP5 (project trellis) and quicklogic eFPGA (support provided by vendor itself!), all in good shape.

Then there's some more, like GW1N (project apicula) and Xilinx family 7 (project x-ray), in a partially-working state.

incrudible · on Dec 8, 2020

I've looked into FPGAs that are connected to PCIe and provide high throughput to the host, but it seems these are always in the thousands of dollars.

What's preventing the FPGA AIB from becoming a commodity like a GPU? Is it just economics of scale, or are these things fundamentally more expensive?

structural · on Dec 8, 2020

It's somewhat economics of scale, but also that small/cheap FPGAs are sized rather closer to what you'd expect in a microcontroller (and that is quite sufficient for many designs).

Supporting PCIe and other high-speed interfaces (10Gb ethernet and beyond, etc) requires physical transceivers which look a whole lot more like "wireless communications over a PCB trace" and less like the traditional "drive a digital signal over a GPIO pin". These interfaces also typically drive requirements for data buffering as well as higher clock rates: all of these things increase size and cost.

That said, you can get basic chips that will do this for much less than "thousands of dollars" - Xilinx's Artix line is optimized for low cost + relatively high numbers of transceivers compared to the number of logic cells. You'd probably be interested in a development board something like the PicoEVB, which is in the USD $200 range and provides a M.2 form factor / PCIe interface to a FPGA. The actual FPGA itself to do this can be had for less than this... but the cost of the PCB, connectors, adding DDR memory, etc., etc., do start to add up.

mng2 · on Dec 8, 2020

The most reasonable board for hobby PCIe (< $100) right now is probably the SQRL Acorn, it was designed for mining but it turned out to be pretty useless for that purpose. There are cheaper FPGAs with PCIe support coming onto the market, but they tend to be low on fabric since they are designed for low end applications (Lattice Crosslink NX).

I think there are a few fundamental problems with FPGA compute offload. First is that, as niche products, FPGAs are always a node or two behind the leading edge, so their logic gates run slower than CPUs and GPUs; even their "hard" compute blocks are not as fast. Second, the fundamental nature of FPGAs as a "sea of logic" mean that routing delays reduce your max frequency, or make pipelining necessary thereby incurring latency. Third is memory; historically, FPGAs have not supported high-bandwidth GDDRx, and if you're crunching on something you generally want bandwidth. The latest high-end FPGAs do have HBM but they are quite expensive.

So why would anyone want something that's slower and more expensive? Well, you have to be doing something special, like a lot of custom parallel processing pipelines, or with hard realtime requirements. Basically, a niche, and that doesn't lend itself to economies of scale.

homarp · on Dec 8, 2020

SQRL Acorn are 'second-hand' eBay only To get started once you get one, it's supported by https://github.com/litex-hub/litex-boards

https://www.element14.com/community/community/project14/hard... links to more or less everything you need to know.

homarp · on Dec 9, 2020

and also https://github.com/enjoy-digital/litex/wiki/Use-LiteX-on-the...

RantyDave · on Dec 8, 2020

The problem is floating point math. Mostly we want GPU's (and TPU's) to crunch floating point match really quickly. However, FP units are complicated things and take a disproportionate amount of FPGA fabric. Add to this the lower clock speed and suddenly GPU's start to look really cheap.

egsmi · on Dec 8, 2020

Some FPGAs are commodities. Here’s one with a PCB for $13. Can be used with an open tool chain too. I could probably find an even cheaper one if I looked harder.

https://store.tinyfpga.com/products/tinyfpga-a1

Just like with GPUs, FPGAs with bigger die areas cost more. The biggest ones are expensive, just like with GPUs.

ajfjrbfbf · on Dec 8, 2020

All TinyFPGAs are sold out for now :(

egsmi · on Dec 9, 2020

Interesting. The AX is out but the AX1 ($15) is in stock. https://www.crowdsupply.com/tinyfpga/tinyfpga-ax-bx

I suppose the price is a bit misleading because the AX also requires the $12 programmer so with shipping+tax to CA it's closer to $30 total out of pocket.

Personally I use the BX because having the flash, and a bit more logic, is convenient.

snvzz · on Dec 10, 2020

The AX is not iCE40, thus no icestorm support, and it is tiny. Few designs can fit in there.

Do look for iCESugar on aliexpress instead.

the_only_law · on Dec 8, 2020

I know hardly anything about actually using FPGA’s (though learning would probably make my life easier), but when I was doing research a while back I found some Lattice ones which were ~$200. No idea if they are capable of what you’d want to do.

ruslan · on Dec 9, 2020

How about OrangeCrab ? It is Lattice ECP5 based (24k LUTs) with DDR3 and USB directly connected to FPGA. The price was $138 last time I checked.

snvzz · on Dec 9, 2020

At that price point, I would look at ULX3S.

mciancia · on Dec 8, 2020

There are some chinese FPGA card that you might find interesting: https://pl.aliexpress.com/item/1005001275162791.html

mastax · on Dec 8, 2020

That chip is $967 on Digikey, yet that entire dev kit is $259. It's annoying that the only way to get reasonably priced FPGAs is to launder them through China.

neurotech1 · on Dec 9, 2020

For those who want different cheap FPGA board options, this page is a good option. Some iCE series boards included.

https://joelw.id.au/FPGA/CheapFPGADevelopmentBoards

kelu124 · on Dec 10, 2020

With a focus too on Lattice's ;)

https://github.com/kelu124/awesome-latticeFPGAs

elcritch · on Dec 8, 2020

Projects like this are getting me excited for a possible future where you can run the entire MCU in simulation. Instead of buying a given MCU for embedded and dealing with crappy weird peripheral issues, or undocumented behavior, you an choose a RISC V core and architectural implementation. Then download the core and simulate it in software with an open source qemu module with peripherals and all using an open source RTOS, perhaps with some commercial RISC-V add-ons for your project. The burn it to an fpga and test real world or if you need/want purchase an ASIC version made by a third pzrty. Of course we're probably far from that in practice but in theory it's achievable.

btashton · on Dec 8, 2020

You can already do a lot with https://renode.io I have run a full simulation with a verilator (slooow) of a small RISC-V before I had hardware in hand.

yudlejoza · on Dec 8, 2020

Do we have to have a physical device?

What are the best FPGA simulators?

blackguardx · on Dec 8, 2020

Verilator [1] converts Verilog to C++ and runs simulations very fast. You can also use 8bitworkshop [2] to simulate Verilog in the browser. I believe they use Verilator to convert to C++ and then to javascript via emscripten.

[1] https://www.veripool.org/wiki/verilator [2] https://8bitworkshop.com/

jnwatson · on Dec 8, 2020

That's pretty cool. I have an IceStick. I'll have to try it out.

homarp · on Dec 8, 2020

if you use icestudio, there is also: RISC-V CPU for OpenFPGAs - https://github.com/Obijuan/RISC-V-FPGA