Hacker News new | past | comments | ask | show | jobs | submit login
Exploring FPGA Graphics (2020) (projectf.io)
136 points by rbanffy on Feb 2, 2021 | hide | past | favorite | 38 comments

Original author here. I’ll do my best to answer any questions. Be gentle; this is my first time on the receiving end of Hacker News. :)

Nice series! I've considered doing something similar due to the dearth of decent material on FPGA and hardware design on the internet though never quite got around to it...

For the sprites did you consider a part on dealing with multiple overlapping sprites? (I just skimmed, so apologies if you did this somewhere and I missed it!) There's at least two approaches here and I think it's an interesting thing to explore to discuss design trade-offs.

The two approaches I can see are:

1. Load the relevant line of each sprite into flops during the blank interval, when drawing determine which sprites produces a pixel for a given output pixel then apply ordering, taking into account transparent pixels to determine which one wins. You have a simultaneous read of all of the sprite flops each output pixel to do this

2. Pre-render all of the sprite pixels for a line into a memory buffer during the blanking interval. If you have multiple sprite pixels per cycle from memory and take advantage of that fact your FPGA can probably run at some multiple of the pixel clock (say 100 MHz vs the ~25 MHz pixel clock for 640x480) you do quite a lot in the blanking interval.

I reckon 2 wins when you're wanting more sprites per line and higher bit-depths per sprite pixel (8-bit 256 colour sprites vs 2/4/8 colour sprites).

I've got a ~60% complete version of 2, maybe I should follow you lead and finish it off and write about how it works.

I’d strongly encourage you to finish your write up and post it. As you say, there is a dearth of decent FPGA material, especially for hobbyists and those trying to learn hardware design.

You can do the pre-rendering on the previous line if you can spare two buffers. You just render into a buffer, then during the blanking interval latch it into a shift register. That gives you a lot more time to render.

You're right and I think I'm actually doing that on my 60% complete prototype. Haven't touched it for a couple of months and I've forgotten some of the details!

Having now actually looked at the code looks like you did a version of 1, but as you've only got 1-bit pixels the combination stage is trivial (of the 8 sprites are any on, if so output a sprite pixel)?

I assume you’re talking about FPGA Ad Astra? https://projectf.io/posts/fpga-ad-astra/

The first sprites post, https://projectf.io/posts/hardware-sprites/ does include 4-bit sprites, but again no overlapping.

I am planning another sprite post with overlapping coloured sprites combined with a framebuffer background. I will probably use approach 2, as you describe, but with hardware design, I’ve found that the best algorithm can surprise you in practice.

What’s your opinion about SystemVerilog? Can it be used as a higher level language compared to VHDL? Or it is still on same level?

Doing VHDL designs for a living here, using SystemVerilog once in a while when absolutely required. Take my words with a good pinch of salt!

SV is anyway a hardware description language, so you still think and code in terms of modules and signals (wires) to be connected. But if you want to create structures that use variables and have a more "procedural" style, I find that SV is more agile than VHDL for this specific purpose.

If you want real high level languages for hardware there are tools that provide synthesis from C/C++, Python and more software oriented languages.

> there are tools that provide synthesis from C/C++, Python and more software oriented languages

That sounds extremely interesting!

Beware, its much harder to get the correct hardware design useing these tools.

If that doesn't matter, then go for it!

I would imagine that.

A functional language like Haskell would probably be easier to compile to hardware.

SystemVerilog has several useful additions for FPGA engineers, but tool support has been slow to materialise. For the moment, I’ve kept to the few features that Xilinx Vivado and Yosys http://www.clifford.at/yosys/ are both happy with, such as enums, logic data type, and always_ff etc.

I’ve not used VHDL in anger, so can’t give you a useful opinion there.

I suggest increasing the text contrast on your site. It's barely readable on my screen as it is.

It’s something I’m looking into. The font seems to render thinner on some platforms, making it hard to read.

Do you have any learning materials recommendations for digital design and fpga programming?

Alas, this is a surprisingly hard question to answer, at least for FPGA design.

Here are a few sites that helped me learn:

* https://www.fpga4fun.com

* http://fpgacpu.ca

* https://www.nandland.com/verilog/tutorials/index.html

* https://1bitsquared.com/pages/chat (Discord)

And a few interesting blogs:

* https://blackmesalabs.wordpress.com/

* https://tomverbeure.github.io

* http://labs.domipheus.com/blog/

None of the introductory FPGA books have impressed me, so I don’t have any recommendations there. Maybe some other HN readers will have some suggestions?

Thanks! I feel the best way to learn could be a set of labs of increasing difficulty with attached testbenches for simulation self-check This specialization https://www.coursera.org/specializations/fpga-design has something like that, but unfortunately it's fairly shallow

I bought several books but this is the found I found most accessible: "RTL Hardware Design Using VHDL" by Pong Chu https://ieeexplore.ieee.org/book/5237648

Not the author, but regularly also get asked this question, and I usually recommend Digital Design and Computer Architecture, 2nd Ed [1]; supplemented with practical work on an FPGA.

[1] - https://www.elsevier.com/books/digital-design-and-computer-a...


Any plans on making an actual GPU/accelerator? It looks like you already have the beginnings of one.

We’ll see :D

Forthcoming posts will cover simple 3D modelling with texturing and lighting in hardware. But that’s still a long way from a GPU.

I love the possibilities of custom display hardware. This makes me think hexagonal pixels, vector graphics, and input-to-photon latency so low it bends reality.

Just skimmed, but isn't collision detection in sprite hw usually incorporated in the overlay logic instead of just checking coordinates? If you have round sprites you don't want them to collide as rectangular shapes. The logic of collisions at the pixel level is more intuitive as well.

Did you look at the sprite implementation in the various C64 and Amiga FPGA re-implementations? I haven't, I'm a bit curious on how they do the pixel merging (the trivial implementation I think is a prioritized 1-8 mux controlled by some priority selection logic)

If you're referring to the original URL https://projectf.io/posts/fpga-graphics/, then there are no sprites involved. We define a square mathematically, then check whether it has collided with the edge of the screen. It's a quick way to produce animated graphics with minimal knowledge and code.

The second post on Pong https://projectf.io/posts/fpga-pong/ includes collision checking between the ball and paddles at the drawing level.

I plan to cover full sprite collisions and overlap in a future post. In the meantime, there is a post that covers sprites with transparency: https://projectf.io/posts/hardware-sprites/

I haven’t looked at the FPGA re-implementations, but I am familiar with Amiga hardware: http://amigadev.elowar.com/read/ADCD_2.1/Hardware_Manual_gui...

maybe a stupid question but it wasnt clear to me how this is able to work when the system clock doesnt run fast enough to keep up with 60hz. i was wishing that there was more elaboration about what a PLL is and how it allows a person to get free clock cycles.

PLLs are common to many electronics system, if you’re looking for a primer, try https://www.analog.com/en/analog-dialogue/articles/phase-loc...

However, from the point of view of these designs, the PLL is an internal black box that allows us to generate an (almost) arbitrary clock frequency.

We choose a frequency to meet I/O or performance requirements. In this case a pixel clock of ~25 MHz: we have to meet this to generate a valid 640x480 video signal.

None of the Exploring FPGA Graphics designs uses a CPU: the logic is all in hardware, often a finite state machine. The trick is ensuring the hardware logic completes within one clock cycle. For 25MHz, each clock cycle is 40ns. If a design is too complex to meet timing, we can break it into multiple steps, similar to pipelining on a CPU[1].

You can also run different parts of a design at different clock frequencies, but this introduces the challenge of clock domain crossing[2].

For a low-end FPGA, like an Artix-7, you can expect to run a reasonably complex design at 100-200 MHz.

[1] https://en.wikipedia.org/wiki/Classic_RISC_pipeline

[2] https://en.wikipedia.org/wiki/Clock_domain_crossing

PLLs ("phase locked loop") are entirely normal in both FPGA and microcontroller worlds. They involve a high-frequency oscillator and a frequency divider. The fast clock is divided down, as is the fast one, and the two signals compared. When they are out of phase, one is ahead of the other, and the fast oscillator adjusted accordingly.

This is also how PC BIOSes let you configure various CPU and DRAM speeds from a single physical crystal.

They're accurate enough for most practical purposes. They can take a large number of clock cycles to acquire "lock" and start producing the correct frequency, which rarely matters unless you need to go from 0 to X00MHz very quickly.

> system clock doesnt run fast enough to keep up with 60hz

It doesn't actually say that as far as I can tell?

This is likely not a useful explanation if you do not specifically know what a a Phase-Locked Loop is, but I want to say that it is yet another example of how negative feedback can give you an implementation for the inverse of a component.

In this instantiation of the concept, a frequency divider (dividing by a whole number) is something that's easy to build: it's a counter. For example, to divide by 100, count the input clock, when you increment from 49, wrap around to 0, and toggle the output.

Putting that divider in a negative feedback loop gives you a multiplier.

Another common instantiation: if you put a voltage divider (easy to build with a couple of resistors) in the feedback path of an operational amplifier and you get a voltage multiplier.

Another instantiation with op-amps: Put a capacitor in the feedback path, and you "get" an inductor. This one is also a big deal because, practically speaking, it's easier to engineer close-to-ideal capacitors than it is inductors, which are heavy and lossy.

And if you really want to take a ride on this conceptual train, consider that it's easy to square a number. Need the square root? Use negative feedback: https://en.wikipedia.org/wiki/Newton%27s_method#Square_root

The problem with clocks is not how to generate them fast enough, but rather how to keep them exact and consistent

Phase locked loop can use reference clock(generated outside of the chip by an oscillating crystal, like in a watch) to generate many other clocks of different frequencies each of which is corrected and guided by this reference clock

In fpga there is no "system clock", each module can use its own clock at frequency that it needs, all defined and routed by the designer

FPGA’s are a common man’s TSMC!!!

I agree! I'm a computer architect, and I love playing around with FPGAs for doing 'fantasy architecture' projects (like a 16-core Z80 laptop, or a USB adapter for an old nCube supercomputer). A modern FPGA is like being able to tape out your own 350-nm chip for <$100 (and in many respects much better, when you get to make use of hard-macros like SRAM or PCIe/DDR controllers).

.. not really? They're pretty niche and outperformed wildly by actual hardware.

They are actual hardware

Probably should have written "dedicated" or the good old "application specific integrated circuit".

Either way the distance is considerable. FPGAs do allow you to do unique hardware interfacing things - good luck getting a PCI microcontroller - but even there they're catching up. The RP2040 can bitbang DVI with its peripherals.

This is awesome, thank you. Hybrid FPGAs are the future and how we will create a parallel Internet, or what I call the Solar Bitcoin Radio 2050.

If you’re into hardware, please consider the applications that FPGAs could help make transformative.

If any hardware hackers are out there, please contact me at satoshi@137.lol. To clarify, I’m obviously not Satoshi Nakamoto but I know who she is.

This reads like it was generated by gpt

Applications are open for YC Winter 2024

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact