Understanding Linear Feedback Shift Registers in FPGAs

artemonster · 2024-04-05T12:04:40

I have used LFSR counter as a program counter in my relay CPU. Instead of building a 12bit half adder that increments +1 that would require 12 relays I got away with only 3 XORs and 3 relays. When assembling, the linker just scrambles program memory according to LFSR counting pattern and I get +1 counting behaviour :) This technique is used for some very tiny hard-coded CPUs (where program memory is not even ROM, but directly decoded datapath control, like decoded microops in real CPUs) that really need to save on area.

JoeAltmaier · 2024-04-05T12:06:28

How about Grey code? It's a one-bit change to 'increment'. Wonder if that would be about as simple.

FYI conversion between normal binary and grey code is to simply XOR successive pairs of bits.

artemonster · 2024-04-05T12:09:46

grey code is heavy on combinational logic to implement, its goal is not to reduce logic but force counting behaviour in such a way that only 1 bit is changing between transitions, as opposed in normal binary (i.e. 4b0111+1 would cause 4 bit changes). this is useful when crossing clock domains or to add extra builtin error detection capabilities

JoeAltmaier · 2024-04-05T13:22:51

Used to use it for contact-switches. You turn a knob, say four brushes slide across copper contacts to encode knob position. If you coded position in binary you'd be sparking several bit-changes which for some instants would jump all over the number range, e.g. from three (011) to four (100) could instantaneously go to seven (111) or two (010) etc.

Grey code would at most oscillate between the two values e.g. from 010 (three in Grey Code) to 110 (four in Grey Code).

kevin_thibedeau · 2024-04-05T16:01:07

Gray code can be implemented with an ordinary binary counter and an XOR with a right-shifted count. It isn't particularly "heavy".

artemonster · 2024-04-05T20:32:47

Sorry, after doing extremely low area designs every extra useless gate is „heavy“ :)

oasisaimlessly · 2024-04-05T21:02:46

The thing you're missing is that an LFSR counter can be 20% of the gates of a binary counter.

meindnoch · 2024-04-05T22:22:13

The original comment was precisely about avoiding an ordinary binary counter.

mzs · 2024-04-05T17:46:20

Nintendo's CIC did a similar ting for the same reason, a polynomial counter though: https://forums.nesdev.org/viewtopic.php?p=55288#p55288

dietrichepp · 2024-04-05T13:25:46

I have a more mathematical / software oriented article about LFSRs here:

https://www.moria.us/articles/demystifying-the-lfsr/

dfawcus · 2024-04-05T13:50:34

FWIW - The DVD CSS scheme is build upon a couple of LFSRs. They're then combined in 4 different ways (XOR of the streams, as is or inverted).

Three of the combinations are used for the "authentication" portion (as used by s/w), the other combination for the "encryption" portion.

o11c · 2024-04-06T03:28:42

Huh, that's stated in completely different terms than how I approached the problem. When you throw around terms like "ring theory" my eyes just glaze over.

I ignored polynomials, implemented a "jump" function via bit-matrix exponentiation (unsupported by most BLAS-like libraries!), then did prime factorization via the Cunningham tables, and checked for partial periods of their inverses.

Some minor points I remember:

* there are several boolean choices in how you implement the LFSR. Left-vs-right, forward-vs-backward, xor-vs-xnor ... in particular you have a choice of "no all-zero state" or "no all-one state".

* when comparing that advancing by a given number is a period, it's much more efficient to advance both sides to minimize the number of 1 bits.

* an informative period failure is 15 = 6 + 6 + 3. Doesn't this conflict with something in the article?

* the info you need for 4096 bits is not in the Cunningham tables but is available elsewhere on the internet. But due to some of the simplification rules, some much-larger periods are easily derived from the Cunningham tables (assuming you can properly parse them, of course).

* tradition makes people produce LFSRs with very few taps but if you're doing it in software there's no real reason you have to do anything other than keep them even and use the end bits. I never collected statistics on how long it takes to guess a good one though.

dietrichepp · 2024-04-06T19:33:56

> when comparing that advancing by a given number is a period, it's much more efficient to advance both sides to minimize the number of 1 bits

I’m missing something here… what do you mean by “advance both sides?”

> an informative period failure is 15 = 6 + 6 + 3. Doesn't this conflict with something in the article?

You’re talking about, say, P(x) = x^4 + x^2 + 1, for a four-bit LFSR? This will have three nontrivial cycles, with length 6, 6, and 3. Rather than talk about this specific polynomial / specific set of taps, we can consider that the initial (non-zero) state is in one of those three cycles. There are three possibilities for the length of a cycle:

1. Maximum period 2^k-1. In this case, 15.

2. Period which is a factor of 2^k-1, such as 3.

3. Period which is not a factor of 2^k-1, such as 6.

We can distinguish these case by picking an initial state and then doing a series of checks, where we advance the LFSR by N steps, and then see if it is in the initial state again.

For exactly the LFSRs with maximum period, the LFSR will be in the initial state after advancing 2^k-1 steps, but it will not be in the initial state after advancing (2^k-1)/p steps for prime factors p of 2^k-1. In the case of a 4-bit LFSR, we check N=15, N=5, and N=3. The cycles of length 6 will fail the test because they will be in a different state after advancing 15 steps. The cycles of length 3 will fail because they will be in the same state after advancing 3 steps. The end result is that the LFSR with cycles of length 6, 6, and 3 will be rejected, no matter which cycle the initial state is in.

> tradition makes people produce LFSRs with very few taps but if you're doing it in software there's no real reason you have to do anything other than keep them even and use the end bits.

The reason I was looking at this in the first place was out of a kind of retrocomputing interest, and using something like 0x100001b as your polynomial for a 24-bit LFSR can make the implementation slightly shorter / faster, on some older CPUs and with memory constraints.

It was easy to find a lot of handwaving and code / numbers copied from place to place, and I wanted to know how to generate the polynomials / pick the taps in the first place. The program I wrote just goes through the polynomials in lexicographic order, so it’s not actually trying to pick the fewest number of taps, but it will prefer sets of taps where the taps are all clustered on one side.

o11c · 2024-04-06T19:55:10

> I’m missing something here… what do you mean by “advance both sides?”

Instead of checking `state == advance(state, 2*32-1)`, which takes 32 multiplies, it's cheaper to do `advance(state, 1) == advance(state, 2*32), which only takes 2 multiplies (they both still take 32 squares unless you cache those). And actually the special case of 1 didn't require constructing the matrix at all, allowing a cheaper direct computation.

> The cycles of length 6 will fail the test because they will be in a different state after advancing 15 steps.

Okay, now this rings a bell. It has been quite a while since I actually played with this all.

The only other thing that was surprising was that you wrote out your taps including both ends (thus they look to have an odd rather than even), but that's purely a matter of convention.

dietrichepp · 2024-04-06T20:55:35

> Instead of checking `state == advance(state, 232-1)`, which takes 32 multiplies, it's cheaper to do `advance(state, 1) == advance(state, 232), which only takes 2 multiplies (they both still take 32 squares unless you cache those). And actually the special case of 1 didn't require constructing the matrix at all, allowing a cheaper direct computation.

Ah! That makes sense. But for N=2^32-1, there are also checks for N/3, N/5, N/17, N/257, and N/65537.

o11c · 2024-04-06T22:13:45

I think the rule is: as long as there is a sequence of at least 3 adjacent 1-bits, you can replace them with 2 1-bits (1 on each side). The new 1-bit can of course merge into an existing run.

This only fails to decrease the needed work for the N/3 and N/5 cases here, which form small regular patterns. For larger regular patterns, or for irregular patterns (2^odd-1), chances are that there will be at least some simplification (unless the pattern is mostly zeros in which case it isn't needed anyway).

jhallenworld · 2024-04-05T15:34:26

I used LFSRs all the time in early FPGAs (XC2000 and XC3000 series). These did not have fast carry chains, so LFSR made a much faster counter than binary. Used them for video timing generators.

artemonster · 2024-04-05T12:12:33

also FYI: so called "golden codes" that are used to modulate GPS information bitstream are also bunch of LFSR counters and chosen in such a way that the whole set of them (around 30? dotn remember) has a very low correlation factor between them

andyjohnson0 · 2024-04-05T12:36:56

> so called "golden codes"

A minor point, but they are called Gold codes -- after Robert Gold, who developed the technique in the 60s.

https://en.wikipedia.org/wiki/Gold_code

kurthr · 2024-04-05T14:06:04

There are also the related Kasami codes. Both are made by selecting uncorrelated LFSR sequences and combining them. That has the advantage of extending code length, while maintaining cross-correlation and whiteness properties. Really helpful in early error correction.

willis936 · 2024-04-05T12:15:55

This is how PRBS works more generally. It's in everything. Use the same factors and set the start point.

Iiuc what you're describing is using different sets of factors, which would guarantee low correlation, if differing spectra.

https://en.wikipedia.org/wiki/Pseudorandom_binary_sequence

kurthr · 2024-04-05T14:08:51

Yes, for LFSRs it is often the high ratio of auto/cross correlation for the particular encode phase, along with the ease of generation and deconvolution for single bit signaling at low SNR. I don't know that most other pseudo-random codes have this property.