
PCI Express Retimers vs. Redrivers: An Eye-Popping Difference (2019) - tragiclos
https://www.asteralabs.com/2019/06/26/pci-express-retimers-vs-redrivers-an-eye-popping-difference/
======
lizknope
I worked on a PCIE Gen3 retimer. The problem was you were trying to fool both
ends into thinking you weren't there and was kind of violating the spec. PCIE
Gen4 has explicitly defined the terms allowing you to make a chip that is
compliant with the spec to retime the signal.

~~~
segfaultbuserr
> _fool both ends into thinking you weren 't there and was kind of violating
> the spec._

Long-range USB (e.g. for conference rooms) has the same problem. The maximum
one-way propagation delay is a hard restriction in the standard. You have to
somehow cheat the host into thinking the device is busy before the peripheral
at the far side is able to respond. To this day, I believe no standard
solution exists, it seems everyone's reinventing the wheel in their FPGAs.
Although it's only a niche application, standardization is unlikely [0]. BTW,
FireWire did have a perfect solution to this problem because good networking
was one of its design goals.

[0] And for relatively shorter runs it's not usually a problem - cascading USB
hubs is good enough, it's how most active extension cables work - as one-port
hubs.

~~~
marcan_42
USB sucks. Bigtime. Way, _way_ more than PCIe. PCIe is great. PCIe will run
over wet string* . USB... USB is a world of pain.

This is the same problem any kind of USB virtualization/network tunnelling
has. Also, anyone trying to build a USB2 to USB3 transaction translator (which
the spec, unforgivably, omitted). AFAIK there is one USB2 to USB3 TT chip in
the market and it's unobtainium.

It works if you lift things back to the URB level to some extent, and
"virtualize" the lower protocol layers, but there are corner cases once you
introduce a frame of latency or more, which you have to. The host polls for
data, then you're polling the device for data. You get back some data, but the
next frame the host has stopped polling for data. Now you have some data you
have to drop on the floor. Not good. You could choose to delay a successful
ACK until you get it from the other side (and thus pretend like the first time
the data was sent it was corrupted, even though it was forwarded
successfully), but now you've massively decimated your maximum throughput. The
same problem happens in the other direction too.

Thankfully, in _practice_ these race conditions often hit software anyway
(i.e. if you cancel a submitted URB then if data came back it was already
acked and it'll be dropped on the floor), so you can get away with it since
drivers should be designed not to screw this up. But it's still easier at the
software level (e.g. virtualizing) because you actually know what the host is
trying to do. I wrote the QEMU virtual xHCI code and the original prototype,
which ignored the QEMU USB subsystem and passed through straight to the host,
worked _very_ well (better than anything VMware/virtualbox were doing) because
xHCI is high level enough. But at the wire level, you don't even know when the
host has cancelled a transaction other than just... because it chooses not to
send packets in a given frame... and what if it just decided not to poll that
frame but it will poll later? Now you need timeout heuristics... it's
horrible.

* seriously, I've tunnelled PCIe over 115200 baud RS232. You can do all kinds of horrible things to PCIe and it will still work.

~~~
segfaultbuserr
> _USB sucks. PCIe is great._

> _This is the same problem any kind of USB virtualization /network tunnelling
> has._

+1.

I've been using QubesOS on my workstation for a while, and I found the problem
associated with USB virtualization/tunnelling also has a major impact on
QubesOS's usability (USBility?). Long story short, QubesOS's security model
depends on the ability to isolate individual peripherals at the hypervisor
level. For PCI-E it's easy, but for USB it's a headache. It has various
security, compatibility and performance problems. If one has ever encountered
such a problem, the only workaround is assigning an entire EHCI or xHCI USB
host controller at the PCI-E level to a VM, and this is only feasible on
machines with multiple USB controllers.

> _I wrote the QEMU virtual xHCI code and the original prototype, which
> ignored the QEMU USB subsystem and passed through straight to the host,
> worked very well_

I also got the idea to build something similar, but in hardware - put 4
individual PCI-E USB host controllers behind a PCI-E switch, so that each USB
port can be seen as an individual PCI-E device. Now the USB problem in Qubes
is solved forever...

~~~
pedrocr
Isn't the PCIe card you're looking for this:

[https://www.startech.com/Cards-Adapters/USB-3.0/Cards/PCI-
Ex...](https://www.startech.com/Cards-Adapters/USB-3.0/Cards/PCI-Express-
USB-3-Card-4-Dedicated-Channels-4-Port~PEXUSB3S44V)

Seems to have 4 USB ports each served by it's own controller with dedicated
PCIe lanes.

~~~
segfaultbuserr
Yes, this is exactly what I was describing, and it can be the solution.
However, I was thinking of a downscaled USB 2.0 version - only has a fraction
of its cost, and it doesn't eat all my PCI-E lanes (20 Gbps vs 1.92 Gbps, the
latter one only needs a PCI-E x1 Gen 2).

------
jackyinger
TL, DR: You want a retimer since it doesn’t amplify noise and distortion like
a redriver does.

~~~
rkagerer
At the cost of a bit more latency.

~~~
Junk_Collector
And money, and power.

A retimer is a much simpler device (thus cheaper) and can be tested using
linear methods. To test a line with a redriver you really need a full on BERT
system + a VNA whereas you can get by with just the VNA for the retimer. To
accurately characterize PCIe Gen 4 and 5 you need to go to almost 50 GHz of
bandwidth on your test equipment so you are talking big money on hardware for
test, design, and validation.

Meanwhile, a redriver doubles your power budget on the driver since it
literally is a second set of drivers in the middle of your PCIe line.

This is an ad from a company that sells redrivers. It's an informative ad, but
still an add so it should be viewed with a critical eye.

~~~
lurker_primo
You have flipped the definitions of redriver and retimer.

Also, while I agree with big money for hardware, do you see a way around if
your design needs a retimer?

~~~
Junk_Collector
You're right, I flipped the names in my head. I probably shouldn't be posting
so late so I'm going to bed after this comment.

You can go a long ways with careful design and quality redrivers, but at some
point it will be better to go to a retimer. Which you should use is a matter
of use case and engineering decisions. Just please be skeptical of the
conclusion that "Usage [of redrivers] is highly discouraged; use at own risk"
from a company that specifically designs and sells retimers.

------
qiqitori
Nice article. Didn't even consider you'd ever need something like this in a
single system. Also never considered that a "redriver" would be able to help
at all with multi-GHz signals.

(Note: I'm very unknowledgeable about this topic.)

~~~
kingosticks
I would guess they might use pcie in some physically large systems such as
super computers, enterprise networking stuff etc.

~~~
Dylan16807
With PCIe 3 yeah.

With 4, the limits on how long you can make PCB traces are pushed or surpassed
inside a normal desktop.

When 5 doubles the frequencies it's going to get really hard to make
reasonable connections.

And you also need increasingly complicated multiplexing chips for any
motherboard that can divide the primary x16 slot into two x8 slots.

~~~
kingosticks
Won't some of that be solved by better performing pcb materials filtering down
to consumer desktops when the price allows?

------
amelius
> A retimer is a mixed signal analog/digital device that is protocol-aware and
> has the ability to fully recover the data, extract the embedded clock and
> retransmit a fresh copy of the data using a clean clock.

How does the retimer recover the underlying clock if the signal is (say) all
zeroes for an extended period of time?

And why is clock recovery from the data signal necessary in the first place?
Can't the clock signal be recovered from the clock signal itself? Iow, why is
the clock signal not one of the inputs to the retimer?

~~~
andyv
> How does the retimer recover the underlying clock if the signal is (say) all
> zeroes for an extended period of time?

Data is encoded such that all-zeros or all-ones doesn't happen. For Gen3,
eight bits are encoded with ten bits, look up "10/8 encoding" for details. For
Gen4+, blocks of 128 bits are exclusive-ored with random numbers, plus a pair
of bits to keep the number of one bits roughly equal to the number of zero
bits. This is "130/128" encoding.

> And why is clock recovery from the data signal necessary in the first place?
> Can't the clock signal be recovered from the clock signal itself?

Recovering the clock signal from the data avoids the need for an additional
clock wire.

~~~
lizknope
Close but PCIE Gen3 was the first version that switched to 128/130 encoding.

[https://en.wikipedia.org/wiki/PCI_Express#History_and_revisi...](https://en.wikipedia.org/wiki/PCI_Express#History_and_revisions)

------
anticristi
This article makes me realize how messy the underlying world of digital
communication really is. As a software engineer, I send a file "into the
cloud" and assume that it is going to be magically copied bit-by-bit with zero
errors while going from my disk through the PCIe bus, the Ethernet cable, the
fiber and similarly on the other side. Turns out "digital" is just an
abstraction of messy, noisy, distorting, attenuating physical reality.

~~~
robotnikman
When you really think about it, it makes you realize its amazing it even works
at all in the first place

~~~
anticristi
Right? I'm happy someone else persevered and invented the PCIe bus. People
like me would have use smoke signals to this day. :)))

------
cushychicken
How does a retimer deal with spread spectrum clocking?

Would it achieve a lock at its internal receiver, then apply the spread
spectrum on the retransmitted clock?

