
PCI Express on the Raspberry Pi 4 - trollied
http://mloduchowski.com/en/blog/raspberry-pi-4-b-pci-express/
======
buildbuildbuild
Fun. If you want PCIe on a SBC without the soldering, I highly recommend
perusing Hackerboards. I'm very happy with my RockPi 4 (4GB RAM, PCIe, USB 3,
6 cores), which I discovered through their excellent database.

[https://www.hackerboards.com/search_boarddb.php](https://www.hackerboards.com/search_boarddb.php)

~~~
beatgammit
I think you mean the RockPro64. The Rock64 only has 4 cores and no PCIe.

That being said, I missed the PCIe on the specs last time I was comparing SOCs
and I had forgotten about hackerboards, thanks for the reminder!

~~~
buildbuildbuild
[http://rockpi.org/](http://rockpi.org/) :)

~~~
justinclift
Not seeing a mention of PCIe there on the board?

~~~
gtyras2mrs
Scroll down a bit. M.2 on the backside.

Listed under Storage in the specs.

~~~
justinclift
Isn't M.2 storage specific?

eg not a useful PCIe slot for anything other than plugging in an SSD

~~~
hipboi
M.2 has full four lanes PCIe, you can use M.2 to PCIe x4 adapter and use all
the standard PCIe cards.

~~~
blackflame7000
M.2 is a connector specification it has nothing to do with speed. M.2 supports
applications such as WIFI, USB, SATA, and PCIe. M.2 SSDs are faster and store
more data than most mSATA cards. M.2 SSDs support PCIe 3.0, SATA 3.0 and USB
3.0 interfaces, while mSATA only supports SATA. M.2 SATA SSDs have similar
performance to mSATA cards, but M.2 PCIe cards are faster. SATA SSDs have a
maximum speed of 600 MB per second, while M.2 PCIe cards can hit 4 GB per
second.

PCIe support also allows M.2 cards to take advantage of the nonvolatile memory
express (NVMe) protocol, which brings a large performance advantage over other
types of interfaces due to reduced latency, increased IOPS and lower power
consumption.

------
Jonnax
That's really cool. I'm curious, would it be possible to use a modern GPU
(running at 1x) on an ARM based board?

Would the open source drivers that are part of the Kernel work out the box on
ARM?

~~~
qdot_me
Hack’s creator here - it’s on my list of things to try. GPUs are notoriously
hard to get to work on non-intel, having tried to get a few up on Alpha and
Itaniums in the past.

VideoBIOS expects to run and expects a well behaving Intel CPU to do the
power-up. That said X can sometimes emulate these quite well. On ARM we’d also
run into alignment issues and likely other quirks - but in principle...

~~~
floatboth
VBIOS is often not necessary for running a GPU in the OS. The amdgpu driver
can POST a GPU by itself just fine.

Still… X86EmulatorPkg allows running an amd64 VBIOS in UEFI on an aarch64
machine :)

AFAIK the bigger problem on embedded boards is half assed Synopsys Designware
host controllers. I have a Radeon running on my Marvell MACCHIATObin, on
FreeBSD even. But from what I've heard the Rockchip RK3399 has a worse version
of the controller, and people trying GPUs on the ROCKPro64 saw errors related
to not large enough BAR space or something.

UPD: yeah, someone in the thread mentioned BAR space issues wrt NXP i.MX SoCs,
that's probably what's happening on Rockchip. Would be amazing if the Broadcom
chip in the Pi turns out to be the one with enough BAR space! :D

~~~
markus92
Could you clarify to me/give a short definition of BAR space? For obvious
reasons it's a bit hard to search for :)

~~~
cesarb
In PCI, BAR is Base Address Register, which is a register in the PCI device's
configuration space which defines where in the machine's physical memory
address space that particular window of memory and/or I/O will be mapped (a
single device can have several BARs, for instance a simple graphics card could
have one for its control registers and one for the framebuffer). So the "BAR
space" would be a shorthand for "the region of the physical memory address
space which can be used to map the PCI devices memory through their Base
Address Registers". The size of this region is limited, and graphics cards in
particular tend to have somewhat large BARs.

(See for yourself in your machine: run "lspci -v", the lines starting with
"Memory at ..." or "I/O ports at ..." are the BARs.)

------
zaro
For anybody interested in PCI Express on Arm there are already boards with PCI
express connector, like [https://store.pine64.org/?product=rockpro64-4gb-
single-board...](https://store.pine64.org/?product=rockpro64-4gb-single-board-
computer)

------
segfaultbuserr
It looks like an unreliable modification. Running a GHz-level interface with
jumpers is almost impossible to control the impedance, it's a cool Proof-of-
Concept though.

But is it possible to bring the project to the next level? Is it possible to
make a daughterboard with QFN connector? If so, one can make a pin-compatible
daughterboard with an extension connector. To use it, just desolder the USB
chip and solder a new daughterboard on it, and you're ready to go. It would be
one of the coolest Raspberry Pi projects!

~~~
monocasa
I've seen pcie literally run over a metal clothes hanger soldered to the
board. It's extremely tolerant of terrible quality connections.

~~~
benj111
How did you reach the situation where the solution was soldering a clothes
hanger to a board for pcie ???

~~~
monocasa
We were seeing some issues, one engineer was blaming signal integrity issues,
we didn't have access to a high enough speed oscope to get a clean eye
diagram, and so another engineer literally disappeared for an hour and had the
boards running over the clothes hangars on the old firmware to say, no, it's
not a questionable signal integrity issue, go back and fix your code.

------
dwheeler
It'd be nice if there was an easier way to do this (vs. removing a chip!).
E.g., maybe a dedicated pinout and an easy way to disable the existing use
(since the pins can't be shared).

------
noobermin
Now this is the content I come to HN for. A serious hack just days after the 4
was released. Kudos to the OP.

I envy people like OP for their tenacity. I barely have time to follow what's
happening in IT, much less get ahead of the pack in doing cools hacks like
this.

~~~
anderspitman
I'm sure you are but just in case are you aware of hackaday.com?

~~~
noobermin
Of course! HN is just a good digest of things on the maker side to the fairly
abstract CS stuff to math and physics and all in between. It sure beats the
political stuff that becomes tiresome (even though I succumb and engage in it
like many posters here).

------
baybal2
Raspberry has 2x Gigabit RGMIIs on those SoCs, but they don't wire them out.
It is a waste I think

------
wang_li
Aren't PCIe lanes shared? Why would I need to remove the USB 3.0 chip rather
than just hooking right to the pins on the device where it's soldered in
place?

E: Apparently it's the PCI bus that is shared, not PCI Express lanes. Ty.

~~~
strmpnk
AFAICT lanes are not shared but there are chipsets which can break lanes out
into other sets of lanes which are then routed back onto the original set of
lanes. So if your CPU has 16 lanes you can hang a chip off of it which then
provides more lanes which are then signaled back to the CPU over some subset
of those lanes.

It’s not clear if the lanes themselves can be multiplexed with packets from
many devices but they can change the number of assigned lanes after
initialization so a clever chipset could probably dynamically allocate lanes
as used.

~~~
dgaudet
motherboard features such as x16 or 2x8 are achieved with "pcie mux" chips.
these are devices which select which of N pairs of differential wires is
attached to the input/output differential pair. search for "pcie mux" will
find many, such as [0]. if you look at the diagram you'll see that it connects
wire pair A+/A- to either B+/B- or C+/C- based on the value of the SEL line.

these generally basic passive devices operating at analog signals level, no
higher layer activity required. however some may exist which operate as
"retimers", which do participate in the lowest layer of the PCIe electrical
protocols (generally to extend reach). these are unlikely to be used for a
typical x16 <-> 2x8 sort of motherboard feature though.

the example i picked here is 4 lanes, and you would need 4 such chips to do a
x16 <-> 2x8. (spoiler: you mux lanes 8-15 from slot X to lanes 0-7 of slot Y,
and there are both TX and RX pairs which need muxing.)

there do exist devices called "pcie switches" which operate at all layers of
the pcie protocols, and allow for all sorts of sharing of the point-to-point
links. examples at microsemi [1] ... for example a 48 lane switch could be
used to connect two 16 lane GPUs to a 16 lane slot. this would allow either of
the GPUs to burst to the full 16 lanes, or on average if both GPUs are
communicating with the host then they would see 8 lanes of bandwidth. there's
a picture of such a dual GPU card in this article [2], you can see the PCIe
switch ASIC centered in between the two GPUs, above and to the right of the
edge connector.

[0] [http://www.ti.com/product/HD3SS3412](http://www.ti.com/product/HD3SS3412)

[1] [https://www.microsemi.com/product-directory/ics/3724-pcie-
sw...](https://www.microsemi.com/product-directory/ics/3724-pcie-switches)

[2] [https://graphicscardhub.com/dual-gpu-graphics-
cards/](https://graphicscardhub.com/dual-gpu-graphics-cards/)

------
Millennium
If a Pi is capable of this already, why not replace the Ethernet, charging,
micro-HDMI, and USB ports with a boatload of type-C Thunderbolt ports (plus
support for the HDMI 1.4 alt mode)? Would 8xUSB-C cost that much more than
1xUSB-C+1xEthernet+2xMicro-HDMI+2xUSB3+2xUSB2 (with no PCI Express), in
exchange for a considerably more flexible device?

~~~
kllrnohj
Because there's no where remotely close to enough PCI-E lanes off of the SoC
to do that.

Thunderbolt 1/2 requires a pcie gen2 x4 connector to have enough bandwidth.
The SoC in the pi4, the Broadcom BCM2711, has just a single gen2 pcie lane.
1/4th the required bandwidth for thunderbolt 1/2, and a mere 1/8th the
requirement for thunderbolt 3.

To get a full 8x thunderbolt 3 connectors you need a staggering 32 pcie gen3
lanes off of the CPU. This is out of reach of all but the HEDT & enterprise
platforms, to say nothing of the $5 ARM SoC chips for SBCs. Well in theory you
could also use something like a Ryzen 3000 and split out the 24 PCI-E gen4
lanes into 48 gen3 lanes and then you could have your 8x thunderbolt 3
connectors, too. But that's expensive, of course.

~~~
gnode
Thunderbolt 3 controllers have a 4x link to provide one or two ports or 2x in
the case of JHL6240. Additionally PCIe is designed to support backwards
compatibility and link scaling. I don't see any reason why the 1x gen2 lane of
the pi 4 couldn't host a Thunderbolt 3 port; it would just severely bottleneck
the bandwidth of tunnelled PCIe links.

Even though it would be limited, a Thunderbolt 3 port would expand the
connectivity of the Pi, and very few, if any, devices require the maximum
bandwidth to operate at all.

~~~
kllrnohj
Sure but "hey here's 8x thunderbolt 3 ports just don't ever attempt to use an
entire one at once kthx" isn't exactly going to be a great product story,
either.

> I don't see any reason why the 1x gen2 lane of the pi 4 couldn't host a
> Thunderbolt 3 port; it would just severely bottleneck the bandwidth of
> tunnelled PCIe links.

But that's kind of literally the reason? An entire ecosystem of products
assumes a reasonably high amount of bandwidth from the connector. That's its
singular reason to exist. If you take away the bandwidth from Thunderbolt 3 it
just becomes USB, and at that point why not just offer USB connectors which
have even broader support and not as many cabling restrictions?

~~~
gnode
I agree that 8x Thunderbolt 3 is probably excessive, and I wouldn't want to
trade away the current connectivity options as was suggested.

> If you take away the bandwidth from Thunderbolt 3 it just becomes USB

It becomes low bandwidth Thunderbolt / PCIe. You could still use it to attach
PCIe devices which don't need a lot of bandwidth. GPUs can be attached for
high performance compute where CPU-GPU bandwidth isn't critical. PCIe has non-
bandwidth benefits over USB such as DMA and interrupts.

> why not just offer USB connectors which have even broader support and not as
> many cabling restrictions?

You can't attach PCIe devices via USB, but you can attach USB and PCIe devices
via Thunderbolt.

~~~
kllrnohj
You could also do all that with just a PCI-E x1 slot and use the x1 to remote
x16 connector referenced in the blog post to extend it. No reason to mess with
Thunderbolt just to have any PCIE capability at all.

------
peterburkimsher
Nice work! Would this be compatible with an M.2 to PCIe adaptor? [1]

Being able to attach an Intel 660P and get 2 TB of fast SSD storage on a
Raspberry Pi would be sweet.

[1] [https://www.amazon.com/EZDIY-FAB-Express-Adapter-
Support-221...](https://www.amazon.com/EZDIY-FAB-Express-Adapter-
Support-22110/dp/B01GCXCR7W)

~~~
elFarto
Technically yes, but it's only a PCIe 1x Gen2 slot, so only 500MB/s of
bandwidth (4x Gen3 is ~4GB/s). You'd be better off with a USB 3.0 to M.2
adapter.

~~~
londons_explore
The bandwidth works out about the same, but the USB controller and AHCI
controllers will add quite some latency (and CPU load).

I'd like to see benchmarks, but my guess is single-thread random 4k read
performance will more than double via PCI Express rather than the USB.

------
Zenst
This is something that will probably prove more accessible with the Zero
flavour of the 4.

------
mng2
Wow, that was quick. Given that this is Broadcom, I don't suppose there is any
visibility into the Root Complex? When troubleshooting PCIe it'd be nice to
have the LTSSM state at least. Would be really cool to get eye diagrams...

------
yetihehe
Hmm, how about using this for fast interconnect for making rpi clusters?

~~~
hvidgaard
Wouldn't the gigabit LAN be a better fit for this? If you want to make a
cluster, you need to make some custom hardware to connect to, that can
facilitate the communication. At this point you're likely spending more that
if you just bought a real desktop for more performance. I can see the fun
factor in hacking the system together, though.

~~~
qdot_me
There are two ways of doing clusters - one is a message passing paradigm,
which you can do over Ethernet (to an extent - I’d still take USB3 for 4x the
bandwidth) - and the other is direct memory access a’la Cray.

What really motivated me to do this hack is the relative abundance of stuff I
can now plug into an FPGA :)

~~~
hvidgaard
I was thinking Ethernet because, A:It's cheap to buy a switch and cluster 100
RPi, B: You can have a desktop with a faster NIC and keep the RPis busy.

But as with everything high performance, it depends entirely on the use case.

~~~
qdot_me
True. And with RPi4 having a 1000baseT, it’s not as painful as it seems.
Perhaps even the driver can be coaxed into some form of DMA and MPI that is a
bit lower latency than IP stack.

With secondary IP layer on 802.11, it might actually work reasonably well.

~~~
consp
Or use the PCIe mod and use an infiniband card for low latency and high
thoughput.

~~~
hvidgaard
At that price point, you could probably get more performance out of a server
with 2-4 sockets.

------
moftz
These closeup shots of the VL805 are probably good enough to figure out most
of the pins.

[https://www.viagallery.com/via-labs-vl805/](https://www.viagallery.com/via-
labs-vl805/)

------
epynonymous
this is too awesome! that's quite a lot of work to get the pcie exposed,
soldering and such i try to stay away from so great for the author.

the form factor of pcie devices doesn't really play well with rpi, but there's
definitely a need for faster, more stable persistent storage. i have heard a
lot of issues with microsd cards based on wear leveling and such. it would be
really nice if rpi could develop like an m2 interconnect where i could install
an nvme ssd within the form factor of an rpi, that would make for a truly
incredible little machine.

------
nereid
Interesting install a SATA card and make a NAS. I think better than a usb.

~~~
beatgammit
I would love something with 4x SATA ports for a NAS like this one[1]. I've
seen PCIe on devices like these, but I've heard there are issues getting
drivers to work properly. I haven't actually tried it, but other limitations
(RAM, CPU, Ethernet) have prevented me from actually giving it a shot (I want
ZFS, which is a bit memory hungry). The Pi has just enough that I think it's
doable.

I would absolutely love it if the Raspberry Pi foundation made a version with
PCIe instead of USB.

\- [1] [https://www.newegg.com/syba-si-pex40064-sata-
iii/p/N82E16816...](https://www.newegg.com/syba-si-pex40064-sata-
iii/p/N82E16816124064)

~~~
IronWolve
That rockpi order page had a 4x sata to m2 that looks interesting.

[https://shop.allnetchina.cn/products/m-2-pci-e-protocol-
to-4...](https://shop.allnetchina.cn/products/m-2-pci-e-protocol-
to-4x-sata3-0-expansion)

------
boyadjian
Why not, but I will wait for a classic ATX ARM motherboard instead. It should
happen this year.

~~~
hawski
Is Mini-ITX for $550 acceptable? [https://www.solid-run.com/nxp-
lx2160a-family/honeycomb-works...](https://www.solid-run.com/nxp-
lx2160a-family/honeycomb-workstation/)

16 core A72 CPU, up to 64GB RAM, PICe x8, 10GbE SFP+ ports and 1GbE RJ45,
SATA.

It's early access hardware, so it may have some pain points. Normal units will
be available from the end of the year for $750.

If I would have time and money right now, that's what I would buy.

~~~
floatboth
It's only pre-order right now, and the firmware isn't done. They promise SBSA
compliance (which includes ECAM PCIe working via a generic ACPI attachment)
but they haven't passed the full test suite yet. Some experts are skeptical
about whether full compliance is possible on that NXP chip…

I hope the PCIe works fine. And I hope the firmware will be FOSS like on their
MACCHIATObin.

One thing they revealed is that the chip is overclockable (including memory),
which is awesome. IIRC they got 2.5ish GHz core clock working. Would be
amazing if it does like 3GHz with a voltage boost. (I don't expect software
voltage control… but there's always hard mods :D)

------
mallets
I think I will wait for the pi 4 compute model, tyvm.

~~~
pickle-wizard
I really hope the Pi 4 Compute Module breaks out the PCIe on the edge
connector.

------
teamski
Hijacking the thread:

I develop remotely on VPSes because I like to have an always-on box reachable
from any client. I am wondering if a RP4 offers a similar experience at lower
cost.

Does anyone use a RP for this?

~~~
numlock86
I do this with an RPi3 and it's doing good, so it's doable. It strongly
depends on your setup and develeopment environment, though. Do you want VI to
work over SSH or want full VNC access to a machine with Gnome and Eclipse? Or
something in between like X forwarding? Also, is aarch64 even an option as a
host system? (compilations, software availability etc.)

~~~
teamski
I use tmux and vim over ssh. You mean Arch? Yeah why not.

Then also a lot webpack, Docker: wondering if they would get the Pi stuttering
when compiling/building? And if vim is still smooth then (which isn't the case
with my 20$ vps).

~~~
Narishma
AArch64 is what the 64-bit version of the ARM instruction set is called.

~~~
teamski
ah right, think all software I use are available as AArch64 binaries.

~~~
floatboth
You can get AArch64 in Amazon EC2 by the way… up to 16 A72 cores, which is
nice.

------
mrb
Ohh I see where this is going qdot_me — would love to be able to hook up a GPU
to a RPi4 to crypto-mine. So useful for so many applications ! I will
seriously fund you if you can make this happen, hit me up by email. My contact
info is on zorinaq.com

~~~
mrb
Whoa, downvoted to minus 4. No idea why. In my 9 years of HN, this is my worst
downvoted comment. Maybe people thought I was being off-topic?

Let me clarify a bit for the public here why the comment is relevant: in the
crypto mining community, some groups are looking into what minimal single
board computer can provide a PCIe signal to connect a single GPU. Idea is to
be cheap and reliable. If you have a many-PCIe board failure
([http://bitcoin.zorinaq.com/many_pcie/](http://bitcoin.zorinaq.com/many_pcie/))
you have 10-20 GPUs going down at once. Not good. By isolating each GPU on its
own motherboard, you can isolate failures, thus increase mining profits. When
I saw the OP mention cryptocurrency in the blog post, I thought hey maybe
that's what he is looking to do...

