
PCIe 4.0 will be twice as fast as today's slots - ingve
https://www.engadget.com/2017/06/09/pcie-4-0-twice-as-fast/
======
likelynew
Can someone enlighten me on what is the bottleneck for the transfer speed from
hardware point. Is it the material used in the wire, or the transistors or
some other thing? And, also since the transfer speed is continuously
increasing, what are the improvements in the hardware over, say, last 10
years.

~~~
nuand
Take a look at slides 9 through 15 of this presentation:
[https://pcisig.com/sites/default/files/files/PCI_Express_Ele...](https://pcisig.com/sites/default/files/files/PCI_Express_Electrical_Basics.pdf)

PCIe signals are generated by transceivers -- devices within chips that are
specialized in signal conditioning e.g echo cancelling, emphasis/de-emphasis,
dynamic impedance matching. These transceivers and the analog and digital
techniques they implement get better with time. This is easily measurable by
looking at the Bit Error Rate of data or by looking at eye diagrams (see slide
15). As data rates increase things like drive strengths, impedance mismatches,
and a number of other properties of silicon will "close the eye" meaning the
transmitted "0"s and "1"s are not different enough for them to be
distinguished by a receiver enough of the time to successfully decode a
packet. (PCIe is packet based, it's surprisingly somewhat similar to
Ethernet). But essentially as our understanding and processes for
manufacturing semiconductor devices increase, we're able to "open the eye"
more, at which point the industry decides to increase data rates.

~~~
bicubic
Would these problems persist if we switched to optical signaling instead of
rf? Can we expect an optical PCIe equivalent one day?

~~~
frozenport
No mostly because the wavelength of light is large, and therefore has poor
information density compared to electrons.

~~~
jhoechtl
I doubt this be true as HF is also modulated onto wire ... it's not the
electron per se which wander but modulated HF. Sure, the carrying device is
the electron buts it's not like a stream of water

~~~
frozenport
Basically in the optical regime it won't be possible to propagate the E field
in a conduit smaller than the wavelength. This is because a small conduit
doesn't have the right boundary conditions to support fields (like trying to
fit waves into a didgeridoo). So you're always going to have these massive,
massive 1um structures compare to state-of-the-art nm scale semiconductors.

There are advantages of optics including that light moves faster than
electrons (important for HPC where the figure of merit is latency in us
between nodes, etc) and typically has higher fidelity. But the size of these
structures is orders of magnitude larger than conventional semiconductors.

------
igravious
Engadget reporting is so … underwhelming for a site with such financial
backing.

The associated graphic is some random “Maximum PC Magazine via Getty Images“
thing. You mean they have nobody that can take a photo of the inside of a PC?

Then the source of the article is not PCI-SIG itself but a TechReport article:
[https://techreport.com/news/32064/pcie-4-0-specification-
fin...](https://techreport.com/news/32064/pcie-4-0-specification-finally-out-
with-16-gt-s-on-tap) which, frankly, is much more packed with info and deets.
In contrast to Engadget there's a nice info-graphic showing the evolution in
bandwidth over the years _plus_ there's a table with PCI specs 1 through 5.
Finally the source there is PCI-SIG itself:
[http://pcisig.com/](http://pcisig.com/) and from there you can root through
the spec revisions: [http://pcisig.com/specifications/review-
zone](http://pcisig.com/specifications/review-zone) – being as someone else
pointed out here "PCI Express Base Specification Revision 4.0, Version 0.9"
and "PCI Express Base Specification Revision 5.0, Version 0.3"

I mean I do like Engadget, I've been going there years, but sometimes I ask
myself why I do. Their tech event live-blogs are pretty damn decent I guess
but I wish their articles had more meat on them like ArsTechica or AnandTech
or TechReport or …

~~~
thoughtsimple
And they got the reason why manufacturers are going to move to PCIe 4 wrong.
It has nothing to do with bottlenecks to video cards and everything to do with
freeing up lanes. Pretty awful article.

------
Retr0spectrum
> Imagine what a video card could do with that.

Last time I checked, even very high end cards are far from being bottlenecked
by a 16x 3.0 bus.

~~~
highd
True for graphics, though it would be nice for HPC. See
[https://news.ycombinator.com/item?id=14508928](https://news.ycombinator.com/item?id=14508928)

Though it may be that expensive high-end VR could be the luxury home theater
of tomorrow, and you put 4 GPUs in a box to get 90FPS 8K seamless VR. Faster
PCIe could be nice to have in that case.

~~~
valarauca1
Multi-GPU doesn't exist for anything other than ML or HPC.

While Vulkan fully supports it. Metal and DX12 don't.

The real issue for real time AR/VR/144FPS+ is knowing what you can/can't
offload, what's the transfer latency, etc. this will change based on cards,
generations, CPU's, library versions, Driver versions, vendors.

It is a nightmare.

Even SLI/XFire when you know there are identical cards, and drivers. You still
see ~10-20% pref gain for 50% more resources.

~~~
jandrese
I've wondered if it would be possible to design a dual GPU setup to handle VR
applications more efficiently? VR tends to be kind of hard on current GPUs due
to needing to render the scene from two viewpoints. With a dual GPU setup you
could effectively dedicate one GPU per eyeball.

~~~
spectralblu
That's actually not true. I believe with the latest generation of Nvidia cards
(10 series) they made rendering multiple similar viewpoints dirt cheap by
tweaking the hardware and the rendering pipeline.

From the arstechnica review on the 1060: "GPU Boost 3.0, Fast Sync, HDR, VR
Works Audio, Ansel, and preemption make a return too , as well as the ability
to render multiple viewpoints in a single render-pass."

[https://arstechnica.com/gadgets/2016/07/nvidia-
gtx-1060-revi...](https://arstechnica.com/gadgets/2016/07/nvidia-
gtx-1060-review/)

From my limited understanding, VR is difficult because of the tolerances
required. For regular gaming, slight frame drops were annoying, but didn't
break the experience. Thus, it was reasonable to ship a game that was able to
hit 60fps 99% of the time, and just write off the remaining 1% of the time.
For VR, not only do we need to hit at least 75fps, the tolerance for frame
drops is much much lower (a stutter while you're watching a monitor is
annoying, the same stutter in VR could make you lose your balance). To aim to
hit 75fps and guarantee that you'll hit that 99.9%, 99.99%, or 99.999% of the
time is where the difficulty lies. I'm sure most of the HN audience has
experience with just how difficult it is to tack on an additional 9.

------
chx
Eh... for the everyday use PCIe 3.0 x4 is aplenty. If you wanted to help the
everday user then make these new PIO slots standard
[https://world.taobao.com/item/535467249119.htm?spm=a312a.770...](https://world.taobao.com/item/535467249119.htm?spm=a312a.7700714.0.0.T1USiR)
and once you have the GPU parallel to the MB, make the coolers user
replaceable so that you can put a tower cooler on your GPU. What lunacy is
that we have 12-16 cm edge cubes containing like a kg of metal and two-three
fans in a push-pull to cool ~150W CPUs but the 250W GPU cooling system
struggles for air. x4 is enough in practically all cases just don't forget to
feed 75W into it...

Once that's done add a few more x4 slots. Even consumer CPUs have enough lanes
for it, no need to waste x16 on the GPU. TB3 is x4, U.2 is x4, everything big
and useful is x4 so more of that is helpful. The only thing that needs x8 is
like HPC and 40Gbps+ Ethernet both of which are clearly server land.

~~~
colejohnson66
4 3.0 lanes may be enough for a GPU, but if you're using a 3.0 capable GPU on
a 2.0 system, you'll need 8 lanes. So my guess for why GPUs have 16 lane
connectors is for backwards compatibility. What motherboard manufactures could
do is use open ended x4 slots. But from what I've observed, people don't seem
to know that you can put an x1 card into an x4 slot

~~~
chx
Not really. 2.0 x4 is already enough.
[https://www.techpowerup.com/reviews/NVIDIA/GeForce_GTX_1080_...](https://www.techpowerup.com/reviews/NVIDIA/GeForce_GTX_1080_PCI_Express_Scaling/25.html)
PCI Express 1.1 x4 is shouting distance within 3.0 x16. 2.0 x4 and 3.0 x16 has
no noticeable difference.

And the future is actually _less_ bandwidth requirement:

> Performance doesn't even drop with newer DirectX 12 and Vulkan games,
> including titles like "DOOM," which are known to utilize virtual texturing
> ("mega textures," an API feature analogous to Direct3D tiled-resources). If
> anything, mega textures has reduced the GPU's bandwidth load on the PCI-
> Express bus.

So: 3.0 x4 will be plenty for the foreseeable future.

------
jandrese
So it took 6 years to not yet release PCIe 4 and they're saying they will have
PCIe 5 out in two years?

Have they fixed some problem with the development process that will make it
take 1/3 of the time?

~~~
brohee
4.0 and 5.0 are developed concurrently, 4.0 being at revision 0.9 and 5.0
being at 0.3.

~~~
ksec
So why would anyone use 4.0? Or 4.0 is more like a stop gap for HPC, Network
applications? Where the need for higher bandwidth interconnect is urgently
needed. Intel don't even plan to have 4.0 on selected CPUs until late 2018.
And AMD are moving to 4.0 until 2019.

They might as wel wait a year and have 5.0

~~~
XzetaU8
"If PCI-SIG hits its target goal of a 2019 standard finalization date, PCIe
5.0 could be in-market by 2020 or 2021"

[https://www.extremetech.com/computing/250640-pci-sig-
announc...](https://www.extremetech.com/computing/250640-pci-sig-announces-
plans-launch-pcie-5-0-2019-4x-bandwidth-pcie-3-0)

------
astrodust
Would be interesting if the new standard was just multiple USB-C connectors
for the cards. One, two, four or eight of them depending on the bandwidth
requirements.

~~~
lightedman
The alignment issues of multiple on-board plugs would just be a headache
waiting to happen. Even with SMT components typically aligning themselves
during solder flow, there's always discrepancies in manufacturing and
components go ever so slightly out of whack. On the GPU side, those would
probably be straight because they'd be laid flat against the board,
perpendicular to the motherboard. The motherboard mountings would be the
problematic ones.

~~~
wlesieutre
You could plug all the sockets into a large retainer (mimicking the role of a
GPU) during soldering to ensure they're spaced correctly.

Doesn't sound as solid as a socket though, they support some weight of the
card in addition to carrying data. I'd be concerned about USB sockets getting
torqued off if you looked at them wrong while installing a card, or if you
took out the mounting screws without having the card properly supported.

~~~
astrodust
USB-style sockets could require less force to insert than a typical card, so
they're less likely to break, but that's also a function of design. A plastic
sleeve to guide the card into place and ensure a good fit isn't that hard a
concept.

~~~
wlesieutre
Force to insert it isn't what I'm worried about, it's snapping it off with
perpendicular force. Double height GPUs are heavy and most of that's supported
by hanging off the socket.

~~~
astrodust
Any heavy-weight CPU would have 4 or 8 sockets holding it up, so that
shouldn't be an issue.

If all that weight was hinging on one socket an accommodation to have some
kind of flange wouldn't be too hard to incorporate.

~~~
wlesieutre
True, you could still run the PCB footprint out to the edge and have a support
socket on the motherboard to grab it.

Then we could eliminate the cost of the USB components by building contacts
into the support socket that connect to pads on the PCB!

I dunno, ease of inserting and removal just isn't on my list of desires for a
GPU. I install one maybe every 2 years, and I don't want to worry about it
falling out and smashing around inside my case if I pick up my computer or
something.

Whatever mounting shenanigans are necessary to lock it in place with weaker
connectors feels like a solution looking for a problem. The somewhat high
force of PCIe slots is a feature, not a bug.

------
mrmondo
I'm actually more interested in any latency decreases and improvements to
direct CPU/memory access that may be included in the new standard.

------
mhh__
Would this, or current technology, be fast enough to have a (professional
grade) Oscilloscope as a PCIe card? It's my understanding (I don't work around
top of the line so I'm probably wrong) that even the best scopes don't have
particularly amazing interfacing with regular PCs because of low data
bandwidth with their interfaces.

~~~
21
Todays top of the line osciloscopes (in the $250k+ range) are basically a
Windows computer with some custom hardware -

[https://www.youtube.com/watch?v=dx596o8t_TY](https://www.youtube.com/watch?v=dx596o8t_TY)

[https://www.youtube.com/watch?v=WgbAESoFDAY](https://www.youtube.com/watch?v=WgbAESoFDAY)

~~~
c3833174
But Agilent had Win98-based oscilloscopes more then 15 years ago.

~~~
baobrien
I have a DOS-based LeCroy oscilloscope with an embedded 386 and an IDE hard
drive. The 'oscilloscope' part is hooked up through an ISA card.

------
phkahler
How is this possible? I assume GT per second is transfers not bits. You can't
do any kind of handshake at that rate over more than a centimeter or two. Do
they allow multiple requests to be issued before the first transfer is
complete? The speed of light is definitely coming in to play here as an upper
limit.

~~~
21
In this context, a transfer is almost the same thing as a bit. For each 8 bits
of useful payload PCIe sends 10 bits of encoded payload. The GT counts the
total number of sent bits, which include this 20% overhead. So actual userful
bitrate is 0.8 * GT.

~~~
shaklee3
That's true for pcie 2. Pcie 3 is 128/130b encoding, so it's much, much more
efficient.

------
franciscop
It would seem to me that 2x is not such a huge jump to warrant 6 years
(compared to USB 3 => 3.1) and that it is not the bottleneck anyway. But we
might be approaching the speed limits on PCIe (as in transistor size).

~~~
shaklee3
It is the bottleneck. This is why Nvidia invented nvlink.

------
tiku
PCIe 5.0 will be twice as fast as tomorrow's slots...

