
Behold the data tsunami: IEEE begins work on 400Gbps and 1Tbps Ethernet - evo_9
http://www.extremetech.com/computing/134757-behold-the-data-tsunami-ieee-begins-work-on-400gbps-and-1tbps-ethernet
======
iyulaev
As a hardware designer I'm worried about this more and more. PCB technology
hasn't really advanced all that much in the past ~10 years, although the line
rates are going higher and higher. Back in the ~100Mhz days a 1ns edge was
about 6 inches long in HR370-based circuit boards, which was long enough to
barely matter. Now, at 10Gbps, your bits are half an inch long, and your high-
end harmonics are on the scale of your via lengths. PCB design has gotten
appreciably harder in the past few years as a result - maybe three years ago,
when I laid my first 10Gbps traces, I remember thinking that this was bleeding
edge stuff, and some of the techniques we used were pretty exotic for the
time. Now, 10Gbps is run-of-the-mill and 28.05 is on the horizon (maybe even
coming into town!). It seems like we've really taken up all of the slack in
PCB tech and I wonder what will happen next. Maybe academia's optical dreams
from the last millennium (silicon ICs with on-die/co-die optical xcvrs) will
finally be realized. Without a revolution in the way PCBs are processed and
designed I doubt we'll be able to push much past 28.05Gbps in copper.

------
cs702
Clearly, there is pent-up demand from US businesses and consumers for such
high speeds, but AFAIK the country's long-haul and last-mile infrastructure is
nowhere near ready for handling end-user traffic at 400Gbps, let alone 1Tbps.

On the long-haul side, service providers like Level 3 Communications appear
unlikely to invest aggressively to add a lot more capacity any time soon,
because they are still dealing with the financial consequences of their over-
investment during the 1990's telecom boom (remember Worldcom?). On the last-
mile side, service providers like Comcast also appear unlikely to invest
aggressively to add a lot more capacity any time soon, because their
traditional business models are under threat from companies like Netflix --
our last-mile providers _fear_ higher speeds.

So, we have demand for ever-faster speeds on one hand, and constrained supply
on the other hand. Economics 101 suggests Internet service pricing is bound to
stabilize or even (gasp!) _increase_ over the coming years.

~~~
amalcon
This is not for end users, it's for datacenters. Consumer hardware never has
fiber ethernet as it is, because there's simply no need. Not only do consumers
have no use for that amount of bandwidth, typical hardware has trouble
saturating a gigabit connection anyway. Adding more unusable bandwidth isn't
going to help. Of course, for some crazy reason consumer hardware often has
fiber _audio_ jacks. People are weird.

In core routers, the landscape is entirely different. It's actually very easy
to saturate a 40-gig connection coming out of a major interchange, and there
are lots of BGP tricks to work around that. Individual ISPs routinely only
have one, for political reasons. It would be difficult to saturate 400. (For
now.)

~~~
theatrus2
Fiber audio (TOSLINK) isn't for bandwidth or even distance (both suck on
TOSLINK for the record), but for allowing a ground isolated digital audio
connection.

Plus the red LED looks cool.

~~~
Wingman4l7
For the truly lazy: <http://en.wikipedia.org/wiki/TOSLINK>

------
djtriptych
The implications of this are fascinating to me. I'm really, really interested
in how Comcast is going to figure out how to make sure the US never sees this
technology widely deployed.

~~~
ams6110
Well we're not still on dial-up are we?

~~~
Lost_BiomedE
No, but that was due to the '96 telecom act. They wanted it to be able to get
into long distance, but that backfired. As soon as others started making
meaningful inroads in deploying DSL, they were crushed. Essentially the cat
was let out of the bag, but it did cost tax payers billions in subsidies for
fiber to the home that never happened and the 'duopoly' of cable and DSL.

------
mbell
The transport definition is nice and all but I'd love to see what the plan is
for routing/switching at these speeds. Given the speed at which 100Gbe has
limped into the market I wouldn't expect to see anything capable of handling
single links at these speeds for quite a long time. 1Tbps with with a 1500MTU
is ~83 million packets a second. Just for illustrative comparison a CPU at
3.5Ghz would have 42 clock cycles per packet[0].

[0] I know a general purpose CPU would never be used for something like this,
just putting the numbers into context.

~~~
jsnell
You might be selling general purpose CPUs short. To me the computing hardly
seems like the daunting part, it's the IO (both the wire interface, and all
the internal buses).

To verify that intuition, I just did a quick experiment with an Intel 10Gbps
NIC and a Xeon E5506. The workload was roughly similar to switching: read
packet, parse the headers, extract some fields from the headers and use that
as a key to a hash table lookup, write packet out to a different interface.

This was done using code from a much more complicated system that I didn't
spend any time tuning for just this case, so it's absurdly wasteful (e.g.
spending 10% of the runtime just on useless calls to RDTSC).

The result was 4.3 Mpps, or about 500 cycles / packet. That seems far from 83
Mpps / 42 cycles, but it was using a single core and a single rx queue on the
NIC. Make it parallel with 16 cores / queues, and you're in the right
ballpark. And then remember that we're talking of a 3.5 year old CPU, which
was already low end when it launched.

~~~
noselasd
You're right that it's the IO (buses and memory) that the main limit. The
problem is that a design around a central CPU doesn't scale, there's not
enough IO, not enough pins, if you want to DMA packets from 24 interfaces onto
main memory and have different cores operate on it is just infeasible with
current designs.

------
luckydude
Cool. I still remember arguing for 100Mbit ethernet at Sun and being told that
it couldn't be done, use FDDI.

Personally, I'd love ethernet to be used as the interconnect for more stuff.
USB is kinda weird to me, I'd much rather have my disk drives be POE devices.

~~~
voltagex_
iSCSI + PoE. Sign me up please.

~~~
luckydude
Yeah, something like that. I was arguing for that when iSCSI was pretty new so
I went with a remote DMA model that was somewhat simpler.

I had a whole startup sketched out, it's pretty obvious when you think about
it.

Desktops get a software only version that will do reasonably well and is
cheap.

Power users/small servers get a slightly more expensive (maybe 2x the cost of
a PCI card) that does the remote DMA in hardware; this card is one ethernet
port and only handles one DMA at a time.

Big servers get a multi port card that will run multiple streams at once.

I tried to get the disk drive people interested because there already is a
processor down on the drives that is a general purpose CPU (there are two, one
that controls the disk arm, that's special, and one that manages the cache,
read ahead/write behind, that could run a stripped down Linux or whatever and
do the other side of the protocol).

The drive people never got excited because the cost savings of el cheapo
ethernet wasn't enough to offset the cost of the extra memory they felt they
would need.

------
ken
In 2005, Jeff Atwood reported that 100M ethernet is 10x as fast as 10M
ethernet, but 1G ethernet is only about 30% faster than 100M, even for a
simple loopback test.

[http://www.codinghorror.com/blog/2005/07/gigabit-ethernet-
an...](http://www.codinghorror.com/blog/2005/07/gigabit-ethernet-and-back-of-
the-envelope-calculations.html)

Is this still true? What's the bottleneck?

~~~
zaphoyd
Assuming you arent CPU or memory limited, the bottleneck was originally
largely due to gigabit Ethernet's support of unswitched hubs and as such uses
CSMA to detect collisions. In order for this to work there is a minimum
transmit timeslot to ensure that collisions can be detected at the full round
trip delay of a ~100m network.

Effectively a 1Gbit Ethernet node must transmit at least 512 bytes during its
timeslot. If it has that much to send then it all goes out at 1gbit. If it
only has a few bytes to send (a tcp ack for example) then it must idle for a
bit to make sure there wasn't a collision.

In theory gigabit Ethernet should perform at 1gbit for sustained transfers but
much worse for workloads with more sporadic sets of smaller packets. In
practice, especially on core switches and routers, modern techniques like
frame bursting (sending multiple small packets going to the same destination
back to back) even things out more.

10gbit Ethernet and above get rid of the carrier sense collision detection
support all together and require the entire network to be point to point with
switches. It has no minimum timeslots or packet lengths.

------
ajays
The sad part is that on the consumer side, we are still paying what we were
paying in 2005, for roughly the same amount of bandwidth. If consumer-side
speeds kept a similar pace, either the costs should be halving every 18 months
(modulo a certain fixed infrastructure cost), or the speeds should be doubling
every 18 months at the same price point.

Google Fiber is a great first step, but I think the only long-term viable
solution is to have Municipal Fiber (in dense cities like SF and NY). There is
no reason why we (I live in SF) should not have access to the same speeds as
folks in Seoul and Tokyo, at similar prices.

There's a nascent effort in SF to bring Municipal Fiber to the City (I am just
a mailing list subscriber): <http://sffiber.info/> It could sure use some
love.

~~~
ghshephard
In the Bay area we are finally starting to see a bit of competition at the
$70/month level. Comcast will either deliver you 30mbits/second or AT&T (if
they've hooked up your neighborhood yet) - will send you 24mbits/second.

I was paying $150/month for 2x64 ISDN in 1998/1999. So, my price went from .85
kilobits/1$ to 434 kilobits/1$. 500x increase in price performance in 13
years, which is a bit better than the expected 256x increase over 13 years.

Things have plateaued recently - but I expect to see 1 Gigabit to the home to
be pretty standard in the bay area by 2020 for $70 - which is completely in
line with 18 month price performance doubling. (33x improvement) - it's just a
little bursty.

------
ender7
Can anyone explain how these speeds, if possible, are even useful? For
example, the bandwidth of L2 cache on a core i7 is about 70GB/s, which doesn't
come close to Tbit's theoretical 125GB/s.

~~~
noselasd
Imagine you have a rack of servers. Around 40 of them fit in a rack, you equip
each of them with 10Gb ethernet, they're running HDFS. You plug these 40
ethernet cables into one switch, so they can all talk to eachother, serving
files to its clients, shuffling segments around etc. Now you want to connect
that rack to another rack of 40 servers, on another floor of your building.
Doing that with 1 10Gb cable, and you do not have full bandwidth between all
your 80 servers. Even bonding a handful of 40Gb links will not be enough - and
it'll be a lot of cables you have to manage. Doing it with 1 Tb link and you
start having full bandwidth. Or just imagine connecting 2 buildings. Fewer
cables. Higher bandwidth. The clue is you don't feed all this into 1 server
running a commodity CPU and OS, you feed it into blinking boxes that will
distribute the data to/from a high speed link onto many lower speed links to
reach ordinary servers. Switches/routers can do that at line-rate if you
implement this properly in hardware.

~~~
mbubb
I can imagine it as I am building out a 30 node hdfs cluster with 10g BNT
switches. How do you design 40 nodes to a rack? Just curious...

~~~
noselasd
Racks come in all heights, we have 3 42u racks here populted with 1u boxes.
There's also many designs around smaller half depth servers, enabling you to
fill servers from the front and back.

------
timc3
About time really.

------
guscost
Tsunamiii!!

