Hacker News new | comments | show | ask | jobs | submit login
Intel wants to kill the traditional server rack with 100Gbps links (arstechnica.com)
101 points by orrsella on Apr 10, 2013 | hide | past | web | favorite | 56 comments

They have the parts to do this:

- High speed copper physical layers are cheap - Thunderbolt today has 40Gbps PHYs, and Infiniband/10GBE both extremely inexpensive compared to a few years ago.

- Their chips are getting smaller and more power efficient, and the Atom finally has an ECC part.

- As clockspeed is not longer hockeysticking up, people are transitioning in droves to multi-system parallel task model, which favors lots of medium speed, efficient cores.

In short, in the near future general purpose computing is going to look more like HPC clusters, because the software side of things has, in most cases, caught up.

Latency is a significant issue with this approach. A SSD doing 100,000 IOPS per second that's 3 feet from the CPU is noticeably faster than one that's 100 feet from the CPU.

HPC tends to solve this with Infiniband, which is optimized for very low latency.

I'd assume that Intel's long term trajectory is to have CPU/cache/memory/flash all on one system-on-a-chip "module" that is the smallest replaceable subunit. Larger dedicated storage is delivered over the network interconnect in a tiered architecture.

Since the speed-of-light delay for 100 feet is 100 ns, and SSD latencies are on the order of 10-100 us, this shouldn't be a fundamental issue. Maybe it's caused by the latency to the disk cache (DRAM)? In which case perhaps you could separate that from the disk, and move it closer to the CPU.

At 2Ghz the speed of light in a vacuum delay is a clock cycle every 15cm. Current fiber and copper transmission is about 70% of c, so that's a clock cycle every 10.5 cm.

A few clock cycle might not matter for bulk storage, but Intel is also talking about separating main memory from individual processors. There individual clock cycles do matter. Witness the rise of low latency premium RAM.

We are already doing that, in many steps.

100 feet / speed of light = 101 nanoseconds, round trip that's 202 nanoseconds, but electricity is ~66% of speed of light ~= 300 nanoseconds or .3us. (Fiber is also ~1/3 slower than the speed of light both because it's not a vacuum and the path is not strait.)

Now 10 us vs 10.3 us might not sound like much but it's still 3% slower. And it get's worse when you look at high end DRAM based SSD's which can be faster than 10 us.

"but electricity is ~66% of speed of light"

According to Wikipedia, it depends on the insulation:

== SNIP ==

Propagation speed is affected by insulation, so that in an unshielded copper conductor ranges 95 to 97% that of the speed of light, while in a typical coaxial cable it is about 66% of the speed of light.

== SNIP ==

For LMR-400 (a very common cable for Ham Radio) - the Velocity Factor is about 85% according to http://www.febo.com/reference/cable_data.html

The net-net is, that there are opportunities to get closer to the speed of light if latency is really, really important.

At data rates, transmission lines are used in copper. This is why CAT-5e has higher requirements on the twisted pairs than CAT-3 (or plain old phone line).

66% turns out to be a surprisingly consistent approximation for both copper and fiber.

In a world of commoditized storage isn't latency always the issue (rather than capacity)? In a traditional setup it surely harder to tune the latency requirements for a particular app.

I know nothing about hardware, but i'm curious. Isn't a whole lot of the speed we're getting out of systems these days a ramification of putting everything on the same piece of silicon? If each component is physically separated, isn't that going to impact latency?

Think of the cache pyramid.[0] Latency may be higher, but if there is enough work / data then it's still efficient to do work remotely / put data farther away.

From what I understand, the article is about two things:

1. Hot swappable components.

Oversimplified, but imagine being able to add / remove CPU and RAM to your server like you can with disk space and USB thumb drives.

2. Shareable resources / load balancing.

If one server is using all its CPU and another is using all its RAM, they can now use each other's resources.

You know how AWS is on-demand, scalable computing? This is the same thing but at a hardware level with CPU, RAM, disk, and network resources.

[0]: http://static.ddmcdn.com/gif/computer-memory-pyramid.gif

Memory(except on chip caches), disks, and various other peripherals arn't on the same chip in traditional servers.

They're basically taking the backplane of a motherboard and spreading it across a whole rack instead of inside a single server. Especially memory would take a latency hit if all you do is extend the "cable" length between the CPU and memory, which is why they also need a much faster interconnect.

No, in recent years speed gains are found by adding more processors. I don't have the chart on hand, but a couple years ago the growth rate of speed of a single chip got much lower.

Even without crazy fabric setups, how have we been stagnating at 1GbE for like 10+ years now? Shouldn't 10, with a normal ports-and-switches type of setup become normal someday soon?

The standard thing to do now is 2-4GE per server, link aggregated (possibly split across two switches for redundancy), using multiple 48-port GE switches with multiple 10GE uplinks, and then usually something like the Juniper $10-20k 40-port 10GE switches. Some specific servers like SANs go direct 10GE, particularly if you're going to have 1-2 SAN interfaces and a bunch of clients (so you can use the 10GE uplink ports for it).

Enough things are CPU/memory/etc. bound that the "cheap" building block of a 1-2CPU server with 4GE, RAM, and some local disk is still more appropriate than buying a Xeon E7 with 10GE HBAs, in most cases. There might be an exception if you have per-host vs. per-core licensing for expensive stuff, or other artificial complaints, or a specific component (database?) which doesn't horizontally scale.

I predict $5k 40-port 10GE switches and commodity server on-board 10GE NICs in a couple years, though. Although at that point, you need something crazy to uplink the switches. 40GE is emerging, or you could use a non-ethernet option. SANs are the big application for 10GE now since you can comfortably fit all the clients and servers on a standard 10GE switch and don't need to uplink most of the traffic. Part of the issue with the higher speed ethernets is lack of a copper cabling option, particularly one which works with existing cable plants. (less of a concern within racks).

I think mainly there's not the demand for it - you'd need a fast SSD to saturate 1GbE, and nothing remotely consumer-level can saturate 10GbE yet.

> you'd need a fast SSD to saturate 1GbE

Or pretty much any decently spec'ed new RAID array. Most of our "recent" arrays are in the 300MB/sec to 500MB/sec range and were not particularly expensive.

But you're probably right - while it'd be nice to be able to hang those arrays off iSCSI or export network filesystems that can reach those kind of speeds, it's rare-ish to need it. Saturating GbE in a way that doesn't make it just as easy to "just" bond a couple of interfaces together or spread your IO load over a couple extra servers is a pretty special case still.

I'd love to be able to justify 10GbE in our network, but I can't until it costs almost as little as GbE.

> and nothing remotely consumer-level can saturate 10GbE yet.

Not much consumer level even saturates Fast Ethernet. Especially given how much consumer equipment today still hangs off 54Mbps or below wifi... I don't think that's really an argument - most 1GbE equipment likely still goes to corporate networks.

1Gbps (125MB/sec) is pretty easy from RAID (or even really good drives), and obviously trivial out of RAM.

Standard modern HDD serves up ~150MB/sec, SATA 6Gbps SSD over 500MB/sec for large sequential I/Os. Both easily saturate 1GbE.

At home, I'm running an 8-port SATA RAID controller (but with ZFS, so not using the RAID firmware). I'd say it could saturate 10GbE. But it's true I'm not the average consumer.

You could have at least /glanced/ at current hardware before saying this. The cheap SSDs on the market now can saturate four times 1GbE. There is definitely demand for SSDs, and they're getting to the point where it is not expensive to just have a pretty big SSD and no spinning disk at all. So to me it is surprising there is no inexpensive 10GbE option. People always like it when they don't have to see a loading bar, and even gigabit ethernet is slower than HDD to HDD.

Who says it has to come from hard drives? Look at things like redis or memcached that are serving out of RAM.

But (ordinary) people don't run redis on their home network. 10Gb Ethernet is very common on servers in datacenters today, but it hasn't yet caught on to the consumer market.

Many of the currently available SATA hard drives will saturate 1GbE.

A single HDD can almost saturate 1GbE with sequential reads from the disk with no caching. What if you have 2 HDDs, or are serving data that's cached?

Sata (a single disk link tech. for consumer gear) went from 3 to 6 Gbps years ago, before SSDs went mainstem.

10Gbit is very commonly used in SANs now.

one of the reasons is the stagnating internet connection speeds. In 13 years my net connection speed has only slightly more than doubled (10 to 24 Mbps). we should be getting Gbit internet connections and running 10 Gbit home/office networks.

A 4K HD broadcast is 40 Mbit/s. There are just very few usecases where you need anything faster than 100Mbit, much less 1Gbit.

LAN file transfers with 100MBit/s are very slow, even by todays standards.

I always hate to get big files from our NAS at 7-8MB/s when a proper GBit device can achieve 10x that. That isnt even shared, if 2 or more people try to get files from the thing, you can basically forget it.

About three months ago, we had an IT maintenance event over the weekend where our phones were replaced with VoIP systems. When I came in on Monday morning, I had a brand new Polycom CX-600 on my desk, which was great - but all of a sudden, Visio, VMware, and all of the Apps that were working out of my home directory - were really, really sluggish. Not unusable, but really long lags every time I tried to open a file.

Turns out the VoIP phones were now acting as switches for all the Desktops, and my Desktop had been plugged into the phone, which, you guessed it - only had a 100 Mbit interface.

Moving my desktop back to my (still lit) Gigabit port returned me to my regular speedy connection.

Gigabit makes a big difference. I'd hate to throw around multi-100 megabyte files on 100 megabit network connections.

This sure looks familiar. Intel wants to build mainframes, basically.

I don't see anything wrong with this. The PC (including servers) world went down a path of high physical integration- the bus, the CPU, the memory, and the peripherals are all one unit that couldn't really be carved up.

Now people are realizing they can build a server from a bunch of parts connected by fast fabric, and you can pull/place items in the fabric and use them immediately, or take them apart. The MULTICS machine was actually carved into two pieces live, every night, to run two instances, then re-merged in the morning(!)

Hardware virtualization - and component aggregation - were always great ideas, and now the technology exists to deploy it at the middle-tier commodity server level.

I came in to say this and didn't want to get all the heat. Thanks for stepping up. :)

Turns out it was a popular sentiment! :)

To be clear, I like mainframes. They make certain problems much easier. I worked in the company that maintains the world's largest DB2 installation and it was amazing and terrifying.

Isn't this sort of like turning an entire rack into just a big blade server enclosure?

Yes, actually.

Instead of a dumb rack -- four posts and screw holes in the right places -- you get a blade chassis 72U high. Centralized but redundant power supplies with integrated monitoring and per-feed software control. A built-in KVM. Large slow(er) fans that push more air more efficiently and more quietly, or perhaps a liquid cooling system that provides a clean, standardized disconnect for each component. Assignable resources -- instead of running virtual machines, you run re-configurable real machines, where you start by selecting a number of processors with associated RAM and add in storage.

On the one hand, this will have less in common with consumer hardware, so economies of scale will not be shared. On the other hand, server hardware is already substantially different from high-end consumer hardware, so it's probably not that big a deal.

Heh - I was working with a MIPS processor designer on this exact model of device in 1999 - we basically just sketched out the idea and talked about what it would take to have a fabric rack with a standard set of interconnects so that various vendors could build cards to go into it.

We discussed the challenges of various signaling that would prevent companies from being willing to participate. But this idea is really old.

Is it that different, though? Form factor of the machine/fans is different for a server and the Xeon chips have multiprocessor and ECC support. That's about it, right?

Sounds like Intel is validating AMD's acquisition of Seamicro and their "Fabric Compute Systems"


Yep, tempted to call it a Seamicro clone (if they build it some day).

This does not sit so well on a front page with a "fusion drive to Mars" link. This story has almost as much if and when about it - developing new silicon photonics, a reference architecture in 2014. Sounds a lot like a intel territorial fight than an actual product.

Probably a nice idea though

Whenever a high-profile company gets an article in a prominent publication touting a "new" feature/hardware/system that really is already in use in the industry for years, then you know that feature/hardware/system has arrived. What they announced is just a different spin on pre-existing technology.

Just like the way Cisco, et. al. are co-opting SDN...http://www.lightreading.com/blog/software-defined-networking...

Is it just me or does this seem very much like mainframe architecture from 30 years ago? Not that 30yr design is bad. More that all trends/technology are cycles.

Seems like there are two architectures that are commonly adopted. One is to make the server ever larger and more modular. E.G. separating processors/memory and storage as separate entities that are individually configurable (like this article). The other is to make the server smaller and more parallel. E.G. 1000s of single socket/single hdd servers running in aggregate.

I see this flying about as well as blades do. There's reasons to use blades, but a 42U stack of pizza boxes with a commodity interconnect (even 10gbps) is almost always cheaper than a single blade center.

One of Intel's advantages is being a process node ahead of other foundries: 22nm vs 32nm.

This already exists. This already existed in the 90s, and was in fact very normal. Sun, DEC/compaq, HP, IBM, etc all sold systems like this. Why does intel always pretend doing something 20 years late is innovation?

That's true, but really only in a specious way. If you want to argue that sort of logic, then people were building machines "like this" back in the 60's when the "rack" was a big card cage into which you could plug (sometimes) arbitrary amounts of application-specific CPU/memory/storage/IO resources.

But what's actually happening is that Intel is building a fabric interconnect which can serve data at DRAM-bandwidth-or-higher speeds. And that's certainly something that hasn't been done in the modern world. The headline might lead you to believe it's the architectural idea that's the new thing here, but it's really the pure technology that's the interesting bit.

>But what's actually happening is that Intel is building a fabric interconnect which can serve data at DRAM-bandwidth-or-higher speeds

Yes, that is exactly what I am talking about. It was perfectly normal to purchase systems in the 90s that worked that way. A cabinet powered by a single power node, and you plugged in CPU nodes, memory nodes, I/O nodes, etc.


Right. And yet people stopped doing that when 512 bit wide cache lines pulling at 30-60MHz became the norm. The interconnect just wasn't fast enough, and everything moved onto a single board and stayed there. For almost 20 years the best we've been able to do are things like Infiniband that are far slower than on-board DRAM.

Now (apparently) there's a new interconnect that can do the job. That's news, not "same old boring stuff".

> For almost 20 years the best we've been able to do are things like Infiniband that are far slower than on-board DRAM

The SGI shared-memory big iron machines (Origin & Altix) are more recent and can have memory-only nodes. The most recent NUMA Altix was launched in 2009, I'm not sure if the later machines managed to keep the Origin 2k era goal of having as much remote memory bandwidth as local.


What do you mean right? You said this is different, I pointed out that it is literally exactly the same. That is not agreement.

The rest of your post is simply factually incorrect. People did not stop doing that, such systems continued to exist through the 90s and 2000s. They still exist right now. They became less common, but that had nothing to do with "the interconnect just wasn't fast enough". It was because intel CPUs became the fastest available, and racks full of small intel based systems were (and are) massively cheaper and entirely sufficient for the vast majority of uses. The speed of interconnect fabric kept up just fine.

Sun currently builds systems like this in the high end of it's M range. (M8000, M9000) I'm sure that IBM and everyone else make them as well.

Not Sun anymore sadly, but yeah oracle, IBM, HP all still make systems like that.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact