Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The term “non-blocking” may well originate with circuit switching, but Ethernet switches have referred to the behavior of supporting full line rate between any combination of inputs and outputs as “non-blocking” for a long time. (I wouldn’t call myself an expert on switching, but I learned IOS before there was a thing called iOS, and I think this usage predates me by a decent amount.)

> With ethernet you put a frame on the wire and hope.

This is not really true. With Ethernet, applications and network stacks (the usual kind — see below) ought to do their best to control congestion, and, subject to congestion control, they put frames on the wire and hope. But network operators read specs and choose and configure hardware to achieve a given level of performance, and they expect their hardware to perform as specified.

But increasingly you can get guaranteed performance on Ethernet even outside a fully non-blocking context or even performance exceeding merely “non-blocking”. You are fairly likely to have been on an airplane with controls over Ethernet. Well, at least something with a strong resemblance to Ethernet:

https://en.m.wikipedia.org/wiki/Avionics_Full-Duplex_Switche...

There are increasing efforts to operate safety critical industrial systems over Ethernet. I recall seeing a system to allow electrical stations to reliably open relays controlled over Ethernet. Those frames are not at all fire-and-hope — unless there is an actual failure, they arrive, and the networks are carefully arranged so that they will still arrive even if any single piece of hardware along the way fails completely.

Here’s a rather less safety critical example of better-than-transmit-and-hope performance over genuine Ethernet:

https://en.m.wikipedia.org/wiki/Audio_Video_Bridging

(Although I find AVB rather bizarre. Unless you need extremely tight latency control, Dirac seems just fine, and Dirac doesn’t need any of the fancy switch features that AVB wants. Audio has both low bandwidth and quite loose latency requirements compared to the speed of modern networks.)




> With Ethernet, applications and network stacks (the usual kind — see below) ought to do their best to control congestion

Exactly. Network endpoints infer things about how to behave optimally. Then they put their frame on the wire and hope. The things that make it possible to use those networks at high load ratios are in the smart endpoints: pacing, entropy, flow rate control, etc. It has nothing at all to do with the network itself. The network is not gold plated, it's very basic.


> high load ratios

A ratio has a numerator and a denominator. I can run some fancy software and get (data rate / nominal network bandwidth) to be some value. But the denominator is a function of the network. I can get a few tens of 10Gbps links all connected together with a non-gold-plated nonblocking switch that’s fairly inexpensive (and even cheaper in the secondary market!), and each node can get, say, 50% of 10Gbps out as long as no node gets more than, say, 50% of 10Gbps in. That’s the load ratio.

Or I can build a non-gold-plated network where each rack is like this and each rack has a 100Gbps uplink to a rather more expensive big switch for the whole pile of racks, and it works until I run out of ports on that big switch, as long as each rack doesn’t try to exceed the load ratio times 100Gbps in aggregate. Maybe this is okay for some use case, but maybe not. Netflix would certainly not be happy with this for their content streaming nodes.

But this all starts to break down a bit with really large networks, because no one makes switches with enough ports. So you can build a gold plated network that actually gets each node its full 10Gbps, or you can choose not to. Regardless, this isn’t cheap at AWS scale. (But it may well be cheap at AWS scale, amortized per node.)

And my point is that egress does not add meaningful cost here. A 10Gbps egress link costs exactly as much, in internal network terms, as a compute or object storage node with a 10Gbps link. For an egress node, it costs whatever 10Gbps of egress transit or peering costs plus the amortized internal network cost, and a compute node costs whatever the cost of the node, space, power, cooling, maintenance etc costs, plus that internal network link.

So I think there is no legitimate justification for charging drastically more for egress than for internal bandwidth, especially if the cost is blamed on the network cost within the cloud. And the costs of actual egress in a non-hyperscaler facility aren’t particularly secret, and they are vastly less than what the major clouds charge.


I think one thing that's different about GCP is that Google itself is such an overwhelming tenant. If you read their SDN papers, their long-distance networks are operated near 100% nameplate capacity. For Google, I don't think squeezing customer flows is free.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: