Hacker News new | comments | show | ask | jobs | submit login
Google NetStack: IPv4 and IPv6 userland network stack in Go (github.com)
122 points by yarapavan 8 months ago | hide | past | web | favorite | 44 comments

A search for "ledbat" in github doesn't find any reference, which is a shame. Apple are the only one with ledbat support in the kernel[0] afaik, and it's functionality that's sorely needed -- it's ridiculous that there's no way to say "data on this connection is not urgent, only give it bandwidth as long as it doesn't interfere with other connections". Some programs, e.g. rsync, let you limit bandwidth - but that's wasteful; I want this 2TB rsync job to be downloading at full speed at all time I'm not doing anything else.

[0] yes, this is userland, but it's for fuchsia, so "kernel functionality" as far as its users are concerned.

From my brief look into ledbat, it seems like it serves a similar role to BBR, which is in the Linux kernel now.

Either way, it looks like the congestion control is just New Reno.


Also, this isn't an official Google product, per the disclaimer; just some code that happens to be owned by Google. Where are you seeing the reference to Fuchsia?

edit: oh, I see. Fuchsia codebase has references to this netstack.

LEDBAT is similar but works on L7 instead of L4 and needs support on both ends. BBR is excellent. It's the first congestion control strategy to really focus on fairness rather than trying to send packets as fast as possible.

Combined with fq-CODEL, BBR finally offers a way to load balance traffic on your NAT without allowing someone to monopolize bandwidth

Wow this is awesome, It bothers me that one computer on a LAN making a big download, takes over the other devices, causing high latency and connections disruption. I mean we are talking about connectivity, it should not happen.

Yup, it's because for the longest time TCP was as aggressive as possible to get the most bandwidth.

Over the last 10 years networks buffers have become gigantic and sending packets as fast as you can causes standing queues that can be a dozen seconds long. This is where the lag comes from when someone on your LAN is uploading/downloading.

The lag actually gets worse with higher connection speed because the devices tend to have even bigger buffers.

BBL and to some extent VEGAS are smarter going about things. They attempt to maintain constant latency instead of maximum bandwidth. For a few percent less connection speed all the buffers stay empty and delay stays minimal

Experimenting with other congestion control algorithms is definitely on the TODO list. BBR is interesting.

Good to know. Though it is still through an undocumented option and considered experimental.

My understanding from a talk at Ignite 2017 is that certain OS functionality (like some Windows Updates) use it, and that it would get more official support later...but nothing I can find yet, so as you say for the developer it's still an undocumented 'experimental' feature. Hopefully this has changed in the latest build, I will have to look more.

https://view.officeapps.live.com/op/embed.aspx?src=https%3A%... slide 31

LEDBAT typically runs over UDP and doesn't need kernel support. Windows has something similar with "background intelligent transfer service" BITS, but LEDBAT works just fine outside kernel.

But you can apply it to TCP as well (only at a kernel/raw socket level), which Apple and Microsoft have already done, which makes it simple for software to use; Apple uses it to download o/s updates in the background.

Implementing it in the kernel also makes programs like “trickle” much easier.

Are you sure they didn't use UDP? It seems like the kernel flow control in TCP would interact badly with LEDBAT

How would LEDBAT help for downloads unless the server supports it? Even then, how does the client indicate to the server that "data on this connection is not urgent, only give it bandwidth as long as it doesn't interfere with other connections"?

If it's TCP, adjust the congestion control to stop sending acks and slow the connection when others have priority. Can't do it for anything like udp though.

> adjust the congestion control to stop sending acks

Huh? Congestion control is enforced on the sender side while ACKs are generated by the receiver.

Apple has a patent on throttling lower priority TCP flows by advertising a reduced receive window based on inter-arrival jitter. I feel this is a cleaner way than delaying ACKs.

Anyways, my point was that LEDBAT wouldn't help.

It's not implementable by local devices: your LAN is much faster than your WAN link, in most cases. Your router/modem does the packet dropping. But only your router/modem knows how much unused throughput is available at any moment, so rsync can't know how much it can use. Sure, rsync can look at TCP packet retransmits, but that will still cause other streams to experience an (assuming a lot of things) "equal" amount of drops too.

So, you get a lot of complexity for not a lot of benefit. However, Fuchsia is destined for mobile devices, which may not have the LAN-WAN bottleneck, so this could see some benefit.

The way ledbat works is by measuring the derivative of round trip time with respect to a throttled receive bandwidth. As long as RTT is constant or decreasing, keep increasing bandwidth; if RTT is increasing decrease bandwidth.

The server has to give you an indication of round trip for that to work - although I suspect there’s some clever way to get it using Some kind of URG or OOB or keep alive even if the server doesn’t natively support TCP ledbat.

If you've negotiated tcp timestamps, and you assume the server will send more data immediately when it gets an ack, you have a convenient round trip time signal in the echoed timestamps. If you don't have tcp timestamps, you can advertise a reduced window and time the round trip between acks that open up the window and packets with sequence numbers in the window.

A nice article about kernel vs. userland network stacks at Cloudflare: https://blog.cloudflare.com/why-we-use-the-linux-kernels-tcp...

As opposed to the fuchsia netstack written in Go by Google https://github.com/fuchsia-mirror/netstack

Why so many netstacks in Go? I would imagine even the tiny GC pauses Go has would be undesirable for a netstack.

Fuchsia's netstack is based off this netstack. Fuchsia's netstack is essentially a wrapper of this netstack with link endpoint implementations to access NICs via magenta.

GC systems programming languages are a thing since Xerox PARC, the only part missing so far, was having a well known company with the political willingness to push them into mainstream at whatever cost, regardless of what the anti-GC crowd thinks.

tcp was designed a long time ago when computers were slow, and precise timing is not a hard requirement; also you can design a stack so it doesn't allocate in the fast path (gc or not, allocation is slow). That said, Go is still a weird choice as the runtime is not ideal for system applications.

I specifically said gc was not an issue but that Go's weird threaded runtime was more of an problem. You missed Mirage from your list, among others.

I specifically stated some examples. The purpose wasn't to list them all.

Mirage OS has the problem that it still relies on Xen, so sometimes listing it gives argumentation power to those disregarding the dependency as a convenience, rather than lack of support on OCaml for such kind of programming.

Go's weird threaded runtime is nothing new. That is how Active Oberon tasks (aka Active Objects) are implemented.

I would guess that Go's tiny pauses would be less of a concern than it's overhead compared to C, C++ and Rust where there is more room for low level optimizations not available in Go.

Not particularly important, I'm just curious as to how much the GC is really responsible for slowness as opposed to just correlating with languages that don't allow controlling heap vs stack allocations.

That paper was written by high school students; the only thing MIT about that paper is the hostname. It reads like a fun project but after skimming I would not consider it something authoritative on whether golang (or csp) is a better mechanism for writing a userspace network stack.

I'm somewhat sad they didn't include a few things:

- a box/whisker plot in the latency comparison graph — esp. if we're to talk about Go's GC...

- some discussion/arguments why the particular C implementation was chosen ("tapip") — I'm not an expert in this area so I don't have the slightest idea how notable it is;

- how did they detect/measure the claimed memory leaks in C? also, some statistics about the claimed crashes?

- isn't clear to me if they used some "well known" load testing tools, or some homemade framework? (e.g. "siege" is a tool I read about more than once?)

- more details on how exactly the "correctness was determined by testing against Linux kernel [implementation]".

That said, my initial loose conclusions from this seem to be:

- it appears it may be easier to write a correct implementation in Go (the implementation seems to be written just ad hoc by the article authors?) than in C;

- it appears it may be easier in Go than in C to write an implementation scaling well w.r.t. average latency & throughput, assuming the need is for a multi-threaded & user-space implementation.

This is the same network stack (with a little bit of drift).

We intend to merge them into one repository soon. I am just a little disorganized.

So i'm not an expert in kernel context switches... would running a network stack in userland improve performance in any meaningful way for your average network service written in go? or is there some other purpose for this?

No, tcp is unlikely to be the bottleneck for a typical Go network server.

It's the same netstack used in Fuchsia OS

I wonder if one could abstract tcp on top of udp using this.

If by "abstract tcp", you mean implement reliability and in-order delivery then, yes you can. QUIC[0] is the most promising of such approaches.

[0]: https://en.wikipedia.org/wiki/QUIC

Not sure to what extent keeping up with the Joneses is a factor here (or indeed which party is the one trying to keep up), but Apple recently moved to a userspace/in-process TCP/IP stack in iOS 11 (and in the corresponding watchOS and tvOS versions).

Seeing that this is for fuchsia makes that seem relevant.

Yes! I've been waiting for a fast userland OSS network stack forever. There's things like DPDK for IP but nobody has taken up the monumental task of getting TCP working and reliable

f-stack, based on libuinet works over DPDK. https://github.com/F-Stack/f-stack

Libuinet is the FreeBSD TCP stack in user space (including over netmap) https://github.com/pkelsey/libuinet

There is also work in FD.io on userspace TCP & UDP. https://github.com/FDio/tldk

Can someone explain what the use cases for this are?

a few things come to mind :

  - unikernel go
  - making better parallel abstractions over network streams without having to deal with the mess of a kernel interface
  - as a platform to implement more integrated network policy (like congestion control, or state management for ddos protection)
  - ultimately might be really helpful for portability by requiring only a raw packet interface from the host OS
  - much easier environment for tying in w/ SDN and QOS (which google seems to be pretty gung ho on)
there are probably a lot more. I looked a little at the repo, and they went for the most readable version that leans very heavily on the rich go runtime...which I think is a great call if you want to play around. messing with the linux network stack is a bit fussy and painful, there is a lot of .. stuff.. in there

How does Google use this internally? Is it for cloud?

This seems to have been initially started as personal project by some Googlers. Now it is used in Google's upcoming operating system Fuchsia: https://github.com/fuchsia-mirror/netstack

No, mist.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact