Hacker News new | past | comments | ask | show | jobs | submit login
Let's code a TCP/IP stack, 1: Ethernet & ARP (2016) (saminiir.com)
287 points by maastaar on June 15, 2018 | hide | past | web | favorite | 47 comments

I use SLiRP over SSH as a poor man's VPN. It's ancient. The bulk of the code is a userspace TCP stack that performs NAT. I can use any SSH host as full VPN with only basic user privileges. I spent quite a bit of time poking around the code to try to improve performance. Increasing the receive window was sufficient to achieve >1Mb/s over a high latency link, but I can't go much higher without <100ms latency. It turns out that SLiRP never implemented window scaling, it was unnecessary for the links back then. The code has been reused for virtualization applications like VirtualBox, etc; but window scaling was never needed for that since client-host latency is basically zero there. Digging through old network code gives an appreciation for how far we've come.

Sounds like sshuttle! Low bandwidth as you said but very useful, I like to use it to vpn to my home where the gateway router is an ssh server.

sshuttle is great too! Great for road-warrior applications where a basic SOCKS proxy won't do, but last time I tried it I found it unstable under heavy load and the iptables routing to a user process can be somewhat restrictive, still super cool though. I don't know why but network tunneling is just fun.

SLiRP, that's something which was used back in early 90's. With Trumpet Winsock and the one and only NCSA Mosaic browser.

Operating System Design: Internet Working With Xinu by Douglas Comer (1987) is a nice explanation with lots of C code of how to add a networking stack to an operating system, in this case his educational unix-inspired XINU.

I actually can't find the book you are referencing. I do see:

"Operating System Design: The Xinu Approach" and also the books "Internetworking with TCP /IPvolumes1-3." Might you have a link to the title you are referencing here? I have read the Internetworking series which is excellent.

This seems to be the book - https://www.amazon.com/Operating-System-Design-Vol-Internetw...

It looks the the second volume that goes with "Operating System Design".

I can't be 100% certain, as I didn't find a table of content for either of the 1st edition books, but the 2nd edition of "Operating System Design" includes a section in implementing ethernet, so perhaps the two volumes got combined into one for the second edition.

That's indeed the book I was referring to. Thank you for clarifying.

I see that now. Great, this looks interesting. Thanks for checking.

Highly recommend. The first TCP/IP stack wrote was before this book and had to write off of RFCs.

But man the three volume Comer books made things so much easier.

For those that rather learn how to implement it in a safer language, Fuchsia's TPC/IP stack is written in Go.


Isn’t garbage collection going to introduce jitter?

Not necessarily, there are several OSes written in GC enabled systems programming languages.

Just because a language has a GC it doesn't mean it is the only means to allocate memory.

Go also allows for global statics, stack and plain old C style manual allocations. It is a matter to learn how to use them.

And make use of profilers as well.

For example, on performance critical paths always use a standard for loop, never a for range one.

GC will always introduce jitter.

You can choose not to use it and allocate statically etc, but that does not mean that use of a GC does not introduce jitter.

Also don't confuse high performance with latency-sensitive. Highly performance-optimised code will still have issues with GC jitter if you or someone else is causing it.

There is a reason why the GC-enabled OSs aren't used for anything in reality.

BTW: There should be no difference between for and for-range loops given an optimiser thats working.

> GC will always introduce jitter.

Hard real-time garbage collectors (including real-time reference counting approaches) can be used in such a way as to eliminate memory-management jitter.

Region allocation - a form of GC - can be used in such a way as to avoid memory-management jitter.

> GC-enabled OSs aren't used for anything in reality.

You might have a narrow definition of what you consider to be an OS. I would argue that things like the JVM, the Erlang BEAM, and even the browser are OS-like enough to qualify.

> GC will always introduce jitter.

> You can choose not to use it and allocate statically etc, but that does not mean that use of a GC does not introduce jitter.

It only introduces jitter if it runs at all.

If one really wants to be drastic in performance critical code, in most GC runtimes it is possible to just turn it off.

On Go's case a runtime.SetGCPercent(-1) will take care of that.

> There is a reason why the GC-enabled OSs aren't used for anything in reality.

And me thinking I had two running on my phones, go figure.

And the default is 100, which means a double in newly allocated memory space. I have written several servers that after starting up (and running uselessly the GC once), never need to run it again.

Care to say which GC-enabled OS you have on your phone?

I'm not talking about GC-enabled runtimes such as a JVM or ObjC/Swift runtime, they're application-level.

Both Mach(IOS) and Linux(Android) are non-GC-enabled OSs.

> For example, on performance critical paths always use a standard for loop, never a for range one.

i am curious, why would one use a normal for loop, instead of a range one?

A range one might introduce boxing depending on the types being iterated.

Or use a language with reified generics :)

Go maps are reified generics, and very efficient ones at that.

What you mean is it's a set of C macros, except built-in to the compiler. Great if it's exactly what you need. And you're prepared for severe WTF moments (e.g. equality on map values is ... not the same as equality on variables of the same type containing the same thing). Also, expect the whole thing to be useless if you need slight complexity (e.g. you can use strings as keys, but not bytes (either Bytes, or []byte, but you can use [5]byte for example).

Oh ... you actually need it to be a red-black tree, or a custom hash function ? Tough.

Not in general, though of course lots of production language runtimes do involve a certain amount of GC-related jitter. But even where such jitter is introduced, it might not be a very big deal for something like TCP/IP.

Not directly related to the article, but there is an experimental effort to develop a userland TCP stack in .NET right now.

It's interesting to see how this looks in a higher level language.


In addition to that, here are some other examples.

Go in Fuchsia,


Oberon in Oberon (network stack but not TCP/IP though)



Active Oberon in A2 - BlueBottle OS (TCP/IP stack)



Mesa at Xerox PARC (Courier RPC, XNX)


Sing# on Singularity


Start at sourceCode\sourceCode\base\Libraries\System.Net\Sockets

Mirage OS written in ocaml has a tcp/ip stack as well. The marshal and unmarshal code is very succint compared to C implementations.

Thanks, forgot about that one, it is even part of Docker for macOS.


Are there any blog posts about this? It looks like "Magma" is also something to do with Minecraft so it's really hard to find info about it.

Does anyone know why you would use an unsigned char array to store the MAC address instead of uint8_t? Char is one byte but a but a byte is not guaranteed to be 8 bits yet MAC is defined as 48 bits

Unless you are adding TCP to your CDC6600 or PDP-8, bytes are 8 bits. Any suggestion to the contrary in standards is fantasy.

That said, since the next field is a uint16_t, you may as well stay in Rome and call them uint8_t. short really is variable on some live architectures.

(Ok, or some DSPs only address words larger than 8 bits, but there's no reason for your compiler not to pick up the slack.)

Also, TIL that __attribute__((packed)) assumes the struct can be malevolently aligned and for some architectures generates very large and slow code to handle that. If you must pack, also add the ",aligned(2)" or whatever you can get away with to mitigate this.

I don't see any reason in modern C to use int/short/long over explicit int32_t/int8_t/int64_t other than they're shorter to type.

I tend to use int for loops (where I know there will always be less than 2^16 iterations!), return codes etc. purely because it allows the compiler to pick a fast word size everywhere (e.g. if somebody compiles your code for AVR, and your loops are all int32_t, your API returns int32_t, they're gonna have a worse time). Otherwise, I fully agree.

Maybe you could use int_fast16_t? (That means, pick the fastest signed integer type which is at least 16 bits.)

Of course, "int" is less typing than "int_fast16_t". But, int_fast16_t possibly makes it more obvious why you've chosen the type you did (i.e. you only need 16-bits, but aren't relying on any overflow so a bigger type can freely be substituted if that gives better performance.)

That's certainly a valid approach, but I've never actually seen those used in practice. I guess, like me, people are lazy and prefer typing just int, which is guaranteed to be at least 16 bits, and is generally the fastest int type (well, excepting some 8-bit archs.. :P).

Where can I read tips like that about modern C, would you have a book to recommend or a blog to follow? Thank you

I can't speak to the utility of either, but a couple of resources I've found recently:



I wish I did. I write C for my day job, and pick up most of this from my teammates.

I have another TIL for you... maybe you're a desktop/server developer... there are many embedded processor platforms where a byte is not 8 bits.

TI's C2000 microcontrollers - a byte is 16 bits. Other TI DSPs also have 16 bit bytes.

Did you mean char?

In C, char is defined to be 1 byte. So its the same here. But they have 16-bit bytes. This is one of the rare cases where byte != octet.

Might I suggest reading https://www.ietf.org/rfc/rfc4042.txt? There they talk about bytes being 9 bits.

It might not be the author's intention, but I think there is now a push for MAC addresses to be 64 bits instead of 48.

Since the fixed width integer types are optional, and if available must be 8, 16, 32 and 64 bits (and two's compl) I don't think CHAR_BITS > 8 is allowed if stdint.h types are supported since sizeof(uint8_t) must be 1.

They're individually optional - so uint16_t can exist without uint8_t.

But you're right - if uint8_t exists, it must be the same as unsigned char.

In case you didn't realize it at first (I didn't until I happened to click on "/home"), this is the first of a 5-part series:

1. Ethernet & ARP (this post)

2. IPv4 & ICMPv4: http://www.saminiir.com/lets-code-tcp-ip-stack-2-ipv4-icmpv4...

3. TCP Basics & Handshake: http://www.saminiir.com/lets-code-tcp-ip-stack-3-tcp-handsha...

4. TCP Data Flow & Socket API: http://www.saminiir.com/lets-code-tcp-ip-stack-4-tcp-data-fl...

5. TCP Retransmission: http://www.saminiir.com/lets-code-tcp-ip-stack-5-tcp-retrans...

Written three TCP/IP stacks and the first was pretty bad as wrote off of RFCs before the Comer books.

But then purchased the Comer books and made it so much easier. Highly recommend buying the Comer books if really interested in writing a TCP/IP stack.

Also if you really want to learning something you write an implementation. To this day makes it so much easier to deal with IP problems, configuration, buying products, etc.




Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact