Hacker News new | past | comments | ask | show | jobs | submit login
Let's Code a TCP/IP Stack: TCP Retransmission (saminiir.com)
385 points by ingve on July 5, 2017 | hide | past | favorite | 30 comments

I don't really intend to promote this yet but anyway here's my in-progress TCP/IP stack implementation:


It is written in C++14, is platform-independent and works within a single-threaded reactor environment. Microcontrollers are the primary target for example the stack does no dynamic memory allocation at all. I consider the TCP part to be "pretty complete" at this point, it even has PMTUD which lwIP lacks. The TCP code is also quite commented.

C++14, single-threaded and no dynamic allocations sound like good key parameters! I think reasonable alternatives to lwip for embedded systems would be a good thing.

Can something like this be used on a linux PC? Does the input come from reading in a file handle of raw ethernet input like these tutorials?

It is already possible. The current test setup involves my APrinter software (firmware for 3d printers) ported to run on Linux including its web interface. It does indeed use a TAP interface and yes it reads/writes Ethernet frames (the stack does implement Ethernet/ARP).

Linux specific code is here: https://github.com/ambrop72/aprinter/tree/master/aprinter/ha...

However note that the LinuxTapEthernet is not directly for my TCP/IP stack but there is a driver abstraction in between, which is also the part that configures/instantiates the stack (aprinter/net/IpStackNetwork.h).

It is possible to try it out if you have the Nix package manager (since the configuration/build system is based on that):

  python -B config_system/generator/generate.py | nix-build - -o ~/aprinter-build
  ./aprinter-build/aprinter.elf -ttap0
That will result in trying to get an IP address using DHCP (yes DHCP client is included). Static IP can be configured using these commands:

  M926 INetworkDhcpEnabled V0
  M926 INetworkIpAddress V192.168.64.10
  M926 INetworkIpNetmask V255.255.255.0
  M926 INetworkIpGateway V192.168.64.1
The software will run a command line usable via nc on port 23, and also an HTTP server on port 80. To get the HTTP server to actually serve files, an SD card image with FAT filesystem needs to be created and the web interface files put into it (I can explain if anyone wants to try).

What is to be done is a minimal example for integrating the stack and corresponding documentation. For example how to integrate with the reactor, which is abstract from the perspective of the stack but must follow a specific interface. The stack itself only uses timers since other event sources would be used for I/O by e.g. interface drivers. Currently there are two event loops provided, one for Linux based on epoll (aprinter/system/LinuxEventLoop.h) and one that uses busy looping meant for microcontrollers (aprinter/system/BusyEventLoop.h).

The easiest way would be using a tun/tap interface.

If you're going to write a TCP stack to see how it works, it's useful to log all packets that don't advance the connection. This includes all rejected packets, plus any duplicates or ACK-only packets that don't advance the connection sequence. Examining the contents of the bit bucket is useful. I did that in the early days of TCP, mostly to debug other implementations that had interoperability trouble.

What a great series of posts! Thanks for sharing.

Minor nitpick - I wish the author would link to the previous installments in the series at the bottom of the page :)

Thanks! I'll add the links.

I love this series - it is what inspired me to have a crack at it in rust (https://github.com/rthomas/rust_net) - however I am only at the stage of responding to arping, and other shiny things have come along to distract me.

Very informative indeed. Minor nitpick- one should avoid the use of strcpy() - something that (believe it or not) Microsoft outlawed within any code that they wrote which reduced the number of buffer overflows drastically.

I wonder, is TCP generally integrated in ethernet chips, or is it mainly software ?

A major tenet of TCP/IP and the Internet architecture is the end-to-end principle, taking the TCP/IP out of software fits poorly with the idea.

Of course in the real world people have tried it, but it hasn't caught on. There are some niche "specialized hardware talking to specialized hardware" applications where it lives on (RDMA in HPC for example).

The limited TCP acceleration features in ethernet cards that are only used in amenable circumstances have been significant sources of headache and mysterious data corruption bugs, but they have eventually become pretty common HW features - not sure how often the feature is actually used.

The need has also decreased in end-user hardware, since broadband has largely stopped getting faster - we don't have the 10-100 Gbit Internet connections that the curve from the early 2000's would have lead to. Instead we transitioned to choppy 4G on iPhones and left broadband to stagnate...

TCP and IP header checksum offloading is pretty ubiquitous, along with segmentation offloads (i.e. https://en.wikipedia.org/wiki/Large_send_offload).

There is a more heavy handed offload in what Windows calls "TCP Chimney Offload" (https://docs.microsoft.com/en-us/windows-hardware/drivers/ne...), but it isn't as commonly used and is generally disfavored.

There have been NICs that integrate TCP/IP offload features[1]. I'm not sure how widely used they are anymore. It seems when CPU clock frequencies leveled off, and core count started to increase (~10 years ago), there was less rationale to use dedicated hardware versus consuming more available CPU/core resources to deal with the network protocol processing in software.

1 - https://en.wikipedia.org/wiki/TCP_offload_engine

I worked multiple years in writing drivers or low level libraries for companies building network cards (mainly in finance). I also do a lot of networking/electronics at home, including building simple network cards. To answer your question "'m not sure how widely used they are anymore": I would say it's almost impossible nowadays to find an ethernet-only chip that does not have any kind of tcp offload ("toe" for short). 90% of the times the chip also handles ARP & ICMP for you. Most recent drivers are able to offload most of the work to the ship, and fallback on software for complementary features (most on-chip TOEs are still barebone) or when no TOE is present.

All Intel NICs do TCP offload, they are quite widely used.

Do you have a source for more information? I know Intel NICs are mostly regarded pretty highly.

I don't know much about it, but it's briefly described here: https://www.intel.com/content/www/us/en/support/network-and-... (ctrl-f offload)

https://wiki.linuxfoundation.org/networking/toe suggests it's done mostly in closed-source firmware.

An ethernet chip deals with ethernet, at the link layer. TCP is higher up in the stack and not necessary to process ethernet. A few unique network interfaces are designed to process some aspects of IP, and sometimes TCP, to support specific performance demands. TCP will generally be implemented in software/firmware because it's complex and connection-oriented.

A fair number of ethernet chips will handle all the TCP protocol themselves, the upper layers just pass in the address of a large buffer and the chip will take care of chopping it up into packets.

The feature is usually called a "TCP Offload Engine".

That's hardly all the TCP protocol though, it's just the segmentation and checksumming part.

Segmentation and checksumming are very common, with the segmentation often called TSO (Transmission Segmentation Offload) for send and LRO (Large Receive Offload) for receive. However, usually when referred to as a TCP Offload Engine (ToE) it does mean pretty much the whole protocol, as least a functional subset of it anyway. Windows and FreeBSD support ToE with some NICs. Linux has rejected full ToE in mainline for a number of pretty solid reasons:


This is an outstanding way to step through understanding TCP. Thank you Sami!

Yes, and in addition, this is a good way to choose projects. We need to "reinvent the wheel"" a lot more rather than just building on top of frameworks that we have inherited from the past. The more work we do at the lower layers of our stacks, the better the quality of our software will be, in terms of its usefulness. There's so many coders around now that there's not enough work at the top of our stacks anymore

Problem with rolling your own TCP stack is all the RFCs that compose what we collectively know as TCP.

Congestion control and flow control for instance, there are many RFCs that define them.

I like the explanation I've heard that rolling your own TCP stack is a very educational and worthwhile experience. But, what makes TCP so difficult in practice is that there are so many home-rolled stacks out there in the wild that all have different quirks and errors.

So yes, please roll your own. But no, please do not add to the wild zoo of errors by deploying it.

That's literally how the original TCP/IP implementations were tested: by running them against each other in "bake-offs":


I think there's a lot to be said for testing network code against the myriad of real world implementations out there.

Very interesting and yes, interoperability is hard.


Very interesting. I suppose you could reimplement the Ethernet part of the layer on an FPGA?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact