
What happens if you write a TCP stack in Python? - jvns
http://jvns.ca/blog/2014/08/12/what-happens-if-you-write-a-tcp-stack-in-python/
======
tptacek
The idea that Python is so slow that it's confusing TCP sounds wrong to me. I
think it's more likely that your packet capture scheme is slow. It looks like
you're using scapy, which I assume is in turn using libpcap... which may be
buffering (in high-performance network monitoring, the packet capture
interface goes out of its way to buffer). Which is something you can turn off.

About 13 years ago, I wrote my own programming language expressly for the
purpose of implementing network stacks, and had a complete TCP in it; I didn't
have this problem. But I do have this problem all the time when I write direct
network code and forget about buffering.

Similarly: "Python is so slow that Google reset the network connection" seems
a bit unlikely too. Google, and TCP in general, deals with slower, less
reliable senders than you. :)

What's the time between your SYN and their SYN+ACK?

~~~
cbhl
The Google homepage is only about 20000 bytes... if we assume an maximum
segment size of ~1400 bytes, then 14 or 15 packets is about right.

I wouldn't be surprised if Google is sending the packets all at once and
ignoring the ACKs altogether.

Heck, there's even a 2010 paper from Google on the subject of sending a bunch
of packets at the beginning of the connection: _An Argument for Increasing TCP
's Initial Congestion Window_[0]

[0][http://research.google.com/pubs/pub36640.html](http://research.google.com/pubs/pub36640.html)

------
sly010
If an 8MHz microcontroller is fast enough to implement TCP, then Python should
be fast enough too.

Here is my two cents on the expirement:

1\. You don't really have to ack every packet, you have to order them, drop
duplicates and ack the last one.

2\. Google ignores the TCP flow control algorithm and sends the first few
hundred packets very quickly without waiting for acks. They do this to beat
the latency of the flow control algorithm. That's why you end up with so many
packets on your side. You could just try anything but google, and you would
probably see that you have a less insane packet storm.

~~~
Dylan16807
Even 8MHz is massive overkill. It's almost fast enough to bit-bang ethernet.

~~~
jacquesm
I'd like to see you bit-bang a 10MHz signal with an 8 MHz clock.

You'd need something substantially faster (Nyquist and such).

~~~
TickleSteve
TCP != ethernet. A TCP stack can run over serial and othe link layers (PPP,
SLIP, etc). PPP is absolutely doable on very small microcontrollers.

~~~
Sami_Lehtinen
Problem is that using PPP or SLIP doesn't relieve you from having to implement
at least some kind of crude TCP stack.

But if you use TCP offloading, then you can communicate using TCP as cheaply
as doing any traditional serial communication. I've seen those crude TCP stack
implementations and those are barely LAN capable. Using those over Internet
wouldn't be a great idea. Often these TCP stacks are used just as serial - tcp
converters. Using RWIN of 16 (max) bytes due UART buffering limits etc. Not
sending ACK before whole 16 byte is cleared and so on.

So with really low end devices, it's better to off load TCP to custom
hardware. Many embedded devices use that approach. In such cases you'll only
call to IP address and port, and when it connects, you'll have your TCP/IP
connection in pure serial. TCP-modems isn't any different from other good old
Hayes modems. It's just dial-up over Internet and TCP.

Btw. If the system is able to run CPython, Linux, or something similar, it's
already a pretty powerful system.

~~~
Dylan16807
Even without any kind of offloading, you can manage a perfectly good TCP
implementation and super-minimal web browser with a few kilobytes of ram and
100KHz.

------
kashif
I have actually implemented a TCP stack in python, unfortunately I can't share
it publicly just yet.

The authors problems are because she is not using RAW_IP_SOCKETS.

Making TCP packets using pythons struct module is a breeze. I can post
specific examples in code if anyone is interested.

Finally you can write a proper TCP stack in python, there is no reason not to.
Your user-space interpreted stack won't be as fast as a compiled kernel-space
one - but it won't be feature short.

PS: I guess, Google is probably sending him a SSL/TLS handshake which he isn't
handling.

Edit: Corrected author's gender as mentioned by kind poster.

~~~
blutoot
I believe the author is a "she". It says at the banner of the blog.

~~~
blutoot
I think this downvoting thing is out of control. What was even remotely
offensive about this correction?

~~~
dragonwriter
"offensive" isn't the only reason for downvoting. "Not a substantive addition
to the conversation" is one (of many) other bases. If you are going to
complain about downvoting [1] why do so based on the unwarranted assumption
that the downmod must be for offensiveness?

[1] And you shouldn't, see the Guidelines [2] under "In Comments"

[2]
[https://news.ycombinator.com/newsguidelines.html](https://news.ycombinator.com/newsguidelines.html)

~~~
vacri
In my opinion, if HN doesn't want puzzled people to ask why they've been
downmodded, then instead of putting a line in the Guidelines saying "don't do
this", they should separate the downmod into "I disagree" and "flag this as a
bad comment", with the latter item the one that reduces the text weight. It's
clearly broken - people have been doing this for as long as I've been here,
and it's not going to change (sadly, 'it' means both the behaviour, and HN's
braindead downmod mechanism).

------
smutticus
I'm the author of hexcap([http://www.hexcap.org](http://www.hexcap.org)), an
ncurses libpcap file editor and packet generator. I've also written many Scapy
applications like this
one([https://github.com/smutt/mcastClients](https://github.com/smutt/mcastClients)).
I rewrote the DHCPv4 client in Scapy since the stock one is broken. Also as
part of hexcap have made numerous fixes to dpkt. Needless to say, I've done a
lot with Python and packets.

If you're interested in writing a TCP/IP stack in Python I would recommend you
use Python raw sockets, or possibly dnet[1] or pcapy[2]. The Scapy level of
abstraction is too high for your needs.

I agree with other posters who mention buffering in libpcap. Read the man page
for pcap_dispatch to get a better idea of how buffering works in libpcap. Also
try capturing packets with tcpdump with and without the '-l' switch. You'll
see a big difference if your pkts/sec is low.

Don't do arp spoofing. If you're writing a TCP/IP stack then you need to also
code up iparp. If you don't want to do that, then use raw sockets and query
the kernel's arp cache.

On second thought you really need to use raw sockets if you want this to work.
Using anything pcap based will still leave the kernel's TCP/IP stack involved,
which is not what you want.

[1] [http://libdnet.sourceforge.net/pydoc/private/dnet-
module.htm...](http://libdnet.sourceforge.net/pydoc/private/dnet-module.html)
[2]
[http://corelabs.coresecurity.com/index.php?module=Wiki&actio...](http://corelabs.coresecurity.com/index.php?module=Wiki&action=view&type=tool&name=Pcapy)

------
jnbiche
This is a fun write-up. If you enjoy this kind of playing around with
networking in a dynamic language, and don't want to have to worry about ARP
spoofing to do these kinds of experiments, you may want to take a look a Snabb
Switch. It provides userland networking in Lua, connecting to the NIC directly
(only a handful of popular NICs currently supported) [0].

I've not used it yet, but I've read over the documentation and am itching for
an opportunity to do so.

0\.
[https://github.com/SnabbCo/snabbswitch](https://github.com/SnabbCo/snabbswitch)

~~~
voltagex_
I can't find the list of supported NICs in the documentation, but I found
[https://github.com/SnabbCo/snabbswitch/blob/master/src/lib/h...](https://github.com/SnabbCo/snabbswitch/blob/master/src/lib/hardware/pci.lua#L54)
which suggests that only a very small subset of Intel NICs are supported. One
of those may be emulated by VirtualBox, though.

------
mholt
ARP spoofing... clever. This was an amusing read and really informative, too.
There is definitely something to be said for explaining lower-level concepts
(e.g. TCP handshakes) using the common tongue. IMO, not a bad way to begin
learning. Someone could perform the same experiment now and use Wireshark to
see the raw packets, then draw conclusions to what is happening.

Anyone know why the Python program is so slow? I'm looking at the code and my
first guess would be this part[1] but I can't explain why, overall, it would
be so slow that a remote host would close the connection.

[1]
[https://github.com/jvns/teeceepee/blob/7e93970e16fbb0c3d4bee...](https://github.com/jvns/teeceepee/blob/7e93970e16fbb0c3d4bee54bb9fb019b702fae5e/teeceepee/tcp.py#L144)

~~~
delluminatus
I would be interested in seeing if using asyncio would resolve the network
issues.

------
blutoot
This wins the Internet today for me... "my kernel: lol wtf I never asked for
this! RST! my Python program: ... :("

------
kfnic
Shouldn't the TCP handshake look like this:

\---- SYN ---->

<\-- SYN/ACK --

\---- ACK ---->

rather than having the client send two SYNs to the server?

~~~
jvns
absolutely! Fixed :)

------
malone
I like your solution to prevent the kernel from interfering with your packets.

An alternative method I've used in the past is to add an iptables rule to
silently drop the incoming packets. libpcap sees the packets before they are
dropped, so you'll still be able to react to them, but the kernel will ignore
them (and therefore won't attempt to RST them).

------
kalleboo
It seems odd to me that Google would time out that quickly. You could never
reach Google from a GPRS connection if that was the case. I'd investigate the
ACKs you're sending. Are you missing packets or sending them in the wrong
order?

In Uni we had a networking course where we got to build a network web server
software stack from the bottom up, starting with Ethernet/MAC, TCP/IP, and
then on the top HTTP, all being flashed onto a small network device (IIRC it
was originally a CD-ROM server). It was an extremely enlightening exercise. I
recommend you go deeper instead of just using a premade Python library for
TCP!

------
benjamincburns
Please keep writing, Julia. The next time somebody asks me why I spend so much
of my life absorbed by a screen, I'll point them at your blog and say "because
discovery is exciting!"

------
beering
I wonder if it would have a better success rate on a site other than Google's,
since I've heard that Google's done extensive tuning of their TCP stack to
send page data faster.

Somebody's oversubscribed $3/month shared PHP hosting might not ramp up the
speed as quickly.

~~~
wmf
TCP is TCP. If you advertise a certain window size you better be prepared to
receive that data.

------
latiera
It's not Python that's slow, but scapy, which is __dog slow __. In fact, it is
so slow that it should come with big WARNINGs that it 's only really meant for
interactive use. Do the dissection yourself or use something built for that
purpose.

It's really surprising to me that lots of ppl are using scapy for things that
require performance but then again if you look at the scapy website or the
docs, it's not immediately apparent that their tool is not meant for this.
Which I guess says a lot about the scapy developers rather than the scapy
users.

tl;dr Scapy is a joke, performance-wise.

------
peterwwillis
You don't need to spoof a different MAC or IP to implement your own stack on a
raw socket, Python is not too slow to handle negotiating a connection, and
your interpretation of how tcp/ip works is flawed. I highly recommend you read
a good book about tcp/ip and learn how the kernel works with network
applications of different types.

In terms of using Scapy for your packet crafting, here are some guides with
examples that may help you work around your issues. (Hint: use the Scapy-
defined sending and receiving routines and don't implement your own, or stop
using Scapy and implement your own raw packet sockets)
[http://securitynik.blogspot.com/2014/05/building-your-own-
tc...](http://securitynik.blogspot.com/2014/05/building-your-own-tcp-3-way-
handshake.html) [https://github.com/yomimono/client-
fuzzball/blob/master/fuzz...](https://github.com/yomimono/client-
fuzzball/blob/master/fuzz_tcp.py) [https://www.sans.org/reading-
room/whitepapers/detection/ip-f...](https://www.sans.org/reading-
room/whitepapers/detection/ip-fragment-reassembly-scapy-33969)
[http://www.lopisec.com/2011/12/learning-scapy-syn-stealth-
po...](http://www.lopisec.com/2011/12/learning-scapy-syn-stealth-port-
scanner.html)

~~~
jtakkala
I think you'll find that either publishing an ARP entry or filtering incoming
packets in the kernel is required for handling a TCP stream over raw sockets.

As the outbound TCP SYN is manually crafted and sent over a raw socket,
without any corresponding state table entry on the sender's kernel, incoming
TCP responses will be rejected by the kernel with a RST.

I suggested to Julia that she manually publish an ARP entry for another IP
which she could send and receive on. The kernel not having an interface with
that IP assigned to it would ignore responses while also passing them to the
raw socket. An alternative would be to use an iptables rule to drop incoming
packets for the relevant flow - although that may be more difficult to manage
depending on what you're doing.

------
js2
All three volumes of TCP/IP Illustrated may be found on the Internet in pdf
form, but they are well worth buying.

Tangent: One of my favorite interview questions is to ask how traceroute
works. The question works best when the candidate doesn't actually know. Then
we can start to discuss bits and pieces of how TCP/IP works, until they can
puzzle out the answer.

~~~
fivre
Do you know if there's a good IPv6 equivalent? I have IPv6 Core Protocols
Implementation but like the writing style TCP/IP Illustrated much more.

~~~
wmf
There is a 2nd edition.

------
srean
Too late to join the story, but I am really curious if datacenter nodes
intended for heavy mapreduce use implement this layer in user space.

The bottleneck for such processes is typically network I/O and I can imagine
that taking control of the network in the user space might offer some modest
to significant wins. For Hadoop in particular network packets needs to
traverse quite a few layers before it is accessible to the application.

Has anyone done this sort of a thing for mapreduce. Pointers to any writeup
would be awesome.

In fact TCP itself might be an overkill for mapreduce. The reduce functions
used are typically associative and commutative. So as long as the keys and
values are contained entirely within a datapacket, proper sequencing is not
even needed. Any sequence would suffice.

~~~
wmf
There has been some work about using zero-copy RDMA and RoCE (which bypass the
kernel as a side effect) for analytics. Hadoop in particular is so slow that
the kernel is unlikely to be a bottleneck, but more optimized runtimes like
Spark might benefit.

~~~
srean
I dont understand why is Hadoop so freaking slow. I am no fan of Java (to put
it mildly) but Java does fairly well to keep in the 70~80% of a C++ code at
the cost of 4 to 5 times more memory. My experience is that Hadoop is 4 to 6
times slower.

Is it because of bad choice of internal algorithms, bad choice of internal
data structures ? Bad I/O design ? Given the popularity it enjoys, and given
its age, its a little frightening how much worse Hadoop is in comparison to
some proprietary implementations. My hunch is that Hadoop's slowdown is in the
shuffle phase, which is where faster network data transfer can help.

I like abstractions that spark exposes but it still needs a lot of engineering
to catchup. I have anecdotes where Spark is slower than Hadoop by quite a bit.

All my experience is with Hadoop 1.x. Is Hadoop 2.x much better ?

------
lukego
Cool :).

I also learned a lot about networking by writing a TCP/IP stack in Common
Lisp.
[http://lukego.livejournal.com/4993.html](http://lukego.livejournal.com/4993.html)
if you are interested.

------
lnkmails
As someone who worked on writing protocol specs as code for simulation
purposes, I can see how much fun this is. A pure python network simulator (ns2
is C++ and painfully hard to debug) would actually be nice and encourage a lot
of theoreticians to get into real programming. I've spent a reasonable amount
of time in the industry building distributed systems and I can say with
confidence understanding low level protocols improves your thinking.

------
nodivbyzero
Check this out: [http://gafferongames.com/networking-for-game-
programmers/sen...](http://gafferongames.com/networking-for-game-
programmers/sending-and-receiving-packets/)

[http://gafferongames.com/networking-for-game-
programmers/rel...](http://gafferongames.com/networking-for-game-
programmers/reliability-and-flow-control/)

C++, but very detailed articles.

------
beagle3
While not immediately relevant, if you find this discussion interesting, have
a look at sshuttle:

sshuttle[0] is a pure-python one-way TCP VPN solution that is very well
behaved and quite efficient. The source is highly readable as well. +1 to
everything Avery Pennarun has released (including wvdial, bup)

[0]
[https://github.com/apenwarr/sshuttle](https://github.com/apenwarr/sshuttle)

------
tjz77
What if every python virtual machine had a full TCP/IP stack?

IPv6 has enough address space. Object storage takes care of disk access.
Generally it might be way less efficent? What would the OS look like? It seems
like a lot of OS services would disappear? You'd have a cloud of processes.
Each process vm would be like a cell in a body. Maybe each process vm would
load an auth module. Or not.

------
Sami_Lehtinen
There's no reason to ACK every packet. It's very common to deal with things
selectively. SACK: [http://packetlife.net/blog/2010/jun/17/tcp-selective-
acknowl...](http://packetlife.net/blog/2010/jun/17/tcp-selective-
acknowledgments-sack/)

------
norswap
If you're interested in the matter, here's more background info on userland
TCP stacks:

[http://perso.uclouvain.be/olivier.bonaventure/blog/html/2013...](http://perso.uclouvain.be/olivier.bonaventure/blog/html/2013/12/01/sandstorm.html)

------
fragmede
Unfortunately, most people are going to read the headline, read the conclusion
(that Python is too slow for TCP), and not realize that it's wrong.

Hopefully someone will blog a response post that gets popular on HN proving
just how wrong it is.

------
wslh
And in Squeak (Smalltalk) ? take a look at:
[http://www.squeaksource.com/@Hl1Cdo4NwCmLqQl0/Im9cEg0J](http://www.squeaksource.com/@Hl1Cdo4NwCmLqQl0/Im9cEg0J)

~~~
tonyg
I'm guessing you were linking to
[http://www.squeaksource.com/Net.html](http://www.squeaksource.com/Net.html)
\- Seaside uses non-persistent URLs by default, leading to problems like
this...

------
philangist
Fun read. Does anyone here know how to deal with the Python being slow at
sending ACK packets problem? Or is it a built-in limitation that comes with
dealing with high level languages?

~~~
crazypyro
Implement it in C.

Also its likely more a python problem than a "high level" language problem.

~~~
mkonecny
This has nothing to do with Python or C. More likely he has a bug in his code.

~~~
crazypyro
Python being magnitudes slower than C for networking code has a lot to do with
Python, actually. Other smarter people have already explained it better than I
can up above.

------
c_plus_minus
Ha! Nice read, perfect after lunch material with my coffee :) Good job

------
kernelwaste
This is not an implementation of a TCP stack.

------
cookiemonster11
This is not a TCP stack.

~~~
prht
Indeed

~~~
cookiemonster11
It confuses me that people make such a big deal of their little 20 lines of
code toy projects.

~~~
michaelhoffman
It's not a little 20 line of code toy project. It's an engaging and accessible
writeup of some basic parts of TCP that happens to include some easy-to-
understand code.

It's pointless to people who understand how TCP works in depth, but the
majority of programmers don't.

~~~
peterwwillis
It's an engaging and accessible writeup, true. Unfortunately, there are also
several glaringly incorrect/misleading points in the article. The fact that
it's getting upvoted is just..... strange.

------
zzzeek
My first thought was "it'll be slow as shit", then I clicked the article to
confirm. Win!

------
prht
I don't know what TCP stack is, but in case if you are interested in
implementing TCP/IP stack
[http://git.savannah.gnu.org/cgit/lwip.git/tree/src](http://git.savannah.gnu.org/cgit/lwip.git/tree/src)

------
kesor
And there is no mention of ScaPy?
[http://www.secdev.org/projects/scapy/](http://www.secdev.org/projects/scapy/)

~~~
ayrx
" I was much more comfortable with Python than C and I’d recently discovered
the scapy networking library which made sending packets really easy."

Did you even _read_ the link?

