
Nginx optimization: Understanding sendfile, tcp_nodelay and tcp_nopush - dirtyaura
https://t37.net/nginx-optimization-understanding-sendfile-tcp_nodelay-and-tcp_nopush.html
======
Animats
_To avoid network congestion, the TCP stack implements a mechanism that waits
for the data up to 0.2 seconds so it won’t send a packet that would be too
small. This mechanism is ensured by Nagle’s algorithm, and 200ms is the value
of the UNIX implementation._

Sigh. If you're doing bulk file transfers, you never hit that problem. If
you're sending enough data to fill up outgoing buffers, there's no delay. If
you send all the data and close the TCP connection, there's no delay after the
last packet. If you do send, reply, send, reply, there's no delay. If you do
bulk sends, there's no delay. If you do send, send, reply, there's a delay.

The real problem is ACK delays. The 200ms "ACK delay" timer is a bad idea that
someone at Berkeley stuck into BSD around 1985 because they didn't really
understand the problem. A delayed ACK is a bet that there will be a reply from
the application level within 200ms. TCP continues to use delayed ACKs even if
it's losing that bet every time.

If I'd still been working on networking at the time, that never would have
happened. But I was off doing stuff for a startup called Autodesk.

John Nagle

~~~
moe
Are you John Nagle, or was that a quote?

~~~
dirtyaura
He is. Check his Animats blog. And it's a kind of awesome that here we have a
bare-bones internet forum, in which we can have uninformed discussion about
Nagle's algorithm only to be enlightened by Mr. Nagle. Hooray for HN and John
:)

~~~
moe
Indeed! Thanks for confirming :)

~~~
Animats
Yes, it's me. I did my networking work at Ford Aerospace in the early 1980s.
But I left in 1986. It still bothers me that the Nagle algorithm (which I
called tinygram prevention) and delayed ACKs interact so badly.

That fixed 200ms ACK delay timer was a horrible mistake. Why 200ms? Human
reaction time. That idea was borrowed from X.25 interface devices, where it
was called an "accumulation timer". The Berkeley guys were trying to reduce
Telnet overhead, because they had thousands of students using time-sharing
systems from remote dumb terminals run through Telnet gateways. So they put in
a quick fix specific to that problem. That's the only short fixed timer in
TCP; everything else is geared to some computed measure such as round trip
time.

Today, I'd just turn off ACK delay. ACKs are tiny and don't use much
bandwidth, nobody uses Telnet any more, and most traffic is much heavier in
one direction than the other. The case in which ACK delay helps is rare today.
An RPC system making many short query/response calls might benefit from it;
that's about it. A smarter algorithm in TCP might turn on ACK delay if it
notices it's sending a lot of ACKs which could have been piggybacked on the
next packet, but having it on all the time is no longer a good thing.

If you turn off the Nagle algorithm and then rapidly send single bytes to a
socket, each byte will go out as a separate packet. This can increase traffic
by an order of magnitude or two, with throughput declining accordingly. If you
turn off delayed ACKs, traffic in the less-busy direction may go up slightly.
That's why it's better to turn off delayed ACKs, if that's available.

One of the few legit cases for turning off the Nagle algorithm is for a FPS
game running over the net. There, one-way latency matters; getting your shots
and moves to the server before the other players affects gameplay. For almost
everything else, it's round-trip time that matters, not one-way.

------
dugmartin
One note about sendfile - if you are using VirtualBox and serving files out of
a shared folder with nginx (or I assume Apache) you'll need to disable it or
you won't see any updates you do to files once the first version is sent.

~~~
dirtyaura
This VirtualBox bug was the reason that I started reading more about sendfile
and encountered this article about nginx optimizations. The bug actually
causes really weird behavior, in which extra bytes are added at the end of the
file.

~~~
lsaferite
I JUST ran into this issue this weekend! Glad to know I'm not going insane.

Do you have any links specifically discussing the issue?

~~~
falcolas
[https://www.virtualbox.org/ticket/9069](https://www.virtualbox.org/ticket/9069)

Google for 'virtualbox sendfile' and there's a lot of discussion.

Worth noting, and this bit me recently, Go's http file handler also uses
sendfile beneath the covers.

------
justincormack
The "modern" way to do things is to use the splice syscall and related calls
instead of sendfile, which can deal with copying between sockets and appending
headers more directly.

Igor said in a talk I went to that Nginx 1.x was written for FreeBSD first,
while 2.0 will be written for Linux first so perhaps some of these things may
change (hence "nopush" in the config file, the freebsd term).

~~~
dirtyaura
Is slice supported on nginx already? How do you configure it?

~~~
justincormack
No I don't think so, not at present.

------
onn
I always cringe a bit inside, when reading articles like this.

nodelay and cork are different, indeed, but opposites? They both try to
achieve the same effect, put more data in before sending a packet.

> [...] This mechanism is ensured by Nagle’s algorithm, and 200ms [...]

Absolutely not. Nagle's algorithm does not have any delay or timer build in.
It simply holds back non-full packets, when there is data in flight (not
acked). The second half of the problem is delayed acks, but this is not
mentioned in the article, instead it goes on saying

> [...] but Nagle is not relevant to the modern Internet [...]

which is indeed popular belief, but a very superficial analysis that holds no
water if you study it further.

The feeling I always get from articles like this is they border on "technical
religion". It sounds correct, it is technical, it isn't even false, but it
doesn't paint a clear picture, instead it mystifies things further.

The problems nginx had:

1\. nagle and http keepalive don't play nice together, the last bit of data
might be artificially delayed, especially when delayed acks come into play.
nodelay seems needed here. (It is not though, see that Minshall bit.)

2\. how to send headers and use sendfile for the body, and fill the first
packet with more then just the headers? nopush (tcp_cork) is a solution.

------
pjwal
Does anyone else use and/or have thoughts on some of the boilerplate nginx
configuration projects?

[https://github.com/Umkus/nginx-boilerplate](https://github.com/Umkus/nginx-
boilerplate)

[https://github.com/h5bp/server-configs-nginx](https://github.com/h5bp/server-
configs-nginx)

------
shirro
Is there any point to optimisations like sendfile anymore when increasingly
sites are being served over TLS?

------
nodesocket
Our nginx configurations use:

    
    
        sendfile on;
        tcp_nopush on;
        tcp_nodelay off;
    

With tcp_nodelay off. Is the author suggesting we turn it on?

~~~
dirtyaura
Yes, as far as I understood "tcp_nodelay on" is more reasonable for the modern
web, the whole delay business of TCP was reasonable for terminals.

As the latter part of article describes "tcp_nodelay on" is at the odds with
"tcp_nopush on" as they are mutually exclusive, but nginx has special behavior
that if you have "sendfile on", it uses "tcp nopush" for everything but the
last package and then turns nopush off and enables nodelay to avoid 0.2 sec
delay.

------
frankzinger
sendfile is also not necessarily zero-copy:
[https://svnweb.freebsd.org/base?view=revision&revision=25560...](https://svnweb.freebsd.org/base?view=revision&revision=255608)
[https://git.kernel.org/cgit/linux/kernel/git/stable/linux-
st...](https://git.kernel.org/cgit/linux/kernel/git/stable/linux-
stable.git/commit/?id=485ddb4b9741bafb70b22e5c1f9b4f37dc3e85bd) (sendfile
calls splice; the splice manpage also states that it might copy instead of
move pages.)

That said, it equates to a single kernel-to-kernel copy instead of a kernel-
to-userspace plus a userspace-to-kernel copy as in the read/write case.

------
raptium
Nagle Angle and Naggle ?

