
Nginx reverse proxies retries PUT/POST/DELETE on response timeout by default - JensRantil
https://trac.nginx.org/nginx/ticket/488#comment:4
======
JensRantil
This essentially became the laser of death the other day and lead to cascading
failure which eventually brought down our system. That's why I'm posting this.

Very few people know about this and it's really scary. I'm happy people are
voting this up to increase some awareness of this.

Potential workarounds: You might think disabling `proxy_next_upstream timeout`
will do, but that will also disable connection timeout retry which is not what
you want!

Increasing `proxy_connect_timeout` is not an option, because then you risk
filling up too many connections in the nginx instance if the upstream server
swallows SYN packets or whatnot.

The real workaround: Use haproxy. Serioously.

~~~
mhotchen
We've hit 3 problems with nginx:

1\. Exactly this, we had mystery double trades from our clients and it took us
a long time to realise it was nginx assuming we timed out and routing traffic
to the next server

2\. It doesn't do health checks. When a server goes down it will send 1 out of
every 8 real requests to the down server to see if it responds. Having
disabled resubmitting of requests to avoid the double trade issue above this
means when one of our servers is down, 1 out of every 8 requests will have an
nginx proxy error which is significant when you have multiple API calls on a
single page

3\. This isn't something I've personally hit so can't explain the nitty gritty
but it's something one of my coworkers dealt with: outlook webmail does
something weird where it opens a connection with a 1GB content size, then
sends data continually through that connection, sort of like a push
notification hack. Nginx, instead of passing traffic straight through, will
collect all data in the response until the response reaches the content size
provided in the header (or until the connection is closed). I don't know if
nginx is to blame for this one or not, but I do feel that when I send data
through the proxy, it should go right through to the client, not be held at
the proxy until more data is sent.

HAProxy also solved our issues and is now my go-to proxy. Data goes straight
through, it has separate health checks, and it better adheres to HTTP
standards. It can also be used for other network protocols which is a bonus.

~~~
ende42
3\. is the reason why NGinX is the recommended proxy in front of webapps with
scarce parallelism (for example Ruby with Unicorn; see
[http://unicorn.bogomips.org/PHILOSOPHY.html](http://unicorn.bogomips.org/PHILOSOPHY.html)
for an explanation) when "slow clients" are to be expected. NGinX is
protecting the webapp from blocked workers by slow clients and Outlook Webmail
seems to behave just like one. I don't know by heart how to tune this behavior
if one wants to avoid it but this property is the main reason we use NGinX.

~~~
vsl
That's… unique - and wrong - spelling of the name. (Pet peeve of mine, people
spell my app's name in all sorts of bizarre ways too.)

------
owengarrett
This is unfortunate behavior on timeout, and we've shared a workaround
solution available using maps. There's a configuration example in this Gist:
[https://gist.github.com/thresheek/2fa6479ffb7aca710493](https://gist.github.com/thresheek/2fa6479ffb7aca710493).

We're also going to prioritize a complete fix in the product, and encourage
your comments and input on this ticket:
[https://trac.nginx.org/nginx/ticket/488](https://trac.nginx.org/nginx/ticket/488)

Disclaimer: I work @ NGINX. Thanks, Owen

~~~
JensRantil
Double post. See answer here:
[https://news.ycombinator.com/item?id=11221392](https://news.ycombinator.com/item?id=11221392)

------
colanderman
How is one supposed to take seriously web infrastructure software that
exhibits such a basic failure of understanding core web standards? From even a
cursory reading of the HTTP RFCs one will understand that "POST = unsafe =
don't retry after request sent = return 504 on reply timeout".

I mean, a bug's a bug; but this was known for two years!

~~~
coldtea
> _How is one supposed to take seriously web infrastructure software that
> exhibits such a basic failure of understanding core web standards?_

How? Probably based on the fact that otherwise it's a frigging great app that
powers like 15% of the web, including some of the biggest sites out there.

~~~
jimjag
Plus, it is obvious that some fanboys will promote it no matter what...

~~~
coldtea
Yes, please continue calling 40+ year old developers with ancient unix
experience "fanboys".

Because obviously we're all 20yo in HN...

------
nicolas_t
I've been bitten more than once by this.

Example situation, you have a request to process an uploaded file which is
only for admin purpose so you didn't take the time to use a queuing system to
do the heavy lifting in the background. Then, your customer uploads a file
that takes much longer than normal and the request times out, that file is
then sent multiple times to the app server and the user sees multiple
uploads..

The behavior in term of errors should definitely be different if the sending
request failed (in which case resending to the next upstream is fine) and if
receiving the response failed (in which case it's often not a good idea to
resend)

------
jamiesonbecker
This behavior _is_ non-compliant[1] with the RFC.

Although (from the standpoint of the RFC), everything on the server side
(including nginx itself) is considered the web application, nginx probably
takes the implicit position that dealing with multiple requests on non-
idempotent methods such as POST is really a problem that the proxied web app
itself should cope with.

But then nginx puts the web app in an untenable position. Consider the example
of non-idempotent POST to create a new user account. The new user account
includes a username, email address, and password. Because it's proxied, nginx
creates a duplicate request for this new user account in the circumstances
described in this bug report.

How should the web app deal with the duplicate request?

a. Accept the first request (200 OK) and decline the second request since the
account was already created (i.e., 409 Conflict), or

b. Create two duplicate user accounts (200 OK for both)

Obviously, the _ONLY_ correct response is the first one, but what happens next
is really up to nginx: will the client receive the 409 Conflict (etc) or will
it receive the 200?

Well, who knows?! It's completely indeterminate.

If the client gets the 200 OK, great. But what if it doesn't? These duplicate
requests seem like they could lead to an nginx race condition as well. And
what gets logged?

This behavior _clearly violates_ both the spirit and the letter of RFC 7231
(as well as being an obviously poor engineering decision!).

Note also the long time (years!)[2] that this has been a known, outstanding
bug without any action taken. Another commenter actually said this caused
cascading failure in their application that killed their app.

Bottom line... nginx is a great, fast static server, but definitely _not_ a
good proxy for dynamic apps. We're trying to figure out how fast we can
migrate Userify (plug: SSH key management for EC2)[3] from nginx to HA-Proxy,
since we use it to front-end our REST API.

1\.
[https://tools.ietf.org/html/rfc7231#page-23](https://tools.ietf.org/html/rfc7231#page-23)

2\.
[https://trac.nginx.org/nginx/ticket/488#comment:3](https://trac.nginx.org/nginx/ticket/488#comment:3)

3\. [https://userify.com](https://userify.com)

~~~
stormbrew
By default[1], nginx only talks to backends in http/1.0, so the operative rfc
is (sadly)
[https://tools.ietf.org/html/rfc1945](https://tools.ietf.org/html/rfc1945).
Though it did establish GET/HEAD as safe and other methods as not, the idea of
idempotence itself was not yet present and it doesn't have any language I'm
aware of to restrict client retries on non-safe methods.

That said, I don't know if nginx does any better if you set it to http/1.1
mode on this issue. I assume not, to be honest.

[1]
[http://nginx.org/en/docs/http/ngx_http_proxy_module.html#pro...](http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_http_version)

~~~
dragonwriter
> By default[1], nginx only talks to backends in http/1.0, so the operative
> rfc is (sadly)
> [https://tools.ietf.org/html/rfc1945](https://tools.ietf.org/html/rfc1945).
> Though it did establish POST/PUT/etc. as 'safe'

No, only GET and HEAD are safe in RFC 1945.

> the idea of idempotence itself was not yet present and it doesn't have any
> language I'm aware of to restrict client retries on non-safe methods.

That actually doesn't really change the situation that much: without an
idempotence guarantee, there is no protocol-level basis for a proxy (reverse
or otherwise) to assume that a non-safe method is repeatable. Under HTTP 1.0,
by the RFC alone, there's no justification for treating anything other than
GET or HEAD as reliably repeatable. (Except perhaps that the operations
described by PUT and DELETE are at least arguably, as specified, idempotent,
even though the term is not invoked and the guarantee is not made express.)

~~~
stormbrew
> No, only GET and HEAD are safe in RFC 1945.

Brainfart typo, corrected.

------
marcosnils
We had this same problem some time ago in our company. That's why we came up
with this.

[https://github.com/xetorthio/nginx-upstream-
idempotent](https://github.com/xetorthio/nginx-upstream-idempotent)

~~~
JensRantil
Cool. Did you ever consider patching nginx upstream instead?

~~~
xetorthio
Yes. Before implementing this module we contacted nginx developers and they
didn't think it is a problem. This is why we had to create our own module.

~~~
laurent123456
Did they explain why they don't think it's a problem?

------
jimjag
This almost seems non-compliant w/ the action HTTP spec.

~~~
SilasX
Per my other comment [1], it's only noncompliant for POST; for PUT (and
DELETE) it's _more_ compliant than people want or expect!

[1]
[https://news.ycombinator.com/item?id=11217686](https://news.ycombinator.com/item?id=11217686)

------
Rabidgremlin
Yeah, tripped over this beauty a few years ago... Took days to figure out what
was going on :( Was using nginx as internal load balancer across a RESTful
services layer.

------
muraiki
Forgive me for not fully understanding this. If I'm just using proxy_pass to a
single server (vs using proxy_pass with round robin or using
proxy_next_upstream for failover) would this still affect me? In my experience
with proxy_pass, a timeout on upstream was reported to the client and the POST
was not retried.

~~~
JensRantil
No, then you are safe.

------
billpg
Some APIs are not designed very well in the face of the possibility of a time-
out on a POST, because the client can't be sure if the request was successful
or not.

[http://blog.hackensplat.com/2014/07/is-your-api-
broken.html](http://blog.hackensplat.com/2014/07/is-your-api-broken.html)

------
verytrivial
Am I the only Chrome on Android user whom received a PKCS#12 access request
for trac.nginx.org? I have honestly never seen that before.

~~~
JensRantil
Yeah, I got it, too. That was a first-timer...

------
typewriter_t
I can't reproduce it. I have nginx proxy_pass'ing to two upstreams and
configured with proxy_next_upstream timeout;

One of the upstreams is running iptables -A OUTPUT -p tcp --sport 8080 --tcp-
flags PSH PSH -j DROP

CURLing the nginx location configured for proxy_pass'ing returns 504
GATEWAY_TIMEOUT on half of the requests, as expected.

------
Zikes
This is why idempotence is so important.

~~~
SilasX
But it's only the POST that this is a problem for, right? PUT and DELETE are
supposed to be idempotent so retries are okay, yes?

~~~
lukeschlather
Sure, but the point is you should design your POSTs to be idempotent as well.

~~~
joosters
Well, they are defined as non-idempotent, so there's no reason why you
'should' design this way. You can't make every request idempotent.

It's best to design so that duplicate POSTs are handled sensibly (e.g. you
don't make a user pay for the same product twice), but the response to the
second POST is unlikely to be the same as the first one, so they aren't
idempotent.

More difficult cases are where an action could legitimately be performed
twice, e.g. adding an item to a shopping cart. You must differentiate between
a wrongly-duplicated request, and a real request to add the second item. One
way to do this is to add parameters to the POST so that it can be identified
as a duplicate. But it can be tricky to do this without holding a lot of extra
state in the server application, and there are all sorts of concurrency
problems when you have a cluster of servers.

~~~
33degrees
I think the most of these cases can be handled via PUT i.e. update a cart so
it contains these items. That way you keep all the state on the client.

~~~
thedufer
That's great unless people want to shop in multiple browser tabs or anything
like that. The real solution here is to use the fact that we have POST which
is specced as non-idempotent and not depend on the very small set of
technologies that purposefully disobeys the spec.

------
drdaeman
This applies to uWSGI (uwsgi_pass) as well, right?

~~~
Ralfp
I'm wondering about this myself. I'm looking into alternatives to Apache with
mod_wsgi.

~~~
drdaeman
Seems that it does. And I would have a solution if only nginx had
$request_uri_without_args...

The trick for uWSGI is to have `uwsgi_param PATH_INFO` not $document_uri (it
won't work due to `rewrite ^ @nonidem last;`) but an originally requested URI.
$request_uri _almost_ does it, but fails when URI has query arguments.

On nginx mailing lists there is a suggestion to strip it myself, with Lua[1],
but I'm surely not going to throw in Lua just for this.

[1]
[https://forum.nginx.org/read.php?2,215192,215195#msg-215195](https://forum.nginx.org/read.php?2,215192,215195#msg-215195)

------
lziest
That's why I use openresty. I can use balancer_by_lua to customize my upstream
selection/retry strategy.

