Very few people know about this and it's really scary. I'm happy people are voting this up to increase some awareness of this.
Potential workarounds: You might think disabling `proxy_next_upstream timeout` will do, but that will also disable connection timeout retry which is not what you want!
Increasing `proxy_connect_timeout` is not an option, because then you risk filling up too many connections in the nginx instance if the upstream server swallows SYN packets or whatnot.
The real workaround: Use haproxy. Serioously.
1. Exactly this, we had mystery double trades from our clients and it took us a long time to realise it was nginx assuming we timed out and routing traffic to the next server
2. It doesn't do health checks. When a server goes down it will send 1 out of every 8 real requests to the down server to see if it responds. Having disabled resubmitting of requests to avoid the double trade issue above this means when one of our servers is down, 1 out of every 8 requests will have an nginx proxy error which is significant when you have multiple API calls on a single page
3. This isn't something I've personally hit so can't explain the nitty gritty but it's something one of my coworkers dealt with: outlook webmail does something weird where it opens a connection with a 1GB content size, then sends data continually through that connection, sort of like a push notification hack. Nginx, instead of passing traffic straight through, will collect all data in the response until the response reaches the content size provided in the header (or until the connection is closed). I don't know if nginx is to blame for this one or not, but I do feel that when I send data through the proxy, it should go right through to the client, not be held at the proxy until more data is sent.
HAProxy also solved our issues and is now my go-to proxy. Data goes straight through, it has separate health checks, and it better adheres to HTTP standards. It can also be used for other network protocols which is a bonus.
Why is this not what you want? Are you using the reverse proxy as a load balancer to multiple servers? Otherwise, if it's 1:1 proxy (for something like SSL termination) wouldn't having nginx fail/timeout when the server does be acceptable?
That's extremely likely.
Somebody with an Nginx reverse proxy is probably using it for high availability, load balancing and static files cache, probably at the same time. This is what it is good for.
However what would really need to happen is to only disable proxy_next_upstream if data has been written or read from the backend(preferably configurable by backend or location for either of those two options). Right now you basically lose the redundancy in non-idempotent requests, and immediately return the error. Or maybe I read the configuration incorrectly.
We're also going to prioritize a complete fix in the product, and encourage your comments and input on this ticket: https://trac.nginx.org/nginx/ticket/488
Disclaimer: I work @ NGINX. Thanks, Owen
I mean, a bug's a bug; but this was known for two years!
How? Probably based on the fact that otherwise it's a frigging great app that powers like 15% of the web, including some of the biggest sites out there.
Because obviously we're all 20yo in HN...
I can imagine a future world based on pure functional programming where this is no longer the case. You'd need to rewrite the operating system too, which is the explicit goal of the Urbit project.
Example situation, you have a request to process an uploaded file which is only for admin purpose so you didn't take the time to use a queuing system to do the heavy lifting in the background. Then, your customer uploads a file that takes much longer than normal and the request times out, that file is then sent multiple times to the app server and the user sees multiple uploads..
The behavior in term of errors should definitely be different if the sending request failed (in which case resending to the next upstream is fine) and if receiving the response failed (in which case it's often not a good idea to resend)
Although (from the standpoint of the RFC), everything on the server side (including nginx itself) is considered the web application, nginx probably takes the implicit position that dealing with multiple requests on non-idempotent methods such as POST is really a problem that the proxied web app itself should cope with.
But then nginx puts the web app in an untenable position. Consider the example of non-idempotent POST to create a new user account. The new user account includes a username, email address, and password. Because it's proxied, nginx creates a duplicate request for this new user account in the circumstances described in this bug report.
How should the web app deal with the duplicate request?
a. Accept the first request (200 OK) and decline the second request since the account was already created (i.e., 409 Conflict), or
b. Create two duplicate user accounts (200 OK for both)
Obviously, the ONLY correct response is the first one, but what happens next is really up to nginx: will the client receive the 409 Conflict (etc) or will it receive the 200?
Well, who knows?! It's completely indeterminate.
If the client gets the 200 OK, great. But what if it doesn't? These duplicate requests seem like they could lead to an nginx race condition as well. And what gets logged?
This behavior clearly violates both the spirit and the letter of RFC 7231 (as well as being an obviously poor engineering decision!).
Note also the long time (years!) that this has been a known, outstanding bug without any action taken. Another commenter actually said this caused cascading failure in their application that killed their app.
Bottom line... nginx is a great, fast static server, but definitely not a good proxy for dynamic apps. We're trying to figure out how fast we can migrate Userify (plug: SSH key management for EC2) from nginx to HA-Proxy, since we use it to front-end our REST API.
That said, I don't know if nginx does any better if you set it to http/1.1 mode on this issue. I assume not, to be honest.
No, only GET and HEAD are safe in RFC 1945.
> the idea of idempotence itself was not yet present and it doesn't have any language I'm aware of to restrict client retries on non-safe methods.
That actually doesn't really change the situation that much: without an idempotence guarantee, there is no protocol-level basis for a proxy (reverse or otherwise) to assume that a non-safe method is repeatable. Under HTTP 1.0, by the RFC alone, there's no justification for treating anything other than GET or HEAD as reliably repeatable. (Except perhaps that the operations described by PUT and DELETE are at least arguably, as specified, idempotent, even though the term is not invoked and the guarantee is not made express.)
Brainfart typo, corrected.
1. nginx times out while server processes request
2. nginx makes request to second server and second server returns "account already exists"
1. nginx times out while server processes request and returns error
2. user attempts to create account again and server returns "account already exists"
There is no unique identifier in the line item count, and thus no way to determine "account already exists".
Worst case scenario, user gets extra item(s) in their cart, doesn't really look at the totals, and orders, pays for, and receives them.
One of the upstreams is running iptables -A OUTPUT -p tcp --sport 8080 --tcp-flags PSH PSH -j DROP
CURLing the nginx location configured for proxy_pass'ing returns 504 GATEWAY_TIMEOUT on half of the requests, as expected.
But there's a layer beneath HTTP as well. If all you get back is a TCP RST, did the request succeed or fail? How about if you get an ICMP unreachable or just a timeout ... should you retry?
So, the Internet being what it is, it is probably not a bad idea to aim for idempotence for the critical bits.
Edit: turns out I was wrong and assumed PUT should fail if the resource doesn't exist, which isn't how it works. (Probably because of writing apps that deprecate it in favor of PATCH.)
Idempotence only means that the same single method repeated additional times on its own will not produce different end states. It doesn't necessarily guarantee this for combinations of methods in different orderings.
It doesn't stop PUT/DELETE/PUT/DELETE from having different results than PUT/DELETE/DELETE/PUT to the same resource. (You can do assure that these are equivalent in a particular HTTP-compliant application, but it goes beyond the base semantics of HTTP to do so.)
I think you're equating my claims about what methods are idempotent with my claims about what reorderings matter.
I'd say that anybody that sends a pair of PUT/DELETE requests in fast succession over the web and expects a stable result is a fool. This should have no effect on practice, because nobody should be relying on the ordering anyway.
>>The PUT method requests that the enclosed entity be stored under the supplied Request-URI. If the Request-URI refers to an already existing resource, the enclosed entity SHOULD be considered as a modified version of the one residing on the origin server. If the Request-URI does not point to an existing resource, and that URI is capable of being defined as a new resource by the requesting user agent, the origin server can create the resource with that URI
 https://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html 9.6
* Literally the first thing RFC 7231 says about PUT is "The PUT method requests that the state of the target resource be created or replaced […]". RFC 7231 takes into account many changes in HTTP practice over the past decade (even bizarre ones like POST-to-GET on a 301 redirect); if create-on-PUT were frowned upon, it would be called out.
* PUT as described in RFC 7231 is the same thing as UPSERT in an RDBMS, or a write operation in a key-value store. These are certainly not uncommon DB operations; their REST analogue is similarly useful.
Here's some examples:
* PUT is how documents are created in WebDAV. WebDAV is multi-user, so two users may decide to create a document with the same name, just like on any file system. If-None-Match: * is the only way to support the O_EXCL flag on POSIX open(2).
* A resource which represents attributes of arbitrary external resources will have a URI named after the external resource (e.g. UPC or SHA-1, etc.), and therefore must be created with PUT. If-None-Match: * is the only way to prevent lost updates when the external resource is first made known to the system.
PUT-as-create is sound design supported by precedent for any system where the keys have a priori meaning.
People are writing custom HTTP application servers and making PUT do anything and everything. Idempotence doesn't usually enter the picture.
Huh??? First of all, that's a pretty serious statement to be making with no evidence to support it. Plus, that is no excuse for any server that is NOT compliant.
The very fact that nginx decides to create HTTP status codes, willy-nilly, for its own use shows at least the suspicion that strict compliance is not a priority for them. Thankfully, it is for other web servers.
It's best to design so that duplicate POSTs are handled sensibly (e.g. you don't make a user pay for the same product twice), but the response to the second POST is unlikely to be the same as the first one, so they aren't idempotent.
More difficult cases are where an action could legitimately be performed twice, e.g. adding an item to a shopping cart. You must differentiate between a wrongly-duplicated request, and a real request to add the second item. One way to do this is to add parameters to the POST so that it can be identified as a duplicate. But it can be tricky to do this without holding a lot of extra state in the server application, and there are all sorts of concurrency problems when you have a cluster of servers.
But then again, some things must not be idempotent: eg "shuffle this deck of cards in an order that is random to me".
Edit: On second thought, you could make that idempotent too, albeit at the cost of increasing server load and your app's architecture's complexity -- you would just have to verify that the deck has had some reordering since that client's request, and not make any further reorderings in response to that client, since from their perspective it's still randomized.
The trick for uWSGI is to have `uwsgi_param PATH_INFO` not $document_uri (it won't work due to `rewrite ^ @nonidem last;`) but an originally requested URI. $request_uri almost does it, but fails when URI has query arguments.
On nginx mailing lists there is a suggestion to strip it myself, with Lua, but I'm surely not going to throw in Lua just for this.