

In search of performance – how we shaved 200ms off every POST request - Sinjo
https://gocardless.com/blog/in-search-of-performance-how-we-shaved-200ms-off-every-post-request/

======
dap
This is a great write-up of a tough technical problem, and I definitely agree
with their conclusion that "it's worth taking the time to understand your
stack." I especially liked that it wasn't preachy. It just explained their
problem and how they solved it. (It seems a lot of write-ups end up offering
very specific technical advice that grossly overgeneralizes from their
experience.)

My only question is how they got to haproxy as the root cause so quickly. In
my experience, comparing what's different between production and staging is a
long shot because there's so much that's different. Obviously workload can
matter a lot, and so can uptime and time since upgrade. So I'm curious if
haproxy was the first thing they saw that was different or if they just didn't
write about the dead ends.

~~~
edwhitesell
With proper tracking it is not difficult to see different versions of packages
installed. Checking run books/OPs logs would also lead to a quick
understanding that Staging was updated and Production wasn't. Either of those
two things would make it relatively easy to find the discrepancy in versions.
Then, as they noted, a review of changelogs would have led to the patch they
found.

If you don't have processes and documentation to quickly point out those kinds
of differences between environments, you're doing something wrong.

~~~
dap
It's not that you can't find _some_ differences that way. As others have you
pointed out, there should be little difference between the software deployed
in staging and production. But there are can be differences in performance and
hardware configuration (unless you can afford to mirror your production
deployment exactly). Most unavoidably, there are problems you only hit after
cumulative amounts of uptime or load[1], which is nearly impossible to
simulate in a preproduction environment.

[1] For an example, see our recent outage related having seen 200M PostgreSQL
transaction: [https://www.joyent.com/blog/manta-
postmortem-7-27-2015](https://www.joyent.com/blog/manta-postmortem-7-27-2015)

~~~
edwhitesell
Fair enough. I guess my point should have been: With proper tracking of
changes and work done between the environments, finding differing versions
could be the lowest hanging fruit.

That was my response to your original question of how they found got to
haproxy as the root cause so quickly.

Sure, mirroring hardware and performance can help, if you can afford it (which
is rare). However, you'd also probably agree it's pretty rare to find an issue
caused cumulative amounts of uptime or load. Mis-matched software versions or
configurations are a far more likely culprit in my experience.

Horses not zebras...

------
markbnj
>> To make things worse, Net::HTTP doesn't set TCP_NODELAY on the TCP socket
it opens, so it waits for acknowledgement of the first packet before sending
the second. This behaviour is a consequence of Nagle's algorithm.

I think rather than a consequence of Nagle's algorithm it is the situation
that the algorithm is intended to optimize when an app generates many small
packets.

------
geofft
What are the cases where you want Nagle's algorithm enabled? It seems like the
primary use case is telnet and telnet-like services (like SSH), but even
there, Mosh does much better at balancing performance against congestion. For
non-interactive protocols, do you ever want Nagle's algorithm?

~~~
toast0
Nagle's algorithm would actually help in this case, if there were a bit more
data to send, since ruby is passing data bits at a time, buffering it until
you get a full packet would be nice. It's just the first bit is sent right
away, and the second bit doesn't fill the packet, and there is no third bit.
If you were pipelining reuests, the algorithm would be helpful.

~~~
geofft
But the client code already has information about whether it's planning to
write some more or not, so you could just depend on that, instead of using
heuristics about the peer's behavior. Even regular stdio-style buffering plus
an explicit flush would help more than Nagle's algorithm.

~~~
toast0
The client doesn't always have that information (especially if you're doing
Unixy things and piping through netcat), but when it knows, it should
certainly indicate that, which is why there's TCP_NODELAY and friends; I
couldn't find out exactly when TCP_NODELAY showed up, but old BSD man pages
[1] have the option and a date in 1986. In the absence of explicit information
from the client, I think Nagle's algorithm is a decent heuristic.

[1]
[https://www.freebsd.org/cgi/man.cgi?query=tcp&apropos=0&sekt...](https://www.freebsd.org/cgi/man.cgi?query=tcp&apropos=0&sektion=4&manpath=2.10+BSD&arch=default&format=html)

------
throwaway64908
"We use Ruby."

~~~
Ono-Sendai
That was their first problem.

~~~
aikah
> That was their first problem.

Where do you think you are, on reddit?

