
How and why to make your software faster - jstanley
http://incoherency.co.uk/blog/stories/how-to-make-software-faster.html
======
neverminder
> A process at my day job handles millions of URLs per day. As part of
> handling each URL, it checked whether the URL was present in the database,
> and then logged something if it was. This was implemented so that we could
> get interesting stats on how many of the URLs were already present at report
> time. It turned out that it was spending 30% of its time doing these
> database queries, there was still no automated process to generate the
> stats, and humans had long since stopped checking. Deleting that 1 line of
> code knocked 30% off the execution time of every URL.

Or you can make the logging part asynchronous (provided the server itself is
not maxed out) and cut that 30% of request time.

~~~
sethammons
And toss in a bloom filter to test for membership and reduce the likelihood of
even needing to call the db.

------
majke
> If I can't make your worst-performing page load 50% faster, you don't have
> to pay the bill.

Interesting approach. This is also nicely recursive, with each run being more
expensive. I would prefer that statement to have a cost cap. Say "... in three
days of work".

~~~
collyw
Just buy a bigger box with double the memory?

~~~
mootothemax
>Just buy a bigger box with double the memory?

For what it's worth, a quite staggering number of sites I've fixed up have had
databases with varchar ID columns, and no indexes _anywhere_.

Thankfully the DB sizes have been small enough that fixing this has taken a
few seconds to a few minutes max of DB downtime, with instant, seriously-
impressive-looking results afterwards.

~~~
jimcsharp
I wonder when this type of work will go away. I see it _everywhere_.

~~~
beachstartup
it will never. databases are hard and people are idiots. nevermind something
as hard as SQL optimization and index design -- there are LOTS of people out
there who think creating a new logical database for every new customer is a
good idea, resulting in 2000+ databases on a system when there should be 4.
people who think HA is a waste of money because nothing has ever failed on
them. cheapskates who will under-spec deployments and then yell at you when it
doesn't fit their data. people are _stupid_. do not underestimate the
stupidity of your average tech industry "professional". this is why i think
there should be a traditional engineering certification process for this
stuff.

also, i simply don't buy the assertion that any ops or dev can be automated in
any meaningful way. the only thing it's doing is making it easier for morons
to shoot themselves in the foot by ignoring the experts and then run to the
same experts like a little child when everything catches on fire. see: recent
story of a guy who deleted 1500 systems at once. oops.

are we all doing more, or less work than 5 years ago? has all this cloud and
deployment tech made any less work for anyone, or destroyed any jobs? lol no.
i repeat emphatically LOL NO. such a claim is just absurd in my view. anyone
who claims such a nonsensical statement has never done actual ops work where
you get called if shit goes south.

the smartest thing amazon ever did was simply give people a button to push to
give them more money when they're faced with their own overwhelming stupidity.
ain't nobody to blame but yo'self then.

~~~
leesalminen
> there are LOTS of people out there who think creating a new logical database
> for every new customer is a good idea, resulting in 2000+ databases on a
> system when there should be 4.

I'm genuinely curious as to why you feel this way?

~~~
bpchaps
An ungodly amount of reasons. The ability to manage that many, for one. Those
poor devops folk who have to deal with that.

On a more technical side as an example, the number of open files can only be
so high. You really start to hit the limit when you do things like that, which
is when the most interesting kinds application crashes happen. "Just increase
the file count." results in "Our report shows that an OldServer did not
include a necessary open file limit config change. Engineer #582 has been
fired."

~~~
leesalminen
CLI tooling has solved a lot of the problems that arise from many. Scripts for
migration, host transfer, creation and deletion took some time to build out,
but do work fine.

I suppose we are lucky to have a very consistent workload per logical
database.

This allowed us to calculate our costs ahead of time. We found that we could
keep DB costs at ~$1/customer/month. For a B2B SaaS product we were satisfied
with that cost.

------
basemi
"The key to making programs fast is to make them do practically nothing." I
believe the quote original is here[0]

Maybe: s/practically nothing/less

[0][https://lists.freebsd.org/pipermail/freebsd-
current/2010-Aug...](https://lists.freebsd.org/pipermail/freebsd-
current/2010-August/019310.html?utm_source=dlvr.it&utm_medium=tumblr)

~~~
majewsky
Exactly. I've heard a similar quote: "The only way to make a program run
faster is to make it do less."

~~~
kps
Do less _unnecessary_ work. Unfortunately that needs to be stated explicitly
in the age of function-follows-form pseudo-minimalism.

------
nxzero
>> "If I can't make your worst-performing page load 50% faster, you don't have
to pay the bill."

Unlikely that the worst performing page is the most valuable page; which is to
say that offers should focus on providing value.

~~~
jstanley
Fair point.

A page that takes 10s to load that you never look at is a much worse candidate
than one that takes 8s to load and you look at 20 times a day.

~~~
DiabloD3
I'm sorry, but if I had to look at a page 20 times a day that took 8 seconds
to load, I would help that web app commit suicide.

There is absolutely no call for a web app that badly written to exist in 2016.
Period.

~~~
mseebach
OK, that's fair.

So how about this: You go over there (points to corner) and make angry and
dogmatic proclamations of what should and shouldn't be, while the rest of us
go out into the world, where these apps do exist in large numbers and see
heavy usage supporting very valuable workflows, and collect obscene salaries
making them better.

------
jsingleton
Nice post, but why call out C and Perl when talking about web apps? Maybe
highlight some profilers for some more common web stacks.

For example, I'm currently writing a book on web app performance (focusing on
ASP.NET Core) and apart from the stuff built into Visual Studio there are also
tools such as Glimpse[0] and Prefix[1].

[0] [http://getglimpse.com](http://getglimpse.com)

[1] [http://www.prefix.io](http://www.prefix.io)

~~~
geerlingguy
And for PHP, XHProf and Blackfire (among others).

------
sph130
What's your thought on mongodb - i've recently started a project that is
running in prod now for a customer and I spent almost two weeks deciding
between mysql and mongodb. Ultimately went with mongodb but it's not my forte.
And I've been thinking that at some point the calls to the database are going
to slow down. And i have no idea where to start looking to improve mongodb
performance. (MEAN stack application) I'm sure I'll be doing a bit of research
soon on that once i get the base feature set working.

~~~
jstanley
I've never used mongo in anger, but that doesn't matter, my answer is:

Profile the application. Find out what it's spending too much time on, and
figure out how to make it do less of that.

~~~
pjc50
Can't upvote this enough. You cannot reliably improve what you cannot reliably
measure, and Amdahl's Law will tell you where to work.

I wish you every success in this business - if it takes off, I might try it
myself. I found `oprofile` to be my preferred Linux profiler, and haven't
worked out what the corresponding Windows solution is yet.

------
known
Do you offer
[https://en.wikipedia.org/wiki/Code_refactoring](https://en.wikipedia.org/wiki/Code_refactoring)

~~~
jstanley
Sure, email address in my profile.

------
Undertow_
Can someone ELI5 what profiling data is?

~~~
Roboprog
A common technique for looking at a program (not a database query) is "stack
sampling". A timer goes off N times a second, and records the call stack of
the process/threads. Then, statistics are gathered from the set of call
stacks.

E.g. - "60% of the time is in the sequence main() -> process_unit() ->
validate_input() and the things called from there" or "45% of the time is in
all of the call sequences which then lead back into write_line()" or things
like that. Usually, you see patterns of a small number of slow functions that
everything depends upon and/or, arranging the call stacks into a sort of
"tree" of calls, a branch of the tree that you spend an inordinate amount of
time in.

There are newer tools that provide graphical representation of this data as
well (e.g. - show the tree as sort of a topographical map, with hotspots as
peaks, and the call tree as geological strata)

Of course, if your program has a giganto routine that it never leaves, you
might not learn much -- "do_all_the_work_inline() is executing 95% of the
time!", unless you start looking at things at the line number level. Blech.

~~~
Roboprog
For an (SQL) database, you would want to look at something called a query
planner (e.g. - "explain" command).

It describes the order that tables are accessed, which indices (if any) are
used for each access, "and so on".

------
tombert
I don't agree that abstraction layers inherently hide inefficiencies.

In C++, `unique_ptr` is an "abstraction layer" with no runtime overhead, that
will probably make your code faster due to the better safety.

~~~
Piskvorrr
I don't see such claim anywhere in the article (your "inherently" vs. author's
"often").

~~~
tombert
That's true, but he makes the point to remove the abstractions, and I'd argue
that no, _fix_ the abstractions.

~~~
jstanley
Actually, that's not the point I was trying to make!

I was trying to say: "consider the abstractions, and make them more leaky when
necessary" (e.g. allow a flag to say "don't do the expensive calculation"),
not "remove the abstractions".

