
Ask HN: What are the 20% tips that will get my code to performance 80% faster? - echo272
I&#x27;m looking for resources that can improve the execution of my code dramatically. The 20% tricks that will get my code to perform faster 80% faster.
======
sokoloff
I wish I knew whose quote this was: "Use your intuition to ASK questions, not
to answer them."

Specific to this topic:

Use a profiler to determine where you're spending your time. Don't guess;
measure.

When you find a candidate area to work on, determine whether caching or a
fundamental algorithm change could work. Those will usually get you far
greater gains than optimizing code as-is.

~~~
api
I generally agree but there are a few caveats:

(1) A profiler won't tell you that there is an algorithm that is O(2n) while
the one you're using is O(n^2). Only reasoning about your problem will do
that.

(2) Most profilers won't tell you that much about cache effects, and those can
be _huge_ in some cases. This is more of an issue with tight-loop math-heavy
code like codecs and crypto and primitive data structures and the like.

(3) Some code can be "uniformly slow" or "structurally slow everywhere." An
example would be C++ code that over-uses inheritance patterns and defines
every single method as virtual. In that case you are losing massive cycles to
indirect fetch/call instructions, and a profiler will not tell you this. It
will only show you the relative hotspots, not the absolute overall slowness of
this type of code. Another example would be over-engineered code. A profiler
will help you optimize down the problem areas in a pile of over-engineered
spaghetti, but it will not help you make the overall code base more elegant to
realize large overall gains. This is often a special case of point #1.

~~~
sokoloff
1\. Yes, but if you use a representative size dataset and the profiler doesn't
show that as a hotspot, there's no point in changing algorithms.

2\. Agreed, but remember the value of the profiler is telling you, "You're
spending 97% of your time in this small subset of your code" and your job is
to conclude, "Hmm, I should look there for improvements and not at the whole
rest of the program." It can't tell you what changes to make, but it keeps you
from spending time optimizing what's in the 3% of runtime.

3\. Agreed, with the caveat that most profilers that I've used did tell me the
absolute overall slowness in addition to the relative slowness, but that
wasn't the major thrust of your comment so I'll just observe that small
difference and otherwise agree.

------
aaronchall
Code that you don't write consumes zero clock cycles.

Learn your language well, so that you can use data structures and builtin
functions as they are intended, and you can expect them to be pre-optimized
for you. Learn in principle what is optimal, and write in that fashion.

Know the documentation, and follow best practices that the docs recommend -
when the compiler/language gets updated, you'll get free optimizations.

Use semantic names, none of this i, j, x, y stuff (unless you're writing
totally abstract math) and don't use nebulous names like "helper," when you
can be more precise. Better names help you think more precisely about your
code and what you want it to do, so you can avoid unnecessary transformations,
materializations, and calculations.

When you've finished the program (all the unittests and acceptance tests
pass), _then_ profile. Focus on the bottlenecks. In this way, you'll get your
optimal Pareto performing code.

~~~
tbirdz
Code that you didn't write but that is being executed still consumes clock
cycles. Standard library functions and data structures still have code that
executes on the machine, you can't just handwave it away as consuming zero
cycles just because you didn't write it.

~~~
saalweachter
I believe the intention was more Zen -- don't write code that isn't really
necessary -- and the "use standard libraries" code was separate. (If you are
using a language where the standard libraries are so poorly implemented that
you can easily write better versions, you might want to consider using a
different language.)

I would also add the cargo cult suggestion of "precompute and cache everything
you can". Whatever can be done in an indexing process instead of a search
process should be; whatever doesn't need to happen at serving time shouldn't.

~~~
esailija
> If you are using a language where the standard libraries are so poorly
> implemented that you can easily write better versions, you might want to
> consider using a different language.

Standard libraries are typically generalized for a wide array of use cases,
you can easily write much faster implementation specialized to your needs if
you have any idea what you are doing.

Sometimes standard libraries handle such ridiculous cases (e.g. many ES5 array
methods and Function.prototype.bind) that your needs don't even need to be
that special.

------
falcolas
This is fairly specific optimization advice which applies to only a subset of
people's code, but: Write your own SQL queries.

ORMs are notorious for writing inefficient queries once you get more
complicated than selecting a single row from a simple table, and trying to
optimize them without writing a SQL query will make your code all but
unreadable. This isn't to say don't use an ORM, but if you need to optimize,
your code speed and code clarity will be better off with raw SQL.

More generalized performance advice: Keep heap memory allocations to an
absolute minimum. Heap access is more likely to result in cache misses, and
the bookkeeping related to allocating and freeing memory can quickly add up
(even if you're not doing it explicitly).

Globals and stack allocations are almost universally faster.

A resource which popped up on HN yesterday might offer a bit more insight on
how to design for optimal performance:
[https://news.ycombinator.com/item?id=8978081](https://news.ycombinator.com/item?id=8978081)

~~~
mgkimsal
Log and time your SQL queries, don't assume they'll always be worse. Sometimes
you get bad SQL generated because ... the tables/objects were not set up
properly in the first place.

I'm an ORM-first person, and look for slow queries at certain points in
development. Usually about 90+% of the queries from the ORM layer are fine and
10% need to be rerolled by hand for better performance.

The other thing about ORMs I've learned in the last few years is that often
you're not really needing an object back in the first place, if you're just
reading the results vs modifying/saving. Hibernate projections (IIRC) just
give you back a map, and save the overhead of trying to create an object (or
object graph) to hand back.

------
mmccaff
It's a bit unique to each situation, but a few topics that come to mind
immediately:

* Profile & measure to narrow down where time is being spent

* Memoize repeated function calls

* Cache data that doesn't frequently change

* Minify web assets

* Gzip compress web output (a la mod_deflate)

* Load balance

* Shard

* Strategic database indexing

* Asynchronous workers when beneficial

* Use the correct data structures

* Minimize Disk I/O

^ I suppose the last few are architecture-related more so than code.

These topics are mostly specific to web, but, here is a "curated list of Web
Performance Optimization"

[https://github.com/davidsonfellipe/awesome-
wpo](https://github.com/davidsonfellipe/awesome-wpo)

Since nobody has quoted Donald Knuth yet (and this quote sometimes gets bent
to fit situations that it shouldn't):

"We should forget about small efficiencies, say about 97% of the time:
premature optimization is the root of all evil"

------
jacquesm
Use a profiler.

Make your code do less work.

Use less memory so more of your code/data fits in the CPU cache.

Understand the algorithms that you use, their strengths and weaknesses.

No magic tricks, just hard work.

------
chrisbennet
Trust no one, not even yourself: Profile your code.

That said, if you aren't profiling already you are probably trolling us to get
suggestion for your next blog article. ;-)

As others have mentioned, use better algorithms i.e. it makes no sense to
speed up a bubble sort routine instead of using a quick sort approach.

A lot of speed improvements come from just _not_ doing stuff in the first
place vs. speeding up a naive or brute force approach.

For example, suppose you want to determine when something enters the frame of
a camera image that is 640x480 pixels big.

The naive approach would be to compare every pixel of the frame to every pixel
of the previous frame. The area of the frame is Width x Height: 307,200
pixels.

A more efficient approach is to process just the edges (perimeter) of the
image/frame since nothing can enter the frame without crossing the perimeter
first. The perimeter is 2 x Width + 2 x Height: 1080 pixels.

The 1st approach is 284 times slower (307,200/1080) and no amount of
optimizing it is going to make it as fast as the 2nd approach.

~~~
vardump
If your pixels are a byte each and the data is 64-byte aligned in memory,
you're still reading 640 * 2 + 128 * (480 - 2) = 62464 bytes from memory, not
1080. That's because you can't read less than a cache line, which happens to
be almost always 64 bytes. Performance win could be a lot less than what you
expected.

------
14e3d901150a21d
Respect CPU caches and locality. Keeping your data structures in cache or even
packed together can result in a nice performance gain.

I've seen benchmarks where inserting into an array was significantly faster
than inserting into a linked list. This can be surprising considering most
algorithm texts I've seen say the complexity of the latter is better. From
what I understand locality is part of the reason inserting into the array is
faster.

------
bane
\- Learn about big-O complexity.

\- Learn the big-O complexity of your language's data structures.

\- Benchmark places in your code that do lots of things with those data
structures, especially loops. Write a couple variations with the structures
with the best big-O complexity to find the fastest one for your use-case (it
can sometimes be surprising).

\- Conditionals in loops can hide all kinds of complexity. If you are
computing the same thing each evaluation in a loop, compute it once before
entering the loop instead and replace it.

\- Try to do as much with the data that's in memory as possible, before
replacing it with stuff further down the memory hierarchy. If your language
let's you care about L1, L2 cache optimization, learn about that. Nothing will
slow code down like having to move data up and down the memory hierarchy.

\- If you have to operate on lots of data, look for ways to parallelize work
on that data. Modern CPUs have lots of cores, single-threading only uses a
fraction of your computational power.

\- If you have complex conditionals, try to find ways to optimize and limit
the branching: short circuit booleans, eliminate expensive to compute
conditionals etc. Sometimes complex business rules can be rewritten in simpler
ways that greatly reduce branch prediction hits.

\- Try to squeeze as much performance out of a single machine as possible. As
soon as you start sharding your code across multiple systems, your memory
hierarchy problem becomes a network bandwidth problem, which is about as bad
as you can get. If your problem fits on one machine, it can almost always be
made faster than splitting it across n-machines.

\- Spend time really thinking about your algorithms. I've seen all kinds of
cases where a slow algorithm could be trivially sped up by taking a different
approach.

------
JoulinRouge
I suggest you to read a book about algorithms. This one is pretty good:
"Introduction to Algorithms" by Cormen and Rivest.

~~~
andrewstuart2
There is also a lot of great material on Coursera, MIT OpenCourseWare, etc. on
algorithms.

------
edw519
Stop worrying about how well your code performs and focus on more important
things.

Seriously.

I'm not saying that code performance isn't important; of course it is. It's
just that if you're already fairly competent, then code performance is
probably not your biggest problem. You're looking for a magic bullet that may
not be there.

I'd be more concerned with:

\- How easily can another programmer maintain your code? We all tend to
believe that our programs will be just right when we finish, but the sad truth
is that others will probably change them significantly over the years. Your
wish to "squeeze a little more performance" out of your code may actually be
counterproductive; no one will understand your clever optimizations and your
code will become maintainable. Remember, you're writing for the next
programmer as much as for the machine.

\- How well are your databases designed? Do this well and you won't have to
worry about optimizing your code as much. Take care of the microseconds so
that you won't have to worry about the nanoseconds.

\- How well are your variables named? Again, this doesn't affect performance,
but boy does it ever help maintenance.

\- DRY (Don't repeat yourself.) For many reasons, including performance.

\- How precise is your definition of "what" to build? Building the wrong thing
to run very fast is much less important than building the right thing to run
fairly fast.

\- Are you doing the right things on the client and the right things on the
server? For example, if you're validating data on the client and building the
UI on the server (under some conditions), then it really doesn't matter how
fast it runs. You're probably doing it wrong.

\- Are you especially careful about recursions, early exits, sorting,
filtering, parsing, and other fairly standard stuff? Do these things right so
you won't have to worry about performance. Do them wrong and all the fine
tuning in the world won't help.

\- When all else fails, find a peer reviewer whose opinion your respect.
They'll probably find the stupid things your blind eye misses. You'll remember
their feedback and probably never make that mistake again.

Have fun doing the important things well and stop worrying about fine tuning:
a philosophy that has worked very well for me for years. Hope this thinking
helps you too.

~~~
maccard
> Stop worrying about how well your code performs and focus on more important
> things.

It's very naive to think that that's an acceptable tradeoff in every
situation. Would you accept it if chrome/FF/<browser of choice> suddenly
consumed an extra 30% of memory and CPU time after a single update, because
the developers decided that the HTML rendering code wasn't readable enough? Or
if you're writing code for cars that needs a response time in sub millisecond
ranges to delay for hundreds of milliseconds before engaging traction control
because the code wasn't well labelled, so it was rewritten?

There are plenty of cases where performance is absolutely critical, and it is
the most important thing.

~~~
fat0wl
yep there are also a lot of situations in enterprise where you don't exactly
know how your code will be used in near future (in <5 years it could grow to
exponentially more customers using it compared to soft releases to select test
customers).

Because of that I'd say.... it's not like I see a lot of people get raked over
the coals for it, but there is definitely some organizational disappointment
when a silly scaling bug crops up in production[1]. I think as a mature dev it
is proper to internalize as many performance-related techniques as possible so
that you can write more performant code with each subsequent project.
Sometimes it doesn't even affect the amount of initial coding hours, it is
just one of the freebie gains that comes with experience.

[1] Many more such bugs would probably occur if the company didn't
occasionally get around to load-testing, so that should be part of CI or dev
cycle if possible.

------
lectrick
Have a test suite that is 95% unit tests. And structure it in a way that tests
can run in parallel, across all cores.

FROM THE BEGINNING. Because it's nearly impossible to make this change after
the fact.

Also, use a code profiler and commit the profile data to the repo! This will
allow you to plot performance issues over time and watch for big upticks.

This alone will make your code orders of magnitude times better (and likely,
faster). You will catch concurrency issues before they become a huge problem,
your test suite runtime won't be as much as an impediment to your development
process, and you will simply write better, more modular, more maintainable
code by having to unit-test every component in isolation.

------
Swizec
When writing if-statements that have an AND, always put the quickest executing
condition first.

Don't use ORM's naively.

Be careful about what you call in a loop.

Those three alone account for most of the execution speed issues I've
encountered. And they're all low-hanging fruit.

~~~
jessermeyer
>always put the quickest executing condition first.

isn't it put the most probably true condition first?

~~~
Someone
when choosing between

    
    
      A and B
    

and

    
    
      B and A
    

You want to compare expected execution times, which are:

    
    
      E(A) + p(A) E(B)
    

(E(X) =expected execution time of X, p(X) = probability that X evaluates to
true) and

    
    
      E(B) + p(B) E(A)
    

The first is larger if

    
    
      (1 - p(B)) E(A) > (1 - p(A)) E(B)
    

End result is that it may be better to put the expensive call first, if the
probability of the cheap call being true is much higher than that of the
expensive call.

[Feel free to extend this model by adding the time needed to make or not make
the branch around the second part]

In practice, if one is a function call and the other is not, put the function
call last, then, if performance is inadequate, measure.

~~~
TheLoneWolfling
This is also one of those things that an optimizing compiler with profiler
(or, equivalently, a JITter) can do. (I don't know if any compilers currently
actually do so.)

------
eterm
Reduce round trips.

That might be round trips to a database, round trips to a caching layer, or
round trips from client to server.

Generally speaking, grab as much data as you'll need (but not more).

------
lebinh
Measure and profile the code will always be the best thing to do but for
simple 20% effort tips I think it would be:

* Don't write your own data structure, stick to the standard library or, in some rare cases, popular and well-known 3-rd party lib.

* Always prefer a Hash or a Set over anything else.

* Beware of any nested loops.

* Have some local / external caches and try memoization.

* Try parallelizing or better, non-blocking wait for all I/O operations.

------
andrewstuart2
Use hash tables, hash maps, dictionaries, JavaScript objects, etc. (whatever
it's called in your language of choice) for indexing.

This gives you constant time access to data by whatever properties you want to
index on (usually by id).

Also, use libraries and reuse your own code/routines. If you find a faster way
to do something, _all_ the code that used those routines will benefit
immediately.

~~~
troels
Corollary - some times hash tables aren't the fastest data structure. You can
often replace a hash with a trie to see a considerable improvement (Caveat
emptor ...)

------
IMTDb
Use the correct data structure !

Don't use an array when you really need a Set. Store your data un hash maps is
possible. etc

------
gchavez2
I/O latency is usually a major "hidden" cost on programs that deal with large
sets of data, not only when writing the solution to disk or loading the data,
but memory access in particular. Bottlenecks on the memory hierarchy is
something to pay a close attention to.

------
SuperChihuahua
Avoid sqrt (you can always use "x^2 < 2^2", instead of "sqrt(x^2) < 2")

------
Thasc
When interacting with a (well-designed and indexed) database, stick as much
filtering and sorting as you can in the database query, rather than using a
less specific database query and then crunching the result about some more in
your own code.

------
tomp
Use the correct algorithms. O(log n) beats O(n) beats O(n log n) beats O(n^2)
almost every time. A common mistake is e.g. using the `in` operator with lists
in Python, instead of converting the list to a set first.

~~~
Alex3917
> A common mistake is e.g. using the `in` operator with lists in Python,
> instead of converting the list to a set first.

Is this true even for small lists without many duplicates?

~~~
raverbashing
Not really

You may be able to compromise and use an ordered list and a binary search.

------
lycoperdon
The book Code Complete, by Steve McConnell, has a chapter with many great
tuning techniques and general good advice on this topic.
[http://cc2e.com/](http://cc2e.com/)

------
troels
Offload heavy task to run in the background. Use caching. Make cpu intensive
processes parallelisable. (That's not a word, is it?)

------
jkot
Depends on platforms. In general use right data structures, optimize hot-spots
and basic hardware awareness.

------
gren
Profile and identify bottleneck?

~~~
lclarkmichalek
Nah, applying these cargo cult recommendations is obviously a better idea.

~~~
raverbashing
Write in assembly!! That's faster!

~~~
jacquesm
Especially the writing bit...

------
twsted
My first advice is always:

Check your loops, it's where your mistakes get amplified.

------
JoeAltmaier
The slowest processor is the wetware reading the code. Write straightforward
code, using common design patterns, with semantic comments (don't describe
what its doing; say why its doing it).

------
vram22
A book by Jon Bentley [1], "Writing Efficient Programs", is very good for this
area. It is out of print, probably, was written many years ago, but IMO is
still likely to be very useful for a _wide_ range (not all) of performance
issues.

This is because he approaches the problem both somewhat scientifically, as
well as gives many rules of optimization that can be applied in various common
situations. Lots of code examples in a language similar to (or actually)
Pascal (forget which, read it some years ago), but that should not be a
problem for any competent programmer. Many real life war stories of
performance tuning throughout the book. In some of his examples he manages to
improve some initial code (e.g. the travelling salesman problem) by one or
more orders of magnitude (1 order being 10X), via repeated application of
various performance rules to succeeding versions of the same code, optimized
by one rule at a time. One of his war stories is about a team that optimized a
quicksort (?) algorithm on a supercomputer by either 100,000 times or 1
million times (for speed) over the initial version, by working at several
levels of the stack, including from algorithms through code down to hardware.
Great story. At the end of the book he also gives all the rules or techniques
again in a summary form (each rule is illustrated with one or more case
studies in the body of the book), along with guidelines on when and when not
to apply each. You might be able to get a used copy of the book on Amazon even
though it is out of print.

Code hoisting, common sub-expression elimination, loop unrolling (with a good
binary search example from Knuth), trading space for time and vice versa,
using interpreters (to save space - he gives an example of someone who wrote
an interpreter for an interpreter in a space-constrained old IBM machine
environment, resulting in huge saving of space) - those are some of his
techniques / war stories that I remember off the top of my head. Really worth
reading, IMO.

He is also the author of Programming Pearls and More Programming Pearls - also
pretty good books.

I just looked up his Wikipedia page again:

[http://en.wikipedia.org/wiki/Jon_Bentley](http://en.wikipedia.org/wiki/Jon_Bentley)

and saw these excerpts (among other stuff):

Jon Louis Bentley (born February 20, 1953 in Long Beach, California)[1] is a
researcher in the field of computer science. He is credited with the invention
of the k-d tree. ... After receiving his Ph.D., he joined the faculty at
Carnegie Mellon University as an assistant professor of computer science and
mathematics. ... At CMU, his students included Brian Reid, John Ousterhout
(inventor of Tcl/Tk), Jeff Eppinger, Joshua Bloch, and James Gosling (inventor
of Java), and he was one of Charles Leiserson's advisors. Later, Bentley moved
to Bell Laboratories. ... He wrote the Programming Pearls column for the
Communications of the ACM magazine, and later collected the articles into two
books of the same name. ... Bentley received the Dr. Dobb's Excellence in
Programming award in 2004.

~~~
vram22
This post is related and interesting too:

[http://googleresearch.blogspot.in/2006/06/extra-extra-
read-a...](http://googleresearch.blogspot.in/2006/06/extra-extra-read-all-
about-it-nearly.html)

In the book Writing Efficient Programs, Bentley talks about teaching a class
of experienced programmers (the class was about algorithms), in which he asked
all of them to write a binary search algorithm. Then he showed that almost all
of them had bugs. The above post is by Joshua Block (of Sun, author of
Effective Java, who was one of the students on that class), and the post says
that Bentley's own program for the search also had a bug - found later.

Acording to Bloch, Bentley's bug escaped detection for two decades, and
further, Bloch's code for the same binary search, written for the JDK (for
java.util.Arrays), also had a bug that was detected nine years later.

------
jessermeyer
#pragma omp parallel

------
larrykubin
caching and database indexing

------
bibabo
Cache.

------
z3t4
Write simpler code.

------
a3voices
Just use dictionaries and arrays for almost everything.

