
A millisecond isn't fast (and how we made it 100x faster) - Symmetry
http://jvns.ca/blog/2015/09/10/a-millisecond-isnt-fast-and-how-we-fixed-it/
======
Jemaclus
I love this stuff. I'm currently working on a project where, left to my own
devices, I would have come up with some complicated scheme that would reduce
an O(n) (or even O(n^2)) task to O(log n). Fortunately, the guidelines I've
been given (government, of course) offer a series of extremely-hard-to-
understand-but-O(1) methods for retrieving the data. I haven't benchmarked it
yet, but i'd imagine that my process would take about 5ms per record, and this
process will probably take something on the order of 5 microseconds per
record.

I'm learning all kinds of tricks that older developers can probably do in
their sleep. Very cool.

(Note: the reason I say "older" instead of "experienced", even though for
these purposes they're interchangeable, is that the tricks I'm learning are
coming from a COBOL codebase written back in the 70s that I'm modernizing into
Go.)

I also learned a long time ago that lookup tables for common requests are a
no-brainer. If you only have 256 possible answers to a formula, just stick
them in a lookup table.

Love it.

~~~
molmalo
I don't want to be the guy rejecting change... But, is someone expecting a
project in Go, to last 40+ years as that COBOL codebase lasted?? Wow!

~~~
chipaca
It's not because COBOL was good that things written in it lasted; it's because
it was _there_. The things written in it made COBOL last, not vice-versa.

------
cozzyd
Lookup table seems silly when you can take advantage of the floating point
format. In C, I think you can more or less just do something like:

    
    
        uint64_t e = (1023-mj) << 52; 
        double exp = *((double*) &e)
    
    

edit: Whether or not there's a way to express that in Scala, I don't know...

~~~
wiml
In C you can use scalbn(), which a good compiler would hopefully optimize to
something equivalent to what you wrote. Java has scalb(), presumably based on
the C function.

~~~
stephencanon
That's not a legal optimization as it gets all sorts of edge cases wrong. For
example scalbn(0, anything) is zero, whereas simply adding to the exponent
will produce some arbitrary power of two or infinity.

------
coldcode
Optimizations are so fun. But the key is always to measure from end to end
before you investigate, and after you change things, in a comparable manner.
Too many people guess or theorize or make random changes.

~~~
brianwawok
And you need to know how to perf test. The OP shows a test that runs for a
fraction of a second. The JVM will not even think about JIT until it runs a
piece of code 5-10k times... so this is likely a test of how fast interpreted
code runs, which has little to no value.

~~~
kasey_junk
To be fair, the blog post just shows the command output of a test run. That
could have been the test runner extrapolating the run time from a longer
running benchmark.

The pull-request linked references some Benchmark attributes and all manner of
scala magic I don't want to interpret right now.

That said, your point holds and I'd be very skeptical that this benchmark
measures what they think it does.

------
z3t4
Optimization is a drug. If you do it too much you will get addicted and it
will ruin your productivity. But damn it feels great :P

------
SapphireSun
The reason Math.pow is so slow is because underneath, it's really doing:

    
    
        fn pow(x, n): exp( n * ln(x) )
    

This is how you assure that power operations take constant time regardless of
the input. However, exp and ln are relatively slow in numerical computing
land, so if you're using small powers simply doing the multiplication out or
if you're using powers of two, bit shifting will be faster.

~~~
caipre
* Why does this ensure constant time?

* Why is constant time desirable here, if the operation can be quicker for some inputs?

~~~
SapphireSun
It ensures constant time because the operation takes a constant amount of time
to execute exp and ln. Think about the naive implementation in the general
case:

x^6 = x * x * x * x * x // linear in n

Here's a slightly (probably slightly wrong) optimized version of that which is
O(log(n)):

    
    
        fn pow(x, n): 
          if n == 0: 1
          elif n == 1: x
          elif n is even: 
             return pow(x * x, n/2)
          elif n is odd:
             return x * pow(x * x, (n-1)/2)
    

The nice thing is about the constant time is that it's relatively speedy (<
1ms) and is guaranteed to handle even large inputs gracefully[1]. Some
implementations might add an additional optimization like if n == 2: x * x or
n == 3: x * x * x since those are super common, but if you're really crunched,
you've added if statements or function calls when you could have just inlined
it.

[1] Though you'll start running into quantization error at some point from ln.

~~~
tveita
Java's Math.pow accepts non-integer exponents so your example doesn't handle
all cases.

~~~
SapphireSun
Right, it only works on positive integers, but my original example using exp
and ln works in nearly all cases (though you probably want to do some rounding
in the case of an integer input). That's actually another point in its favor.
It's constant time and handles the odd cases.

------
drv
This looks like an ideal case for something like ldexp(), but I couldn't find
an equivalent function in Java at first glance.

A lookup table seems like a bit of a shame for something that should be
inherently fast in binary floating point, but it is probably the next best
thing if poking around in the floating point bit representation isn't
possible.

~~~
stephencanon
ldexp( ) is a surprisingly expensive operation because it has to handle edge
cases correctly; even if you don't hit those edge cases the best
implementation of the fast path on most architectures is approximately extract
exponent, add adjustment, check result, branch to fixup if out of range,
insert exponent. If the range of adjustments you need to support is small, and
you're not totally load-store bound, a lookup table is faster.

(That's not to say that Java shouldn't have ldexp( ), just that it's not a
fix-all.)

------
tmd83
Yeah a millisecond is a long time even in Java. Couple of years I was writing
a java backend server for caching+computation. I was severely memory strapped
and was designing the daily expiration of data. The core data structure was
map<map>. Now the keys that I had for expiration was in the 2nd level and I
was troubled for the memory usage for a kind of secondary index for fast
expiry plus the complexity of managing it. At the end I said what the heck and
just did a full iteration of the 2nd level map and doing a hash lookup on the
expired key, kind of reversing the logic (its essentially what a DB hash join
does). I don't remember if I did a hashmap benchmark first or not. Essentially
in my very old desktop (compared to current machine) I was doing 0.5M hashlook
(2 lookups actually) per second per core and the rest was a very successful
history.

So I think yeah with layers upon layers of libraries we tend to forget how
fast a computer is and thats often sad. The whole misconception of premature
optimisation has some incorrect side affect, people often ignore performance
when designing things. The core parts of most designs specially library or
shared infrastructure code should have some concepts of overhead/cost even
though this shouldn't be overblown. Unconsciously bad designs then are often
followed by super efficient but complicated algorithm to fix the design
inefficiency.

One thing I very much agree with the blog post. The more conscious you become
of these things the more you can guess where a bottleneck exists or where you
should take care. You wouldn't always be right and you should always benchmark
specially if it's critical enough or you are changing design too much for
performance. But a general awareness can be very helpful.

As I completely rewrote that backend server few months ago I started with a
very simple design, data structure and simple algorithms (linear scanning with
a bit of indexing through plain java hashmap/concurrentashmap) it worked
flawlessly the first time (in terms of performance). It is serving over a
million request per day of fairly complicated dashboard generation at ~0ms
latency and 2 YGC/hr I think. Some might call it premature optimisation but I
think you need to have an understanding of where performance is critical and
design for that. That doesn't mean you should think everything is performance
critical and start optimising every line of your code.

------
f_
Does anyone know whether the JVM/JIT generates SIMD instructions for code like
this? Looks like a pretty obvious target to me.

------
foobar2020
Is it denormalised numbers [1]? Or is it just math.pow being slow?

[1] [http://stackoverflow.com/a/9314926](http://stackoverflow.com/a/9314926)

------
lasryaric
I started reading your post thinking I would never get it, and I am now
amazed! Thanks for sharing.

------
analognoise
This is such basic optimization I have no clue how this got any interest at
all.

The level of "yeah, no shit" is pretty high here.

~~~
powera
Some people reading Hacker News are interested in basic optimizations.

~~~
analognoise
Everyone wants everything pre-chewed for them. This is SO basic it's Sophomore
CS. This guy is doing "machine learning" but WOW A TABLE LOOKUP IS FAST?!

This sounds like "code bootcamp" level trash.

Nevermind that optimizations as a whole have a rich history and there are
fascinating examples aplenty that can actually teach you something.

Like, for example, this:
[http://files.righto.com/calculator/sinclair_scientific_simul...](http://files.righto.com/calculator/sinclair_scientific_simulator.html?m=1)

About getting a scientific calculator to run on a $100 chip, when the chip's
creator (TI) said it was impossible.

Or how the genesis of some of our optimizations (and architectural
improvements) date back to parallelizing work...with different colored punch
cards at Los Alamos by Feynman:
[http://blogs.msdn.com/b/msr_er/archive/2011/02/04/celebratin...](http://blogs.msdn.com/b/msr_er/archive/2011/02/04/celebrating-
richard-feynman-at-tedxcaltech.aspx)

If you're a hacker, but everything has to be pre-chewed for you and this
garbage rates as "interesting", what the fuck are you doing with your time?
Buying "geeky" TShirts and playing ping pong? Jesus wept, what next, amazing
Excel spreadsheet calculations? If this person is doing "machine learning",
the machine doesn't stand a chance.

Sorry. I use HN as my 'smart things' filter - if I wanted 'for Dummies' I'd be
on Reddit (and I have tons of stupid shit questions, and I ask them
elsewhere). It makes me irrationally angry for some reason. Like, THIS is now
the level of achievement that requires a blog post - and it gets to the front
page!? The 'everybody gets a trophy!' level shit is here to stay. I have to do
some serious thinking about why this pisses me off so much. Everybody is now a
hacker, hackers are now at the level of blog posts over table lookups. I guess
I'm angry over the dilution, commercialization and sanitation of the whole
thing. Like the pool is now so shallow and full of piss, what the fuck
happened? Everybody is a maker, a hacker, a tech-tropeneur, but subversive,
active, hard-working, horizon seeking is largely gone. People want to reinvent
the wheel in Javascript for the distributed machine learning {buzzword} cloud
with great UI, blog about it, and cash out. It isn't about enlightenment
anymore - personal or otherwise, unless it's dropping out to "find yourself"
making artisinal bread and birdhouses.

How did we get so lost? Maybe it's an existential crisis. Sorry again. Table
lookups are great. Go team.

~~~
mobiuscog
I'd agree, wholeheartedly.

However, it's Friday. Take a nice deep breath. Relax.

Be happy that at least they are investigating the code and trying to find
solutions (no matter how 'obvious' they may be) instead of just replacing it
with another open-source library from somewhere else that happens to be faster
in this case.

Everyone has to learn through their own experience, and I'm always happy when
people realise how things can be better - even if it doesn't happen nearly
enough these days.

Time for coffee :D

