
How much your computer can do in a second - srirangr
http://computers-are-fast.github.io/
======
gizmo
Pretty cool, but a number of the questions are totally unknowable.

For instance the question about web requests to google. Depending on your
internet connection you've got more than a order of magnitude difference in
the outcome.

In the question about SSD performance the only hint we have is that the
computer has "an SSD", but a modern PCIe SSD like in the new Macbook pro is
over 10 times faster than the SSDs we got just 5 years ago.

The question about JSON/Msgpack parsing is just about the implementation. Is
the python msgpack library a pure python library or is the work of the entire
unpackb() call done in C?

The bcrypt question depends entirely on the number of rounds. The default
happens to be 12. Had the default been 4 the answer would have been 1000
hashes a second instead of 3. Is the python md5 library written in C? If so,
the program is indistinguishable from piping data to md5sum from bash.
Otherwise it's going to be at least an order of magnitude slower.

So I liked these exercises, but I liked the C questions best because there you
can look at the code and figure out how much work the CPU/Disk is doing.
Questions that can be reduced to "what language is this python library written
in" aren't as insightful.

~~~
dragontamer
This is exactly my main complaint about this.

Still, it was an intriguing idea, even if a lot of the answers were based on
things not described. I did learn a few things... I don't do much parsing so I
didn't realize how slow JSON was compared to other packaged formats.

~~~
geogriffin
it's the required escaping of quotation marks!! if you have non-english users
and didn't turn off escaping of non-ascii then you're in for a real surprise
:) length-prefixed strings would solve this.

also, memory allocation penalties can hit you hard, depending on your
environment, since it's hard to predict what you'll need to allocate when
encoding / decoding json

------
realo
Yes, modern computers are fast. How fast?

The speed of light is about 300,000 km/s. That translates to roughly 1 ns per
foot (yeah, I mix up my units... I'm Canadian...)

THUS, a computer with a clock speed of 2 GHz will be able to execute, on a
single core/thread, about 4 (four !) single-clock instructions between the
moment photons leave your screen, and the moment they arrive into your eye 2
feet (roughly) later.

_That_ should give you an idea of how fast modern computers really are.

And I _still_ wait quite a bit when starting up Microsoft Word.

~~~
benchaney
Most modern CPU's can execute more than one instruction per clock cycle on a
single core or hardware thread. They do a lot more than 4 instructions in the
time light can travel 2 feet.

~~~
tachyonbeam
If you want to get technical, modern CPUs can dispatch (begin the execution)
of 4 to 6 instructions every clock cycle. However, many instructions take more
than one cycle to complete. For instance, integer multiply takes 3-4 cycles on
an Intel Sandy Bridge chip. FMUL takes 5 cycles to complete.

That being said, there are several instructions, like mov, integer additions
and compare, bitwise operations, which can absolutely complete in a single
cycle.

See this document for more details:
[http://www.agner.org/optimize/instruction_tables.pdf](http://www.agner.org/optimize/instruction_tables.pdf)

------
munificent
If, like me, you spend most of your time in high-level, garbage collected
"scripting" languages, it's really worth spending a little time writing a few
simple C applications from scratch. It is _astonishing_ how fast a computer is
without the overhead most modern languages bring in.

That overhead adds tons of value, certainly. I still use higher level
languages most of the time. But it's useful to have a sense of how fast you
_could_ make some computation go if you really needed to.

~~~
TeMPOraL
Worth checking out Common Lisp. It's as high-level language as you can get,
and yet good compilers (like SBCL, or like commercial ones from Franz and
LispWorks) can compile it to tight assembly with performance very close to
that of C++ (you need to disable some runtime checks for that though, but you
can do that on a per-function level, so it's much less of a problem than one
thinks).

~~~
runeks
Haskell is also really fast. On most Stack Overflow questions asking "why is
this Haskell program slow", there's often a detailed answer with an
implemention that's about as fast as -- or even faster than -- C:

[http://stackoverflow.com/questions/42771348/why-is-
haskell-s...](http://stackoverflow.com/questions/42771348/why-is-haskell-so-
slow-compared-to-c-for-fibonacci-sequence)

[http://stackoverflow.com/questions/6964392/speed-
comparison-...](http://stackoverflow.com/questions/6964392/speed-comparison-
with-project-euler-c-vs-python-vs-erlang-vs-haskell)

[http://stackoverflow.com/questions/29875886/haskell-why-
does...](http://stackoverflow.com/questions/29875886/haskell-why-does-int-
performs-worse-than-word64-and-why-my-program-is-far-slow)

~~~
Johnny_Brahms
None of those are very good examples. They spend a lot of time optimizing
already dead-slow algorithms, and they end up with an optimized dead-slow
haskell version that is faster than the non-optimized dead-slow C version. Not
very impressive.

I thought I would implement some naive versions:

For he second link, I just whipped up this naive in chez scheme, and it runs
in 0.5s, and that should really be the baseline:
[https://pastebin.com/JXGLA4TR](https://pastebin.com/JXGLA4TR) Incidentally
this seems to be on par with the fastest haskell version posted that does not
use precomputed primes for factorisation. It could be made a lot faster by
using only primes, but I couldn't be bothered.

And I doubt you will find a haskell version that gets the longest collatz-
sequence that is faster than this:
[https://pastebin.com/FAdiHA3X](https://pastebin.com/FAdiHA3X) (0.01s using
gcc -O2). Yet again, a simple algorithm, but this time with bolted-on
memoization.

Edit: The first link contains the most naive and slow fibonacci function. It
might be mathematically elegant, but it is dirt slow.

    
    
        (define (fib n)
          (let loop ([n n] [a 0] [b 1])
            (if (zero? n)
                b
                (loop (- n 1) b (+ a b)))))
    

That one takes 0.1s calculating the 100.000th fibonacci number, and there are
even faster ways.

------
userbinator
Alternatively, this could be titled "do you know how much your computer
_could_ do in a second but isn't because of bad design choices, overengineered
bloated systems, and dogmatic adherence to the 'premature optimisation' myth?"

Computers are fast, but not if all that speed is wasted.

A recent related article:
[https://news.ycombinator.com/item?id=13940014](https://news.ycombinator.com/item?id=13940014)

~~~
sametmax
It's wasted only if it's not traded for something else. But it is.

A lot of system would simply not exist if we would have waited for people
doing it properly because there is a limited pool of very skilled experts and
the demand for IT far exceed our ability to supply. Plus writing good code
takes a lot of time and resources, but our society changes now so fast that it
very well maybe rewritten next year.

Hence, we are trading computer power to compensate our limited human resource.

E.G: wondering why you see electron apps everywhere now ? Because until now
making a beautiful, powerful and modern app with a portable GUI was something
only a few people would be able to do. Now any web dev can do it, and they are
able to solve problems that weren't solved before because nobody would be
available to do it. At the price of performance, memory usage and the use of
the ugliest programming language we ever made popular.

It's the same deal as before. When C arrived, Wordperfect died because they
stick to assembly. For them, C was wasting resources.

When Java arrived, expert systems were turned to it because it was more
productive than to write it in C. We didn't need it to be fast, but the
companies wanted their tools to be produced quickly.

Then when the Web arrived, people were laughing about the quality of PHP
(initially a Perl hack !). So many bad code, so many security fails, just a
terrible, terrible stack of programming debt. But it created the web we have
today because it was easy to use, and suddenly a lot of people could just
write their web forum. Hell, I learned my job with EasyPhp.exe way before
becoming a Python expert.

Nothing here is wasted. It's just implicitly and involuntarily invested.

~~~
aschampion
> Because until now making a beautiful, powerful and modern app with a
> portable GUI was something only a few people would be able to do.

Except it's none of these things: electron apps suffer the same presentation-
before-content problems as most of the web, they inherit all of the state bugs
of web apps, and they're painfully slow despite usually just being menus of
nested lists and text boxes. It's passing a burden from developers to users,
which considering the huge asymmetry between these two groups that's typical
(at least 3 orders of magnitude, and often 6 or more) is an enormous
inefficiency.

I'm all for selecting tools and platforms to better optimize developer
productivity, but it's not clear to me the web platform has actually
accomplished that for all but a few types of applications, and pushing it to
the desktop is resulting in cumbersome, unresponsive, needlessly fragmented
interfaces.

Put simply: Amazon Music Desktop has untold development investment and a
complete visual redesign every 6 months, but WinAmp circa 2001 is somehow a
thoroughly better way of searching and playing music.

~~~
sametmax
Let's be clear, I dislike slow apps, I think current behemoth web pages size
is a monstrosity and every time I start an electron app (minus the excellent
vscode), I scream in my head.

Yet.

Most electron app I tried have a ratio result/effort far better than any other
solutions for the dev.

~~~
panic
Why do we care so much about the dev? There are way more people who use a
typical program than there are people who develop it, and these people often
use the program more frequently than the developers make changes to it.

For example, if a feature is used every day for a year by 10,000 people,
speeding it up by 200ms is worth over a month of developer time (200ms for
10,000 people over a year is 203 hours of time wasted, which is about 25
8-hour days, or five 40-hour weeks).

~~~
BHSPitMonkey
Who is funding the developer? Why should they spend $5k (plus the opportunity
cost of not using that dev's time on more fruitful pursuits) on 2 weeks of
micro-optimization so that their 10,000 users will each experience a speedup
so small they'll never even notice (and certainly never pay extra for)? How do
you justify that expense?

~~~
panic
I'm making a more abstract argument that doesn't have anything to do with
money. The time people spend using a piece of software shouldn't be worth less
than the time developers spend writing it.

~~~
BHSPitMonkey
Okay, but under those conditions your argument can only be applied to a more
abstract reality that isn't the one we actually live our lives in.

------
chacham15
Be careful what conclusions you attempt to draw from examples when you arent
sure what exactly is happening. These examples are actually very wrong and
misleading.

Take for example, the first code snippet about how many loops you can run in 1
second. The OP fails to realize that since the loop isnt producing anything
which gets actually used, the compiler is free to optimize it out. You can see
that thats exactly what it does here:
[https://godbolt.org/g/NWa5yZ](https://godbolt.org/g/NWa5yZ) All it does is
call strtol and then exits. It isnt even running a loop.

~~~
jvns
We automatically generated all the results in this quiz from the programs on
the site. None of the loops were optimized out, we ran basically a binary
search to figure out the maximum number of iterations you could run in a
second.

Results and compiler optimizations will of course vary across computers, but
they were correct on my laptop on the day that we ran them (in Sept. 2015).

If you want to reproduce this on your computer and see how the results are
different, you can clone [https://github.com/kamalmarhubi/one-
second](https://github.com/kamalmarhubi/one-second) and run
`run_benchmarks.py`

~~~
Johnny555
The -O2 flag wasn't added to the benchmark script until Sept 20th in commit
4a31931, I suspect you ran at least sum.c without that flag since any modern
gcc compiler should have optimized away that entire loop.

I downloaded and ran the benchmarks and indeed the search for the 1 second
runtime for sum.c never completed -- it never found an answer because the
_sum_ binary runs in constant time regardless of the iteration count. In fact,
atoi() in sum.c quickly overflowed once the iteration count exceeded 2^31
(since atoi() returns a signed int). I removed the -O2 flag and then I got the
same 550429840 iter count as you on my laptop.

The benchmark script isn't really doing a binary search, it's doing a linear
search since it's increasing the iteration count by 10% with each loop.

------
dom0
More impressively, sum.c could go likely an order of magnitude or so faster,
when optimized.

> Friends who do high performance networking say it's possible to get network
> roundtrips of 250ns (!!!),

Well stuff like Infiniband is less network, and more similar to a bus (e.g.
RDMA, atomic ops like fetch-and-add or CAS).

> write_to_memory.py

Is also interesting because this is dominated by inefficiencies in the API and
implementation and not actually limited by the memory subsystem.

> msgpack_parse.py

Again, a large chunk goes into inefficiencies, not so much the actual work.
This is a common pattern in highly abstracted software. msgpack-c mostly works
at >200 MB/s or so (obviously a lot faster if you have lots of RAWs or STRs
and little structure). Funnily enough, if you link against it and traverse
stuff, then a lot of time is spent doing traversals, and not the actual
unpacking (in some analysis I've seen a ~1/3 - 2/3 split). So the cost of
abstraction also bites here.

If you toy around with ZeroMQ you can see that you'll be able to send around 3
million msg/s between threads (PUSH/PULL) from C or C++, around 300k using
pyzmq (this factor 10 is sometimes called "interpreter tax"), but only around
7000 or so if you try to send Python objects using send_pyobj (which uses
Pickle). That's a factor 430.

~~~
p1esk
How would you optimize sum.c to be faster?

~~~
gravypod
GCC with -O3 might try to unroll this loop into a single constant. O(0) is
pretty fast.

~~~
supergarfield
That's very likely. Clang 8.0 collapses the loop with any optimizer setting
other than -O0.

But if you had to do it manually, loop unrolling and SIMD instructions
(although not part of standard C) would be good bets and can probably get you
an order of magnitude.

~~~
loeg
Clang 8.0?

~~~
dom0
Apple Clang version number is _approximately_ the same as the Xcode version
number, and not identical to the Clang release it's based on.

~~~
loeg
So what version of upstream Clang is Xcode 8.0 based on?

------
Eliezer
What an excellent teaching pattern - you're far more likely to remember what
you learned if you first stop to think and record your own guess, and this is
excellent UI and UX for doing that routinely and inline.

~~~
panic
This page from the New York Times is another great example of teaching through
interaction:
[https://www.nytimes.com/interactive/2017/01/15/us/politics/y...](https://www.nytimes.com/interactive/2017/01/15/us/politics/you-
draw-obama-legacy.html)

------
bane
This is awesome. The real lesson here is, when you make a thing, compare its
performance to these kinds of expected numbers and if you're not within the
same order of magnitude speedwise, you've probably screwed up somewhere.

My favorite writeups are the ones that gloat about achieving hundreds of pages
served per second per server. That's terrible, and nobody today even
understands that.

------
alkonaut
Don't some of these examples run in O(1) time because the value in the loop
isn't used? E.g in the first example 0 is returned instead of the sum.

Obviously we are talking about real world c compilers with real world
optimizations so presumably we'd have to also consider whether the loop is
executed at all?

------
paulsutter
That's nothing. Here's code that does 77GFLOPS on a single Broadwell x86 core.
Yes that 77 billion opertaions per second.

[http://pastebin.com/hPayhGXP](http://pastebin.com/hPayhGXP)

~~~
voltagex_
Interesting that that code looks more like Assembly than C to me.

------
asrp
This reminds me of "Latency Numbers Every Programmer Should Know"

[https://gist.github.com/jboner/2841832](https://gist.github.com/jboner/2841832)

Edit: Just realized halfway through that there's already a link to this from
their page!

------
gburt
The `bcrypt` question seems out-of-place. It has a configurable cost
parameter, so almost any of the answers is correct.

------
bch
Hard to believe there are 124 comments here and nobody has brought up Grace
Hopper's talk[0][1] yet. With good humour she gives a example of what various
devices' latency are, and a simple tool to comprehend the cost and orders of
magnitude.

    
    
      [0] short - https://www.youtube.com/watch?v=JEpsKnWZrJ8
      [1] long - https://www.youtube.com/watch?v=ZR0ujwlvbkQ

------
gibsjose
I'm curious to see the data collected on guesses. Some were quite difficult to
guess, like hashes per second with bcrypt not knowing the cost factor, but I
guess we can assume some sane default.

I would have really liked to see all these numbers in C, and other languages
for that matter. Perhaps add a dropdown box to select the language from a
handful of options?

------
tomc1985
One second on what?

A Core i7? A raspberry Pi? A weird octo-core dual-speed ODROID? An old
i915-based Celeron? My cell phone? An arduino?

"Your computer" has meant all the above to me, just in the last few weeks. The
author's disinclination to describe the kind of hardware this code is running
on -- other than "a new laptop" \-- strikes me as kind of odd.

------
alcuadrado
This reminds me to this email from LuaJIT's list:

Computers are fast, or, a moment of appreciation for LuaJIT
[https://groups.google.com/forum/#!msg/snabb-
devel/otVxZOj9dL...](https://groups.google.com/forum/#!msg/snabb-
devel/otVxZOj9dLA/rgCojUohBGMJ)

------
norswap
Brilliant! I'd like to see those numbers summarized somewhere though, a bit
like the latency numbers every programmer should know:
[https://gist.github.com/jboner/2841832](https://gist.github.com/jboner/2841832)
(visual: [https://i.imgur.com/k0t1e.png](https://i.imgur.com/k0t1e.png))

------
partycoder
Computers are fast unless your algorithm is quadratic or worse, then there's
no computer to help you.

------
Lxr
Why isn't the first Python loop (that does nothing but pass) optimised away
completely?

~~~
adrianratnapala
Compilers often don't optimise loops away because they feel that the user put
them there for a reason -- after all they are so obvious that the user could
have removed them herself.

One use for such a loop is a delay -- not so common nowadays, but it used to
be a mainstay of DOS based games etc.

I bet that if GCC started aggressively optimising out empty loops, it would
interact with some subtlety of concurrency to break a spinlock or three in
various kernels.

~~~
danellis
Compilers can only optimize away loops if they know that the loop has no side
effects. Compilers like GCC have to have a list of standard functions that
they know are pure.

This can trip you up sometimes. For example, if you do something like use
memset to zero out sensitive data when you're done using it, the computer can
say "the result of memset is never used, and the reference to that memory is
lost, so there's no need to actually make the call" and you're left with
secrets in memory.

~~~
saagarjha
Forgive me if I don't know what I'm talking about here, but could volatile be
used here to force the compiler to perform the memset?

~~~
HorizonXP
You're correct. ([https://barrgroup.com/Embedded-Systems/How-To/C-Volatile-
Key...](https://barrgroup.com/Embedded-Systems/How-To/C-Volatile-Keyword))

But again, you're telling the compiler that you know what you're doing, and to
not perform its normal optimization. I'm not well-versed in compiler design,
but I think that's the tradeoff you have to make between trusting the user's
design choices, or not.

------
ge96
I came across this "article"? before in the past, I feel like I remember it
under a different title like "language speed differences" or something. Or
maybe that's another article by the same author/site/format.

------
sriku
The grep example should search for one character. Grep can skip bytes so that
longer search strings are faster to search for. On my machine, I get from
22%-35% more time taken if I changed "grep blah" to "grep b".

------
thomastjeffery
Or "how fast can one of my 8 CPU cores run a for loop?" To put that in
perspective: all 8 cores together give me about 40gflops. I have 2 GPUs that
each give me more than 5000gflops.

------
urza
Anyone care to rewrite these into c#? I am really surprised how fast these
python scripts are and I would like to see comparison with equivalent tasks in
c# where it stands..

------
kobeya
Was disappointed to find that nearly all the examples were Python and shell
script. I'm not interested in knowing random trivia about how slow various
interpreters are.

------
samirm
The last question seems really misleading. Most modern CPUs have a cache size
of 8MB (max), yet the answer is >300MB?

------
tim333
Or running Windows Vista you can right click and display a menu in one second
plus about 29 other seconds.

------
wtbob
Well, my computer won't display an image apparently inserted with JavaScript,
although it _could_ if I wanted to grant execute privileges on it to
computers-are-fast.github.io

Does anyone have a link to the image(s)?

------
brianwawok
> GitHub Pages is temporarily down for maintenance.

Ironic?

------
ilaksh
NVMe SSD can be up to 10X faster than SATA.

------
d--b
or "computers are fast, so we might just slow things down by using python for
numerical calculations"

------
joelthelion
This could make a pretty good hiring test. Not expecting perfect answers, but
a rough correlation with the results, and some good explanations.

------
grepthisab
Edit: I'm an idiot

~~~
eridius
Where's the typo?

A millisecond is one thousandth of a second. If you can run something 68
million times in a second, you can run it 68 thousand times in a millisecond.
And that's what the text says.

