
How Much Processing Power Does it Take to be Fast? - blasdel
http://prog21.dadgum.com/68.html
======
elblanco
This is really a fantastically important principle. When we put into
perspective what modern machines should be capable of, it's really quite
disappointing what they end up doing.

Recently I was arguing with one of my company's engineers when I noticed that
selecting a few tens of thousands of things on-screen took 1-2 seconds. He
responded that there's a lot going on there. I responded that I'm selecting a
few tens of thousands of things on-screen on a machine that's capable of
billions of CPU operations per second and trillions if I count in the GPU.
What I'm doing isn't even a rounding error when compared against those
staggering numbers. It _should_ be so fast that those things are selected
before I even move my finger from the mouse button. But it's not, I can
actually count _seconds_ before something happens...and I know that _we_
aren't doing billions of things in the meantime.

All this talk of canvas demo this, and browser app that fails to bring into
focus that the things we are fawning over, simple graphics and interactivity
on a postcard sized rendering window running on a multi-core multi-
_GIGAHERTZ_ machine with assisting co-processors for everything from floating
point to window rendering, doing things that we were all doing on machines
with dozens of Mhz not even 15 years ago, obviates how tall the software stack
is between the hardware and the user.

It's not that my engineer's code was bad, far from it, it's just that it was
lazy. He was relying on libraries built on top of libraries built on top of
libraries on down the line. Basically it's turtles all the way down, and we're
all building on top of that giant turtle pile when we could be tearing away at
relativistic speeds around a pulsar.

I actually get a little depressed when I think about it.

~~~
three14
I've been wondering if anyone has actually checked - is the performance loss
really just because of all the layers? Is there a straightforward alternative?

~~~
jheriko
The obvious alternative is to not use all the layers, but that is anything but
straightforward.

One way the slowness of libraries can become obvious is during optimisation -
as a simple example memory copying can be optimised if you are doing large
copies - the C library memcpy has to work in the general case and afaik it
typically just loops over bytes and copies them one by one, which is probably
optimal if you are copying a small number of bytes like 2 or 3 - probably a
common case, but on modern CPUs you can get substantial speed ups by writing
your own partially unrolled loop to copy 4-bytes at a time, or even more if
you are willing to write assembler code where you can copy 16-bytes at a time,
and with non temporal cache hints. Think about how many routines copy memory
about by using this library... and this is just one example. In an actual use
case of software rendering I used this to copy a 320x240 framebuffer and my
final, assembler optimised version was a good 15% faster than using memcpy.

The problem is that libraries are convenient and they have to work in a large
number of cases which may prevent them from using the choice of algorithm that
is optimal for your problem. Even the fact of being in a library requires some
small slowdown from not being able to inline, e.g. the C standard library math
functions can be optimised just by writing equivalents that can be inlined -
the gain is small per call but it still exists.

I'm not 100% sure but the C library math functions may do things to undo the
features of the FPU as well, e.g. the fsin instruction fails for values over
2^64 iirc, the library function might do expensive operations to get around
this, in which case the gain of using a single fsin instruction will be
significant, perhaps more than twice as fast as the equivalent C library
function.

Some of this is the rational behind my FridgeScript language (which tries to
be fast at floating point ops), which is measurably faster than the MS C++
compiler provided that the code is clean (FridgeScript does no optimisation to
a very good approximation, so things like foo+1+bar+1 mostly end up as three
additions instead of two)

~~~
three14
Thanks. I'm still not sure where the thousands of times the speed of old
processors is going, though - 15% doesn't account for it, unless there are a
lot of layers - 50 or so, no?

[http://www.google.com/search?hl=en&q=1.15^50](http://www.google.com/search?hl=en&q=1.15^50)

------
wheaties
Yeah but back then it took one guy 6 months of bit fiddling in assembly/C to
develop that kind of game. Now we have art departments, story board artists,
script editors, a legion of QA, and of course, the developers. Any time you
take some development process that layered, eventually, the performance at all
costs people get pushed to the side. Although, I do agree with the spirit of
the article.

~~~
wanderr
This. The performance improvements in modern hardware are primarily beneficial
to developers, not users. Developers have always aimed for "fast enough," and
the less time and effort it takes to reach that goal, the more functionality
can be added in the same number of man hours.

The problem is that people have different definitions of what fast enough
means, and edge cases might not meet anyone's defenition of fast enough, if
they just weren't considered.

------
hxa7241
I suppose it is not entirely surprising, since modern programming has drifted
away from its most fundamental purpose -- processing data in a certain amount
of space and time.

Look at common programming languages: the control of how much storage space is
used is in the background, and the control of time taken is non-existent. The
dominant, and almost only, concern is representational structure -- how the
software is understandable and manipulable.

This is reasonable -- representation is important and comparatively valuable
-- but the riches of modern hardware speed have brought a kind of decadence to
programming.

------
extension
Programmers get first dibs at surplus computing resources and they rarely
leave much behind for users. It's not really fair.

Games are the exception.

~~~
silentbicycle
Games are not the _only_ exception. Pretty much anything soft-realtime that
needs to do elaborate graphics (3D rendering, etc.) is pushed towards similar
trade-offs.

