

Writing the Fastest Code, by Hand, for Fun (2005) - gaius
http://www.nytimes.com/2005/11/28/technology/28super.html

======
rudiger
Mr. Goto's name seems aptly suited to his work, writing the instructions given
to microprocessor chips.

<http://en.wikipedia.org/wiki/Aptronym>

~~~
agosnell
Yes, it seems unlikely one would consider Mr. Goto harmful.

------
chrisaycock
GotoBLAS2 was quite an achievement and featured performance metrics
competitive with Intel's MKL [1]. When Prof Goto stopped working on it, the
code was forked as OpenBLAS [2], which is used by the Julia programming
language [3].

[1] <http://eigen.tuxfamily.org/index.php?title=Benchmark>

[2] <https://github.com/xianyi/OpenBLAS>

[3] <http://julialang.org/>

------
brohee
Funny, at the time of the article GotoBLAS was about to be opensourced, it
since not only happened but went unmaintained.

<http://www.tacc.utexas.edu/tacc-projects/gotoblas2>

~~~
fdej
There is an actively maintained fork of GotoBLAS2, OpenBLAS:
<https://github.com/xianyi/OpenBLAS>

------
acqq
The software that the article mentions:

<http://www.tacc.utexas.edu/tacc-projects/gotoblas2>

~~~
chubot
The page says its portable across a few processors. I'm curious if there is
much that is portable or if each one has to be optimized separately. I'm also
curious how long code like this lasts. If the chip vendor changes their
underlying architecture does the code have to be rewritten (for speed, ot
correctness)?

~~~
gcp
The TLB optimizations that he started from are fairly generic. But the kernels
themselves are hand-optimized assembly, so they obviously don't port. No need
to rewrite for correctness, only for speed.

------
mmphosis
_He said his next big challenge was to expose chip designers to his ideas to
help speed their processors.

"Computer architects are stubborn," he observed. "They have their own ideas."
His ideas on computing efficiency, he said, speak for themselves._

------
rms25
Old article but amazing stuff. I'm assuming everything of his is written in
assembly

------
danso
Er, so what this "John Henry" was capable of besting computers at, if I'm
reading between the confusing lines correctly, was the most optimized
subroutines? When I saw the phrase "John Henry" or the OP's headline, I
immediately thought of someone writing more lines of code than...well, I guess
I don't even know what kind of contest that would be.

~~~
jameskilton
He basically is able to see how data flows through code and can rewrite
routines to ensure that as much is saved in the CPU's local caches (L1 / L2 /
L3) as possible to optimize the calculations. In short, he's one of the best
at writing code to minimize CPU cache misses and thus one of the best at
writing blazingly fast code.

