
Only fast languages are interesting - spindritf
http://scottlocklin.wordpress.com/2011/11/30/only-fast-languages-are-interesting/
======
weavejester
There are a few things wrong with this benchmark. In Lush, he's measuring the
time it takes to sum 3 million pre-computed random numbers. In Clojure, he's
measuring the time it takes to generate 3 million random numbers, and then sum
them.

It also looks like he's using two different data structures. In Clojure, its a
lazily generated linked list of objects; in Lush, it appears as if he's using
a vector pre-initiated to the right size.

He also doesn't mention which versions of Clojure and Lush he's using, or how
much memory Lush uses. He complains that a JVM with a 130M is too small, but
the 30 million long array in Lush would have been almost twice that size if it
was populated with 64-bit doubles.

For reference, it takes me 571ms to run the 300 thousand number example that
took him 861ms. If I factor out the random-number generation from the Clojure
benchmark, it cuts the time down to 120ms.

If then I use a pre-generated fast array and the specialized areduce function
(to match his specialized idx-sum function), I get 16ms for 300k, 22ms for 3m
and 230ms for 30m.

So performance is on par with Lush, so long as you compare like for like.

~~~
lutorm
"The point is, Lush has datatypes for fast numerics: it’s designed to do fast
numerics. Clojure doesn’t have such datatypes..."

He knows he's not comparing like for like. What he apparently doesn't know is
that there _is_ a better data type.

~~~
Nrsolis
Whats the better datatype? I'm looking at doing some numeric programming and I
wondered about Clojure for this.

~~~
weavejester
If your aim is raw speed, not a lot beats a standard JVM array.

------
jwr
This is flamebait, especially the headline.

There are various tools you can use, and using a screwdriver instead of a
hammer won't help you much if what you have is nails.

From my point of view, you may get back to me when Lush has data structures
for concurrent programming and does (nearly) pauseless GC on multicore
machines with heaps of 12GB composed of complex data structures, not just
vectors of numbers. This is what I get with Clojure and for my applications it
works great. Oh, we also do numerics, quite a bit in fact. But whenever I need
actual raw-metal double-adding performance, I use an appropriate tool, e.g. an
extension/library. For certain string operations we even went down to x86-64
assembly, with great results.

What you describe is a JVM limitation, not a Clojure limitation — and you're
right in only one thing: Clojure is not the right tool for you.

~~~
ajross
Digression here, but I like the "(nearly)". This bit always amuses me about
garbage collection wonks. The pauseless bit is a real time requirement. Saying
your GC has great latencies or is "(nearly) pauseless" is tantamount to
telling a real time engineer your system only fails some of the time. It makes
you look dumb.

GC is great. GC makes a ton of things simpler. GC as implemented in popular
environments still sucks for real time use.

~~~
Confusion
The 'pauseless' bit is not just a real-time requirement. A website that stalls
for 2 seconds once every 10 minutes is unacceptable to me. That application
thus requires a garbage collector that is _nearly_ pauseless: the pauses are
small enough that your users won't notice them. So I don't think it makes
anyone look dumb at all to speak of a 'nearly pauseless' garbage collector.

------
exDM69
I can relate to this. Nothing is more disappointing than putting together a
nice piece of code and then seeing it run too slow to be practical.

A while ago, I wrote some physics simulation in Python. The code turned out to
be really neat and quite idiomatic Python code. The code ran very slow, I was
seeing only 10-20 frames per second when I was shooting for something closer
to 120 frames per second. All I was doing was simulating a cube suspended on a
plane with four dampened springs, and I was only simulating one of those where
I wanted lots of 'em.

The problem was that my code was allocating and freeing way too many objects.
The proper solution would have been to go from a neat idiomatic approach where
the code resembles the mathematical equations it simulates to some kind of
structure of arrays -style code with Numpy. Needless to say, there isn't much
fun in using Python that way, so I might as well write the damn thing in C.

Later I went on to rewrite that code in C and running it on my GPU with
OpenCL. I did write it in a structure of arrays style, so maybe the Python
experience gave me a valuable lesson. Now it runs fast enough for my purposes.

~~~
maximusprime
> nice piece of code

Too many developers are paying way too much attention to the code being "nice"
or it being in a "nice" language, and not enough attention to "does it work"
"is it fast".

~~~
omegaworks
Nice code is maintainable code. High-level languages exist for the sole
purpose of making code "nice."

You pay enough attention to "is it fast" and you'll find yourself writing
everything in assembler. There's a time and place for that, sure, but when you
have a high level language that can theoretically optimize idiomatic code into
something faster, it should opt to.

~~~
buff-a
_You pay enough attention to "is it fast" and you'll find yourself writing
everything in assembler_

Not everything. Not even then. Just the good bits maybe. Sometimes none of it.
It might make me prefer C# over Java for example, because C# can lay down
structs linearly in memory so I can throw them to the graphics card directly.
Or maybe I'd do all the hard work in C and SWIG it so I could use Ruby.

But what I certainly wouldn't do is attempt to write a _physics simulation_ in
_python_ , "idiomatic" or not.

~~~
rbanffy
> I certainly wouldn't do is attempt to write a physics simulation in python,
> "idiomatic" or not.

<http://numpy.scipy.org/>

Life can be good.

~~~
probably
But that's the thing - I would argue that NumPy code is not "idiomatic" Python
which uses built-in operations/structures, like list comprehensions, tuples,
and dictionaries. I had a similar experience during extensive NumPy coding
where I thought, why am I not just writing this in Fortran.

------
spacemanaki
The title is link bait to the point of nearly trolling. He didn't do his
research, he should have jumped on the Google group or Stack overflow or
something before writing this. Someone named Mike in the comments on the OP
comes up with a fix that uses Java primitives and that is very fast, in the
range of his Lush examples.

[http://scottlocklin.wordpress.com/2011/11/30/only-fast-
langu...](http://scottlocklin.wordpress.com/2011/11/30/only-fast-languages-
are-interesting/#comment-2529)

There's almost nothing to see here, except maybe that there should be better
documentation on this kind of optimization.

------
skew
All the clojure examples test linked lists!

The first example lets lazy evaluation of the random numbers leak into the
timing, toss a (dorun tmp) before the timing for a little more sense.

The type annotation ^doubles is useless - it's telling Clojure to expect the
definition to get an array of doubles, but then binds it to the same old
linked lists.

~~~
bretthoerner
Can you fix his code and paste it? Why not have him run the bench and post it
as an update?

~~~
rjn945
Based on the work of a previous commenter, I posted this code in the blog
comments:

    
    
      (defn add-rands []
        (let [ds (double-array 30000000)]
          (dotimes [i 30000000] (aset ds i (Math/random)))
          (time (areduce ds i res 0.0 (+ res (aget ds i))))))
    

This adds 30,000,000 numbers in 73ms on my machine. His Lush code added
30,000,000 in 180ms. I estimate my computer is twice as fast as his, putting
them on par. Hopefully he will run the code so we can see all the run times on
the same machine. (I never could get his Lush code running.)

Of course, the Clojure code here is fairly involved to do some basic stuff,
but if you did things like this often it would not be hard to add some nice
syntactic sugar over it.

------
fpgeek
To me, this reads like someone who used C to add a large list of numbers
recursively and concluded C was trash because they ran out of stack.

There's no denying there is a specific problem. People are certainly entitled
to decide that solving that problem is too much hassle for them. But how
seriously should we take their opinions when they do that?

I'd say it depends on the quality of the effort they did invest. In this case,
the OP admits that they used the wrong data structure in Clojure and the right
one in Lush, which makes the entire exercise meaningless.

~~~
anonymous
<http://pastebin.com/2fGefex4>

    
    
      $ gcc -o addalot --std=c99 -Os addalot.c
      $ ./addalot
      [1500117.315354] 0.000032 per iteration on average.
    

Works fine for me.

~~~
dchest
This is not the correct C program, this is a C program for the compiler with
tail call optimization (or with the stack large enough for it to work).
There's no TCO in the C standard.

~~~
stonemetal
There is no restriction against TCO in the C standard either. TCO is a valid
optimization for a compiler to apply. So what is incorrect about it?

~~~
dchest
Because it relies on the particular implementation of C compiler, not on the C
standard. Take -Os flag away or use a compiler that doesn't have TCO, and the
program has a bug.

See also:

* using memcpy() for overlapping regions (relies on the particular implementation of memcpy(); was exposed by a change in glibc - <http://lwn.net/Articles/414467/>);

* memory aliasing assumptions;

* passing NULL, 0 to memcpy() (<http://code.google.com/p/spiped/source/detail?r=8>);

etc.

~~~
stonemetal
It is ansi C standard code. There is an implementation limitation (stack size)
that this program runs into. However that doesn't violate the C standard in
any way, because the standard doesn't specify available stack space.

So the program is compatible with certain C implementations in certain
configurations. Given the C standard allows a C implementation to pick any
stack size they like (including zero) we could claim that about all valid C
programs that make any use of the stack.

------
_delirium
Languages where the upgrade path to fastness isn't horrible are okay with me
also. In Common Lisp, for example, you can start adding in type declarations
in speed-critical areas, which Clojure also has something similar to (not as
familiar with it, so not sure how it compares to CL's type declarations).

------
shadowfox
Probably should have added "for large-scale numeric computations" to the title
:P

~~~
jwr
Yes, but then the flamebait/trolling factor would be gone. The title is such
obvious flamebait that I hesitated whether to flag it.

------
luriel
Here is another place where Go hits the sweet spot between performance and
expressiveness/simplicity.

Also by giving you control over how memory is laid out you can be much more
efficient both in space and time, and when you need to go even faster you can
optimize things quite tightly.

------
teyc
Guy Steele said it better though. He wants not the fastest language, but the
fastest way to get to the solution. It may mean that the language isn't as
fast to run, but faster to program.

------
carsongross
Interesting that the people who complain about performance are almost always
in, ahem, 'quantitative finance'? The LMAX stuff, this guy...

I'm just a simple caveman, but I can't help but think the world wouldn't be
dramatically worse off if the HFT bots ran a touch slower.

------
erikb
Learn to be more calm and patient. Maybe you need to think in another way, if
you use another language and another VM. Maybe this language or VM really
isn't strong for what you want to do, but for what other tasks.

Also you might stop thinking in absoluta. There are no fast languages. Every
good and well known language has their strengths and their weaknesses and all
optimizations are a tradeoff. If you get stronger in one thing you MUST get
worse something else. So if you have a fixed set of problems you really just
need to find out which language solves your problemspace best. And if you find
it, that doesn't say this language is better then the others.

------
InclinedPlane
Synthetic benchmarks are rarely useful.

If you want to benchmark two languages you need to do this:

Find N coders skilled in language X and N coders equally skilled in language
Y. Have all of them code several small but realistic systems of various kinds,
then benchmark the results and compare. Better yet, benchmark real, live
systems of comparable purpose built using different languages. Anything else
leaves you open to having your results dominated by differences in level of
coding skill, quirks specific to your synthetic benchmark, and effects that
are only prominent for tiny, overly simplistic programs.

------
mhansen
Very well-written, but it's a non-story.

Check the comments. Mike has contributed a clojure solution using `double-
array` that runs fast, without blowing up the heap with a linked list.

------
swah
But there are no fast languages, only fast implementations...

~~~
skew
... and benchmarks that compare linked lists to ontiguous vectors

------
phzbOx
Not sure what he means by "interesting". I'm often delighted to learn new
languages.. and even ones that are only on paper.

------
mbq
One more benchmark for Clojure fans:

    
    
      R> system.time(sum(runif(3000000)))
         user  system elapsed 
        0.100   0.000   0.101 
      R> system.time(sum(runif(30000000)))
         user  system elapsed 
        1.037   0.020   1.058 
    

not as good as lush, yet human syntax and Scheme scoping can compensate.

------
tokipin
Mathematica

    
    
      In[1]:= AbsoluteTiming[
       Nest[RandomReal[] + # &, 0, 3000000]]
    
      Out[1]= {0.3276005, 1.50067*10^6}
    
      In[2]:= AbsoluteTiming[
       Nest[RandomReal[] + # &, 0, 30000000]]
    
      Out[2]= {2.9952052, 1.5001*10^7}
    

Not bad I think.

------
lloeki
Oh god, you know what's just as bad as a slow language? That Wordpress iPad
theme. Please stop using that, it's ridiculously unusable.

------
zanst
So I think you should stop with clojure and start to code in asm.

------
wtracy
This guy should look at Scala: Functional, runs on the JVM, and has
performance comparable to "normal" Java code.

~~~
darklajid
You mean like this guy? <https://news.ycombinator.com/item?id=3292555>

Scala vs. Clojure is not something to discuss for this use case, I guess. As
<https://news.ycombinator.com/item?id=3293261> points out in that other
thread, you won't write (recognizable) Scala (or Java. Or Clojure) code
anymore if you want the best performance.

~~~
stewbrew
With respect to the second link: It always helps to know your data structures
& their purposes--no matter what language you use, recognizable or not.

With respect to performance: you might want to try writing a c-like solution
and run the code distributed over several machines.

------
derleth
A language can't be fast; a language is a notation. Only implementations can
be fast or, perhaps, generate fast code.

For an example, look at GHC versus some toy-Haskell whipped up in Common Lisp.
Now, with those two implementations to hand, is Haskell 'fast' or 'slow'?

~~~
danieldk
_A language can't be fast; a language is a notation. Only implementations can
be fast or, perhaps, generate fast code._

Theoretically, maybe. But it is nearly impossible to come up with an
implementation for Ruby, that is as performant as a compiled C program (and
retains the characteristics of Ruby).

So, it is definitely true that it easier to compile some languages to fast
machine or byte code than other languages. As a consequence, people will call
Ruby slow, and C fast.

~~~
rbanffy
> But it is nearly impossible to come up with an implementation for Ruby, that
> is as performant as a compiled C program

PyPy is getting there.

JIT, Runtime type detection and path optimization go a long way. And it's much
easier to write correct Ruby than correct C.

When I went to college, the HP-41 was HP's top-of-the-line calculator and all
the rich kids had them. I had a BASIC-programmable CASIO PB-700. In the end, I
was able to finish tests in less time than it took the HP guys to program
their calculators to spit the answer. I think it was the first time (1986 or
so) I realized a more expressive programming language could be a decisive
advantage over your competition.

~~~
jemfinch
"And it's much easier to write correct Ruby than correct C."

No, it isn't. C may be spartan, but its failure modes are predictable and
manageable. A ruby program can be incorrect for a host of reasons completely
unrelated to the program (interpreter bugs, garbage collection problems, etc.)
and the language as a whole provides features (dynamic typing, method
injection, exceptions) that make it harder to write correct programs.

Ruby may make it easier to write _programs_ , but it definitely doesn't make
it easier _correct_ programs.

~~~
rayiner
Eh, Ruby's failure modes are predictable too. I've never tracked a bug down to
a "garbage collection problem", whatever that is. But I certainly have come
across problems with C's lack of type safety, lack of memory safety, lack of
GC, etc.

------
maximusprime
Or you could just use java...

The title says only fast languages are interesting, then rules out using Java
because it's not lispy enough :/

Either you want speed, or you want your favorite syntax and high level stuff.
You can't have both.

FWIW My favorite syntax happens to be c/java/assembly type stuff. I'd hate to
be one of the developers who hates those, must put you at a big disadvantage -
as shown in the original post.

~~~
JoachimSchipper
Why not? Apparently lush gets it right: C-like speed on numerics with a Lisp-
ish syntax this guy likes. Yes, C-style stacks are going to beat Lisp-style
closures; but there is no reason why a "fancy" language can't do fast matrix
operations, which is 99.9% of numerical code anyway.

Even Scheme has (rather un-Lisp-ish) vectors in addition to the standard
linked lists.

~~~
nemoniac
> C-style stacks are going to beat Lisp-style closures

It has been a long time since the "λ the ultimate..." series of papers gave
the lie to this myth.

------
mjwalshe
Remeber kids "real" programmers use FORTRAN

~~~
eric_t

        real, dimension(30000000) :: a
    
        call random_number(a)
    
        print *, sum(a)
    

That's 30 million doubles, takes 0.04s on my machine.

