
Performance comparison of Functional C++ 11, Java 8, JavaScript and Wolfram - npalli
http://unriskinsight.blogspot.com/2014/06/fast-functional-goats-lions-and-wolves.html
======
dxbydt
It is rather sad that none of the 49 comments (at this time of posting) have
made a mention of the underlying math problem, which is super interesting, and
instead focus on dubious speed metrics (dubious in the sense these metrics
change with the chip/cache/RAM/compiler/lang/coding-style & aren't that
relevant anyways, compared to the underlying problem)

Lemme rephrase this rather interesting problem: There are 2055 startups, 2006
bigcorps & 2017 zombies. If the startup gets bought out by the bigcorp, the
bigcorp has wasted its well earned money and soon becomes a zombie. If the
zombie folks instead join hands with the startup, they suddenly make pots of
money & the startup become a bigcorp. Finally, if the bigcorp uses its cash
prudently and buys the zombie, it starts innovating & becomes a startup.

So the claim is that if you let this economy play out in all its glory, there
will be 1.448 billion buyouts. To arrive at this giant figure of 1,448,575,636
takes anywhere between 335 seconds for the C++ hacker to 7000 seconds for the
JS guys.

Now give it your best shot!

~~~
chapel
The code is very inefficient to say the least.

I made some modifications (diverging further from the "functional" nature, if
you could call it that) and as you can see it is much faster[0].

Output:

    
    
      $ node new-magicForest.js 2017 2055 2006
      total forests: 6128
      { goats: 0, wolves: 0, lions: 4023 }
      total time: 20ms
    

[0]
[https://gist.github.com/chapel/1c038b2bf64b3037aaea](https://gist.github.com/chapel/1c038b2bf64b3037aaea)

~~~
skratky
Your code contains a bug in the function getForestKey. See gist.

------
mobocat
But C++11 solution is not functional. It uses state variables: look at the
while loop, it updates variable. Look at the next_forests.reserve,
std::back_inserter. These are not clear functional constructs. Of course it is
possible to model memory state with the monads :) but... The C++ code is quite
different from other codes. So i was surprised by Java which is only 3 times
slower than C++ while running a much less effective code

~~~
pkolaczk
Glad someone pointed that out. It is cheating. It is doing a lot of state
transformations like in-place sorting / filtering. I'd like to see some real
C++ functional code here, using immutable data structures, not an imperative
program using lambdas. I guess it would be both much harder to write (C++ is
not really a functional language) and much slower, because dynamic memory
allocation in C++ is more costly than in Java.

~~~
taeric
Isn't this just evidence of how broad of a definition "functional" programming
is? It is the scottsman of programming debates. (Though, I suspect any
"paradigm" example will fall into this trap.)

------
zik
It's pretty astonishing how easily C++ trounces everything else including
Java. I guess there's still something to be said for compiling directly to
optimised binaries.

~~~
mtdewcmu
Supposedly, Java is going to be faster than native code any day now. It's been
said for years. The case was somewhat credible at one time, because the
opportunity exists to optimize using runtime information. I think the reason
it didn't go that way is:

1\. CPUs have gotten very good at doing runtime optimization kinds of things
on their own, like predicting branches and reducing the cost of virtual
function calls. 2\. Java only does optimizations that can be done quickly,
since the optimizer has to compete with the executing program itself. 3\. The
claim was overblown to begin with, and Java is trying to do too many other
things, like be secure, that interfere with performance.

~~~
Afforess
I have never seen Sun or Oracle make that claim. You seem to simply be ranting
on about a strawman argument, with a rather strange java-hate obsession.

I mean, Fortran did worse than Java. Where is your writeup for that?

~~~
georgemcbay
There was certainly a lot of talk within the Java development community about
how Java was going to meet or overtake native code as the HotSpot VM matured.
See, for example (from 1998):

[http://www.artima.com/designtechniques/hotspotP.html](http://www.artima.com/designtechniques/hotspotP.html)

"According to Sun Microsystems, the Hotspot virtual machine, Sun's next-
generation Java virtual machine (JVM), promises to make Java "as fast as
C++.""

~~~
cbsmith
I'd say that statements like that are subject to some degree of
interpretation. It's hard for one runtime to be definitively faster than
another runtime. It is, however, quite possible for one runtime to have cases
where it is better, cases where it is worse, and cases where it is equivalent
such that it is reasonable to say that it is "as fast as" the other. Java
tends to be a bit slower than C++ still, but the differences between it and
"as fast as" are trivial enough that a number of HFT systems, for example, are
written in Java.

~~~
kbenson
While I think the comment here leaves that open as a possibility, the second
sentence of the article it's from makes it pretty clear. "Specifically, Sun
says that a platform-independent Java program delivered as bytecodes in class
files will run on Hotspot at speeds on par with an equivalent C++ program
compiled to a native executable."

There's not a lot of wiggle room there.

~~~
jdmichal
Considering that I can get different performance for the _same_ C++ program
simply by using different compiler, or even different compiler _options_ , I'd
say that there's a lot of wiggle room.

------
xxs
Few major points:

Using sort to filter duplicates is horribly worse compared to hashing
(javascript for instance). Java version has rather poor impl, it's interesting
to see the GC+allocation cost and the GC type used. C++ version does not use
really use func. prog in find_stable_forests and meal...

~~~
panic
_Using sort to filter duplicates is horribly worse compared to hashing
(javascript for instance)._

Is this really true? A sort plus a linear scan has very low constant time
factors, good cache locality (depending on the sorting algorithm used), and no
need to allocate if you're sorting in place. I've seen good results using sort
to filter duplicates in my own performance-sensitive code. Are you saying this
technique is "horribly worse" based on your experience or intuition?

~~~
FreezerburnV
Just to add to the discussion: you're likely correct, based on this SO
question:

[http://stackoverflow.com/questions/11227809/why-is-
processin...](http://stackoverflow.com/questions/11227809/why-is-processing-a-
sorted-array-faster-than-an-unsorted-array)

It will likely get faster the more duplicates you have, as it allows the CPU
to predict a branch more reliably several times before being wrong and having
to re-do some of its prediction stuff. (note that I'm not experienced with
optimizing stuff, dealing with cache optimizations, etc., but I have read a
decent amount, and I did take a class about CPU architecture) I would also
suspect that hashing might end up messing with memory all over the place,
causing the cache to be partly useless.

~~~
ygra
That question (and answer) is probably not relevant here. The important thing
is likely cache locality, not branch prediction (as the test case is very
different in that question, amplifying branch misprediction issues, which
won't be the issue here).

------
defg
I was curious how a non-functional version would fare, so I wrote one in
Nimrod and it's a lot faster than the functional C++:
[https://gist.github.com/def-/8187448ea7a5c8da8265](https://gist.github.com/def-/8187448ea7a5c8da8265)

    
    
      Goats Wolves Lions    C++11  Nimrod
         17     55     6     0.00    0.00
        117    155   106     0.17    0.01
        217    255   206     0.75    0.01
        317    355   306     2.16    0.01
        417    455   406     5.28    0.01
        517    555   506    10.75    0.01
        617    655   606    19.15    0.02
        717    755   706    31.58    0.02
        817    855   806    46.52    0.02
        917    955   906    67.94    0.02
       1017   1055  1006    93.75    0.02
       2017   2055  2006   731.42    0.04

~~~
qznc
The important word here is "functional" C++. This is not about "fastest" C++.
What does your comparison tell us?

~~~
taeric
That functional might not be the best damned paradigm on the planet for every
use case? :)

------
platz
The term "Functional" here is used _extremely_ liberally. In all 3 cases.

------
taliesinb
Some of the WL code leaves a little to be desired. Here's a more idiomatic and
readable Meal:

    
    
       Meal[forests_] := 
         Outer[Plus, forests, Permutations[{1, -1, -1}], 1] // Catenate // 
           DeleteDuplicates // Select[AllTrue[NonNegative]];

------
bottled_poe
A message to the blog owner: get rid of that swipe to go forward/backward
between posts when viewing on mobile. It looks good, but it is functionally
frustrating. Why would a user expect different swipe behaviour in one
direction to another?

~~~
eshyong
I've seen this on other blogs too hosted by Blogger/Blogspot. I'm pretty sure
it's their doing, not the blog owner himself.

------
twfarland
I had a quick go at this with Racket:
[https://gist.github.com/twfarland/a9d8ce9eff22b39d3136](https://gist.github.com/twfarland/a9d8ce9eff22b39d3136)

I'm not sure if I got the problem right, because it solves the hardest case
almost instantly (2006 lions, 2055 wolves, 2017 goats -> 4023 lions), in 0.8s
on my macbook air.

I used a general search algo that I've also used in the past for the
missionaries and cannibals and snake cube puzzles.

It uses a set to store the past states seen, instead of deduping a list.

------
Dn_Ab
Here is my natural (without any extra effort thinking about speed) F#
solution: (for calibration I have the C++, unchanged, times)

    
    
       | 217 | 255 | 206 | 0.5 s (C++ 1.6s)
       | 317 | 355 | 306 | 1 s   (C++ 4.8s)
       | 617 | 655 | 606 | 3.5 s (C++ 35s)
       | 917 | 955 | 906 | 10s   (C++ 117.5)
    

code:

    
    
        let actions = [|[|-1; -1; 1|]; [|-1;1;-1|]; [|1; -1;-1|]|]
    
        let stable [| g ; w; l|] = (g = 0 && (w = 0 || l = 0)) || (w = 0 && l = 0)
    
        let isSound = function | [| x ; _; _|] | [|_; x; _|] | [|_; _; x|] when x < 0 -> false | _ -> true
    
        let stateChange state =  Array.map (Array.map2 (+) state) actions 
    
        let deduplicate sequence = sequence |> Seq.groupBy hash |> Seq.map (snd >> Seq.head)
    
        let forest start =
          let rec search curforest =  
                let nextforest = Array.collect stateChange curforest
                                          |> Array.filter isSound 
                                          |> deduplicate 
                                          |> Seq.toArray  
                                    
                if nextforest.Length = 0 then curforest 
                else match Array.tryFind stable nextforest with 
                        | Some _ -> nextforest
                        | None -> search nextforest
    
          search (stateChange start) |> Array.filter stable

------
frik
It would be interesting how LuaJIT, PyPy and HHVM (PHP JIT) score against JS
v8 and Java 8 on that test environment.

Also, the clang C++ compiler is (a lot) slower than gnu C++ compiler. (we had
a benchmark on HN that showed that clang C++ is 5% slower and asm.js is 10%
slower than gnu gpp) The comparision should be executed on Linux or Windows as
OS X is known for shipping with older versions of Unix tools/applications.

------
dschiptsov
SBCL or at least Haskell by any chance? And then in terms of lines of code.)

~~~
hiker
Here's a Haskell translation of the C++ version. Runtume is around 10x of the
original (probably vector vs list cache trashing and vector sort/unique vs
list to set/set to list), code reduction is 3x.

    
    
      import qualified Data.Set as S                                                                                                                                                
      
      data Forest = F Int Int Int
        deriving (Eq, Ord, Show)
      
      meal forests = (S.toList . S.fromList)
        [nextForest |
         forest <- forests,
         meal <- possibleMeals,
         let nextForest = forest <+> meal,
         valid nextForest]
       where
        possibleMeals = [
          F (-1) (-1)   1,
          F (-1)   1  (-1),
          F   1  (-1) (-1)]
        F x y z <+> F x' y' z' = F (x+x') (y+y') (z+z')
        valid (F x y z) = x >= 0 && y >= 0 && z >= 0
      
      findStable forest = iter [forest]
       where
        iter forests | not (done forests) = iter (meal forests)
                     | otherwise          = filter stable forests
        done forests = null forests || any stable forests
        stable (F _ 0 0) = True
        stable (F 0 _ 0) = True
        stable (F 0 0 _) = True
        stable _         = False
      
      main = print $ findStable (F 117 155 106)

------
SNvD7vEJ
Strangely, if the hashCode() method in the Java version is replaced with one
generated by Eclipse, the java program runs much slower.

In fact, the execution time is more than doubled.

Why is this so?

The Eclipse generated version of hashCode():

    
    
    		@Override
    		public int hashCode() {
    			final int prime = 31;
    			int result = 1;
    			result = prime * result + goats;
    			result = prime * result + lions;
    			result = prime * result + wolves;
    			return result;
    		}
    

The version in the original code:

    
    
    		@Override
    		public int hashCode() {
    			final int magic = 0x9e3779b9;
    			int seed = 0;
    			seed ^= this.goats + magic + (seed << 6) + (seed >> 2);
    			seed ^= this.lions + magic + (seed << 6) + (seed >> 2);
    			seed ^= this.wolves + magic + (seed << 6) + (seed >> 2);
    			return seed;
    		}
    

The two HashCode() methods both have about the same execution time.

Example:

Forest.makeForest(517, 555, 506)

With original hashCode(): 8.177 s

With Eclipse generated hashCode(): 19.237 s

(100% repeatable with only a few 100ms diff between executions)

------
curveship
If anyone is still reading this thread, I optimized the javascript solution
and algorithm, speeding it up by almost 20,000x and beating the C++11 one by
almost 1,000x. Post is here:
[https://news.ycombinator.com/item?id=7858485](https://news.ycombinator.com/item?id=7858485)
.

------
FloNeu
Hmm... it doesn't mention anything about the used VM settings. System has 8gb
ram. But as i remember java vm is restricted in its use of ram as is NodeJs (
i guess this test was performed in node). To use full 8gb ram you would have
to run multiple nodejs instances? am i wrong?

------
jaytaylor
I'd love to see how Go stacks up here.

~~~
kyrra
This post seems to focus heavily on using map/filter type commands, and Golang
does not have functional programming features like this. The Go way of solving
this problem would not look like the other solutions here.

~~~
AYBABTME
Go has all the 'functional' features that Javascript, Java and C++ have
demonstrated here. The difference is in the library implementation of
collections/containers; the default ones in Go don't provide the filter/map
methods (or any methods for that matter), but nothing prevent you to use that
style with a collection that would implement it.

That is, there's nothing inherent to Go, as a programming language, that makes
it less functional than Java or C++. Go has first class support for functions
and methods as objects, with proper closures, anonymous funcs and such.

------
jryan49
Do the Java benchmarks include the start-up time for the JVM? If so, I
wouldn't say it's a completely fair comparison. (At least for the times that
are small).

------
m_mueller
yeah, I'd like to see that Fortran code. There's some trivial mistakes that
can be made by messing up array access orders for example.

~~~
short_circut
I am almost certain it wasn't included for some reason like that. I also
suspect that they were using an outmoded version of fortran. Clicking the
extra links would tend to validate that suspicion.

Also performance of fortran code is quite dependent on the compiler and
compiler options. Somethng that was not well described.

------
Shorel
I would like to have DLang added to that benchmark.

------
mike_ivanov
Haskell, anyone?

~~~
hiker
See my comment here
[https://news.ycombinator.com/item?id=7857316](https://news.ycombinator.com/item?id=7857316)

------
Groxx
Neat results, and a neat comparison of the languages.

I'm a bit curious how an asm.js port (via whatever means. maybe c++ ->
emscripten?) ends up performing.

edit: heh.

> _Also note that FORTRAN is not included in the list [of relative speedups],
> because no sane person would switch to FORTRAN from another programming
> language voluntarily._

~~~
mtdewcmu
It's embarrassing that Fortran gets beat by Java.

~~~
pkolaczk
It's embarrassing that Fortran got beaten by a very unoptimized Java code. But
this is probably not because of Fortran is slow and Java is fast, but because
the whole benchmark results are pretty much a coincidence of bad coding and if
someone else wrote it, the results could be as well completely reversed. E.g.
the Java code uses some of the worst performance antipatterns that are
possible to do in Java, e.g. like creating small objects everywhere
(Optional), while C++ version uses primitives like ints and bools.

------
xkarga00
I don't think although all these languages have some functional
characteristics that they should be called functional languages. It seems like
the OP was more concerned with getting the results he wanted rather than
answering the “Can a functional language do this as well?” question.

------
alok-g
See also:
[https://news.ycombinator.com/item?id=7850394](https://news.ycombinator.com/item?id=7850394)
(Code length comparison for several languages)

------
exabrial
Actually surprised Java lagged behind so far behind... It's usually the case
that "Java is 95% as fast as CPP, but can be written with a fraction of the
violence."

Any real explanations?

~~~
zmmmmm
I've never heard anybody say that Java is that fast. It can be in very
specific situations. Actually 2 - 3 times as fast as C++ is pretty awesome,
especially since that is hardly low level Java, rather quite abstract. I
suspect low level Java could get it down to a flat 2x in this case. For a huge
range of uses 2x as fast is effectively as fast as C, especially since taking
advantage of multicore / parallelized operations is significantly easier (in a
cross platform manner) than it is in C/C++.

------
zurn
It would be very neat to see a Cuda or OpenCL solution

------
itsbits
what did you use for Javascript??...nodejs??

~~~
Flenser
How each program was compiled/run is documented in the source:
[http://www.unisoftwareplus.com/download/blog/2014-06/magicFo...](http://www.unisoftwareplus.com/download/blog/2014-06/magicForest.js)

------
scope
don't know if am mistaken or not but I think I read somewhere that V8 got
beaten by native

