
ARMv7 vs. x86-64: Pathfinding benchmark of C++, D, Go, Nim, Ocaml, and more - logicchains
https://github.com/logicchains/LPATHBench/blob/master/writeup.md
======
lispm

        (defun get-longest-path (nodes node-id visited)
          (declare (optimize (speed 3) (space 0) (debug 0) (safety 0)
                             (compilation-speed 0)
                             #+lispworks (fixnum-safety 0))
                   (type fixnum node-id)
                   (type (vector node) nodes)
                   (type (vector atom) visited))
          (setf (aref visited node-id) t)
          (Let ((max (loop for neighbour of-type route across (node-neighbours (aref nodes node-id))
                           unless (aref visited (route-dest neighbour))
                           maximize (the fixnum
                                         (+ (the fixnum (route-cost neighbour))
                                            (the fixnum (get-longest-path nodes (route-dest neighbour) visited)))))))
            (declare (fixnum max))
            (setf (aref visited node-id) nil)
            max))
    

Above Common Lisp version improves the runtime from 8.5 to 3.6 seconds in SBCL
and from 30 seconds to 2 seconds in LispWorks 64bit. Computer: i7 Mac mini.

~~~
logicchains
Nice, re-running the benchmark with the new code now.

*Edit: and, it's done.

~~~
lispm
here is a much faster version:

[https://gist.github.com/lispm/6066e1eeadf943910c47](https://gist.github.com/lispm/6066e1eeadf943910c47)

You might want to adapt it...

~~~
logicchains
I'm off to bed now, I'll pull it in the morning.

------
userbinator
I'd really like to see a "pull out all the stops" benchmark using highly-
optimised Asm for the two architectures, as then it's just a matter of how
much you can squeeze out of the CPU itself and not something limited by the
thick layers of language abstractions on top of that. That would be a nice
theoretical maximum to compare against.

Edit: I tested the C++ version on my 5-year-old i7, with an even older
compiler (just had to modify the code to not use C++11 features), and with the
max optimisation level, it produces a result of 1465ms - which is _pretty damn
amazing_ , considering that this is a 16-year-old compiler generating 32-bit
code and the most recent CPU it had knowledge of was the Pentium Pro (P6)! I'm
convinced that an Asm version could be <1s though, so there's still plenty of
room for improvement.

~~~
nly
You have to give Intel/AMD a lot of credit for improving the performance of
existing binaries. They've reduced CPI and improved ILP a lot in 16 years, not
to mention vastly improved various forms of prediction, caching and
speculation. A lot of the performance improvement gleaned from modern
compilers will be better reduction of the C++ language itself, e.g. smarter
inlining, LTO, devirtualisation etc.

------
Symmetry
So if it's a "Galaxy S3 with 2GB of ram and a quad-core 1.3ghz processor" then
this should be the version with the Exynos SOC. That means that these are
cores inside are A9s. The Intel cores are from the Westmere generation.

I wonder how much of the difference we're seeing between various languages is
the quality of the code their compilers generate for various backends and how
much is due to the different languages benefiting more from architectural
differences between the two chips.

I imagine that languages that generate code with more indirection are going to
excersize the prefetcher and branch predictor of the core they're running on
much more than languages that generate code with simpler control flows. Both
the A9 and the Nehalem cores are out of order but the Nehalem has a much, much
more sophisticated set of facilities for that. I predict that if you were to
re-run the benchmarks on an iPhone 5S you'd see much less of a difference
between the various ARM times. And if you were to run it on a cheap Android
phone with A7 or A53 cores you'd see a much larger difference.

~~~
alain94040
Correct. Not to mention that benchmarking on a mobile phone is tricky. You
need to make sure your CPU won't throttle due to heat. Your average mobile CPU
can't run at 100% for very long. The more benchmarks you run back to back, the
slower you get, so your measurements can easily be completely wrong.

~~~
acqq
Exactly, the speed of mobiles is limited to short bursts to preserve the
battery. And sometimes even the worse code gets more CPU if it hits the right
patterns.

The timings presented are also confusingly presented for my taste.

------
pja
Based on profiling the C++, this seems to be more a benchmark of function call
cost in various languages than anything else - this code makes 42975348 calls
to

    
    
      int getLongestPath<16>(std::vector<node, std::allocator<node> > const&, int, std::bitset<16>)
    

Which is fine, but seems a slightly limited platform / language comparison!

------
rwmj
Which version of OCaml? The ARMv7 backend was rewritten about 2 years ago, and
merged in 4.00.

[http://caml.inria.fr/mantis/view.php?id=5433](http://caml.inria.fr/mantis/view.php?id=5433)

The new backend is supposed to be considerably faster on floating point code.
This code looks integer only, and I don't have relative performance of old/new
backend for integer code.

As a wider question: Who cares much about ARMv7? ARMv8 is a completely
different beast, requiring a different backend, with much better raw
performance (on the same terms as x86-64). That's where languages should be
concentrating their current efforts.

~~~
jensnockert
Except it's really hard to benchmark on ARMv8 since non-Apple hardware is
expensive.

~~~
lazyjones
> _hard to benchmark on ARMv8 since non-Apple hardware is expensive._

If Android will do - Cortex-A53 smartphones like the Huawei Ascend Y550 cost
120€ nowdays in Europe (about 4x the price of a Raspberry Pi).

~~~
rwmj
Also Nexus 9, for quite a bit more money. The problem with these phones and
tablets is they are quite unlike server hardware:

\- Far too little memory (2GB vs 16/32GB+)

\- Slow flash vs SATA disks

\- The "zoo" of u-boot/proprietary kernel crap, instead of UEFI, ACPI and
standard upstream kernels

\- Nonsense like locked bootloaders (Nexus 9 disables HYP mode in the
bootloader!!!)

------
IsTom
> Functional code in Haskell/OCaml can be faster than imperative code using
> iorefs.

IORefs involve locking. They are bad performance-wise. An algorithm like this
should be done either fully functionaly without any mutation at all or in ST.

~~~
eschew
GHC, at least, uses the same implementation for IORef and STRef:
[http://hackage.haskell.org/package/base-4.3.1.0/docs/src/GHC...](http://hackage.haskell.org/package/base-4.3.1.0/docs/src/GHC-
IORef.html#IORef)

Poking through GHC's runtime source, I see a writer barrier but no locking --
did I miss something?

~~~
IsTom
Looking deeper you're right, it seems to use atomic CAS operations. Still
mutable data is bad for GC and optimization pass (can't share expressions
etc.)

------
timthorn
This is a comparison of specific implementations of the two processor
architectures. The benchmarking is still an interesting work, but it isn't a
straight comparison of language performance across architectures. It might be
a more enlightening result to know the number of opcodes each run executed.

------
FraaJad
No love for nim on this thread? nim is as fast as c++ with clang.
[http://www.reddit.com/r/programming/comments/2pvf68/armv7_vs...](http://www.reddit.com/r/programming/comments/2pvf68/armv7_vs_x8664_pathfinding_benchmark_of_c_d_go/cn0fkn1)

~~~
def-
I think the Nim code is also pretty readable:
[https://github.com/def-/LPATHBench/blob/master/nim.nim](https://github.com/def-/LPATHBench/blob/master/nim.nim)

And Nim doesn't look bad in the other statistics (on an x86_64 Intel Core2Quad
Q9300):

    
    
        Lang    Time [ms]  Memory [KB]  Compile Time [ms]  Compressed Code [B]
        Nim          1400         1460                893                  486
        C++          1478         2717                774                  728
        D            1518         2388               1614                  669
        Rust         1623         2632               6735                  934
        Java         1874        24428                812                  778
        OCaml        2384         4496                125                  782
        Go           3116         1664                596                  618
        Haskell      3329         5268               3002                 1091
        LuaJit       3857         2368                  -                  519
        Lisp         8219        15876               1043                 1007
        Racket       8503       130284              24793                  741
    

Code size with gzip -9 < nim.nim | wc -c. Removed unused functions in Haskell.

------
brucer
I'm really impressed you implemented this in so many languages!

I was curious how fast Javascript would run this, so I ported the lua example
over, it ran in 2200msec on my mac laptop. I'm not sure it's a great
comparison benchmark however, because there are some obvious optimizations you
can make. I added a simple cache to avoid recalculating travel costs for
leaves and it now runs in 150msec.

var fs = require('fs');

    
    
      var nodes = [], visited = [];
      var visitCache = {};
    
      function readPlaces() {
        var lines = fs.readFileSync('agraph','utf8').split('\n');
        var numnodes = lines[0]|0;
        for(var i=0;i<numnodes;i++) {
          nodes[i] = [];
          visited[i] = false;
        }
        for(i=1;i<lines.length;i++) {
          var line = lines[i].split(' ');
          nodes[line[0]|0].push({dst: line[1]|0, cost: line[2]|0});
        }
      }
    
      function getLongestPath(nodes, nodeid, visited) {
        visited[nodeid] = true;
        var neighbours = nodes[nodeid];
        var max = 0
        for(var i=0;i<neighbours.length;i++) {
          if (!visited[neighbours[i].dst]) {
            var dist = neighbours[i].cost + getLongestPath(nodes, neighbours[i].dst, visited);
            if (dist > max) {
              max = dist
            }
          }
        }
        visited[nodeid] = false;
        return max;
      }
    
      function getLongestPathCached(nodes, nodeid, visited, depth) {
        var idx;
        visited |= 1 << nodeid;
        if (depth == nodes.length - 7) {
          idx = (visited * 32) + nodeid;
          if (visitCache[idx]) {
            return visitCache[idx];
          }
        }
        var neighbours = nodes[nodeid];
        var max = 0
        for(var i=0;i<neighbours.length;i++) {
          if (!(visited & (1 << neighbours[i].dst))) {
            var dist = neighbours[i].cost + getLongestPathCached(nodes, neighbours[i].dst, visited, depth+1);
            if (dist > max) {
              max = dist
            }
          }
        }
        if (idx) {
          visitCache[idx] = max;
        }
        if (typeof visited != 'number') {
          visited[nodeid] = false;
        }
        return max;
      }
    
      readPlaces();
      var start = Date.now();
      var length;
      if (nodes.length < 32) {
        length = getLongestPathCached(nodes, 0, 0, 0);
      } else {
        length = getLongestPath(nodes, 0, visited);
      }
      console.log(length+' LANGUAGE Javascript: '+(Date.now() - start));

~~~
logicchains
Nice, your cached version is the fastest by far, although it uses a different
algorithm. Running with `node js.js` without the cache, it takes around 6
seconds, close to Dart and Haskell.

~~~
brucer
I just realized I overthought the caching scheme, and removing one check made
it 3x faster :)

    
    
      function getLongestPathCached(nodes, nodeid, visited, depth) {
        var idx;
        visited |= 1 << nodeid;
        idx = (visited * 32) + nodeid;
        if (visitCache[idx]) {
          return visitCache[idx];
        }
        var neighbours = nodes[nodeid];
        var max = 0
        for(var i=0;i<neighbours.length;i++) {
          if (!(visited & (1 << neighbours[i].dst))) {
            var dist = neighbours[i].cost + getLongestPathCached(nodes, neighbours[i].dst, visited, depth+1);
            if (dist > max) {
              max = dist
            }
          }
        }
        visitCache[idx] = max;
        return max;
      }

~~~
logicchains
Nice! That's almost as fast as the C++ one using caching; your algorithm must
be better.

------
nly
Seeing as how they're likely the most widely distributed Java runtimes on ARM,
I would have liked to see Dalvik and ART benchmarks for the Java code

Key takeaway for me is that statically typed languages that are compiled to
native code are still 2-3x faster than the fastest JITs. On both platforms.

~~~
logicchains
Interestingly, now that someone submitted a change to the Java implementation,
such that it doesn't use classes (unboxing is simulated by using arrays for
the data instead), it runs much closer to native speed.

~~~
0xFFC
can you explain a little further ?

~~~
logicchains
I think it's easier to read the code:

[https://github.com/logicchains/LPATHBench/blob/master/jv.jav...](https://github.com/logicchains/LPATHBench/blob/master/jv.java)

Instead of a vector of node classes, there's a

    
    
        static final int[][] nodes;
    

Which is used in a similar manner to a vector of node classes, but due to
containing primitives (ints) is unboxed.

~~~
igouy
When "the longest path problem" is reduced to indexed-access to an integer-
sequence, does it become a duplicate of fannkuch?

[http://benchmarksgame.alioth.debian.org/u64q/performance.php...](http://benchmarksgame.alioth.debian.org/u64q/performance.php?test=fannkuchredux)

------
jarcane
The Racket and Lisp comments are a bit odd. To the best of my knowledge Typed
Racket does support gradual typing, and as well, comparing it's compatibility
to Scheme is a category error: Racket is not Scheme anymore, that's why it's
called Racket now and not PLT Scheme.

~~~
logicchains
>To the best of my knowledge Typed Racket does support gradual typing

If you take a Racket program and change the language to Typed Racket, you'll
get flooded with compiler errors. Or at least that's been my experience.

>Racket is not Scheme anymore

I'll update the post to note this, thanks.

~~~
kuschku
You can use typed/racket with optional types, it’s kinda easy.

~~~
klibertp
How? The docs say ([http://docs.racket-lang.org/ts-
guide/more.html#(part._when-a...](http://docs.racket-lang.org/ts-
guide/more.html#\(part._when-annotations~3f\))) that you _have to_ provide
type annotations in certain cases.

------
kev009
I'm not really sure what this data means because amd64 and ARMv7 are ISAs. For
instance, you could make a very deep and superscalar ARMv7 chip that blows a
typical amd64 out of the water if you sacrifice size and power. Is the intent
simply to show that some language backends are not optimized? Otherwise,
without something like "These two chips and clock-for-clock or watt-for-watt
it looks like this" it seems meaningless.

~~~
logicchains
It's not meant to compare AMD64 and ARMv7 architectures, it's meant to compare
the performance of various language compilers/runtimes on two common AMD64 and
ARMv7 chips.

~~~
kev009
Not sure why the downvote as that was my request for clarification.

Point being, if you want to benchmark language backends, it doesn't make a lot
of sense to cross chips without making mulch-dimensional benchmark (i.e adding
in clock or power vectors).

After rereading closely, the intent of the article is saying "50% slowdown for
backend X on x86, 70% slowdown for backend X on ARM." The subtle difference is
that you are keeping comparisons of slowdown in the family and only making
analogy to the other arch by slowdown percentage.

It's still susceptible to ISA implementation (for instance an in order Atom
might fare a lot worse for typical backends), but mildly interesting.

------
atoponce
The LuaJIT results don't surprise me. I've always been impressed with LuaJIT.
The OpenJDK results also don't surprise me. If you work in a Java shop, you
learn very quickly to throw out OpenJDK in favor of Sun/Oracle Java. OpenJDK
is indeed a "steaming pile of crap".

However, I would have liked to see Julia and Javascript benchmarks in those
results. I've heard great things about Julia, and knowing just how incredibly
far we've brought the Javascript VMs over the past decade, it wouldn't
surprise me to see Javascript fairly high on the list.

~~~
logicchains
>OpenJDK is indeed a "steaming pile of crap"

Just to clarify, it performed as well as the Oracle JVM on x86. Its poor
performance on ARM is just due to its lack of JIT compilation.

>However, I would have liked to see Julia and Javascript benchmarks in those
results.

I'm happy to include Javascript or Julia implementations if someone supplies
them. I wasn't comfortable with Julia enough to write one myself.

~~~
atoponce
>> OpenJDK is indeed a "steaming pile of crap"

> Just to clarify, it performed as well as the Oracle JVM on x86. Its poor
> performance on ARM is just due to its lack of JIT compilation.

Understood. Admittedly, I'm a system administrator first, developer second.
However, my experience has shown that Sun/Oracle JVM usually out performs
OpenJDK. Even in development, on the Java teams I've had to support, OpenJDK
is never preferred or wanted.

Further, RHEL ships the OpenJDK compiled with the GNU compiler for Java (GCJ),
as well as GNU's classpath. From what I've seen supporting these Java teams,
and not being a Java developer, it's unstable and slow.

So, in this specific instance, OpenJDK may have performed as well as the
Sun/Oracle JVM on x86-64, but in the broader scope, it doesn't seem to hold
up. Just my experiences though. Take it with a grain of salt.

~~~
imanaccount247
>However, my experience has shown that Sun/Oracle JVM usually out performs
OpenJDK. Even in development

Your experience seems to be filtered through your misconceptions. It is the
same code base.

~~~
exabrial
I wonder what magic sauce Oracle does then with their JDK? We can measure a
very very very significant difference in performance both on x86 and ARM.

------
gaius
This is some quality work. The Java results are particularly striking! It
would be interesting to see some more languages that have more than one
implementation compared.

What is the difference between FSharp and F#?

~~~
logicchains
FSharp instead of F# is just a way of getting around issues with using the #
character in the shell scripts that run the benchmark.

~~~
gaius
But he says _F#, Haskell, Rust and Dart send their apologies. F# didn 't have
an Arch Linux package for ARM, and when I built it myself the compiler and
runtime segfaulted upon use_

~~~
logicchains
Wow, you're right, no idea where that FSharp entry on the ARM results is
coming from.

------
icefox
Something not measure that is just as important as speed (if not more so) was
how long it took to create a working program in each language.

~~~
wtf_is_up
Agreed. We must also consider what a pain in the ass it was to set up the
toolchains and/or runtimes for several of the tested languages.

I'm willing to trade a little performance and do a little extra error checking
if it means I can just do:

    
    
        GOARCH=arm go build
    

and then copy the binary over.

~~~
logicchains
> Agreed. We must also consider what a pain in the ass it was to set up the
> toolchains and/or runtimes for several of the tested languages.

The Arch Linux package repository proved really awesome here: for the vast
majority of things (everything on x86) I could just `pacman -S myLang`. The
hardest to install was SBCL on ARM, as I needed another Common Lisp to compile
it, so had to get ClozureCL and use that.

------
steveklabnik
> [Rust] 0.12 is so much prettier; vec[i] instead of vec.get(i), for instance

That was just a momentary bit of weirdness, vec[i] has since returned.

EDIT: Further down, the author acknowledges that this has been fixed. That's
what I get for commenting before scrolling the whole way down!

~~~
smosher_
I wish there was more substance to the Rust section. Most of it is in strike-
out and the remaining text has more to say about how you would do things in
OCaml than Rust.

------
0x0
It seems kinda unfair to compare 32bit ARMv7 against 64bit x86-64. Wouldn't it
be much fairer to compare ARMv7 against the register-starved x86, or to
compare AArch64 against x86-64?

~~~
logicchains
It's not comparing the architectures against each other, it's comparing
language implementations against each other on those architectures. As x86-64
is the most common x86 architecture nowadays, and ARMv7 the most common ARM
architecture (at least on mobile), comparing language's performance on them
should hopefully be more relevant as more developers use those platforms.

------
bluecalm
Obligatory question: what compiler options did you use for c++? Good first try
is -o3 -march=native. It really makes huge difference in case of C/C++

~~~
logicchains
It's in the makefile in that repo:

    
    
        	g++ cpp.cpp -std=c++11 -Wall -O2 -march=native -o cpp
    

I was originally using -O3 but then someone found that -O2 is actually faster
in this case.

~~~
vmarsy
When visited was changed from a vector<bool> to a bitset<T>, why is it not
passed by reference anymore?

~~~
pbsd
The most common call in the benchmark is to `getLongestPath<16>`. Passing 16
bits by value is cheaper and easier for the compiler to optimize (i.e., no
aliasing) than passing by reference. I'd expect graphs with many nodes to
behave differently.

------
arthursilva
I updated rust versions and submitted PR
[https://github.com/logicchains/LPATHBench/pull/33](https://github.com/logicchains/LPATHBench/pull/33)

------
kyrra
In the Go code, if you want to ignore an err you can write _ in its place.

Another things that I'm not sure you can even do anymore is disable bounds
checking by adding "-gcflags -B" to the compile.

~~~
logicchains
> In the Go code, if you want to ignore an err you can write _ in its place.

Not a good habit to get into though.

------
mrottenkolber
That is some horrible Common Lisp code. I have to try this out....

~~~
mrottenkolber
Nope sorry. I tried to fix this up a little but this is too convoluted for me
to work with.

Q1: Why do you need an adjustable array?

Q2: Why do you need a structure with a single slot (node)?

If you want to benchmark a piece of code, please write a _nice_ version and
then optimize it. How can I reason about a benchmark result if the code is not
understandable?

~~~
wtf_is_up
Many others were able to send pull requests. What's you're issue?

~~~
mrottenkolber
The original code was badly written so that is was hard to understand the
algorithm. I tried to rewrite it but it took longer than I had patience for.

The snarky tone comes from the fact that people often post "benchmarks"
including languages they can't program in resulting in a misrepresentation.

E.g. Lisp jumped from the bottom to the center (as expected) of the benchmarks
after somebody _donated_ a sane implementation.

------
melling
"The F# was however nowhere near as fast as the OCaml"

I find this a bit odd. Microsoft should be able to build a better compiler. I
thought F# was getting a lot of traction.

~~~
profquail
F# _is_ quite fast (I've written a good deal of F# code) and it has seen a big
uptick in adoption over the last couple of years.

I had a look at the F# code and it's not all that efficient (there are a
number of intermediate data structures being created that don't need to be).
I'll have a go at optimizing the code to see if I can improve the performance.

------
w8rbt
C++ wins again. There's really no competition.

~~~
cbd1984
Meh. Only if you value performance above correctness and maintainability.

Besides, C wasn't tested. C would win if it were, I'm pretty sure.

~~~
userbinator
I'm pretty sure you can write correct and maintainable code in any language.
Whether its users do is a different question.

~~~
MichaelGG
If even top-notch teams like Mozilla cannot ship large C/C++ codebases without
security holes, it seems legitimate to state that people cannot write correct
code in that language.

~~~
ShawnyShawny
Is Mozilla really a "top-notch team", though? When I look at their track
record, it's not what I'd expect from a top-notch team:

* They've alienated many Firefox users thanks to many bad UI changes. They've continued to do this even after the users have strenuously objected to these unwanted changes. Firefox's share of the market has thus dropped from 35% to probably sub-10% these days.

* Many of the remaining Firefox users still point out that Firefox is slower and more bloated than other browsers. Although Mozilla often rejects or ignores these complaints, my years of software development experience have taught me that when many users say there's a problem, there very likely is one, even if we the developers can't reproduce it.

* Firefox for Android hasn't been picked up by many users.

* Firefox isn't an option on iOS. (Although I guess we can't fully blame Mozilla for this.)

* Firefox OS is floundering. The devices available so far have fared very poorly in reviews. Some of these reviews are among the harshest I've ever seen for any software or hardware product.

* Thunderbird is on life support.

* Rust is still pre-1.0, and will be like this for several more months, at the very least.

* Servo depends on Rust, so it being a viable option is still years away.

* There was that whole Eich debacle. It was pathetic, no matter how you look at it.

* Bugzilla is long forgotten these days.

* Despite absolutely massive funding from Google and now Yahoo, Mozilla hasn't managed to put out any other product that people actually want to use.

When I look at that track record, it's just one failure or disaster after
another. It's not top-notch at all. So I'm not surprised that they have
trouble using C and C++. They seem to be having severe trouble with pretty
much everything they do!

~~~
kibwen
I have no particular love for Mozilla (no more than any other corporation),
but your points all seem facile.

1) Regardless of whether anyone thinks Firefox's UI changes have been for the
better, it's not possible to alienate Firefox users via UI changes because
Firefox still offers the most customizable UI of any browser you've heard of.
What are users going to say? "Damn you Mozilla, you made your browser look
just like Chrome! I hate that so much, I'm switching to Chrome!"

2) Many of the remaining Firefox users point out that they have switched back
to Firefox because Chrome has become bloated and slow. (Personally I think
that everyone making this argument, on all sides, merely fails to appreciate
what sort of benefit it brings to a browser to have a totally fresh user
profile.) In overall benchmarks of memory usage and browser engine/Javascript
engine speed, neither Chrome nor Firefox is significantly better by any
significant margin.

3) Firefox for Android has between 50 and 100 million downloads on Google
Play, and has a higher user rating than Chrome for Android (4.4 to 4.2).

4) Mozilla has announced earlier this month that they'll be shipping a Firefox
for iOS, but given the crippling of third-party browsers on iOS I doubt
Firefox will be any less hobbled than Chrome for iOS, and will certainly be
worse than Safari.

5) I have no sales stats on Firefox OS, but given that they're still
persisting in setting up new carrier partnerships I'd say they're better off
than at least the Ubuntu phone. I'll probably never need a Firefox OS phone,
but honestly if it weren't for the audacity of Mozilla trying to penetrate the
OS market (and hence trying to end their reliance on the willing participation
of third-party platforms to host their browser (which isn't so "willing" these
days with the advent of locked-down platforms like iOS and WinRT (is that
still even a thing?))), then I'd have already written off Firefox as dead in
the water.

6) Thunderbird was never a moneymaker nor key to Mozilla's strategy,
especially after the meteoric rise of web-based email clients.

7) Rust is the most interesting systems language to emerge in years, and its
influence will be felt on every future systems programming language to come
(though I am certainly biased here).

8) Servo is Mozilla's _other_ project, aside from Firefox OS, that is so
unbelievably audacious that I can't help but cheer them on. I have spoken with
its developers and they're all astounded with the performance they're seeing,
though they're holding off on releasing concrete numbers until the feature set
is comparable with more complete browsers. Having seen the Servo devs in
action, I can assure you they are on top of their game.

All this said, your original point was that you don't think that Mozilla has a
top-notch team of C++ developers. And here's the thing: all of this is
irrelevant to whether or not Mozilla's C++ developers are top-notch. I bet
John Carmack's team at Id software was as top-notch a team of C++ developers
as will ever be assembled, and yet Rage was still a commercial failure. To
know whether or not the team is good, you have to look at their code and you
have to look at their process.

~~~
TheLoneWolfling
Regarding 1, although a single data point does not a trend make, I will say
that I switched away from Firefox due to multiple UI changes and removing
customization options. I switched to Pale Moon.

In other words, personally, you are incorrect. FF no longer "still offers the
most customizable UI of any browser you've heard of". (Simple enough: FF
removed options that PM kept. Hence, FF is not more customizable than PM.)

~~~
Jakuv2000
I switched away from Firefox for the same reasons. Even if the UI was as
customizable as is claimed, I don't want to waste my time recustimizing it
every six weeks, or whenever a new Firefox release comes out.

------
mda
That's some alien Java code indeed.

~~~
logicchains
Any improvements you can see? The 'inner loop' of the code is:

    
    
        int getLongestPath(ArrayList<node> nodes, int nodeID, boolean[] visited){
    	visited[nodeID] = true;
    	int dist, max=0;
    	for(route neighbour: nodes.get(nodeID).neighbours){
    	    if (!visited[neighbour.dest]){
    		dist = neighbour.cost + getLongestPath(nodes, neighbour.dest, visited);
    		if (dist > max){
    		    max = dist;
    		}
    	    }    
    	}
    	visited[nodeID] = false;
    	return max;
        }
    

The ArrayList of nodes could be changed to an array, but I don't imagine
that'd be much faster (I've always heard that ArrayLists are just as fast as
arrays, except for primitives).

*Edit: someone found a massive improvement, by replacing the node class with arrays, to simulate unboxing.

~~~
bsaul
I don't think this improvement is really fair though. Everybody heard the same
thing as you did : arraylist are ok performance wise. The optimisation of
using arrays and static global variables and remove use of any object isn't
something that leads to readable code in the long run, nor is it the code your
regularely see in everyday software.

You should include both your old java code with the new one. Call your version
"enterprise java" and the other "optimized java". If anything, that could be
useful to people coding in java and having performance issue ( otherwise
they'll have to look for it in git history, which they'd have no reason to).

~~~
logicchains
It is idiomatic in highly performance-sensitive Java code though. Hopefully it
won't be necessary in Java 9 or 10 when unboxing support is brought in.

I've added a note regarding the change, but I'm off to bed in a moment (AEST
timezone). I'll separate the Java versions tomorrow.

~~~
exabrial
You have a link to Java 9/10 class unboxing? I've been googling for it and
can't find any information on it.

~~~
Dewie
Have you googled something like _value classes_?

------
WhyYes
Wow Dart is impressive above Lisp and close to F#

~~~
mrottenkolber
It's not, look again, the benchmark code was bad.

------
nkurz
Not that the article does this, but since my current bandwagon is that
assessing performance by comparing the source of two programs without
consideration of the compiler and target processor is silly, I decided to try
out the C++ version with several compilers and options. Renaming the file to
'lpath.cpp' and compiling with 'cc lpath.cpp -std=gnu++11 -Wall -Oxxx
-march=native -o lpath-cc-Oxxx' here's what I found an an i7 Haswell running
at 3.4 GHz:

    
    
      lpath-clang3.5-O2 8981 LANGUAGE C++ 763
      lpath-gcc4.7-O2 8981 LANGUAGE C++ 769
      lpath-gcc4.8-O2 8981 LANGUAGE C++ 746
      lpath-icpc14-O2 8981 LANGUAGE C++ 750
      lpath-icpc15-O2 8981 LANGUAGE C++ 735
      lpath-clang3.5-O3 8981 LANGUAGE C++ 734
      lpath-gcc4.7-O3 8981 LANGUAGE C++ 943
      lpath-gcc4.8-O3 8981 LANGUAGE C++ 946
      lpath-icpc14-O3 8981 LANGUAGE C++ 664
      lpath-icpc15-O3 8981 LANGUAGE C++ 655
    

The last column is the time reported by the program in milliseconds. What this
shows is that the same source compiled with Intel's icpc 14 or 15 -O3 is about
50% faster than the same source compiled with g++ 4.7 or 4.8 and -O3, and
about 20% faster against g++ -O2. The point isn't that Intel's compiler is so
much better, but that this degree of variation is normal for a benchmark like
this. There are times when Clang or GCC will come out ahead by the same
margin. The lesson is that you aren't benchmarking source code, you are
benchmarking a particular compiler with particular options running on a
particular processor.

The article is wonderfully specific about what was used, but one should be
very careful extrapolating to different combinations. In addition to the
compiler differences, note for example that although I did my tests on a
processor running less than 1.5x faster, I got runtimes that were almost 2-3x
faster. Most likely, this is because the Haswell processor I tested on is more
efficient than the several generation old Westmere that the author used.
There's nothing right or wrong about either choice, but the degree of
difference is why it's always important to specify.

I glanced briefly at the code with 'perf record -F10000', and my quick
conclusion was the the Intel version was running faster because it was making
better use of the branchless cmov's than the other compilers, and thus has
10,000,000 fewer branch prediction errors. At 20 cycles per miss, this
accounts for over half the difference between icpc and the others. The use of
the new fast variable shift instructions (shlx) and the once-again fast bit
test (bt) instruction is probably the rest of the difference.

The difference between g++ -O2 and -O3 seems to be that the -O3 version is
doing almost everything off the stack rather than in registers. It's bad
enough that this is probably a performance bug rather than intended behavior.

~~~
igouy
>>The article is wonderfully specific about what was used…<<

I'm missing something -- where does it say which version of JDK was used?

