
Java vs. Scala: Divided We Fail - mritun
http://shipilev.net/blog/2014/java-scala-divided-we-fail/
======
skybrian
The conclusion I take away from this is that good performance isn't
necessarily more useful than predictable performance. For many language
implementations, the performance characteristics are quite fragile, and you'd
better be benchmarking after every change or you won't notice a regression.

Even setting up a benchmark is tricky because the same binary can have
dramatically different performance depending on the environment. At the
machine level, caching effects from different processors can have dramatically
different effects on performance. But compilers and runtimes can easily make
things even more unpredictable, requiring things like warmup time to even hope
to get an idea of what's going on.

I think Go starts out a bit closer to achieving predictable performance; any
ahead-of-time compiler has an advantage here. Other than the use of garbage
collection, the whole language seems to be designed with predictable
performance in mind over ease of use, and you can avoid GC where it matters.
Also, recently they replaced segmented stacks with copying of contiguous
stacks, which both improves performance and makes it more predictable (while a
goroutine is running).

~~~
nostrademons
I remember reading a thread once about Kenton trying to optimize some protobuf
deserialization routines. After much profiling, the outcome was basically "I
can't believe the compiler is doing this. It means it is basically impossible
to predict the performance characteristics of source code." At this point, a
number of old compiler guys jumped in (we have a number of folks on the C++
standards committee at Google, and most of the team who wrote HotSpot), and
said "Yeah. The best way to make sure your program runs fast is to get it into
a compiler team's benchmark."

Chrome/V8 has this problem as well - if you talk to really skilled web
developers, they have a lot of performance "tricks" in their head, and the
pitfall is that this knowledge decays much more rapidly than people think it
decays, and so what was common knowledge in 2008 (or even 2013) is now no
longer true. One major problem we faced with Google Search and specifically
Instant was that it was optimized for the performance characteristics of the
browsers of 2008; pretty much none of those rules apply to modern Chrome and
Safari on mobile networks, and so performance is ridiculously bad on mobile.

Go achieves predictable performance only because it's relatively new. It's
true that the team has tried very hard to keep things simple and predictable,
but the problem is that the _hardware_ that Go runs on keeps changing as well.
As a language it was designed to take advantage of multicore chips that are
just coming into use now. What if in 5 years we're all using quantum
computers, or memristors, or flash memory, or the memory hierarchy no longer
applies? What if everything is peer-to-peer over mobile devices?

~~~
pcwalton
> It's true that the team has tried very hard to keep things simple and
> predictable, but the problem is that the hardware that Go runs on keeps
> changing as well. As a language it was designed to take advantage of
> multicore chips that are just coming into use now.

This has already happened; what many people don't realize is that we're
arguably past the multicore era. Desktop-class CPUs aren't adding more cores
as quickly as we had predicted, because people aren't using them (although
note that this is different in mobile). What they _are_ adding is wider and
wider SIMD units, and more and more SIMD instructions. To get maximum
performance on the CPUs of today, as well as those in the future, using SIMD
effectively is every bit as important, or more, as using multiple cores
effectively.

In my opinion, programming languages have been fairly slow at responding to
this.

~~~
seanmcdirmid
Have CPU SIMD units been very successful? Many cases where you can use SIMD,
you can also just go and use the GPU (the ultimate SIMD) instead and get a
much better speed up. Larabee (or whatever its called now) hasn't seen very
much use either.

~~~
pcwalton
Very much so. SIMD is part of the cost of admission for video codecs, for
example.

~~~
seanmcdirmid
Why aren't the video codecs GPU accelerated by default these days? Is there
something about decoding video that prevents it from mapping well to the GPU?

~~~
wtallis
Intel is only on their second generation of OpenCL-capable GPUs, so there are
still tons of machines out there that have no GPGPU capability and don't have
even a CPU-based OpenCL implementation installed (except on OS X). APIs giving
access to fixed-function transcoding engines on GPUs are also not as standard
and universal as they should be, and those fixed function engines also have
poorer output quality than the good software implementations.

------
mbell
I'm left with a feeling something else must be going on here in addition to
the 'fast mod' optimization. It looks like the optimization would only be used
on the first 2 or 4 calls to isEvenlyDistributed() which doesn't strike me as
enough to cut runtime in half, nor does it seem to make sense that you'd see
roughly the same performance difference for all values of lim (I'd expect the
performance to get closer as lim grows). Am I missing something?

~~~
Perseids
Maybe you underestimate how many numbers are weeded out just by checking
divisibility by 3, 4 and 5? The first check regarding 2 will always return
true, as `val` is incremented by 2. The check for divisibility by 3 will cause
an abort for two out of three calls. The check for divisibility by 4 will
cause an abort of half of the remaining calls (as `val` is incremented). And I
guess the check for 5 will weed out an additional segment of 4 out 5. Thus
just after checking 3, 4 and 5 only 1/3 * 1/2 * 1/5 of the `val`s will
continue down the recursion call. So the majority of calls terminate in the
highly optimized static division code.

Now you can still argue that the remaining calls that go deep into the
recursion could account for a lot of time. I don't believe this is the case
though, as in each recursion call you should terminate a constant fraction of
each calls, so the drop down should be exponential.

An other interesting observation of this phenomenon is that running the
recursion backwards (beginning with the large divisors) might greatly decrease
the runtime as a more significant fraction of the `val`s are falsified by
larger divisors in the first calls of the recursion.

~~~
mbell
I see what your saying and that was the type of thing I was getting at. If i
understood the article correctly, the claim was that it was faster because the
first 2-4 checks could be optimized by avoiding the idiv. But even if you
optimize those check to zero time, that is a small number of checks in
comparison the number of checks needed as lim increases (even at lim = 20,
only 10%-20% of the checks would be optimized). It doesn't follow that the run
time would be cut in half based on the article's conclusions alone.

------
chton
While it felt a bit chaotically written (maybe due to my lack of in-depth JVM
knowledge), a very interesting article.

Is there a way to detect when these cases happen? If there was, the compiler
could choose to ignore the @tailrec directive if it wouldn't give any
meaningful speed increase.

~~~
bad_user
The purpose of `@tailrec` is not necessarily for performance, but more for
correctness, otherwise you can end up with recursive functions that blow up
the stack pretty fast on large enough inputs. Plus this benchmark only does at
most 20 iterations per loop - which means nothing if you want to measure the
cost of method calls.

~~~
chton
hmm that's a very good point, I didn't consider that. We'd need additional
benchmarking to know what the actual impact is and at what point the @tailrec
version becomes more desirable.

------
_3u10
tl;dr: When trying to make code run faster, use a profiler rather than doing
things you think make the code faster.

------
integricho
I just love the ending sentence in the conclusion.

~~~
jimbishopp
Wish that was the first sentence; would have saved me some time.

~~~
octo_t
it is true for basically every programming language, from haskell to C++ to
python to ruby.

~~~
JimmyM
I think it was a joke, but I'm genuinely not sure.

------
michaelochurch
I'm going to comment on the political aspect and leave the technical JVM-
centric bit alone.

 _In this example, did we learn what is faster, Java or Scala? Nope. But we
learned a lot digging for explanations why the results are different, which
will hopefully result in having better cohesion between languages and the
underlying platform._

Performance is often FUD when it comes to languages. Clojure, Scala, Haskell,
and even Common Lisp are plenty fast enough for most purposes. Hell, Python
and Ruby are fast enough for most applications. Besides, runtime performance
has more to do with how the code is written than anything else. You can write
very fast C++, if you spend a lot tweaking it and hire experience and
extremely expensive ($300k/year and up) programmers, but typical C++ isn't any
faster than well-written code in other languages. (I've seen C++ projects fail
for performance reasons related to maintainability issues that arguably
wouldn't exist if Haskell were used.) However, invoking _performance_ is a
great (if unreliable, given who it brings on to the field) way to scare
decision-making business people (toddlers with guns) into taking your side on
an issue they know nothing about.

This is a place where it'd be better if programmers were a little more
politically savvy. (Bringing The Business into a technical dispute is not
politically savvy. It ruins everything, in the long run. Never invite
executives, also known as toddlers with guns, to anything. As many a Chinese
noble learned about opening The Wall and letting the Mongols in to fight one's
battles, it's impossible to get them out after it's done.) Let's say you have
a team of 5 programmers who want to write something in Python, which is (for
most purposes) fast enough. One of them stands up and says, "oh no, we can't
do this in _Python_ because _if we end up running this on 100,000 boxes_ it
will be too expensive, so we can only use C++" (premature optimization). If he
were more politically savvy, he'd build the thing in Python and _then_ , if
the software were to run on 100,000 boxes, rewrite performance-critical pieces
in C++, and justify a bonus for himself by pointing to the 20,000 CPUs that
were just deprovisioned. And a year later, a month before bonuses are
disbursed, he can rewrite another performance-critical component. This is good
for him (he actually gets recognized, instead of being that annoying guy who
bludgeoned a team into writing C++ and taking 4x as long to deliver an MVP)
and for the business (only performance-critical components, with price tags
large enough for him to care, get rewritten).

 _Oh, and if you think that part is constructive and boring, and you came here
for a language holy-war, I do think both Java and Scala suck as programming
languages, because they both allow me to write stupid programs with
performance problems._

I assume this is sarcasm, but I actually worry about the fact that Those of Us
Who Care About Languages are too divided over minutiae like Haskell's
syntactic whitespace and Clojure's parentheses, and that may be why The
Business comes in, mushroom stamps us and says, "I'm sick of your shit,
programmers. Everything has to be in Java."

(Actually, if you've watched _Orange is the New Black_ , you know that The
Business has taken to calling software engineers "inmate", but that's another
discussion.)

I like Haskell and I like Clojure for very different reasons, and they are
very different languages, but the day-to-day real-world differences between
them are small compared to the very real risk of The Business overhearing our
flamewar and saying, "fuck you guys, Java all the way, now lick my SCRUM or
it's minus-5 story points for you."

One of the things that worries me about Clojure is that, while I'd argue that
it's the best (for a definition of "best" that includes short-term business
viability; this enables me to exclude obscure niche languages that may be
better on paper, but that are just too numerous for me to know anything about)
dynamically-typed language-- a great language on its own right, but sitting on
top of the JVM and having access to those libraries-- there's been a mind-
share split between Clojure and Scala. And while Scala/Java and Clojure/Java
interop aren't bad, Clojure/Scala interop is a mess. On top of this, while
Odersky's brilliant, I think Scala has taken in a little too much of the Java
culture for it's own good. Scala's a fine language to write in, but large
Scala codebases are generally things that I'd rather not risk my sanity and
career by being anywhere near them. If Scala wanted to be "Haskell with Java
libraries" it would been a different and harder fight... but then again, it
might not have taken off at all without the "slightly better Java" crowd, so
maybe the way things happened was the only possibility.

The mind-share split between Clojure and Scala scares me, because it generates
a very real risk that the intellectual energy that I'd like to see benefitting
both languages, or at least consistently benefitting one of them, might fall
back down the tree onto Java. The real risk, to me, is "Clojure vs. Scala:
Divided We Fall". But I don't know exactly what to do about it.

~~~
YZF
> You can write very fast C++, if you spend a lot tweaking it and hire
> experience and extremely expensive ($300k/year and up) programmers, but
> typical C++ isn't any faster than well-written code in other languages.

I disagree and so does Andrei Alexandrescu:

"The going word at Facebook is that 'reasonably written C++ code just runs
fast,' which underscores the enormous effort spent at optimizing PHP and Java
code. Paradoxically, C++ code is more difficult to write than in other
languages, but efficient code is a lot easier [to write in C++ than in other
languages]." – Herb Sutter at //build/, quoting Andrei Alexandrescu

Here's the thing about performance. Sometimes you care and sometimes you don't
care. When you don't care- you don't care. You can write it in Python and even
though it runs 10000 times slower, you still don't care. When you do care a
factor of 10000 for Python vs. C++ or a factor of 3-5 (or more) of JVM
languages vs. C++ can mean running 5 million servers instead of 1 million
servers, or 10 frames per second vs. 60 frames per second, and the success or
failure of your business. This is why pretty much all the big players who care
about performance use C++ (from games to web at scale). The maintainability of
large C++ projects is pretty much field proven and C++ is also evolving and
while not quite fixing some of the causes for grief (because it maintains
backwards compatibility) it offers new ways of doing things that are safer,
more maintainable, and just as fast.

The only thing you said that I can slightly identify with is that you need
good people in order to build things in C++ (and no, they don't cost
$300k/year. I wish.) This is not a con, this is a pro. You want that
regardless of language and good people will be expensive. Yes, you can get
cheap people to write bad code in any language.

EDIT: Another data point. I worked on a huge Python project where performance
did turn out to be an issue and it was virtually impossible to find a
"critical" part to apply C++ to. It was just slow throughout, it was built
over a huge base of meta-programming and Python specific magic. There was
simply no single piece you could point to that if written in C++ would make it
go significantly faster. My point there is that while it's certainly possible
in a well designed system to mix languages while applying fast languages to
performance critical portions it's not always possible after the fact.
Language choice is an engineering decision and there's no single answer but
you have to be very careful with the attitude of just throwing something
together in the mistaken hope that it can be fixed later.

EDIT2: I could say more to defend C++'s "honour" but it doesn't need me as a
champion... The choice of programming languages though is important and we
need a way to eliminate some of the FUD. Part of that is through sharing real
world successes and failures. Naturally there is some cognitive dissonance
happening, that is if I chose language X, therefore I'm smart, therefore
language X is the best, therefore other people who chose language Y don't have
a clue. Where this turns from religion to data is when we can say share data
about the project N years later that is somehow comparable to other projects
and people can try to gather some insight from that data. The nice thing about
performance is that it has an objective component to it, that is if we look at
a certain problem we can get some numbers that we can compare. It's
quantifiable. Factors such as development time, maintainability etc. are less
quantifiable. Developer salaries, while quantifiable, are also hard to compare
but are definitely a factor in making language decisions.

~~~
pkolaczk
I think you're exaggerating performance differences. Reasonably written Python
is not 10000 times slower than reasonably written C++, nor reasonably written
Java is 3-5x slower. And while I agree that creating a super-optimized tight
loop of a program in C++, C or (better) assembly might be much easier to do
than in higher level JITted languages, but this does not scale to huge and
complex codebases. At some point you'll be struggling with overall project
complexity, and architecture / high level design can affect performance more
than those low-level bits. A higher level language will be more amenable to
refactoring and fixing high-level performance problems than a low-level
language like C or mid-level language like C++. So then, you might be much
better choosing a high level fast language like Haskell or Scala or even
C#/Java than C++.

~~~
YZF
Not every Python program is slower than C++ by a factor of 10000. Some are.
Some Python can be reasonably fast as long as all the heavy lifting is done in
native code, e.g. numpy or scipy. Let's say it would typically be more in the
x250 range for algorithmic code that uses native Python types. You get to the
10k range when you create your own types and do meta-programming and try to do
algorithms over that. YMMV.

The point I was making though was that if you don't care then you don't care.
I wrote some Python for a friend who wanted to scrape and process financial
information from various web sites. I'd be an idiot to do that in C++. None of
us cared how long it took to run (as long as it wasn't weeks). Having access
to various Python libraries made this task a breeze and maintainability wasn't
much of a concern either.

Andrei wouldn't have made that statement about C++ vs. Java if the difference
was in the noise. I think x3-5 is a reasonable rough number to put on it but
while I can point to benchmarks I can't point to a comprehensive study that
shows "reasonably" written across different domains. You're welcome to take
those numbers with a grain of salt and do your own investigation. Another data
point there is that C++ is the dominant language in Google Code Jam and
TopCoder SRMs where writing fast and correct code quickly is a competitive
advantage.

EDIT: I find a lot of people will underestimate the performance advantage of
native languages vs. JIT or interpreted environments. The x10000 is something
I've seen in a real world system. Another thing to consider is that in a
native language you can drop to assembler to optimize performance critical
sections. You have 100% absolute control over your hardware. Anyone know how
this: [https://code.google.com/p/h264j/](https://code.google.com/p/h264j/)
compares to the original implementation?

~~~
pkolaczk
"The point I was making though was that if you don't care then you don't
care."

I disagree with this. You always care, _to some degree_. If I write a one
time, 100-LOC script to process a 1GB of data, I don't care if it takes 1
second or 1 hour. But I would care if it took 1 week or 1 month. That's why
Java performance is good enough for 99% of programs I write, Python is also
good for many, but if I completely didn't care for performance and wrote
sloppy code in Python (or Java) I'd get into 10000x performance penalty region
and this would be unacceptable in almost all cases.

"I think x3-5 is a reasonable rough number to put on it but while I can point
to benchmarks I can't point to a comprehensive study that shows "reasonably"
written across different domains" YMMV. The micro-benchmarks in Great Language
shootout disagree with this. Most of them are within 2x range and the one
outstanding is actually a benchmark of particular regular expression engine.
Ok, we should not believe microbenchmarks, so what about real, optimized
applications? Compare performance of Netty vs nginx. Or Tomcat vs Apache. This
is a tie. Or Hypertable vs HBase (yeah, despite huge expectations and
marketing, even in Hypertable own benchmarks, Hbase comes only... 50% - 2x
slower). Or Jake vs Quake2 with Jake2 again not even 50% worse (actually
better in some cases).

"C++ is the dominant language in Google Code Jam" This only supports what I
already wrote - when writing a very small piece of code you have full control
over, like for a competition, it is much easier to achieve high performance
code in asm/C/C++ than in Java/C#/Scala etc. In a competitiona like that, even
a 20% overhead is not acceptable and I'd also use C or C++. But you can't
extrapolate that on large-scale programming, where benefits of using a high
level language matter much more.

"Another thing to consider is that in a native language you can drop to
assembler to optimize performance critical sections" I can do that in Java or
C# as well.

~~~
YZF
Sure, there's some transition region between "don't care" and "care" but I
find a lot of people tend to be strongly in one or the other (sometimes
wrongly).

Another benchmark: [https://days2011.scala-
lang.org/sites/days2011/files/ws3-1-H...](https://days2011.scala-
lang.org/sites/days2011/files/ws3-1-Hundt.pdf)

Keep in mind though that the JVM is a moving target. It keeps getting better
(on one hand) and on some platforms it's worse (e.g. Android, though the
upcoming new version looks promising). It's possible the gap is smaller now
than what I remember seeing in the past.

At any rate, if x2 is a number that feels right for the stuff you're doing I
can't argue with that. You need to choose the language that works for you.
Maybe you choose Java because of the libraries. Maybe you have more experience
writing in Java and you're a lot more productive. Maybe there is better
tooling. Maybe it's just more in line with how you think.

A web server does a lot of file I/O and a lot of network I/O. The performance
of a web server is more about how efficiently you can juggle those given
highly concurrent loads and what mechanisms are used at the native layer to
interface to those systems. It's not so much about the "raw" power of the
language. In your game engine example a lot of the heavy lifting is done in
OpenGL which is native code. I'm also not intimately familiar with the details
there. At some point it's also about how much effort went into optimizing
things and whether or not something else was traded off.

To contrast that, Google Code Jam tasks are typically algorithmic and they
stress the "raw" power aspect of the language. That said it's certainly not
representative of real-world product development.

~~~
nsajko
If you are thinking about the Android runtime, Dalvik and now ART, those
aren't JVM (I think they compile to a different bytecode). There probably is a
port of JVM but I didn't get the impression you were talking about that.

~~~
YZF
Dalvik does indeed use a different VM/bytecode. Your Java Android app is
compiled JVM bytecode which is translated into Dalvik. The point is/was that
the program you write in Java may run faster or slower depending on your
target platform and that if your target platform is Android you are taking an
additional penalty vs. e.g. using the NDK...

(EDIT: Yeah, you're right, the name JVM should only be used to refer to the
specific type of VM that runs specific bytecode and not to other VMs. As
Android shows Java does not _have_ to run on the JVM. Thanks for the
correction.)

"In 2012, academic benchmarks confirmed the factor of 3 between HotSpot and
Dalvik on the same Android board, also noting that Dalvik code was not smaller
than Hotspot" (from Wikipedia)

------
hellodevnull
>because they both allow me to write stupid programs with performance
problems.

Which language doesn't?

~~~
michaelochurch
He's being sarcastic (I think).

~~~
bad_user
I'm pretty sure he's being half-sarcastic.

~~~
michaelochurch
I wonder what he considers to be good languages. Is he a C++ diehard, or a
Haskell purist?

~~~
javadocmd
Perhaps he's saying _all_ current languages suck in that respect, and that the
future of programming is languages that don't let you write unoptimizable
code.

~~~
fleitz
I guess programmers won't be allowed to write optimized code anymore, as
optimized code is unoptimizable :)

