
Debunking the Java Performance Myth - emforce
https://medium.com/@elliot_f/debunking-the-java-performance-myth-29b842955a24
======
flavor8
I'm not sure that there's a myth to be debunked, is there? Java's known to be
high performance as a web application server, if good architectural choices
are made. The well referred to
[https://www.techempower.com/benchmarks/](https://www.techempower.com/benchmarks/)
reliably has some of the better Java frameworks ranking near the top along
with C and C++ servers.

~~~
mikeash
Yeah, the title confuses me greatly. I assumed this was going to be an article
about how Java is _slow_ , debunking the myth that it’s really fast.

~~~
whois
Really? You've never heard people joke about Java being slow? Half the
programmers I've met have this cult like hate of java and cite performance as
the main reason.

~~~
timthelion
Java _is_ slow, but not in this case. Tight loop java is fast. Anything that
touches memory is slow. Anything that involves loading jar files is slow.
Anything that involves starting the VM is slow. But tight loop java is fast,
indeed, being a JIT, potentially faster than C (better branch prediction given
that the JIT can optimize at runtime).

So the claim that Java is slow is true. That's why so many people hate it. The
fact that Java is fast is also true.

~~~
mohaine
To be more fair, java is mostly slow on startup which which is mostly due to:

1) Swing. I have no idea what this does on startup but it is bad. This is what
killed most java GUI apps. A simple AWT or SWT app will pop open while a
simple swing app will take a few seconds to get started

2) Jars on systems with virus scanners. Jars are just zip files. Many virus
scanners will unzip and scan a zip on every open which leads to each jar being
unziped at least 2 times and often more if the app has to load more things
from the jar at a later time.

3) Spring. Really spring is just a bad idea all the way through and most of
the reason people hate java is Spring. Huge XML configuration files instead of
code? Spring (mostly annotations now). Stupid names like
AbstractSingletonProxyFactoryBean? Spring. That aside, on startup Spring will
auto generate classes which really slows things down. This could have been
done during the last build but spring does it at each startup which is really
slow. Spring Boot has helped with startup times but it is still slower.

After startup Spring continues to kill performance. The underling code is full
of maps to control request lifecycle so instead of just calling a method it
has to ask "who all is configured to do something at step
PreParseRequestEncoding?" then again at ParseRequestEncoding,
PostParseRequestEncoding, PrePareRequestBody, ParseRequestBody.... Not real
names but the idea and number of steps is. Most of the time the answer is
"nobody" but it has to ask for every step that might ever need something done.

Add on top of this JPA which is super easy but quickly breaks down to a lot of
DB calls. JPA is great for people who don't want to learn SQL and simple CRUD
apps but anything with any amount of data will quickly slow down to thousands
of DB round trips.

~~~
jacques_chester
I like Spring, I find it makes Java development pleasant, plus I have unfair
access to the Spring team members by working for the same company. So I'll
bite.

> _Huge XML configuration files instead of code? Spring (mostly annotations
> now)._

Spring 3, yes. Spring 4 and 5, no. XML has not been the blessed configuration
approach for ~4 years.

> _Stupid names like AbstractSingletonProxyFactoryBean? Spring._

Java is a language that has evolved enormously in expressiveness since Spring
began, meaning heavyweight Go4 patterns were necessary early on to maintain a
flexible substrate with a uniform interface serving very many use cases?
Spring.

> _That aside, on startup Spring will auto generate classes which really slows
> things down._

I believed this urban legend too.

Dave Syer, who is understandably interested in Spring criticism, has done more
empirical investigation than anyone on this topic[0].

The tl;dr is that boot time is proportional to total classes loaded. That's
all, that's it.

In-memory reflection operations are hilariously, stupidly, insanely faster
than I/O. The JVM fetches and loads classes _individually_. So if you have a
lot of classes, it takes a longer time to load.

The other thing to bear in mind is that Spring relies on this much less than
it used to. As the language and JVM have evolved, so has Spring.

> _The underling code is full of maps to control request lifecycle ... Not
> real names but the idea and number of steps is._

Are you talking about servlet filters? Because any web framework is going to
have a few of these. Spring MVC adds a handful and all of them can be removed,
replaced or added to. Spring Security adds a bunch and all of them can be
removed, replaced or added to. But the defaults are chosen because of
feedback, not just for the heck of it.

> _Add on top of this JPA which is super easy but quickly breaks down to a lot
> of DB calls. JPA is great for people who don 't want to learn SQL and simple
> CRUD apps but anything with any amount of data will quickly slow down to
> thousands of DB round trips._

In general, yes, I agree that JPA is a PITA. For a small to medium system I
would try to avoid it where possible. For a sufficiently large codebase --
thousands of domain objects, dozens or hundreds of programmers -- it might be
a necessary evil.

Mind you, if you want to see a world where the database goes from abused to
outright mocked and ignored, pay a visit to Rails land. It drives me batty.

[0] [https://github.com/dsyer/spring-boot-startup-
bench](https://github.com/dsyer/spring-boot-startup-bench), particularly
[https://github.com/dsyer/spring-boot-startup-
bench/tree/mast...](https://github.com/dsyer/spring-boot-startup-
bench/tree/master/static)

~~~
mohaine
I will admit that spring has gotten a lot better recently, but it is still a
mess (IMHO of course) and this "benchmark" seems to agree.

The biggest issue I have with spring isn't performance but is the huge
internal state that is the basis of Spring's DI.

After a few versions you end up with code that depends on bean named 'Abd" to
preform a task, but in the next version is renamed to "Abc" which fixes the
typo, but makes all the documentation off. And the 3 other classes that also
needed that task still look for it under the name "Abd". And then the next
major version replaces this entire module and the bean name is now "Xyz". To
me this IS spring programming. Googling the DI names and trying them one by
until you get the desired run time functionality.

I've had apps in production with unused beans defined (It only had 1 function
that only throw an exception and was never called) but had to be defined or
something would break. Multiple team members wasted some free time trying to
locate the code that was requiring that bean but nobody could ever figure it
out. At some point everybody gave up, it was easier to just let it be.

Most developers have little to no idea what is actually going on inside their
spring apps (or even what half their dependencies are). When a struts like
vulnerability comes along in the spring world it is going to be a massive PITA
to fix. With how complex this all is I have little doubt that a vulnerability
exist somewhere in there.

~~~
jacques_chester
The Struts vulnerability was less about _Struts_ and more about the universal
difficulty of dependency management.

Which Spring Boot ameliorates by providing starter POMs. Curated, levelled,
updated collections of dependencies for common cases. No need to play whack-a-
dep with Maven or Gradle. No need to track 50 different dependencies yourself.

The thing is: I don't care how Spring does the magic. I care that I don't
_have_ to care.

I came to Spring and Java-for-real development relatively late -- by fluke I
was on what is almost certainly the first Boot production app ever deployed,
back in early 2014.

Later I got a chance to see the primordial world of Spring 3. I understand the
residual hate.

------
traspler
Spring Boot really comes away awful in this article (and the one he links in
the article). I have mostly worked on projects using Dropwizard and Spring
Boot and have come to like some of the easy to use abstractions but I have
only scratched the surface of these frameworks, I assume, and most of it I
will probably never use. Most of the time it did not matter that much to give
the Server a GB more memory but recently we had some real issues with
CloudFoundry. To me it seems that it's not possible to somehow magically tweak
Spring Boot to behave better. So I came to wonder, what other framework would
strike a better balance? Maybe not offer the crazy specialised stuff Spring
offers or maybe needing some more work to get something to run but then offer
vastly better memory and speed charactersitics?

I guess one way would be to implement the things I actually use myself with
less abstractions but that honsetly feels very daunting to me. The Java
ecosystem is so vast and between Handling the Requests, DB access (with or
without ORM), persistance, caching, security, I really have no clue on where
to even start doing something like this myself. If someone more seasoned here
has some input, I would be highly interested!

~~~
RhodesianHunter
You mentioned Dropwizard early on, in my experience I have found it to be the
polar opposite of Spring.

Where Spring is an "everything but the kitchen sink" framework, Dropwizard is
just a collection of best-in-class libraries that are easily swapped out if
you prefer something else.

I don't get to do so often professionally, but if given my choice of tools ill
generally opt for a combination of Dropwizard in Kotlin.

~~~
traspler
One big drawback of Dropwizard (at least for us) was that it was very
problematic to deploy it as a war in a tomcat environment. There is wizard-in-
a-box but there were a lot of issues with that as well.

~~~
cs02rm0
If you're deploying a dropwizard app as a war in tomcat you're doing something
a bit odd.

~~~
traspler
True but sadly project specs sometimes change in odd ways.

------
barrkel
The primary reasons for choosing a particular language for your application
server usually come down to developer ergonomics, developer availability and
library availability / integration for your specific domain. Performance
rarely comes into it; performance while scaling into astronomical request
numbers is important for top web sites, but most app servers don't see that
kind of load.

Even if some requests needs crunchy CPU power, it's better to offload them
into something asynchronous than try really hard to make them perform well
enough to be synchronous. Typically, hefty CPU jobs come in varying sizes,
they're rarely guaranteed to be runnable under the 200ms or so latency bar for
synchronous requests.

~~~
tomelders
I find that laughably generous. In my experience, Java gets a pass because
it's "enterprise", whatever the hell that means. I've never heared anyone make
a compelling case for Java on technical merit. It's always "we're a Java shop,
because we're enterprise".

~~~
ssijak
You say there is no technical merit to use Java ecosystem over something else?

~~~
tomelders
Nope. Thats not what I said.

------
JepZ
Well, I have not taken a look at the implementation yet, but java is certainly
known for delivering a decent performance even if the programmer is a little
careless regarding perfromance.

In Go on the other hand, doing simple 'mistakes' like declaring a variable
inside a loop can bring you to performance hell:

Declaration in a loop:

    
    
            var sum int
            for i := 0; i < n; i++ {
                    x := i + i
                    sum = x - i
            }
    

Declaration outside the loop:

    
    
            var sum int
            var x int
            for i := 0; i < n; i++ {
                    x = i + i
                    sum = x - i
            }
    

Benchmark results:

    
    
      BenchmarkInLoop-8       2000000000               0.40 ns/op
      BenchmarkOutLoop-8      2000000000               0.47 ns/op
    

And the same is certainly true for C/C++.

Src:
[https://nopaste.xyz/?68c6800acd9200d6#mDstI36uBPBU4Td8k//GNC...](https://nopaste.xyz/?68c6800acd9200d6#mDstI36uBPBU4Td8k//GNC+P7mmbB1ldMw3DGMuxeQo=)

~~~
helper
What go compiler are you using? On my machine those run in exactly the same
amount of time (go version go1.9 linux/amd64 ).

    
    
      $ ./tmp.test -test.bench .
      goos: linux
      goarch: amd64
      BenchmarkInLoop-4       2000000000               0.46 ns/op
      BenchmarkOutLoop-4      2000000000               0.46 ns/op
      PASS
    
    
      (pprof) disasm fast
      Total: 1.95s
      ROUTINE ======================== _/tmp.fast
           970ms      970ms (flat, cum) 49.74% of Total
               .          .     4ed7c0: MOVQ 0x8(SP), AX                             ;main_test.go:23
               .          .     4ed7c5: XORL CX, CX
               .          .     4ed7c7: MOVQ CX, DX
               .          .     4ed7ca: JMP 0x4ed7d6                             ;main_test.go:26
           660ms      660ms     4ed7cc: LEAQ 0x1(CX), BX                             ;_/tmp.fast main_test.go:26
           110ms      110ms     4ed7d0: MOVQ CX, DX
           170ms      170ms     4ed7d3: MOVQ BX, CX
            30ms       30ms     4ed7d6: CMPQ AX, CX
               .          .     4ed7d9: JL 0x4ed7cc                               ;main_test.go:26
               .          .     4ed7db: MOVQ DX, 0x10(SP)                           ;main_test.go:30
      (pprof) disasm slow
      Total: 1.95s
      ROUTINE ======================== _/tmp.slow
           960ms      960ms (flat, cum) 49.23% of Total
               .          .     4ed790: MOVQ 0x8(SP), AX                             ;main_test.go:14
               .          .     4ed795: XORL CX, CX
               .          .     4ed797: MOVQ CX, DX
               .          .     4ed79a: JMP 0x4ed7a6                             ;main_test.go:16
           690ms      690ms     4ed79c: LEAQ 0x1(CX), BX                             ;_/tmp.slow main_test.go:16
           100ms      100ms     4ed7a0: MOVQ CX, DX
           100ms      100ms     4ed7a3: MOVQ BX, CX
            70ms       70ms     4ed7a6: CMPQ AX, CX
               .          .     4ed7a9: JL 0x4ed79c                               ;main_test.go:16
               .          .     4ed7ab: MOVQ DX, 0x10(SP)                           ;main_test.go:20

~~~
JepZ
Well, the assembler looks the same to me too, but I can confirm different
runtimes on two different machines (fast is always faster here).

    
    
      $ ./main.test -test.bench . -test.cpuprofile=test.profile
      goos: linux
      goarch: amd64
      BenchmarkInLoop-8       2000000000               0.42 ns/op
      BenchmarkOutLoop-8      2000000000               0.47 ns/op
      PASS
    
      (pprof) disasm fast 
      Total: 1.86s
      ROUTINE ======================== command-line-arguments.fast
           990ms      990ms (flat, cum) 53.23% of Total
               .          .     4ed7b0: MOVQ 0x8(SP), AX                             ;mem_test.go:16
               .          .     4ed7b5: XORL CX, CX
               .          .     4ed7b7: MOVQ CX, DX
               .          .     4ed7ba: JMP 0x4ed7c6                             ;mem_test.go:19
           650ms      650ms     4ed7bc: LEAQ 0x1(CX), BX                             ;command-line-arguments.fast mem_test.go:19
               .          .     4ed7c0: MOVQ CX, DX                               ;mem_test.go:19
            10ms       10ms     4ed7c3: MOVQ BX, CX                               ;command-line-arguments.fast mem_test.go:19
           330ms      330ms     4ed7c6: CMPQ AX, CX
               .          .     4ed7c9: JL 0x4ed7bc                               ;mem_test.go:19
               .          .     4ed7cb: MOVQ DX, 0x10(SP)                           ;mem_test.go:23
      (pprof) disasm slow
      Total: 1.86s
      ROUTINE ======================== command-line-arguments.slow
           870ms      870ms (flat, cum) 46.77% of Total
               .          .     4ed780: MOVQ 0x8(SP), AX                             ;mem_test.go:7
               .          .     4ed785: XORL CX, CX
               .          .     4ed787: MOVQ CX, DX
               .          .     4ed78a: JMP 0x4ed796                             ;mem_test.go:9
           560ms      560ms     4ed78c: LEAQ 0x1(CX), BX                             ;command-line-arguments.slow mem_test.go:9
               .          .     4ed790: MOVQ CX, DX                               ;mem_test.go:9
            20ms       20ms     4ed793: MOVQ BX, CX                               ;command-line-arguments.slow mem_test.go:9
           290ms      290ms     4ed796: CMPQ AX, CX
               .          .     4ed799: JL 0x4ed78c                               ;mem_test.go:9
               .          .     4ed79b: MOVQ DX, 0x10(SP)                           ;mem_test.go:13

------
sempron64
I think an important element in comparing Go to Java for webserver apps is
language design. While Java 8 and Go's performance may be similar in small
single-threaded request handler benchmarks, Java's baseline design encourages
the creation of lots of expensive runtime abstractions, and makes parallelism
difficult. Go on the other hand encourages a minimum of abstractions and makes
parallelism trivial. That means that the marginal cost of business logic is
much higher in Java, in both response times (due to unnecessary serialization)
and overall computation. Some of this can be averted by careful engineering,
but Java definitely isn't a "pit of success" in this regard - the easiest
thing to do is usually the wrong thing.

~~~
twic
> Java's baseline design encourages the creation of lots of expensive runtime
> abstractions

Abstractions yes, expensive, perhaps not. One of the nice things about Java is
that the JIT can collapse a lot of abstraction at runtime. Not all of it, but
a lot of it. I am skeptical about the idea that Java naturally guides
programmers towards expensive abstractions.

> and makes parallelism difficult

Presumably, you mean something like "makes lightweight concurrency difficult,
because all the backend APIs for database access etc are blocking", which is
unfortunately true.

~~~
sidlls
I think perhaps it's more of a "pure OO" approach than the language itself.
I've seen plenty of Java _and_ C++ code where there is one interface class for
every concrete class, even when just one concrete class is ever implemented,
for example. I've also seen cases where some "factory" pattern is force-fit
into the architecture, and the objects it creates are kinda sorta related, but
ought not be in code. It gets exponentially worse as more interfaces and more
levels in a hierarchy are allowed.

The compiler isn't always going to be able to collapse or elide these.

------
falcolas
The JVM has indeed matured to the point where it's on par with other VM style
languages, and better than interpreted languages. This wasn't always the case:
early in Java's lifetime it was a slow, memory intensive hog. That stigma has
stuck.

That said, it's still pretty heavyweight (both the language and the runtime).

It also wouldn't be my first choice for compute intensive tasks (for example,
a non-toy ray tracer). It would do the job eventually, but at a cost. Of
course, I wouldn't pick Go or Python either.

~~~
agumonkey
Reminds me people blaming clojure slow boot on the JVM when it was in fact
clojure tooling that took time.

Things aren't straightforward, we need to go deep before concluding.
Profilers, profilers, profilers.

------
ssijak
One more dumb test not even giving the code and test environment fully. Btw,
this tests are useless. Add some real usecase, a database, some number
crunching, jobs execution and see python fail even more for example.

------
boggio
Flask is synchronous and it shouldn't be labeleld "Python 2.7 or Python 3.6".
Comparing that to async frameworks is not quite fair. (Also is SpringBoot
async?)

~~~
ssijak
Spring Boot can be both reactive or ordinary thread pool based to serve
requests or anything from the stack really. It is very very powerful framework
and comparing something barebones to Boot is really dumb. Not to mention that
author intentionally made java look like a monster when he could use some
simple tweaks to reduce jvm memory usage by much and he hid what dependencies
really imported. And also if he did not like Boot he could just fire barebons
Jetty up and use 20mb heap tops.

------
emforce
Hi All, this has received an incredible amount of views and comments since I
posted this and left the house!

I appreciate the feedback comments! I'm very much still playing around with
different technologies and writing about them as I go, this is more a learning
experience for me that I've documented and I would take what I'm saying with a
pinch of salt.

I thought I'd also clarify that I'm using the code/docker images that I
created in the previous article as the base from which I'm running my tests as
that seems to have gotten missed by a few peeps!

~~~
jacques_chester
I definitely missed it. Maybe adding a link to this one as well would be
helpful.

Edit: I see the code using the `com.sun.net.httpserver` package, but not the
Spring Boot version.

------
Timothycquinn
Would be interested in seeing Elixir/Phoenix compared here. I've seen some
amazing scalability figures from those folks:
[http://phoenixframework.org/blog/the-road-to-2-million-
webso...](http://phoenixframework.org/blog/the-road-to-2-million-websocket-
connections)

For posterity sake, throw in Ruby on Rails as well.

~~~
brightball
Elixir is more focused on concurrent performance than raw performance. It’s
still very fast but favors consistency above all else. I remember seeing a
great benchmark that compared Elixir to Python and Go where they charted the
variability of response times. Python and Go were very spikey with Go
producing the fastest overall time. Elixir was extremely steady. Makes it an
ideal choice for real time ops, holding millions of web sockets, ensuring a
heavy process doesn’t affect the experience of everything else that’s
executing.

You’ll also get natural clustering ability and a level of fault tolerance
that’s almost silly.

I’ve been trying to find the time to demonstrate this with the tech empower
benchmarks honestly. I want to see what will happen if you run all of them at
the same time for a particular platform.

~~~
rlander
It baffles me how people give Elixir the credit for this. Everything you
mentioned here have nothing to do with Elixir, but with how Erlang and its
preemptive scheduler were optimized for soft real-time apps.

~~~
brightball
BEAM gets the credit but the parent mentioned Elixir so that’s how I
responded. It’s just quicker than writing Elixir/Erlang/BEAM/OTP everytime.

Wasn’t trying to misappropriate the credit.

------
chvid
Did you publish the source code for the test servers (in the various
languages)?

I am curious to what a "normal" Java microservice look like.

~~~
coldtea
Not very different than a "normal" Node microservice.

It might even take fewer lines.

------
pipio21
The myth that java is slow is not a myth it is true for medium to big programs
with lots of interconnections, because with the abstraction you loose control
on the program behavior.

You gain in other things, like abstraction, but the penalty is enormous. We
use very high level languages in house, but when we do it performance is not
the priority.

Doing "Hello worlds" is not a valid debunking of anything because , guess
what? printing a Hello world in a screen is a very simple operation.

We have millions of lines of code and we have done tests on just converting
20-30Ks of our c,c++ code to java and the result has been disaster: Hundreds
or thousands of times slower. That's right.

In my opinion if other companies are deluded wanting to be lazy, much better
for us as competitors, but children should not be intoxicated with bad advice.

My advice is do not believe me or anyone else, when in doubt just test on your
own.

~~~
sk5t
Like with any language, the level of abstraction is up to you... one could
just as easily "lose control" with C++.

Hundreds or thousands of times slower Java? Have you considered that the
conversion was done with a weak mastery of the language or runtime?

The HFT folks working in Java would be taken aback to learn the language
doesn't work.

------
Retric
1,000 requests per second for a hello world micro service is horribly slow. I
don't know what architecture they are using bit ~100,000 / CPU for simple
requests is completely reasonable making this test very dubious IMO.

~~~
tyingq
Some of the stacks he tested had significant failure rates at 1000/sec though.

Bumping it up would have made the comparison less interesting.

Perhaps it was just a small VPS?

------
ajnin
How does an "Hello world" request on localhost take an average of 50~100ms
using any framework ? 1ms would already be too much. Something seems way off
with this benchmark.

------
rsynnott
One issue here is that it’d really be more appropriate to run the test twice
and take the second result. This should get most JITing out of the way.

It’s hard to be sure what’s happening given the lack of source code, but 1000
requests per second where the request doesn’t do much is a fairly minor load
for a well-written java service on modest hardware. I would expect the
‘lightweight’ java example to do better, assuming it’s just jetty or
something.

------
msl09
Pretty good article but I wish we had the source code so we could compare code
complexity also.

------
luord
Damn, I will have to take a look at aiohttp. That's impressive, I'm glad that
python, my favorite language, can have that kind of performance.

