
JRE 8 needs more codecache than before - jlward4th
http://engineering.indeedblog.com/blog/2016/09/job-search-web-app-java-8-migration/
======
nn3
Typical problem of over tuning. Not every tunable needs to be tweaked (or
worse cut'n'pasted somewhere from the internet). Your street cred as a admin
does not depend on how many settings you can change. If they had just kept
using the defaults everything would have been fine.

That said 250MB is really a lot of code. Their binaries must be gigantic. This
by itself likely already causes performance problems because all caches will
be thrashing.

~~~
xxs
It's all about the tiered compilation. Tons of unused but eagerly compiled
code w/ C1 (the dumb and fast compiler) that never reached the proper C2.

The issue has been three since 6.0.23 when the -XX:+TieredCompilation
defaulted to true. (hmm, or was it 6.0.25?)

I'd say nothing really new. If you run a server, just disable tired
compilation altogether. It might cost few seconds slower startup but that's
it.

~~~
billsmithaustin
I believe TieredCompilation was disabled by default on Java 7u55:

    
    
        $ java -server -XX:+UnlockDiagnosticVMOptions -XX:+PrintFlagsFinal -version|grep TieredCompilation
             bool TieredCompilation                         = false           {pd product}        
        java version "1.7.0_55"
        Java(TM) SE Runtime Environment (build 1.7.0_55-b13)
        Java HotSpot(TM) 64-Bit Server VM (build 24.55-b03, mixed mode)
    

I would be interested in hearing from people with first-hand experience
disabling TieredCompilation on Java 8 on production servers.

------
otterley
To the author: It would be useful to put a little blurb paragraph at the top
to say that if you haven't overridden the default codecache size via JVM
options, no action is necessary. This information is buried way down at the
bottom of the page, which may lead hurried readers to assume that something
may already be wrong with their Java 8 JVMs.

~~~
billsmithaustin
Thanks, I'll see what I can do to clarify that point.

------
EdSharkey
Would a good approach to major JVM upgrades in production be to remove all but
critical JVM flags in the new version, and run it on a subset of nodes, then?

After getting a baseline measurement of performance with JVM defaults,
experiment with tuning a couple of settings, measure some more, make sure
nothing breaks under load. Repeat until new JVM version seems stable/performs
better/etc even under worst load and then upgrade all nodes to the latest and
greatest?

~~~
user5994461
Yes. The good approach when you're given a legacy java application with tons
of flags is to removed ALL the optimizations flags for the heap and the GC.

Most of the settings found in legacy projects or on the internet are either
obsolete or defaults in the latest version of the JVM.

The only mandatory heap flag is: "-Xms???M -Xmx???M" which set the heap size
of the application to ??? megabytes.

~~~
benzewdu
Not even -Xms -Xmx, there's heuristics built in to appropriately size the gc
spaces...
[http://hg.openjdk.java.net/jdk8/jdk8/hotspot/file/tip/src/sh...](http://hg.openjdk.java.net/jdk8/jdk8/hotspot/file/tip/src/share/vm/memory/collectorPolicy.cpp)

~~~
user5994461
Right. Most applications don't need that.

It's only the server apps running 24/7 on fixed hardware that should have a
defined memory usage.

------
protomyth
I wonder how many projects have production issues because they are not able to
hit expected production volume in a QA environment? This is certainly the
cause of a lot of grief I've experienced.

~~~
lucozade
For us, it's not production volumes that are the main issue but production
dynamics i.e. how the volumes change over time. It's such a big issue that
standard practice is to run QA against live production data by default and
manage the consequences.

The downside is, when we do need to use test data e.g. when there's a large
change to the structure of the upstream or we want to do stress scenarios,
it's a pain in the backside.

~~~
cle
We struggle with this too. There are so many ways the dynamics can change--
daily/weekly/seasonal cycles, robots, client caching, etc. One best practice
we use for high-volume services is to minimize the variance in call mixtures--
if there are two calls with vastly different call patterns, it's probably
worth it to split them into separate services, so you can tune throttling, GC,
load balancing, etc. specifically for those calls, instead of having to tune
them to support both calls (which is often difficult or impossible to do). Of
course, it's hard to predict how your service will evolve over time, so making
the split is often painful for you and your clients. Some of our services
can't be handled by a single load balancer, so we use DNS round robins, which
have a whole other class of problems when you have mixed call patterns. Gotta
earn your pay...

Some other techniques we use are one-box deployments that receive a proportion
of production traffic and "bake" new changes before deploying to the whole
fleet, and shadow fleets which let you tune and test against live traffic.
We've found that simply replaying production traffic at higher volumes
sometimes isn't sufficient, because our calls don't necessarily scale that way
(some of them scale with upstream traffic, some of them scale by downstream
fleet sizes due to client caching).

------
Animats
How much Java source code does it take to need 256MB of codecache? The author
says they're using a service architecture where each transaction uses about 20
services. There's no indication of why their program is so huge.

~~~
billsmithaustin
Author here. We know 128MB codecache was not enough and 256MB was sufficient.
I think we could have gotten by with less, e.g. 200MB, but we stopped
experimenting once we found a value that worked.

We don't have a complete understanding of why this app uses so much more
codecache than other apps we've switched to Java 8. Now that we expose the
codecache size in Datadog, I may try plotting code size vs codecache size for
a variety of our apps.

~~~
remh
Datadog employee here, you don't need to run your own collector to collect
codecache usage. As it's exposed through JMX, the JMX collector can collect it
[http://docs.datadoghq.com/integrations/java/](http://docs.datadoghq.com/integrations/java/)
. It doesn't by default though, so you'll need to configure it to collect the
metrics from the "java.lang:name=Code Cache,type=MemoryPool" bean. We should
add that in our default configuration.

~~~
uxcn
If it's exposed through JMX, would it also be visible via something like
_jvisualvm_?

~~~
remh
Not sure about jvisualvm, but it will definitely be visible in jconsole

~~~
uxcn
_jvisualvm_ is largely a superset of _jconsole_ , with a plugin interface, so
it probably is.

I have no idea how many other people actually use it though.

------
osi
so the unspoken thing seems to be that they kept the same JVM arguments as
java 7 and that caused problems in java 8? and they had a JVM argument that
was setting a value to the same as the default?

~~~
billsmithaustin
We knew to adjust other JVM settings (e.g. PermGen replaced by Metaspace) but
we overlooked that (1) we used a non-default max codecache size and (2) the
default for that setting was 3x higher for Java 8.

An unfortunate thing about using non-default JVM settings is that you need to
scrutinize them every time you switch Java versions. If I ever go through this
again, I will know to pay more attention to every JVM setting we override.

~~~
osi
ah, so you had tuned JRE7 to use 64mb code cache (more than default), when
then became far less than the new default.

------
__-X-__
-XX:-TieredCompilation is the magic option to disable tiered compilation.

Beginning with Java 8, instead of having the VM to magically choose between
using the client JIT (c1: think V8) or the server JIT (c2: think gcc -O2), the
default configuration is to run in so called "tiered mode", first starts with
the interpreter (as usual) then c1 (here keep the profile info) then c2.
Because the code is compiled twice, you need a twice bigger code cache.

From my own experience, tiered compilation is nice when you run something
interactive like an IDE (IntelliJ IDEA) and useless when you run a server app.
That's said i've never had to have a 250MB code cache.

------
bobbyi_settv
If the JVM has one bytecode compiler that it uses only during startup and
another that it used the rest of the time, shouldn't it dump the bytecode
generated by the first compiler from the cache at the point when it switches
to the second compiler?

~~~
jdmichal
The article's wording is a bit misleading here. The first-tier and second-tier
compilers are always available throughout the entire lifetime of the program.
The first-tier compiler performs very few optimizations, but compiles very
quickly. It is a simple and literal compiler -- most direct translation of
byte code to machine code. It's meant to be a quick win to get out of
interpreted mode, which is vastly slower. The second-tier compiler is slower,
but also takes advantage of the data collected during runtime profiling to
make optimizations. Basically, there's a trade-off between the time it takes
to run the second-tier compiler and the time it will save. By comparison, the
first-tier compiler is almost always a win over interpreted code, even if the
code is only run a small number of times.

Now, sometimes optimizations can be "wrong", in the sense that something new
has happened that invalidates an assumption used during second-tier
compilation. Here's a real-world example: I have an interface and only one
loaded class that implements that interface. The second-tier compiler will use
this information to basically do this:

    
    
        function doStuff(MyInterface a) {
            if (a.getClass() != MyClass.class)
                deoptimize();
    
            // Do everything from here on assuming
            // that a is an instance of MyClass.
            // This includes inlining simple getter
            // methods to pure field accesses, etc.
        }
    

Now, when a second class that implements `MyInterface` is loaded and makes it
to that function, it will hit the `deoptimize` branch and go back to first-
tier or even interpreted mode. Eventually the function will be recompiled by
both tiers with the new assumption -- two implementing classes.

So, in the case of deoptimization, it can be a win to keep the first-tier code
to fall-back to.

~~~
__-X-__
yes that's the idea ! but your example is wrong because c1 never compile a
branch that was not executed before.

------
manishsharan
Yikes ! Can somebody with deep AWS knowledge comment on how this impacts aws
lambda (java) and AWS Elastic beanstalk applications? what configuration is
recommended?

~~~
billsmithaustin
To be clear, the Java 8 default is much higher than Java 7's. If you stick
with the default, you will probably be fine.

