
From Java code to Java heap (2012) - maastaar
http://www.ibm.com/developerworks/java/library/j-codetoheap/index.html
======
dr_rust
Can someone add 2012 in the title, Most of the OpenJDK collections have been
rewritten during the Java 8 timeframe, so the values are out of date :(

~~~
geodel
I think the object layout is not changed. I just ran jol tool for HashMap and
size looks similar to what is mentioned in the article.

~~~
jcdavis
String is one that has changed pretty significantly - count and offset have
been removed, only fields are hash and value now.

This means String.substring is no longer an O(1) method, but it saves 8 bytes
per string which is huge, and also avoids some of the wierd corner case
GC/memory issues that substring/StringBuilder caused by hanging on to the
reference

~~~
geodel
Right. Java 9 will move from char[] to byte[] for String.value which will lead
to further savings.

------
dmichulke
Most useful info:

If you run the 64 bit JRE and have memory problems, use the flags

\- Xcompressedrefs

\- XX:+UseCompressedOops

for a decent 35-45% reduction in memory use.

The rest is well-known stuff (use primitives, use arrays with fixed size, ...)

Still, the numbers are quite interesting (and the overhead quite scary)

~~~
the8472
On hotspot ergonomics automatically turn those options on as long as your max
heap is < 32GiB.

    
    
        $ java -Xmx31G -XX:+PrintFlagsFinal 2>&1 | grep UseCompressed
             bool UseCompressedClassPointers               := true                                {lp64_product}
             bool UseCompressedOops                        := true                                {lp64_product}

~~~
nalllar
They also can't be enabled with >32GiB heaps.

~~~
bogomipz
I think it depends on the JVM. You can use compressed refs on JRokit with >
32G heaps. Its a bit of "bit twiddling." This article explains it pretty well:

[https://blogs.oracle.com/jrockit/entry/understanding_compres...](https://blogs.oracle.com/jrockit/entry/understanding_compressed_refer)

------
needusername
It has to be noted that on HotSpot the object header is only two words not
three words like on J9. Basically Flags and Locks fit into one word on HotSpot
whereas they use two words on J9.

~~~
jcdavis
Only on 32 bit IIRC, headers are 12 byte on 64 bit with CompressedOops (the
default for heaps <32gb) and 16 bytes without. The bit layout of the markoop
is here:

[http://hg.openjdk.java.net/jdk8/jdk8/hotspot/file/87ee5ee275...](http://hg.openjdk.java.net/jdk8/jdk8/hotspot/file/87ee5ee27509/src/share/vm/oops/markOop.hpp)

~~~
needusername
You are correct.

------
PDoyle
If you're interested in memory layouts in Java, I wrote a blog post recently
discussing a way to make super-tight data structures in Java:

[https://engineering.vena.io/2016/05/09/transpose-
tree/](https://engineering.vena.io/2016/05/09/transpose-tree/)

------
whack
As someone who used to be a hardware engineer, I found Figure 1 in the first
section surprising. All modern OS run processes on independent virtual memory
spaces, in order to ensure that processes don't collide with one another, or
with the OS itself. But if figure 1 is to be believed, the kernel shares the
same address space as the process. Is this a mistake on the part of the
writer?

[1]
[http://www.cs.utexas.edu/users/witchel/372/lectures/15.Virtu...](http://www.cs.utexas.edu/users/witchel/372/lectures/15.VirtualMemory.pdf)

[2]
[https://en.wikipedia.org/wiki/Virtual_address_space](https://en.wikipedia.org/wiki/Virtual_address_space)

~~~
ww520
It's pretty common to map the kernel memory address to the same fixed range in
the virtual memory address space of every process. Typically the kernel
portion resides on the higher range of the memory space. For the 32-bit 2/2
split setup, 0GB-2GB is reserved for user mode and 2GB-4GB for kernel. With
1/3 split, 0GB-3GB for user and 3GB-4GB for kernel.

This makes it easy to work on memory shared between user mode and kernel mode
code since it's the same address space. Buffer passed from user mode to kernel
mode is just a matter of passing down the virtual memory address pointer, no
need to copy. The kernel code accessing it just accesses the lower range of
the virtual memory space.

The kernel code mapped to the same fixed range in every process also makes it
easy to call kernel routine from user mode. SysCall just elevates privilege to
be in kernel mode and jumps to the kernel routine at the exact same address in
every process. You can think of the kernel as a special library got "linked"
into every process at the exact same location, along with all its data.

Although user mode and kernel mode are in the same memory address space, user
mode cannot access kernel mode memory. Memory pages are protected with flags,
like R/W (Read/Write), U/S (User/Supervisor). A S-marked page cannot be
accessed by user mode code. Protection between kernel and user mode is still
in place.

------
gmarx
In my work I rarely find this sort of thing relevant. It's much more important
to make sure you don't have object references lying around. I mostly do server
stuff now. Maybe this kind of optimization is still important for devices?

~~~
breischl
As with everything performance related, it depends. I do server-side work as
well, but I've seen cases where it matters.

eg, somebody got overly "OO-happy" with a response object and managed to take
something that should've been one object with 8 fields and instead made a
graph of 8 object with a total of 12 fields (4 of them repeated), wasting ~400
bytes each IIRC. When you're creating one of those for each request and
handling 200k requests/sec it adds up to a lot of memory. That means a lot of
time spent in the GC, which means a lot of GC pauses, not to mention effects
on memory bandwidth, locality, and processor cache usage.

All else equal, using less memory is faster and more scalable than using more
memory. Java programmers seem to frequently forget that object references do
have a cost associated with them.

Tangentially, those complex object graphs also make (de-)serialization much
harder than it needs to be. Requests & responses should be as simple as
possible!

~~~
gmarx
What did it save you server-wise and how long did it take to identify and make
the change? Also, did you discover this as part of a performance problem
investigation or was it something you saw upfront and nipped in the bud before
it became a problem. I agree it sounds like poor design. I instinctively
simplify whatever info is going to be sent from the server.

~~~
x0x0
Generally when building servers on top of a jvm, it's a good idea to be able
to capture eg 1m requests, replay them, and monitor your memory allocation /
gc as part of quality tests. If you create nests of objects in java, it
quickly gets expensive, both in terms of net memory and gc costs. One common
place this happens is parsing json/xml; use thrift or protobufs instead.

If you are eg building an ad server that is supposed to sustain some XX
thousand requests per second, you want to also monitor eg eden, survivor,
tenured, and promotions. Bump pointer allocation and gc is very fast --
normally faster even than manual memory management -- as long as almost
nothing survives. The problem is if some objects start accidentally living
past requests, stuff goes boom fast.

------
stuff4ben
As I move into other languages like Go and Swift, would love to see breakdowns
of how they store objects in memory comparable to Java.

~~~
pjmlp
In any case better, thanks to value types support and layout control.

Somehow I would like they would have focused on value types and reified
generics for Java 9 instead of jigsaw and postponing them to Java 10+.

~~~
needusername
> Somehow I would like they would have focused on value types and reified
> generics for Java 9 instead of jigsaw and postponing them to Java 10+.

I hear you. Cost-benefit wise Jigsaw doesn't look attractive to me. To be
honest I would have preferred Java 9 in the fall of 2016 without Jigsaw.

~~~
pjmlp
Java is getting lots of new competition from languages that have proper
support for value types, have layout control and have AOT available for free
on their reference compilers.

On the desktop it already lost to .NET, Qt and HTML 5.

On the mobile it might reign in Android, but Android Java is not 100% Java and
I don't see Google improving the language, in case Oracle stops doing it now
that they had another setback.

They went silent on future of JEE and IBM is now doing lots of Swift and Go
support, while making J9 language agnostic.

So I really don't care about jigsaw.

~~~
Longhanks
Also, what advantages, except existing code bases, does Java provide to
Kotlin, Go, Swift, Rust? For about any use case I can imagine a better choice
than Java.

Also, concerning the recent events (Google vs. Oracle), I think every new
project might want to triple check if they really want to use a product from
someone like Oracle.

~~~
pron
Kotlin is a JVM language, and doesn't really compete with Java (the two are
"friendly"). Rust is a low-level language which is much better suited than the
JVM to memory-restricted environments, but JVM languages (including Java) are
much easier (therefore cheaper) to develop, so are a better choice for server-
side applications. Go and Java are indeed very similar, but I find that Go is
better for smaller programs and Java is better for larger ones (due to amazing
monitoring, polyglotism, dynamic code loading etc.).

~~~
pjmlp
Yeah, Rust is not the real adversary.

The real adversaries are Objective-C, Go, Swift, .NET Native, C++14, OCaml,
Haskell and even node.js.

Monitoring tools similar to VisualVM will come.

Nowadays my Java code is either targeted at Android or maintenance of existing
enterprise code. Other than that we aren't using it for greenfield projects.

~~~
pron
It seems to me that companies that choose non-JVM languages (with the
exception of .NET) for important server-side projects are those that have
always opted for the flavor-of-the-month. Most of those using
Go/Elixir/OCaml/Haskell today, are those that used PHP, Ruby, Python or Node
yesterday, and are likely to pick something new tomorrow. All of these choices
have deep, fundamental and serious problems -- both technical and
environmental -- that are unlikely to ever be solved, but are all solved on
the JVM. Those problems are sometimes obscured by the novelty and promise of
the shiny alternative, and take a while to be felt among those who always opt
for the new thing, but they are invariably discovered later on, and by that
time something newer comes along, whose problems are yet unapparent. The
problems are there because unlike Java and the JVM (and Fortran, COBOL, C and
Rust), those platforms were not developed after careful consideration of the
industry's needs (Erlang/Elixir is a different story, and, indeed its
_fundamental_ problems are more environmental than technical).

In addition, in smaller/less-critical projects, Java was an accidental leader
in a short period of time where the competition disappeared _and_ Java itself
was novel, but it was never "supposed" to be there to begin with. VB and
Delphi were very popular choices early on. I'd estimate that even when Java
was the shiny new thing, VB, Delphi and other rapid-development languages had
a larger share of the market than Go, Node, Ruby, Python, Haskell and OCaml
combined today. And frankly, I don't see companies developing en masse large-
scale applications in C++14. Unless you have very specific needs, there's just
no reason to go there and pay a higher development cost, and if you do have
those needs, you're probably already there.

In large corporations (including trendy ones, like Google, Amazon, Netflix,
Twitter and many more, let alone more conservative ones) the JVM still reigns
supreme, with competition lagging far, far behind. I'm not saying it won't eat
into the JVM's market share -- although they're mostly cannibalizing one
another -- but I don't see any potential leader in the pack. It's possible
that there won't be one, with many sharing the cake.

So I think that (again, with the notable exception of .NET) the non-JVM market
is composed of novelty-seeking organizations and a share that wasn't Java's to
begin with, but accidentally was for a short while (the two segments
intersect).

~~~
fauigerzigerk
_> All of these choices have deep, fundamental and serious problems -- both
technical and environmental -- that are unlikely to ever be solved, but are
all solved on the JVM._

The problems that the JVM solves are solved by other platforms as well, and
they were in fact already "solved" by the time Java was invented. Java was the
prototypical flavor of the month language at one point in time.

Programming language history sometimes seems pretty random. If Java hadn't
hinted at the fantastic possibilities of the now completely defunct Applet
technology, Smalltalk VMs might have captured the server-side with
technologies very similar to what the JVM became many years later. They were
arguably on the verge of doing that.

It's ironic that Applets failed because of unsolved problems with the JVM that
remain largely unsolved to this day (such as insane memory usage and laggy
startup).

And Java failing on the client side has created another problem unsolved by
the JVM, which is code and skills sharing between client and server. That's
one of the main reasons why node.js exists.

And the one problem with the JVM that is probably unsolvable forever is the
influence of its authority-seeking users on software design. That's what makes
the JVM environment so utterly broken. Its culture of excessive complexity.

I wouldn't dare to predict which flavor-of-the-month will fade and which one
will dominate. It seems to be random and largely dependent on fantasies about
future platforms unrelated to technological merit.

We could have stuck with Lisp and Fortran and we wouldn't be any worse off
than we are today. There is very little progress in our industry when it comes
to programming environments.

~~~
pron
> The problems that the JVM solves are solved by other platforms as well, and
> they were in fact already "solved" by the time Java was invented. Java was
> the prototypical flavor of the month language at one point in time.

Perhaps, but they're not solved by any of the languages/platforms mentioned.

> Smalltalk VMs might have captured the server-side with technologies very
> similar to what the JVM became many years later. They were arguably on the
> verge of doing that.

Except that they were _the same_ VMs (HotSpot was a Smalltalk VM). Smalltalk
didn't lose; it just got repackaged as Java.

> And Java failing on the client side has created another problem unsolved by
> the JVM, which is code and skills sharing between client and server. That's
> one of the main reasons why node.js exists.

With that I agree, but JS has its own issues, and the other languages
mentioned don't have it any easier in that regard.

> And the one problem with the JVM that is probably unsolvable forever is the
> influence of its authority-seeking users on software design. That's what
> makes the JVM environment so utterly broken. Its culture of excessive
> complexity.

Except that the JVM is by far the leading platform not only in conservative
enterprises, but also among the thought-leaders and biggest technological
innovators. I don't know if the JVM environment is "broken" or not, but it
certainly isn't any more broken than any other platform for serious server-
side software. As I said, for less-serious/smaller applications, Java and the
JVM were not meant to be the first choices. We had VB and Delphi, then Python
and Ruby, and now Node and whatever. It's the same market share, and I don't
see evidence that the JVM is slipping.

~~~
fauigerzigerk
_> Perhaps, but they're not solved by any of the languages/platforms
mentioned._

Go and Elixir do support concurrency/paralellism mechanisms that are not
supported very well by the JVM. Go also solves the value type issues that
cause the JVMs excessive memory usage. Code/skills sharing between client and
server isn't supported by the JVM either (for web apps that is).

 _> Except that they were the same VMs (HotSpot was a Smalltalk VM)_

Yes, but HotSpot is from 1999 whereas Java is came out in 1995 if I remember
correctly. The initial Java VM was a bit of a throwback.

 _> I don't know if the JVM environment is "broken" or not, but it certainly
isn't any more broken than any other platform for serious server-side
software._

How do you define "serious"?

~~~
pron
> Go and Elixir do support concurrency/paralellism mechanisms that are not
> supported very well by the JVM.

Nope. Tooting my own horn here, but the Quasar library on the JVM has both
models to the T, performs on-par with Go and blows Erlang/Elixir out of the
water in its own game.

> Go also solves the value type issues that cause the JVMs excessive memory
> usage

True, but not only is that solvable, it is actually being solved on the JVM as
we speak, and will be released long before Go will have caught up. OTOH, Go
has some serious issues that are unlikely to ever be solved (its reliance on
source-code instrumentation and lack of support for dynamic code manipulation;
this has been demonstrated to be of great value for many server software
systems).

> Code/skills sharing between client and server isn't supported by the JVM
> either (for web apps that is).

This isn't any worse than any language that isn't JS. On the JVM you have
Kotlin, Clojure and Scala, all also compile to JS, and those are just the
leading options.

> How do you define "serious"?

Critical software that is required to serve for a decade or more, and whose
development costs exceed, say, one man-decade. .NET is pretty much Java's only
serious competition there.

~~~
pjmlp
> True, but not only is that solvable, it is actually being solved on the JVM
> as we speak

I am looking forward for it, but with Brian Goetz speaking about Java 10+
(note the plus) and Oracle not having a programming language culture, I am not
sure they will ever happen in the near future.

IBM also doesn't talk any more about Packed Objects and I don't know if anyone
is really paying attention to Gil efforts promoting object layout.

Even the Scala guys are now researching into Scala Native.

~~~
bitmapbrother
>IBM also doesn't talk any more about Packed Objects

Of course they do. You're just not looking in the right place.

[http://mail.openjdk.java.net/pipermail/panama-spec-
experts/](http://mail.openjdk.java.net/pipermail/panama-spec-experts/)

~~~
pjmlp
I know that list.

1 to 2 emails per month, for something that might eventually come in Java 10
or later isn't talking about it.

------
brown9-2
It seems really silly to compare the "search/insert/delete performance" of
HashSet to HashMap to ArrayList to LinkedList, since the fundamental purpose
of each class is different.

Not to mention that "search/insert/delete" are three separate operations with
sometimes three different performance characteristics.

For instance it is incorrect to state that the insert/delete performance for
LinkedList is O(n) - both are constant-time.

~~~
swsieber
I'm not sure how LinkedList insert/delete performance isn't O(n) where n is
the list length - care to shed some light?

Are you assuming you have a node in the linked list? That might be constant,
but the standard is to remove by index, and that's O(n).

~~~
brown9-2
Ah, I was assuming foolishly that we were talking about removing or inserting
only at the head or the tail. All the more reason to be more detailed in a
writeup than simply "Performance: O(n)" :)

------
chii
very detailed and interesting read!

