

Vanilla Java: Using a memory mapped file for a huge matrix - pkl
http://vanillajava.blogspot.com/2011/12/using-memory-mapped-file-for-huge.html

======
chancho
I just about spat my coffee out when I saw that nio bytebuffers used 32 bit
ints for everything. (I'm not normally a java guy.) I thought "oh hey a direct
byte buffer will be a great way to keep all this data from blowing up the heap
AW WTF!!?? ints?!?"

Does anyone know the rationale for this? If they had used 64 bit long values,
like the underlying OS calls, his whole matrix could have been mapped into a
single buffer, making all this list-of-mappings stuff unnecessary. That extra
level of indirection normally wouldn't matter much but in this case he's
paying the cost 1e12 times over.

~~~
beagle3
The 32-bit ints can be solved even at the user-level library. But it's much
worse than that.

Even if you only need to access 2GB (or you had fixed the Java memory mapping
code) you still have a .getDoucle() or .putDouble() call for every access; and
that's actually a virtual call (and as far as I can tell, even though I only
ever used one kind of memory channel, the JVM wouldn't inline it -- although I
can't tell for sure, because the JVM also sucks at introspection).

I had real computational code in C that needed to be translated to Java.

First attempt (no memory mapping, converting C structs to Java objects) failed
miserably because my structs were 32 bytes each, and the object overhead was
24 or 32 (don't remember), which took me beyond physical memory (using virtual
memory caused a slowdown of ~1000).

2nd attempt, I switched to memory mapped arrays -- much better, only ~15 times
slower. But I also had to write my own sort, because Array.sort() or whatever
it was called was allocating 48 bytes for each 4 byte int to sort (wtf?),
blowing memory usage up again.

That's a cost people using Hadoop pay all the time -- which kind of surprises
me how popular it is. You need 10 times less CPU if you do things right -- and
at that scale, maintenance & hardware cost as much as salaries....

~~~
Scaevolus
Arrays.sort() creates a full copy of the input data before sorting.

~~~
SeanLuke
I see nothing in the Java6 Arrays.java source code which would support this
claim.

~~~
Scaevolus
Oops, I got Arrays.sort confused with Collections.sort

------
Scaevolus
Java will never free the mappings created for memory mapped files. If you use
them in long-running programs for writing many different files, you'll end up
with a process taking dozens of gigabytes of virtual memory.

------
iqster
My Java must be really rusty ... I just learned about Direct Buffers a few
days ago (they seem to have been around since Java 1.4.x though). Anyone know
how commonly used these are? I can't imagine vanilla IT apps making use of
them.

~~~
gresrun
The most enterprisy usage I can think of is that DirectBuffers are used by
Ehcache/Hibernate's BigMemory product to cache off-heap.

DirectBuffers are also used in I/O libraries like Netty.

