

3 billion items in Java Map with 16 GB RAM - qwerta
http://kotek.net/blog/3G_map

======
_delirium
The MapDB mentioned in this article looks pretty interesting:
<https://github.com/jankotek/MapDB>. Anyone have experience with it?

~~~
spartango
I've been using the previous version of the library (JDBM3) for some time now,
although I haven't tried MapDB just yet. They have very similar APIs, however.

I've found JDBM3 a pleasure to use; it's fast and stable, with an excellent
API. It's really quite easy to start using (it exposes Java collection
interfaces), but you can configure it to do some powerful things under the
hood. I'm not using the library in a really high-performance setting (just
need a persistent key-value store), so I can't quite comment on the extreme
scaling qualities.

I'm excited to switch to MapDB as it matures.

------
clumsysmurf
From Github:

"What you should know: MapDB relies on mapped memory heavily. NIO
implementation in JDK6 seems to be failing randomly under heavy load. MapDB
works best with JDK7"

The original JDBM2 apparently worked fine under Android; I wonder how the new
implementation (JDBM4 renamed MapDB) fairs.

I do worry about Java as implemented on Android diverging from the official
stuff in general. For example, Android does not have NIO.2, and Harmony NIO
bugs would be different than OpenJDK NIO bugs.

------
thrownaway2424
Can someone please explain how this is possible, when all the java collections
have "int size()" and the max value of a java int is 2,147,483,647?

~~~
_delirium
Collections are allowed to grow larger than size() can handle. The
documentation for size() says:

> If this collection contains more than Integer.MAX_VALUE elements, returns
> Integer.MAX_VALUE.

~~~
thrownaway2424
I see, thanks for pointing it out. I always considered it a fundamental
limitation of collections, because all or almost all of them are array-backed,
and array indices in java are int, therefore the collections are generally
limited to 2^31 items.

~~~
omaranto
Does the term "array-backed" include the possibility of using an array of
arrays? That would get you up to about 2^62...

------
dahlia1
"Near the end insertion rate slowed down thanks to excesive GC activity"

Can someone explain why this happens?

~~~
kt9
The Java GC uses a mark and sweep algorithm that has to stop the world to
collect the garbage. As the number of items on the heap goes up, there are a
higher number of valid pointers (less garbage) and the GC takes a long time to
find even a little bit of garbage to clean up. So each GC run takes longer and
there are more of them as the heap fills up. Given that the world is stopped
during each GC run things take longer and hence the insertion rate is slower.

~~~
andrewvc
What? In java there are multiple GCs available: ConcMarkSwep, G1GC,
ParallelGC, and I believe a couple more. All you have to do is set the right
CLI flag.

From the concurrent mark and sweep docs: "The concurrent mark sweep collector,
also known as the concurrent collector or CMS, is targeted at applications
that are sensitive to garbage collection pauses. It performs most garbage
collection activity concurrently, i.e., while the application threads are
running, to keep garbage collection-induced pauses short. The key performance
enhancements made to the CMS collector in JDK 6 are outlined below. See the
documents referenced below for more detailed information on these changes, the
CMS collector, and garbage collection in HotSpot."

It does have to stop the world _sometimes_ but those times should be quite
rare!

~~~
mbell
If the CMS collector can't 'keep up', meaning it doesn't think it can avoid
running out of heap space based on its duty cycle and the current memory
state, it will do a full on complete, stop the world GC. Based on the article
I'm guessing this is what is kicking in.

~~~
andrewvc
They never mentioned using CMS, and CMS isn't the default GC, so I assume they
weren't using it.

CMS actually isn't the highest throughput GC, it's there for low-pause systems
where there are extra cores available to perform GC in the background.

~~~
mbell
My point was that none of the GCs completely avoid full on stop everything,
sweep and compact all the things GC cycles. They will all do this if they are
in a position where they think the heap will be exhausted (specifically when a
concurrent mode failure occurs). There really isn't anything else a GC can do
in this situation other than throw an OOM exception. Thus any time your
pushing the heap to its limits, regardless of what GC you use, you will see
complete GC sweeps of the old generation.

If you want stricter failure requirements use GCTimeLimit and GCHeapFreeLimit.
By default the JVM with throw an OOM error if it spends 98% of execution time
in the GC and frees only 2% of the memory, GCTimeLimit and GCHeapFreeLimit
switches allow this these values to be changed. Also only the 'stop the world'
portions of a collection apply to the execution time limit, concurrent phases
do not. So if you managed to keep the collector in concurrent mode (by
cranking up the duty cycle for example) then only the 2 mini-pauses will count
even if the concurrent portion were now pegging a CPU core at 100%.

------
greenyoda
Does anyone know which license MapDB is distributed under (GPL, BSD, etc.)? I
couldn't find this information on the MapDB site.

~~~
biobot
Apache Software License, Version 2.0

[https://oss.sonatype.org/content/repositories/snapshots/org/...](https://oss.sonatype.org/content/repositories/snapshots/org/mapdb/MapDB/0.9-SNAPSHOT/MapDB-0.9-20121108.082655-3.pom)

------
plant42
Interesting, though I'm curious about what kind of results you get when
something other than an empty String object is used?

~~~
qwerta
It uses more memory and result number of entries will be smaller. It is
database engine, physical laws still applies.

~~~
plant42
I was more interested in knowing whether you have any real world examples of
MapDB?

~~~
qwerta
MapDB is still too fresh, so there is no documentation or examples yet. But
some people are already using it, mostly for parsing large text files.

Previous versions were called JDBM and it has been around since 2001, so it
has some solid user base. Problem is that nobody tells me until there is bug,
which is not happening that often :-)

~~~
spartango
I'll say for JDBM3, the bugs must be few and far between; we haven't found any
yet. Many thanks for a rock-solid library!

------
lhaussknecht
Does anyone know of a .net implementation?

