
Byte Buffers and Non-Heap Memory - DanielRibeiro
http://www.kdgregory.com/index.php?page=java.byteBuffer
======
ShabbyDoo
Right now, I'm dealing with a problem where, from an external system, we
retrieve up to 1M rows of data -- about 30 columns each (short strings,
numbers, etc.). Because the external system can never guarantee that it will
provide the same answer to a query twice and because our users still expect to
be able to sift through a particular query result without changes to the
underlying data, we must store these query results "on our side."

Our app is Java. Even in "short pointer" mode, 30M references to the same
string would eat up 4 bytes * 30M = 120M of memory! And, the garbage collector
would be displeased because it frequently would have to sift through mature
generation regions. So, we are left with disk-based options (a traditional
RDBMS included in that mix) or some variation of structs in byte
arrays/buffers. [I should mention that we only must provide access to these
query results for an hour or so.] While I have found some reasonable flyweight
pattern-based schemes for representing structured data in large byte arrays
(Javolution has a nice one), I have not found any good data structure
libraries which use byte arrays as backing stores. Because the data does not
change once queried and inserted into our storage scheme, it seems reasonable
to use some look-up schemes (hashes, etc.) which could be persisted in a
manner similar to the raw data itself. An additional level of abstraction
would be nice, but I'll settle for not writing my own hash implementations.

The route I'm taking now is to use the H2 embeddable, all-Java DB. It has
pluggable backing stores, so I can make it pretend that direct byte buffers
are disk-like. I don't really need write concurrency, transactionality, and
all the other stuff "real" RDBMS systems provide, but this seems like a good
trade-off compared to rolling my own in-memory sort-of DB. It may turn out to
be fast enough to just run H2 in a disk-based mode.

Does anyone have any thoughts on this?

~~~
dschoon
You might also look into Berkeley JE. It's an in-process KV store, which might
be a minor crimp for you, but it's more mature than god and transparently
handles paging to disk and shadowing. We used it as durable working memory for
a high volume stream-processing application, bundling rows together into
pages, compressing them, and then letting JE handle caching and GC. It does an
excellent job.

[http://www.oracle.com/technetwork/database/berkeleydb/overvi...](http://www.oracle.com/technetwork/database/berkeleydb/overview/index-093405.html)

License for server usage is Sleepycat.
<http://en.wikipedia.org/wiki/Sleepycat_License>

------
ShabbyDoo
It seems that there are genres of problems for which much better performance
and/or memory efficiency may be achieved by avoiding Java's object model and
garbage collection. In my other comment in this thread, I mentioned my use
case for in-memory database-y stuff. What else? Terracotta sells its "Big
Memory" plug-in for EHCache which is basically a ByteBuffer-based in memory
cache designed to reduce the number of objects visible to the garbage
collector. Certainly manipulation of binary media formats qualifies (and the
nearly-abandoned Sun/Oracle-provided libraries for doing so use such
approaches, IIRC). What else?

------
mahmud
Actually, the rest of his articles are also of topnotch technical quality. Me
like!

<http://www.kdgregory.com/index.php?page=programming>

------
ShabbyDoo
Thanks. We've also considered SQLite. One Java wrapper impl (all are 3rd
party) makes it easy to pass a direct byte buffer in as a persistence store.
So, one could keep a map of database blobs and apply the DB to them as
necessary. If H2 is fast enough, I'll take the all-Java solution so that we
don't have to worry about native code issues between environments.

I'll have to think about how much I need actual SQL support. It sure does make
a couple of my use cases easy (data aggregation), but it may come at a
performance/memory footprint cost.

