
Terabyte-sized Java Apps Now Possible - vkalladath
http://www.pcworld.com/businesscenter/article/201629/terabytesized_java_apps_now_possible.html
======
Groxx
Key point: terabyte-sized _memory pools_. Which is quite awesome.

I'd initially thought it referred to terabyte-sized _executables_. Followed by
a "oh great. Now someone's going to _make_ one, and some government's going to
_want_ one."

~~~
zandorg
Don't worry, Adobe's slowly getting there.

------
ihodes
Sometimes I feel as though my little, simple, apps' source will end up being a
terabyte after packaging all their dependancies. It's a little ridiculous.

Regardless, this seems like it has some pretty powerful implications for big-
data processing. The potential integrating this with Clojure somehow, and
parallelizing the computation across those 10 (at least) servers with 1TB of
memory is pretty astonishing to think about. (Though you don't need Clojure to
make it parallel, yes.)

<http://www.terracotta.org/ehcache-2.2?src=/index.html> for the product
itself.

------
strlen
_An organization could put their entire database into memory, which would
reduce the latency of the application by "a couple of orders of magnitude," he
said._

That works well until the power goes out (and it does) or the OS (or the JVM)
crashes. Keeping the hot portion of the data cached in memory (and maintaining
a smarter cache vs. simple LRU heuristics) _without_ sacrificing durability is
still a must for data you care about it.

You can checkpoint your data to disk and assume you'll never have more data in
memory, but that starts to become very expensive when you factor in obsolete
versions, replication (to make your system immune to machine failures), logs
for recovery.

Ultimately there's a lot to be said about the redundance of putting a cache in
front of a database. The right thing to do, however, is to build storage
systems (that may or may not resemble conventional databases) that integrate
caching. I highly suggest reading about LSM trees as used by BigTable (a way
to reduce write latency without significantly sacrificing durability) as well
as the BigTable paper (for the "keep the hot set in memory, maintain disk
persistence" model): ehCache is a useful product, but it's simplistic to say
it can replace databases and file systems.

------
hga
What Azul is doing gets you about 3/4s that far with true SMP. They've got
their Vega custom hardware, a new x86-64 software only version named Zing
(<http://www.azulsystems.com/products/zing>) and they're pushing an open
source version of the foundation (or more) of Zing through the Managed Runtime
Initiative (<http://news.ycombinator.com/item?id=1491653> and
<http://www.managedruntime.org/>).

And they're listening to people and continuing to work on the latter, e.g. 3
days ago they updated the Linux source code releases. A complete SRPM,
particularly for Fedora Core 12, and a kernel patch suitable for auditing and
applying to the newer 2.6.34 containing the memory management half with
remaining scheduling part to follow:
[http://lists.managedruntime.org/pipermail/dev/2010-July/0000...](http://lists.managedruntime.org/pipermail/dev/2010-July/000004.html).

------
dnsworks
I'm guessing 10 servers for redundancy? A server with 192GB of memory can be
had for about $15k nowadays.

------
gojomo
And that Terabyte-sized app? "Hello, World!"

