
High performance libraries in Java - javinpaul
http://vanillajava.blogspot.com/2012/02/high-performance-libraries-in-java.html
======
SeanLuke
Colt has not been updated for eight years, yet the article refers to it in the
present tense. Why? Because the article's section on Colt is lifted straight
from Colt's website.

This appears to be blogspam.

------
kevinherron
I don't have a solid grasp on what exactly Java Chronicle is and does and what
the use cases might be. (<https://github.com/peter-lawrey/Java-Chronicle>)

Can anybody elaborate?

~~~
tlack
I'm definitely not a knowledgeable Java dude but it seems to be just an
mmap()'d file wrapped in a class. I don't understand how this fits the
description the author provided. I also don't understand how he can call it an
"in memory database" in the first paragraph and then say "can be much larger
than your physical memory size (only limited by the size of your disk)". Any
Java kids who have digested those couple code files want to help me
understand?

~~~
th0ma5
i haven't looked at the code, but i have dabbled in some performance coding...
if you can stream things in and out quickly, and in operations that themselves
take as little as possible memory, then you can operate on very large files in
mostly directly linear time with relation to the size of the file... so while
the packet in the stream might be in memory at the time your code runs against
it, it must seem to be saying that not only can you do that, but we can also
stream files through this thing without having to read the file all at once
into memory (which works great and fast on very small files, but can downright
non functional for large files) .... just a guess... a lot of performance
stuff i've been around is a hot-potato situation, so maybe they're trying to
describe that.

------
rywang
I use the Colt (Java) matrix libraries for applied math and graphics
applications, but it's still quite costly compared to optimized libraries like
Intel's MKL. A singular value decomposition takes about eight times as long in
Colt vs MKL.

~~~
tdj
I've used both Colt and apache commons math in a machine learning setting. It
was mostly the better handling of sparse matrix ops that made apache a couple
of times faster than colt (due to hash-based vs. sorted list vectors) It was
pretty much the same on dense linear algebra. Intel MKL, eigen, uBlas can be
better, but I haven't done the benchmarks to prove this either way.

~~~
rywang
While we're on the subject, I'm curious if you've played with matrix-toolkits-
java / netlib-java. I'm considering switching to it. I've had good experiences
with the Lawson-Hanson non-negative least squares that comes with netlib.

~~~
bedatadriven
The netlib-java project is very convenient because it allows you to use
optimized native netlib libs, and transparently fall back to the f2j (jvm
bytecode) library when native libs are available.

The java version of netlib produced by f2j is unfortunately the unoptimized
"reference implementation", so it's not terribly performant for large
matrices.

------
elehack
I use the excellent fastutil for primitive collections - it has great
integration with java.util collections, and has a more standard design than
Trove in my opinion. Works great.

------
pasbesoin
Another borked Blogger template. Cache URL as a "permalink", for those who
have Javascript disabled:

[http://webcache.googleusercontent.com/search?q=cache:http%3A...](http://webcache.googleusercontent.com/search?q=cache:http%3A%2F%2Fvanillajava.blogspot.com%2F2012%2F02%2Fhigh-
performance-libraries-in-java.html)

------
chaostheory
I think missing from this list is akka <http://akka.io>

It's a concurrency framework that allows you to:

1) abstract threads into actors and messages

2) distribute processing with remote actors residing on different machines.

imo, it's way easier to use than traditional java concurrency

~~~
ww520
How is Akka compared to Hazelcast in term of distributed processing?

~~~
chaostheory
I don't know much about Hazelcast, but it seems to just be a group of
distributed versions of constructs used for managing threads. That's the key
difference. Akka abstracts thread management for you with an Actor model,
which imo is much much easier to use.

------
rkalla
The "serialization using ByteBuffers" should be clarified that it is
"serialization using _direct_ ByteBuffers" -- as in they exist in the native
OS memory space and not inside the JVM's heap.

Direct ByteBuffers are _excellent_ when you can make use of a long-lived,
fixed-size buffer that you are using to communicate with a native resource
(e.g. socket, file, etc.)

My own experience with using direct ByteBuffers is allocating read/write
buffers to a running Redis process that I use to write commends to the server
and read the results back. The difference in performance using a direct buffer
instead of raw byte[] (basically a standard ByteBuffer) were astounding.

I have seen people argue against the use of direct buffers pointing out that
at some point your calls and payload have to cross the JVM-native barrier and
using a direct ByteBuffer simply moves the point of entry/exit which won't
change the performance of the entire round-trip.

I can't argue with that, but I would point out that in my own work with Redis,
having a native process input and output data to and from a native OS buffer
that I can then pull into the JVM gave me at least an order of magnitude
improvement in speed than sticking with raw byte[] in and out over a socket.

ASIDE: I attribute my success here to the fact that I was able to queue up
out-bound commands as fast as possible inside the JVM, pushing them out into
the native buffer space which streamed them into Redis; reading back the
replies as quickly as possible in a separate thread. My understanding is that
by moving the "blood-brain-barrier" to this point, I am allowing Redis to
consume and produce as fast as possible as long as I keep the input buffer
full and output buffer relatively empty. In other words Redis wasn't being
blocked (for the most part) by waiting on me to push and pull data in and out
of my running JVM on every single read/write.

ADDENDUM: Just had a fun impl thought for anyone that read this and thought it
was interesting... a custom InputStream and OutputStream impl along the lines
of the JDK's Buffered streams, but the input and output streams are actually
backed by direct ByteBuffers.

The use-cases for the stream would need to be very specific and the underlying
approach clearly spelled out in the Javadoc, but it would provide a nice
bridge between standard JDK stream-based I/O and the NIO work without
burdening the caller with knowing about how to use the NIO APIs.

For anyone interesting, I'll likely add a first-pass impl of this to the
Universal Binary JSON Java libs[1] later today to compliment the re-usable
ByteArray stream impls that are there already.

[1] <https://github.com/thebuzzmedia/universal-binary-json-java>

~~~
Scaevolus
Be careful. The JVM will never unmap the underlying regions it uses to support
direct ByteBuffers, so if you end up creating a lot of them (such as for
reading/writing files) or even just resizing them (which creates an entirely
new one), you can run out of virtual memory.

~~~
gresrun
The excellent I/O library, Netty, deals with this by allocating a slab of
direct memory, slicing it up and pooling the use of the memory.

It's the Java equivalent of writing your own alloc().

~~~
pron
That's not exactly what Netty does. It does allocate large direct buffers and
slices up pieces, but the pieces are not pooled because they are never
returned to Netty. For this reason they also don't need to implement malloc in
Java. All the slices reference the "parent" buffer, and once they are all
collected, the parent can be collected as well. When there is no more room in
the buffer, another one is allocated. (I just read that code last week because
I wanted to know what was going on precisely because there was no way of
returning a sliced buffer to the pool).

It is very easy (and efficient) to implement java.io input and output streams
backed by a buffer (as long as the buffer doesn't have to grow).

~~~
gresrun
Buffer pooling is still very much a WIP. (the first commit was 3 days ago!)

<https://github.com/netty/netty/tree/bufferpooling>
<https://github.com/netty/netty/issues/62>
[https://github.com/netty/netty/commit/c9968d6cbfa958f73a9868...](https://github.com/netty/netty/commit/c9968d6cbfa958f73a98688b6d71571f32a3086d)

~~~
pron
Oh! Good to know!

