
Hazelcast Jet – In-Memory Streaming and Fast Batch Processing - martypitt
http://jet.hazelcast.org/
======
koolba
The best part about these types of projects is also the worst: _Anything that
is java.io.Serializable can be used for keys /values_

This means you can immediately save / load / stream your applications objects.
Just tack on an " _implements Serializable_ " to you class header (or "
_implements Externalizable_ " if you want to be fancy and do it yourself) and
you're good to go. Plus with the native Map<?,?> interface writing code
against it feels natural.

In practice this also means that you end up serializing arbitrary Java objects
and get stuck in serialization/deserialization hell. Your data is stuck tied
to a specific format, on a specific platform / language. It's somewhere
between impractical and impossible to get it into an agnostic format usable by
any other language so you're stuck in JDK land forever.

Anything that involves getting data into some other system or language
requires you to also write a Java app to read (and possibly write back) your
data.

Do yourself a favor and stick to something language agnostic for your data
stores. You'll thank yourself many times over down the road.

~~~
jankotek
Hazelcast is not really an database. Data are 'persisted' in RAM.

~~~
koolba
> Hazelcast is not really an database. Data are 'persisted' in RAM.

I purposely didn't use the words "database" or "persisted".

My comment applies even more so when data is persisted to durable storage
(i.e. disk) but was meant to apply generally to any distributed data stores.

------
rickette
We're quite spoiled on the JVM when it comes to stream processing frameworks.
Just a few off the top of my head:

\- Apache Spark (especially Spark Streaming) \- Apache Flink \- Apache Storm
\- Apache Apex \- Apache Samza \- Apache Ignite (also includes other things)
and now Hazelcast Jet.

~~~
vegabook
also, note _Apache_ prepended every time. Can't help thinking there is
concentration risk there.

~~~
jonstewart
I'm hard-pressed to think of a feasible concentration risk for Apache
projects. Can you be more specific?

~~~
vegabook
Flink, Spark, Samza, Kafka (Streams). All under the Apache umbrella. All doing
similar things. Competitors. There is a risk of spreading resources too thin
on the space, there is a risk that some of the projects will be guided towards
sub-optimal objectives so as not to compete too directly with others projects
in the portfolio, and there is clear branding dilution. This is made worse by
the fact that all the projects use the JVM, eroding differentiation even
further. And there is concentration risk for Apache making tons of bets in the
same space as opposed to diversifying across sectors. Just a lay observation
as I happen to have been perusing the space for a financial markets compute
graph I am building, and was surprised by the lack of options outside of
Apache, and in particular, away from the JVM.

~~~
lern_too_spel
Apache projects are independently managed, not by anybody paid by the ASF.
[https://community.apache.org/projectIndependence.html](https://community.apache.org/projectIndependence.html)

------
tyingq
I couldn't quite grok what this was supposed to be from the information on the
site. What did help was their one-line description that it was basically a
distributed implementation of java.util.stream.

I didn't know what java.util.stream was either, but this document made it
clear: [http://www.oracle.com/technetwork/articles/java/ma14-java-
se...](http://www.oracle.com/technetwork/articles/java/ma14-java-
se-8-streams-2177646.html)

~~~
Traubenfuchs
For those who know C#: It's the Java version of LINQ.

For those who don't know Java and C#: All regular collections now offer a
.stream() (and .parallelStream()) method that returns the collection as Java 8
Stream<T> instance. With the help of lambda expressions, objects of a stream
can be converted/mapped, filtered, sorted, aggregated, reduced, flatmapped,
etc. by chaining methods on the stream. You can also create finite or infinite
streams without a collection (Like of numbers ranges, random values or
anything else you can think of).

I often end up writing complex stream method chains just to break them up in
the end, thinking that they are too difficult to understand for others.

