
Building a Streaming Analytics Data Stack - henridf
https://medium.com/@henridf/building-a-streaming-analytics-data-stack-ea0641048661
======
rbranson
FWIW jut.io shut down the day after this was posted.
[https://twitter.com/PurpleQuark/status/661274501728964608](https://twitter.com/PurpleQuark/status/661274501728964608)

~~~
spo81rty
Weird. Their own website and twitter account doesn't say anything about
shutting down.

~~~
demmer
Disclaimer: I’m the CTO at Jut and have been with the company since it’s
inception.

Actually Jut didn't shut down last week. We figured out a much better way to
package up our tech and ended up making some big changes to execute on this
strategy. Unfortunately @PurpleQuark and a number of other really great people
are no longer with us.

~~~
rbranson
Sorry for the misinformation then. Thanks for clearing this up.

------
mring33621
I'm working on something similar. So far I like Apache NiFi for ingestion and
Apache Flink for processing. Storage choice(s) are plenty and IMHO determined
by the use-case and available expertise.

~~~
henridf
Indeed storage choices are plenty. That said, if you want to be able to do the
kind of optimizations that are described in the article, then the set of
candidates gets a bit smaller.

------
teej
Let's get this out of the way - I love it when companies are open and
transparent about their architecture. Sharing intimate details like this is
fantastic.

Where I'm struggling is that there are a number of questionable choices here
with little justification. For example, why a HTTP front-end? This is fine for
webhooks but I'm not going to let my website's backend open an HTTP connection
for every event I want to send out. The decision to store the data in
Elasticsearch and Cassandra is equally dubious. In my experience Elasticsearch
has been a maintenance nightmare and has not been a perform any and robust
reporting solution at scale.

------
spoonfoe
I saw these guys at Velocity in NY this year. Pretty impressive product. I
felt like the query language they built was easier to work with than setting
up queries and filters in elasticsearch's api.

Really interesting to hear about the innards.

Thanks for the post.

~~~
vladsanchez
Query DSL's have always been the problem with these NOSQL / Document-oriented
backends.

------
itaifrenkel
Do you support only transitive aggregation operations? If so why not push the
entire aggregation to elasticsearch/cadsandra?

How do you plan to scale cpu wise? ES and streaming engines (dont know
cassandra) are cpu hogs (compared to map reduce). I heard at devopsdays Tel
Aviv that bigpanda decided to provide different sla to paying and non paying
customers to balance the costs.

~~~
henridf
The Juttle query language allows the expression of dataflow graphs and UDFs
which can not (in general) be done natively in ES.

As far as the scaling issue goes, this was designed to run on premise rather
than as a SaaS service (unlike bigpanda?).

Disclaimer on last paragraph: as per @demmer's comment below, Jut's plans have
changed, so it may no longer be valid or relevant.

