
Show HN: EventQL – Open-source distributed SQL analytics database  in C++11 - paulasmuth
https://github.com/eventql/eventql
======
jimktrains2
Is this similar to Apache Drill[0]/Google Dremel[1] (BigQuery[2])? One
differences is that it seems to be able to do mutations to data, not simple
appends.

[0] [https://drill.apache.org/](https://drill.apache.org/) [1]
[http://research.google.com/pubs/pub36632.html](http://research.google.com/pubs/pub36632.html)
[2] [https://cloud.google.com/bigquery/](https://cloud.google.com/bigquery/)

~~~
paulasmuth
Yes, it is similar to BigQuery with a couple of differences as you pointed
out. The big ones being that it's fully open source and self-hostable.

It's less similar to Apache Drill - Drill is "only" a query engine and doesn't
handle the actual data storage. EventQL combines a bigtable-like storage
engine (optimized for the analytics use case) with a dremel/bigquery/dremel-
like query engine.

~~~
ddispaltro
How does it compare to Clickhouse, recently released from Yandex [0]?

[0] [https://clickhouse.yandex/](https://clickhouse.yandex/)

------
babas
Looks great! I've just gotten a single instance up and running. Really simple
to set up. It seems almost exactly what We've been looking for! Some
background: We have been evaluating different time series databases for use
with sensor readings (We probably need to ingest something on the order of
100,000-200,000 samples/s per cluster). The data is from physical sensors like
temperature, level, pressure etc. Where all the data is in the form of
tag(string)|datetime(ms)|value(decimal).

Have you done any comparisons to other similar software like influxdb,
Cassandra, etc? Especially ingestion rate and disk usage.

What kind of pricing can we expect on Managed Hosting?

We are currently leaning towards Influxdb but the cluster licensing stuff they
are doing really made us think twice.

~~~
teej
If you're looking for recommendations for time series data, this thread still
holds up -
[https://news.ycombinator.com/item?id=8368509](https://news.ycombinator.com/item?id=8368509)

I'd consider taking a look at KDB+

~~~
babas
We threw kdb+ out of consideration pretty early because it's extremely
expensive and we prefer open software.

------
dtheodor
Why EventQL over other distributed columnar-storage based databases like
Redshift, Vertica, Citus, or Hadoop-based ones like Impala and Presto. What
does EventQL do better?

~~~
carterehsmith
Good question!

May be useful to add to Show HN guidelines that it is recommended to add a
"Why X" section. Or "How X compares to others in this space".

Without that, it is confusing. Are authors unaware of other solutions in the
space? If so, they surely did not build competitive product.

Or maybe authors are aware of other solutions, but don't want to get compared?

Either way, not a good sign.

------
nwrk
looks good [0], thank you

can anyone comment on data disk usage ?

[0] docs -
[http://eventql.io/documentation/](http://eventql.io/documentation/)

------
ddorian43
Why not build a bigtable-cassandra fusion row-store too ? Since updates are
async and all nodes are the same (cassandra) while the data model is sorted-
by-primary-key (bigtable) and the schema fixed (low cost in storing tuples)
and sql available (easier for devs).

------
buremba
How can we import CSV dataset? HTTP API for insert returns this error message:
expected JSON_OBJECT_BEGIN, got: JSON_OBJECT_BEGIN. The GROUP BY query in
homepage scans 1.8B rows and only takes 1.5 seconds which is great but how
many nodes used in that setup?

~~~
paulasmuth
We do have a csv import util (the API expects JSON) but it's not in the
current distribution/release build. I'll add it and update this comment once
it's live.

Queries are mostly limited by IO if running on regular hard disks. The number
of rows/seconds mainly depends on the number and types of columns that are
accessed. For example, if we scan 1.8B rows and only load a single integer
column from disk (and the integers are small), we'll only have to load about 1
byte per row from disk (using an idealized model excluding some overheads for
illustration purposes). If we want to complete the query in 1.5 seconds that
would be a total IO load of 1144MB/s. So (depending on disk speed) around ~15
machines would suffice.

------
ktamura
As a maintainer of the projec,t it's cool to see the Fluentd plugin in the
repo: [https://github.com/eventql/fluent-plugin-
eventql](https://github.com/eventql/fluent-plugin-eventql)

