

Cloudera announces real-time query engine for Hadoop - TallGuyShort
http://www.marketwatch.com/story/cloudera-announces-game-changing-real-time-query-on-hadoop-and-leads-a-new-era-of-data-management-2012-10-24

======
pella
"Cloudera Impala: Real-Time Queries in Apache Hadoop, For Real"
[http://blog.cloudera.com/blog/2012/10/cloudera-impala-
real-t...](http://blog.cloudera.com/blog/2012/10/cloudera-impala-real-time-
queries-in-apache-hadoop-for-real/)

Impala source code: <https://github.com/cloudera/impala>

Impala documentation can be found here:
[https://ccp.cloudera.com/display/IMPALA10BETADOC/Cloudera+Im...](https://ccp.cloudera.com/display/IMPALA10BETADOC/Cloudera+Impala+1.0+Beta+Documentation)

------
benbjohnson
Metamarkets also open sourced their data store today. High performance,
distributed query database.

<http://metamarkets.com/category/technology/druid/>

------
jgrahamc
This looks very interesting. I've got multiple PB of log files that I'd like
to be able to do real-time and batch processing on. Log files are being
generated at a small number of Gbps.

Anyone else got that much data and have a good solution?

~~~
untitledwiz
I don't think anyone has managed to get interactive speeds (~seconds to max
couple of minutes) at the PB scale.

------
turingbook
I noted that Marcel Kornacker, the architect of Impala, was the lead developer
for the query engine of Google’s F1 project. In the end of the article:

[http://blog.cloudera.com/blog/2012/10/cloudera-impala-
real-t...](http://blog.cloudera.com/blog/2012/10/cloudera-impala-real-time-
queries-in-apache-hadoop-for-real/)

------
hobbyist
So, what is the key innovation that changes hadoop from batch processing to
real time. Is it similar to google's dremel?

~~~
TallGuyShort
This doesn't change Hadoop from batch processing to real time - it's a new
query engine that uses the same data sources/formats as Hadoop and the same
interfaces as Hive. So yeah, similar idea to Google Dremel / F1.

~~~
ryanpers
It isn't like F1 at all... F1 is a multi-datacenter, transactional datastore
and SQL, that google uses to replace mysql.

This aint nothing like that.

------
nphase
Interesting, I wonder how this will be different than Apache Drill, championed
by Ted Dunning (MapR)

~~~
rmnoon
They're both Dremel clones, but this one seems to be a lot more complete (as
Drill is just getting started).

