

Show HN: Phoenix – an ultra-fast SQL layer on top of Hbase - bra-ket
http://phoenix.incubator.apache.org/

======
capkutay
Hmm...Phoenix has been available for some time. Is there a reason its on the
front page of HN now? (new release? significant new adopters?)

It's interesting how the NoSQL movement spawned a chorus of 'SQL/relational
db's are dead' [0] in the tech media world. Now you're seeing both basic and
advanced SQL interfaces roll out for the leading NoSQL technologies including
HBase (this thread), Couchbase [1], Cassandra (although limited) [2], and
there's even some effort in MongoDB to at least map SQL commands to mongo DM
operations [3]. In addition to that, open source data management frameworks
like spark[4] and hadoop[5] are even putting significant effort in that area.
I hope I'm not stating the obvious, but all this data suggests that SQL will
be a key component in the future of data management.

0: [http://www.iheavy.com/2013/12/30/sql-databases-
dead/](http://www.iheavy.com/2013/12/30/sql-databases-dead/) (just one
example, you'll find many more if you google it)

1: [http://www.couchbase.com/press-releases/unql-query-
language](http://www.couchbase.com/press-releases/unql-query-language)

2:
[http://cassandra.apache.org/doc/cql/CQL.html](http://cassandra.apache.org/doc/cql/CQL.html)

3: [http://docs.mongodb.org/manual/reference/sql-
comparison/](http://docs.mongodb.org/manual/reference/sql-comparison/)

4: [http://shark.cs.berkeley.edu/](http://shark.cs.berkeley.edu/)

5: [http://hive.apache.org/](http://hive.apache.org/)

~~~
nemothekid
Another user put it better just a couple days ago, saying NoSQL should really
mean NoACID - databases which simply prefer having eventual consistency for
the sake of achieving high availability.

AFAIK, the recent trend in "Get me SQL" is really because the alternate
databases became really popular - and as they did it simply became
unacceptable to have to teach someone Python in order to query your database
when chances are they already know SQL (think of all the junior analysts who
know SQL through one way or another, but have never touched a REPL).

Note that this obviously doesn't include the real trend towards distributed
strong consistency like Google's F1.

~~~
capkutay
Eventual consistency and NoACID aren't synonymous. HBase has strong
consistency but it certainly doesn't have ACID transactions.

------
j-m-o
I actually just got a prototype working that loads up HBase data through
Phoenix into Spark for analytics.

So far the results look very promising. Across millions of rows, we're getting
queries back in the orders of single seconds.

I'm also able to get millions of rows back on specific, unindexed queries
(e.g. "SELECT * FROM EVENTS WHERE TYPE = 'some_type'"), and can run analytics
across that data in under a minute per run.

The previous comparison to CQL for Cassandra is apt, though I've found the
builtin functions and grammar to be a lot more powerful.

------
bra-ket
introductory talk from ApacheCon:
[https://www.youtube.com/watch?v=YHsHdQ08trg](https://www.youtube.com/watch?v=YHsHdQ08trg)

------
adenner
Phoenix is a jdbc connector to Hbase that originated from salesforce.com and
is currently an apache incubator project.

------
jhorey
For those know a bit more, is this basically equivalent to CQL for Cassandra
except for HBase?

~~~
bra-ket
Phoenix supports richer SQL syntax - joins, derived tables, subqueries,
indexes, views, functions.

------
nemothekid
Anyone familiar with the project able to understand why it so much faster?

~~~
linuxhansl
HBase committer involved with Phoenix here...

Phoenix is not faster than HBase per se, it automates using all the
performance knobs HBase has and is smart about what actions should be
performed at the client and what can be parallelized and/or pushed to the
HBase servers (as standard HBase filters or coprocessors)

------
aba_sababa
Is this an Impala competitor?

~~~
bra-ket
it's faster than Impala
[http://phoenix.incubator.apache.org/performance.html](http://phoenix.incubator.apache.org/performance.html)

[http://phoenix-
bin.github.io/client/performance/phoenix-2014...](http://phoenix-
bin.github.io/client/performance/phoenix-20140324122633.htm)

~~~
yid
That link only shows results for a SELECT COUNT(1) query -- hardly evidence
that its "faster than Impala". That's about as trivial a benchmark as one
could imagine.

