Hacker News new | past | comments | ask | show | jobs | submit login
SQL interface for Elasticsearch (github.com/nlpchina)
76 points by taf2 on Nov 16, 2015 | hide | past | favorite | 23 comments



If you use Spark SQL to query ElasticSearch as a data source, it already has better SQL support with joins and you can push predicates down into ES.


If you want real SQL on top of Elasticsearch have a look at https://crate.io/ (https://crate.io/blog/sql-for-elasticsearch/).

P.S.: I'm not affiliated with that company.


I don't get the point of this. Do people not want to learn ES so bad that they will use something like this without understanding how to build an ES query object? All this plugin does is convert sql to an ES query...


I think it's the size of the ES query object as well as complexity... even the ES documentation provides SQL examples to explain the ES query object...

Here's a good example:

https://www.elastic.co/guide/en/elasticsearch/guide/current/...

SQL:

     SELECT document FROM products WHERE productID = "KDKE-B-9947-#kL5" OR (productID = "JODL-X-1937-#pV7" AND price=30)
ES Query:

    {
   "query" : {
      "filtered" : {
         "filter" : {
            "bool" : {
              "should" : [
                { "term" : {"productID" : "KDKE-B-9947-#kL5"}}, 
                { "bool" : { 
                  "must" : [
                    { "term" : {"productID" : "JODL-X-1937-#pV7"}}, 
                    { "term" : {"price" : 30}} 
                  ]
                }}
              ]
           }
         }
      }
   }
}


Not having to learn a different query language for every product is exactly the point of SQL, and naturally also the point of pretty much every "SQL interface for X".


If only this was the case. Except that every database has their own flavour of SQL with plenty of proprietary extensions. Not to mention each database will support a different subset of SQL.


Dont know why anyone is downvoting this, this is absolutely true, and its even worse because the subset might work differently with the same keywords due to the underlying optimizer differences.

This makes code you wrote in two different engines with the same syntax run vastly differently.

Only if you are writing basic queries is the ansi sql promise realized.


>>Except that every database has their own flavour of SQL

Sure, but simple queries, like the poster above provides as an example, have been standardized for like decades, and work the same across pretty much every credible RDBMS.

https://news.ycombinator.com/item?id=10577996


> Except that every database has their own flavour of SQL with plenty of proprietary extensions. Not to mention each database will support a different subset of SQL.

Its a lot easier to become familiar with a new SQL dialect than a new query language, especially since a lot of basic querying will be the same between different dialects.


Learning the basics is easy with any query language eg. MongoDB/ES JSON. It's when you start writing more advanced queries that you realise that SQLs "write once, run anywhere" premise is an illusion.


Sure, but its relatively easy traslating from one dialect of SQL to another, as opposed to learning a whoe new query language.


No idea what motivated the authors, but from my perspective, basically: yes.

I've been working with ES for years now. the query language changes about twice per year, things which used to be the only way to do things become deprecated. (see: https://www.elastic.co/guide/en/elasticsearch/reference/curr... )

Even if you do understand 100% of the idiosyncrasies of elasticsearch, you still need to construct horrifically nested JSON to do meaningful queries... making it annoying to do in code, but mind wrenching to debug with curl.


ES query is enormously verbose (and JSON formatting adds another layer) -- so SQL is not great but it is much more succinct than ES alternative


Absolutely agree. ES query is difficult to read.


Personally I find SQL much easier to read. In a product where you might allow users to write queries it is a much more pleasant UX experience to write and read SQL than it would be to do so for ES queries. It isn't a problem to understand how to build an ES query object but an ES query object is pretty terrible to read IMO.


It could certainly lower the barrier of entry into ES. I was almost scared off by the bafflingly huge query object I had to build for my first few (seemingly simple) queries.


No joins.

I wonder why you wouldn't use PrestoDB to connect to Elastic Search. It provides you with an SQL engine and you just need to write a connector that knows how to get data.

Similar thing has been done in Crate.io.



Interesting — looks like the join isn't pipelined. The entire right-hand side evaluates synchronously. So it has to wait for the entire right-hand result set before it can evaluate the join operator, instead of streaming it concurrently. I'm surprised anyone would do it this way in Java, which has good support for concurrency.

Edit: Actually the file you linked to was a test file. Hash join code is here [1], and it uses ES' scrolling feature to incrementally join, though it's not pipelined. Not sure scrolling is entirely appropriate for this; it will potentially hold an unpredictable amount of memory on the server end.

[1] https://github.com/NLPchina/elasticsearch-sql/blob/5cd6ab639...


Joins are absent from the documented list of features and examples on the initial README but there's "limited support" for joins as described here: https://github.com/NLPchina/elasticsearch-sql/wiki/Join


Don't have much experience with Presto, but I have used Hive to query Elastic Search.

It works very very well for full-table-scan analytics-type queries. I expect PrestoDB would be similar. But when it comes to queries about smaller and smaller pieces of the full dataset, it becomes less and less likely that these types of connectors perform well. Predicate pushdown is rarely well-implemented in these types of "run SQL against any big data" systems (Hive, Presto, Impala, SparkSQL, etc). A simple "select * where id = 1234" will often do a full scan and filter within the query engine, rather than push the point lookup into ES.


Actually Spark SQL's data source API has a very expressive predicate pushdown interface and most data sources implement them. id = 1234 should not do a full scan.


As mentioned Spark SQL has good predicate push down support. The ElasticSearch, Cassandra, HBase, MongoDB adapters all support it.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: