I don't get the point of this. Do people not want to learn ES so bad that they will use something like this without understanding how to build an ES query object? All this plugin does is convert sql to an ES query...
I think it's the size of the ES query object as well as complexity... even the ES documentation provides SQL examples to explain the ES query object...
Not having to learn a different query language for every product is exactly the point of SQL, and naturally also the point of pretty much every "SQL interface for X".
If only this was the case. Except that every database has their own flavour of SQL with plenty of proprietary extensions. Not to mention each database will support a different subset of SQL.
Dont know why anyone is downvoting this, this is absolutely true, and its even worse because the subset might work differently with the same keywords due to the underlying optimizer differences.
This makes code you wrote in two different engines with the same syntax run vastly differently.
Only if you are writing basic queries is the ansi sql promise realized.
>>Except that every database has their own flavour of SQL
Sure, but simple queries, like the poster above provides as an example, have been standardized for like decades, and work the same across pretty much every credible RDBMS.
> Except that every database has their own flavour of SQL with plenty of proprietary extensions. Not to mention each database will support a different subset of SQL.
Its a lot easier to become familiar with a new SQL dialect than a new query language, especially since a lot of basic querying will be the same between different dialects.
Learning the basics is easy with any query language eg. MongoDB/ES JSON. It's when you start writing more advanced queries that you realise that SQLs "write once, run anywhere" premise is an illusion.
Even if you do understand 100% of the idiosyncrasies of elasticsearch, you still need to construct horrifically nested JSON to do meaningful queries... making it annoying to do in code, but mind wrenching to debug with curl.
Personally I find SQL much easier to read. In a product where you might allow users to write queries it is a much more pleasant UX experience to write and read SQL than it would be to do so for ES queries. It isn't a problem to understand how to build an ES query object but an ES query object is pretty terrible to read IMO.
It could certainly lower the barrier of entry into ES. I was almost scared off by the bafflingly huge query object I had to build for my first few (seemingly simple) queries.
I wonder why you wouldn't use PrestoDB to connect to Elastic Search. It provides you with an SQL engine and you just need to write a connector that knows how to get data.
Interesting — looks like the join isn't pipelined. The entire right-hand side evaluates synchronously. So it has to wait for the entire right-hand result set before it can evaluate the join operator, instead of streaming it concurrently. I'm surprised anyone would do it this way in Java, which has good support for concurrency.
Edit: Actually the file you linked to was a test file. Hash join code is here [1], and it uses ES' scrolling feature to incrementally join, though it's not pipelined. Not sure scrolling is entirely appropriate for this; it will potentially hold an unpredictable amount of memory on the server end.
Don't have much experience with Presto, but I have used Hive to query Elastic Search.
It works very very well for full-table-scan analytics-type queries. I expect PrestoDB would be similar. But when it comes to queries about smaller and smaller pieces of the full dataset, it becomes less and less likely that these types of connectors perform well. Predicate pushdown is rarely well-implemented in these types of "run SQL against any big data" systems (Hive, Presto, Impala, SparkSQL, etc). A simple "select * where id = 1234" will often do a full scan and filter within the query engine, rather than push the point lookup into ES.
Actually Spark SQL's data source API has a very expressive predicate pushdown interface and most data sources implement them. id = 1234 should not do a full scan.