

Apache Drill: Schema-free SQL Query Engine for Hadoop and NoSQL - dsr12
http://drill.apache.org/

======
akbar501
The Hadoop community has been making huge strides on integrating with existing
technologies and developers thanks to the various SQL-like query engines.

Based on the home page it appears that Drill currently supports queries on
Hadoop, but has not yet started supporting non-Hadoop DBs.

I'm interested to see how well Drill integrates with existing BI tools vs.
Impala, and how well it queries non-Hadoop data stores vs. Presto.

------
tshiran
It depends what you mean by layout. If you add/remove/change the fields in the
records, Drill can handle that. It can also handle a situation where you move
from one format (say JSON) to another format (Parquet) and have mixed formats
in a directory.

Drill is described as schema-free, but the data doesn't have to lack schema.
Some data (Parquet, Avro) has schema, while some data (JSON, HBase, MongoDB)
doesn't (ie, each record could have different fields). Drill is designed to
allow queries on any data. Also, Drill leverages the structure that's embedded
in the data without requiring IT to redundantly define schemas in a
centralized schema repository. In other words, if there's a schema embedded in
the Parquet file, why require the user to then go and maintain the same
schema, explicitly, in Hive metastore?

------
arthursilva
What if you change the layout of your files? There's a very thin line
separating schema and schema-less databases.

~~~
jaltekruse
You are correct that schema-less systems can still require maintenance.
However a schema-less system like Drill allows users of the data like business
analysts to write interactive queries as soon as they bring new files into
HDFS. Surely if you were using Drill to query data programmatically, you would
want to make sure only the technical staff could modify the data during the
dev process, and likely use an automated process to incorporate data as it
comes in. However, this is no different than managing appropriate permissions
on a traditional database.

~~~
arthursilva
Skipping the schema configuration step only saves a few minutes but if you are
sloppy schema-less may hurt you big time (hidden bugs, etc.).

~~~
tshiran
I think there's a tradeoff here. It's like MongoDB - hundreds of thousands
downloads per month because it enables people to build and deploy applications
faster. For many cases a relational database is still better though.

The nice attribute of Drill is that it works on schema-less data as well as
data with strong schemas. The user can make the tradeoff between agility and
'safety'.

