
Tools to Transform and Query Data with the Apache Drill and Dplyr in R - bsg75
https://github.com/hrbrmstr/sergeant
======
bsg75
> Drill + sergeant is (IMO) a nice alternative to Spark + sparklyr if you
> don't need the ML components of Spark (i.e. just need to query "big data"
> sources, need to interface with parquet, need to combine disperate data
> source types — json, csv, parquet, rdbms - for aggregation, etc).

I am really surprised Apache Drill is not gaining more attention / traction
for reasons like the above.

> I find writing SQL queries to parquet files with Drill on a local 64GB Linux
> workstation to be more performant than doing the data ingestion work with R
> (for large or disperate data sets). I also work with many tiny JSON files on
> a daily basis and Drill makes it much easier to do so. YMMV.

Using drill both on a local machine and a 4 node cluster, I find it very
convenient and fast, using an API that many people and tools are already
familiar with: ANSI SQL.

