
Google BigQuery hits the gym to beef up even more - vgt
https://shinesolutions.com/2016/08/19/google-bigquery-hits-the-gym-to-beef-up-even-more/
======
vgt
Here is the blog describing the Capacitor storage engine (successor to
ColumnIO, which was the inspiration for Parquet):

[https://cloud.google.com/blog/big-data/2016/04/inside-
capaci...](https://cloud.google.com/blog/big-data/2016/04/inside-capacitor-
bigquerys-next-generation-columnar-storage-format)

Here's the blog post describing Google's in-memory execution (some of what the
new Dremel version is doing, including pipelined execution, dynamic in-memory
shuffling, dynamic partition sizing, etc):

[https://cloud.google.com/blog/big-data/2016/08/in-memory-
que...](https://cloud.google.com/blog/big-data/2016/08/in-memory-query-
execution-in-google-bigquery)

These are some of the more significant improvements since the Dremel paper:

[http://static.googleusercontent.com/media/research.google.co...](http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/36632.pdf)

------
paulasmuth
If you care about not having your data locked-in to Google's ecosystem or
think BigQuery is too expensive, please consider to give the open-source
EventQL [0] alternative a try.

[0] [https://eventql.io](https://eventql.io)

~~~
vgt
Really excited to see the open source community around big data grow.

Three minor points on "locked-in" and "too expensive":

\- BigQuery now has fully standard SQL and allows for easy exports of ALL your
data out of BigQuery

\- BigQuery has a Free Tier

\- BigQuery is a vast multi-tenant cluster, and you pay for it per-job. Folks
who are spending just a couple of dollars a month get to experience running
SQL on a HUGE cluster for a few seconds at a time.

Broadly, it's very exciting to witness emergence of open source distributed
data processing technologies (Apache Druid and Flink ftw!). BigQuery stores
data and runs SQL well, yes, but it is also a serverless fully-managed service
with a pay-only-for-what-you-consume pricing structure, seamless downtime-free
upgrades and maintenance, encryption, high availability, durability,
compliance and proven at scale to XXX PB.

