
BigQuery: Required Reading List - isp
https://medium.com/@thetinot/bigquery-required-reading-list-71945444477b
======
filereaper
I'd really start with Google BigQuery Analytics by Jordan Tigani and Siddartha
Naidu. Jordan's one of the core developers and shows up to BigQuery and GCP
related conferences.

[1] [https://www.amazon.ca/Google-BigQuery-Analytics-Jordan-
Tigan...](https://www.amazon.ca/Google-BigQuery-Analytics-Jordan-
Tigani/dp/1118824822)

~~~
vgt
Jordan has since become the engineering lead on bigquery and Dremel!

The book is still very relevant but would be nice to get a refresh - many
things have changed. Perhaps the biggest one is the execution engine. Jordan
gave a nice talk on this in March:

[https://youtu.be/UueWySREWvk](https://youtu.be/UueWySREWvk)

------
isp
A collection of condensed, easy-to-read posts that I found helpful when
learning about BigQuery (BQ).

------
slap_shot
Great sources! Thanks.

I would love to know why BigQuery has a limit of 1,000 loads per day, per
project [0].

I'm a founder of an early stage company that helps companies ETL their data
into various Data warehouses, and I'm frequently met with companies that have
evaluated BigQuery but ended up with Redshift or Snowflake because the maximum
inserts/update/deletes are too low for their ETL process. Until these limits
are increased, BigQuery is not an alternative to Redshift or Snowflake for a
vast number of analytics teams that query thousands of tables that are updated
hourly.

[0]
[https://cloud.google.com/bigquery/quotas](https://cloud.google.com/bigquery/quotas)

~~~
vgt
I work at Google and authored some of those posts.

The limitation is per table[0] and for "batch loads" only. If you have per row
inserts, you can always use bigquery's amazing streaming API, you can also do
materialization and federated query loads.

Batch limits are there to help folks from creating too many small files, which
affects performance (I believe Jordan's book explains this in greater detail).

Would love to know more about your use cases since this type of thing
shouldn't be the deciding factor in choosing an analytics data warehouse and
should be easy to architect for. As someone pointed out, most quotas are
negotiable. If anything, bigquery ingest process is pretty amazing and unique
in the industry [1].

Apologies for brevity, I'm out of office on paternity leave.

[0][https://cloud.google.com/bigquery/quotas#import](https://cloud.google.com/bigquery/quotas#import)

[1][https://medium.com/google-cloud/paying-it-forward-how-
bigque...](https://medium.com/google-cloud/paying-it-forward-how-bigquerys-
data-ingest-breaks-tech-norms-8bfe2341f5eb)

~~~
slap_shot
All use cases were a firehose of raw data being pulled from a Kafka topic
every 15 seconds - 1 minute, written to storage, and inserted into 1,000+
tables.

~~~
vgt
Yeah definitely streaming API. This use case is MADE for BigQuery :)

Check out, as well as the blog posts on the required reading list on the topic
of inserts. There's also a pretty good Kafka-bq connector.

It sounds like "ingest" was the big factor in your choice of a data warehouse.
Regardless, I think bigquery has the best story in the industry here (but I'm
biased).

I'd help more hands on but sadly not online for a while :)

------
hilti
Thank you for this great reading list! Currently I'm using Firebase for a
medium-sized BI web app, but I'm really interested in BigQuery for bigger
datasets.

