Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Who is using bigtable and what's your experience with it?
16 points by xstartup on March 30, 2018 | hide | past | favorite | 6 comments

My company is currently using big table in production, it is really great since it scales linearly by nodes. That is as long as you get the row key design / store paradigms correct.

Super fast for lookups (can be as quick as using something like Redis which store in memory). Row ranges by prefix are how you can efficiently scan your data. Tables scans while not necessarily frowned upon, you should most likely try to avoid , especially for high write throughput systems since it will eat your IOPS up.

Does it also support aggregation or your end up doing that on the client side? Are there operational issues or it's pretty much hands-off? Auto-scaling, provisioning, etc, etc...

Operational tasks are on you, there are scripts that can be found to help out with auto-scaling the cluster but that also does put load on the cluster because it has to do a lot of work underneath when scaling down to rearrange splits. I would only recommend scaling up.

All aggregations are done outside the DB level so yes your client would most likely have to create the logic for the rollup.

How do you think it compares to Google cloud data store?

Former GCP support here. Bigtable and Cloud Datastore (or the newer, shinier Firestore) are very different, and meant for different purposes.

Bigtable is meant for wide-column data at high volume. If your data can be organized into simple rows and columns, and you plan on using massive amounts of it at high throughout (think IoT transactions, for example), then bigtable is the right choice.

Datastore, on the other hand, is for semi-structured data, with parent/child hierarchies, key value pairs, etc. It isn't run on a cluster of nodes like bigtable, but is managed behind the scenes as part of App Engine. It is slower than bigtable, but is more sophisticated, and offers client libraries for ORMs (ndb), SQL-like queries, and the like.

There's a brief comparison chart here: https://cloud.google.com/storage-options/

I also highly recommend the Google cloud data engineering course at Coursera: https://www.coursera.org/specializations/gcp-data-machine-le...

Or the instructor's book, " Data Science on the Google Cloud Platform: Implementing End-to-End Real-Time Data Pipelines: From Ingest to Machine Learning"


Datastore is a slightly different beast, they handle all the underlying node work for you , so you are not scaling your cluster. They also provide a SQL like interface for querying your data which in big table would be completely on your implementation. There exists some logical boundaries by using `namespaces` (Inherently tables) and then the `kind` for further selection.

When you query bigtable you can provide column / various filters on what type of data you would like to return or truncate. The problem with that is most of the filters at the bigtable level are regex based and can be resource intensive. The better approach with bigtable is to design a table (row key) for each type of query you would want to ask the DB.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact