
Joy and Pain of Using Google BigTable - jonomacd
https://syslog.ravelin.com/the-joy-and-pain-of-using-google-bigtable-4210604c75be
======
cortesi
This matches my experience with BigTable, down to the short-duration failure
spikes.

I feel that something should be said on the plus side of the ledger here. I'm
the solo founder of a company that indexes huge amounts of fine-grained
information. Bigtable is the key technology that let me start my company on my
own: it soaks up all the data we can throw at it, with almost zero
maintenance. Even within the stable of GCP technologies it stands out as being
particularly reliable.

My biggest "problem" with BigTable is the lack of public information on schema
design - which in this context is mostly the art of designing key structures
to solve specific problems. I've come up with sensible strategies, but much of
it was far from obvious. I can't help but feel that there should be a body of
prior art I could draw on.

~~~
rrdharan
Disclosure: I work on Google Cloud Bigtable.

You might find this talk from a recent Google Cloud event useful in this
regard:

Visualizing Cloud Bigtable Access Patterns at Twitter for Optimizing Analytics
(Cloud Next '18)
[https://www.youtube.com/watch?v=3QHGhnHx5HQ](https://www.youtube.com/watch?v=3QHGhnHx5HQ)

------
yongjik
Random inside joke I overheard about ten years ago:

"It's called BigTable, not FastTable or AvailableTable!"

...It's probably a _bad_ idea to evaluate 2019's BigTable based on the joke,
but my puerile mind still find it amusing. :)

------
fastest963
We are a user of BigTable, 30k writes/sec and 300k reads/sec, and compared to
the other managed services (Pub/Sub, Memorystore, etc), it has been the most
stable by far, but we have to scale up our node count at times when we don't
think we should have to (based on the perf described in the docs) as well as
the latency/errors described in the article. They also added storage caps
based on node count last year that increased our costs dramatically.

The Key Visualizer has been a huge help but there's still not enough metrics
and tooling to understand when things do go wrong or what is happening behind
the scenes. Luckily we have a cache sitting in front of Bigtable for reads
that allows us to absorb most of the described intermittent issues because
cost has prevented us from doing any sort of replication.

~~~
jonomacd
Interestingly, for us scaling up doesn't really solve the short term
unavailability. It seems to be only somewhat related to load as it does seem
to hit more often at high traffic times but we have also seen it at low
traffic times.

Putting in that cache is a great move. Cache is challenging for us as we get
hits over a very wide range of keys.

------
HenryBemis
Reading the article the following quote got my attention "you should always
keep things simple even if your tools allow for more complex patterns".

I follow the "it is perfect when you don't need to remove anything else" rule
in most systems/processes/functions/tasks in life (not only IT systems). I am
happy to see in this cluttered space called IT there are many more like-minded
people who see that too much is TOO much.

~~~
SEJeff
Obligatory, "Does this function bring you joy?"

~~~
codeisawesome
It's a beautiful thing that this comment exists :-D

------
radsftw694
As another (former) user of cloud bigtable (migrating from cassandra) we saw
almost identical results. Great performance when it works, but regular periods
of unavailability (this was around 2-3 years ago at this point). Interesting
to hear that they still have the same problems. Had a similar experience
spending time with the cloud bigtable team but they never really got to the
bottom of it.

~~~
jonomacd
Yeah, it was quite frustrating trying to figure out what was going on. Up
until replication was released, it makes it a real non-starter for a lot of
use cases. With replication you can combat the problem and it does give you
great performance (when it isn't giving you random errors).

~~~
puzzle
When I was at Google, you were not supposed to serve end users straight out of
BigTable. You had to do extra work: request hedging against multiple replicas
(Jeff Dean has mentioned this up in public many times, with numbers on long-
tail latency impact), some in-memory caching if appropriate, etc. In other
words, you had to protect end users from Bigtable: after all, its original
target was the web crawling and indexing pipelines. The problem is that, as
you both say, it works very well the vast majority of the time, so people
tended to get spoiled and/or make assumptions. Which is why Spanner was
created.

As to those hiccups, unless they last for minutes or hours, in which case you
might have a case of data corruption (BT is paranoid and rereads data right
after any kind of compaction), most of the time they might be explained by, in
approximately increasing order of badness:

\- an orderly tablet server restart, e.g. for a binary update or because a
Borg machine is undergoing a kernel update

\- a tablet server crash: a software crash or a hardware one (this is bad,
because there's a timeout that needs to be hit before a new server can take
over the shard. The BT paper has details about the recovery protocol.)

\- heavy load on the master, while either of the previous two are happening

\- I don't think any of the various types of compactions would normally block
reads/writes, but with some abnormal traffic patterns you might be able to
make the tablet server suffer

\- slowness at the lower layer, GFS/Colossus (although it mitigates a bit
against this by having two separate log files into which it can write)

\- Chubby outage

\- power outage affecting a good chunk of or the entire cluster

~~~
jonomacd
This is great feedback. Thanks for this. We have heard the same thing
particularly about using multiple clusters. They only started offering it last
year though!

------
abalone
Worth noting his original reason for moving away from DynamoDB is outdated.
DynamoDB added an “adaptive capacity” feature to handle hot partitions.[1]

[1] [https://aws.amazon.com/blogs/database/how-amazon-dynamodb-
ad...](https://aws.amazon.com/blogs/database/how-amazon-dynamodb-adaptive-
capacity-accommodates-uneven-data-access-patterns-or-why-what-you-know-about-
dynamodb-might-be-outdated/)

~~~
ses1984
It's still expensive though, right?

~~~
abalone
Compared to what? You'll have to be more specific.

Definitely adaptive capacity targets the primary reason people had to
overprovision DynamoDB. It changes the entire calculation and obsoletes all
the advice you might have heard based on experiences prior to late 2017.

------
hcnews
"Unfortunately, we do multiple operations on Bigtable in one request to our
api and we rely on strong consistency between those operations."

I feel like "strong consistency" is misused here. Strongly consistent is
relevant only in a distributed environment. Its usually solved by using
paxos/raft between the replicas. Bigtable only has had best-effort
replication, so I am not sure its being mentioned here. I think they are
looking for the term serial, that their queries have to be executed in a
specific order for a particular user request.

~~~
jonomacd
Bigtable is a distributed database. It does best effort replication across
regions, hence eventual consistency. But within a region it is still
distributed across nodes and provides strong consistency.

------
draw_down
I really, really, really hate unexplained problems like the one described
here. Not in storage but any facet of computing. It's true that the systems we
build and work on are complex, but they are also ultimately deterministic, and
there is a reason why something goes wrong like TFA describes. Ideally we
would seek to understand our systems before continuing to add features to
them, but of course the real world often doesn't work that way.

This would be a super frustrating situation for me, particularly when you're
not given the tools you need to diagnose in the first place, _and_ you loop in
support but they still can't help you identify what's wrong.

Years ago, I worked on a .NET system that sometimes would respond super slowly
and we didn't have a concrete explanation for why. As in TFA, we developed a
kind of religion about it. "Oh, it must be JITting", that sort of stuff.

