
Announcing Google Cloud Bigtable - suprgeek
http://googlecloudplatform.blogspot.com/2015/05/introducing-Google-Cloud-Bigtable.html
======
mataug
I've been quite frustrated with the google cloud platform, Just take a look at
their APIs for AppEngine, CloudSQL, GCE and the sort, Its pathetic compared to
their direct competitor AWS.

Lets compare trying to do create an RDS instance on AWS vs Creating one on
CloudSQL

AWS:

1\. Get AccessToken and AccessSecret from IAM

2\. pip install boto

3\. conn = boto.rds.connect_to_region("us-west-2")

4\. db = conn.create_dbinstance("db-master-1", 10, 'db.m1.small', 'root',
'hunter2')

Done !

Google

1\. Get a Client Id and Client Secret

2\. pip install google-api-python-client

3\. Go through the OAuth Flow and run a server locally to capture the access
token

4\. Use the discovery api to generate a service object. Good Luck finding this
in the documentation

5\. use the uninspectable service object to create a cloudsql instance.

The reason I don't have code for steps 3,4 and 5 is because I gave up after
wasting time trying to figure this out.

My point is that they've gotten into the habit of doing half assed work so I
have no hopes that the've improved this time. Practically no way to automate
this The only way to use this would be from the horribly slow GUIs that google
provides.

EDIT:

I ended up using google cloud sdk cli and running the automations with
subprocess.check_output(['gcloud', 'sql', 'instance' ... ])

~~~
forgottenacc56
You missed the snakes and ladders game of following googles out of date
documentation.

"You're doing it wrong, we deprecated that last year, I know we haven't yet
updated the 4 separate places in which the same thing is documented all in a
different context and different way. We really should fix that but we're busy
coding. Did you know we're all PhDs at Google?"

What?!?!? You're using the Google console version 3? Why? That feature isn't
implemented there. Stupid you. You're MEANT to be using our new Google console
version X. Why are you using our old one?

Also you missed the 16 hours you'll spend trying work out why something isn't
working only to find it's actually been changed or taken out of the Google
developer console, without any trace or notice left in the code to say'we
moved/removed the feature that you expected to be here.'

Really you should ask Google for help on StackOverflow which is now the
official Google channel for ignoring support questions, and where your
question will within seconds be down voted, derided and deleted by the
StackOverflow Community,saving Google the effort of not reading and ignoring
your question about how to resolve the catastrophic failure of your software.

Seriously though, why not entrust your critical systems to such capable hands?

~~~
jjjjoe
(Google Cloud Support here)

The across-the-board guidance to go to StackOverflow is not working, and we
get that. It's not just that system administration questions get downvoted on
StackOverflow (which I think is the parent's point) but that StackExchange
isn't good for general discussion.

We (support) are trying to be much more clear that free support is not
"StackOverflow or nothing." We recently updated our "community" support page
at
[https://support.google.com/cloud/answer/3466163](https://support.google.com/cloud/answer/3466163)
to lay out all of the community support options in one place.

Bottom line: everything listed on that page has Googlers actively
participating. This includes groups, StackExchange, and issue trackers.
There's room for plenty of improvement, the intent is not to ignore you.

~~~
hoodoof
Google should have a central support forum and try to focus all the questions
into one place.

StackOverflow should not be on your support list at all, for the reasons
raised. Having said that, Google should still be on SO answering questions
that end up there.

What Google needs to understand is that its long standing public reputation
that it has built is an organisation that actively tries to avoid providing
support - ever since Google started it has tried to avoid support. That is now
the reputation that Google carries into its efforts to woo the developer
community.

Google has to be extraordinarily good at developer support to dispel the
baseline assumption that developers have that Google really (genuinely) wants
to avoid dealing with support questions.

There's a level of cluelessness to Google's support strategy that is
concerning. Why, for goodness sake, would it EVER look like a good idea to
push support to StackOverflow? Who is doing the thinking behind that sort of
decision? It is self evident that Google's support interests and StackOverflow
are not the same thing. It's the sort of decision made without really
considering the detail, and that is the point about Google's support - it's an
afterthought. Kind of like washing the dishes after dinner is eaten - has to
be done but we're not enthused about it.

And in the end, lack of support is a showstopper for using a cloud computing
platform. If the support looks sketchy then it just isn't worth risking your
business by using that platform.

~~~
jjjjoe
You make great points. I'm new and can't really speak to your rhetorical
question of "what were they thinking?!?" but I do hope for my own job security
that support isn't an afterthought :-)

First, I should point out that what we are talking about here is Google's
"Bronze Support." This is similar to Amazon's "Basic Support." In both cases
it's not what you should be thinking about if you need a case response time
SLA, or the ability to wake up engineers at midnight. If your business depends
on _any_ platform provider I really hope you buy a support plan which gets you
the ability to talk to support and engineering whenever you need. Google
definitely offers these. They start at $150 per month.
([https://cloud.google.com/support/](https://cloud.google.com/support/)) End
plug.

On StackOverflow: let's stop calling it "support." It's a Q&A site with good
SEO. If you have a question which "belongs" there it's a fine place to ask.
We're moderating our "go to StackOverflow no matter what" messaging, but I
can't see tossing it completely.

Anyway, when it comes to free support, I partially agree with your point about
a single forum, in that can be confusing. Again, keep an eye on
[https://support.google.com/cloud/answer/3466163?hl=en&ref_to...](https://support.google.com/cloud/answer/3466163?hl=en&ref_topic=3340599)

To your point about a single forum for everything, I respectfully disagree.
Free-form discussion is different from bug tracking. Highly structured Q&A has
a place too. It seems to me that every product should have \- a discussion
forum where users can discuss the product and raise issues \- an issue tracker
to collect bug reports and feature requests \- a designated place for Q&A

On each of those three, support staff and (preferably) engineers should
participate daily. I think the situation with Compute Engine is closest to our
ideal right now: it has a lively Google Group, actively triaged issue tracker
and a sponsored tag on ServerFault (which I hope we can agree is a better
destination than StackOverflow)

~~~
nulltype
While StackOverflow has a number of flaws, it is pretty good for finding
things that I'm looking for. I've definitely found posts from the BigQuery
team on there answering my BigQuery questions.

As for Google's support plan thing, it might be a little odd that it costs
$150 per month, but if you're spending any significant money GCP, it's well
worth it. The support response times even at the lowest level are pretty good
and they sometimes fix bugs.

Of course, I think if Google employees use GCP internally, it will improve at
a much faster rate.

------
justinsb
I think using the HBase API is a very clever move. This means that the HBase
API is now supported on AWS (EMR), GCE, VMWare (Serengeti), OpenStack
(Sahara), and everywhere (Hadoop, if you're willing to run it yourself).

In comparing against DynamoDB (for example), you'll have to weigh a
proprietary single-vendor API against an API with a good open-source
implementation (that will get even better with hydrabase), yet that is also
available in managed-form on all major clouds.

Edit: although - ouch - the $1500 per month entry price-point does not compare
well to DynamoDB's $5 per month minimum.

~~~
turingbook
Where is $1500 per month from? I can not find it in the pricing page.

~~~
dudus
Cost per node per hour - $0.65; Minimum number of nodes per cluster - 3

0.65 x 3 x 24 x 30 = $1404 / month

And that's before any storage costs.

~~~
turingbook
Thanks!

------
obulpathi
Pretty impressed with the performance metrics: Reads/Writes 6ms@99% compared
to Cassandra 300ms for read and 10 ms for write.

------
wiradikusuma
How is it different than Datastore?
[https://cloud.google.com/datastore/](https://cloud.google.com/datastore/)

~~~
Goranek
datastore is a copy of Google Megastore service. It has indexes, sql like
queries, transactions.. and you don't need to run servers like with BigTable
(you pay for documents and api calls only)

~~~
SjuulJanssen
Why would I want to manage instances?

~~~
vgt
I don't believe you need to manage anything with BigTable.. "instances" is a
concept to describe iterations of scale only

------
bbromhead
So their benchmark of Cassandra against BigTable doesn't even match their
previous benchmark of Cassandra.

[http://googlecloudplatform.blogspot.com/2014/03/cassandra-
hi...](http://googlecloudplatform.blogspot.com/2014/03/cassandra-hits-one-
million-writes-per-second-on-google-compute-engine.html)

How did the latency for Cassandra on their cloud platform increase by 200ms
from a year ago?

~~~
ivansmf
I wrote last year's benchmark. The clusters are completely different, and so
is the workload. Last year's cluster had 300 VMs, which was a much higher
price point, and the workload was write only. This benchmark uses YCSB
workloads A and B, which we though matches the usage we'll have on BigTable.
The cluster is much smaller as well. I shared my scripts from last year, it is
pretty easy (although a bit expensive) to repro the numbers. Let me check if
we can share this year's benchmark scripts as well.

~~~
bbromhead
I'm pretty surprised about the difference in latency though, throughput as you
say will be different due to number of nodes.

For any given replication factor in Cassandra, overhead remains the pretty
much the same irrespective of whether you have 300 or 3 nodes. So should the
latency.

On top of that both BigTable and Cassandra use SSTables to store the data on
disk (with all the compactiony goodness that goes with them), so I'm even more
surprised that the difference in latency is so huge.

Would love to see the scripts for the benchmarks! I don't want to take away
from a great product launch and I'm sure BigTable kicks arse in certain areas
that Cassandra doesn't... I'm just surprised at the differences in latency.

------
StevePerkins
Any information on pricing? I doubt they'd have specific prices ready to
announce yet, but it would be good to at least know the DIMENSIONS by which it
will be priced (e.g. per read and write, storage, etc?). Will it be accessible
to "classic" App Engine front-end instances, or only meant for Compute Engine
VM's and "App Engine 2.0" Managed VM's?

The biggest pain point with the current Datastore is how difficult it can be
to predict your costs. Also, there are weird quirks in the pricing model (e.g.
"writes" used to cost more than "reads", it's more expensive to delete rows
than it is to flag them as tombstoned and continue storing them indefinitely,
etc). These quirks have left people with a lot of technical debt from having
designed around them.

If this is another database option (alongside the Datastore and CloudSQL) for
"classic" App Engine apps, which aren't likely to be re-written for Managed
VM's, then it might be interesting. However, if it's only for Compute Engine
or Managed VM contexts, where you're _not_ locked-in and are free to choose
any technologies you want, then at this point I would need to hear some pretty
amazing information on the pricing model before I could be bothered to even
test it out. Google lock-in is _painful_... once you've gone through the
trouble of breaking free from the App Engine jail, it's really difficult to
even consider adding new lock-in dependencies.

EDIT: Doh. You have to click through a couple of links from the original post
to find it, but they have indeed posted pricing specifics already.

[https://cloud.google.com/bigtable/#pricing](https://cloud.google.com/bigtable/#pricing)

Looks like it's priced by the number of VM nodes you want in your cluster,
storage, and network I/O if you're using it from outside Google's datacenters.
No metered pricing on "read ops" and "write ops". This model IS a significant
improvement over classic Datastore pricing. Unfortunately, it doesn't look
like you can use it as a Datastore-replacement on classic App Engine front-end
instances... and I'm not sure that I wouldn't just use Cassandra in other
contexts where I have complete control.

~~~
B-Scan
Cost per node per hour - $0.65; Minimum number of nodes per cluster - 3; SSD
storage (GB/mo) - $0.17; HDD storage (GB/mo) (coming soon) - $0.026; Source:
[https://cloud.google.com/bigtable/](https://cloud.google.com/bigtable/)

------
pdevr
"To help get you started quickly, we have assembled a service partner
ecosystem to enable a diverse and expanding set of Cloud Bigtable use cases
for our customers. "

Any idea how the service partners were chosen?

------
eva1984
Is this like a direct competitor to DynamoDB? How about open-source solutions,
like Cassandra/HBase?

~~~
Blackthorn
Bigtable is the original Cassandra/HBase.

------
sagivo
funny they call it "open source" just because it supports other open source
API.

------
nivertech
Interesting if there will be Cloud Bigtable to BigQuery connector, possible
using Cloud Dataflow.

~~~
michaelwsherman
Currently you can use the BigQuery Hadoop connector and write a MapReduce job
to scan Bigtable and write everything to BigQuery. Works quickly. I'm sure
dataflow support is in the works, since Google internally doesn't really use
MR and therefore likely has this on the back end already.

Source--I wrote one of the whitepapers on the BigTable homepage.

------
rplnt
Anyone willing to be dependant on this is honestly stupid when you take into
account Google's history in this area: unreliability, changes in offerings,
changes in pricing, discontinuations of services, hard lock-in, bad customer
service, ...

~~~
jpatokal
As the blog post says, Bigtable internally runs virtually all of Google's big
services. This means it's rock solid, and it's not about to get discontinued
anytime soon.

~~~
BinaryIdiot
> Bigtable internally runs virtually all of Google's big services

Are you sure? I'm not a Googler but have been told by other Googlers that
Bigtable has essentially been replaced internally (though heard it's still
similar). So I wasn't sure how much Bigtable is even used anymore inside of
Google.

~~~
jpatokal
The blog post says "the same database that drives nearly all of Google’s
largest applications", and I work at Google, so yes, I'm pretty sure ;)

Of course Google has a whole slew of other storage options optimized for
various use cases, but some of these are actually built on top of Bigtable.

~~~
BinaryIdiot
Haha okay fair enough. Thanks!

------
EugeneOZ
All is cool except one thing - it's vendor lock and vendor is known for
absence of customers support, often API deprecations and products shutdowns.

------
3lux
does anyone know of a Go client?

~~~
skj
[https://github.com/google/google-api-go-
client](https://github.com/google/google-api-go-client) is the code-gen Go
client for Google APIs.

It probably does not (yet) have the generated client for cloud bigtable
checked in (but I'm sure it will), but you can always use it to generate a
client. You pass it the API to use on the command line, it will go fetch the
docs it needs to make your client, and put its source where you tell it to.

~~~
kevinschumacher
This uses the HBase API. You just connect to it like you would any other HBase
cluster, using an HBase client. It's not like e.g., BigQuery or Datastore
where you need the API client. You include a JAR and then connect to HBase
like normal.

~~~
saurik
The website claims that you must use their customized version of the Java
HBase client library: it does not claim it is network compatible, and seems to
state it is API compatible with the Java API (but then describes numerous
subtle differences).

> To access Cloud Bigtable, you use a customized version of the Apache HBase
> 1.0.1 Java client.

------
amelius
Nice, but you still can't use this for your privacy-aware customers.

~~~
ImJasonH
Care to elaborate on why not?

~~~
amelius
Storing data in a database that is managed by a third-party is something that
some customers explicitly forbid.

~~~
hoddez
Wouldn't that mean you can't use any cloud data services with any company? Or
even cloud hosting? What kind of customers forbid this?

~~~
amelius
> What kind of customers forbid this?

Government entities, for instance.

~~~
icebraining
[https://www.google.com/work/apps/government/](https://www.google.com/work/apps/government/)

Google has a dedicated cloud environment for governmental agencies:
[http://googleforwork.blogspot.pt/2009/09/google-apps-and-
gov...](http://googleforwork.blogspot.pt/2009/09/google-apps-and-
government.html)

~~~
amelius
> We look forward to working with governments across the country on these
> exciting initiatives in the months ahead.

So what about foreign governments?

------
forgottenacc56
No Python 3? Not interested. All that BigTable development, pointless without
drivers. Silly Google.

~~~
estefan
Do you actually know what hadoop & HDFS are?

~~~
saurik
I am going to read your comment as "you should be able to use the off-the-
shelf drivers for HBase for Python" (I have elided the "3" as no one uses
Python 3: that must have been a typo for "2" ;P). The "APIs" that Google
describes as being compatible with are for Java, not the network: "To access
Cloud Bigtable, you use a customized version of the Apache HBase 1.0.1 Java
client.". So, no: it seems like if you are not using Java you will need to
pull apart their customized Java SDK and build your own driver.

