

Announcing the High Replication Datastore for App Engine - peter123
http://googleappengine.blogspot.com/2011/01/announcing-high-replication-datastore.html

======
mmastrac
Glad to see this live! We've been testing this for the last couple of months
at <http://gri.pe> without any issues. It basically means you don't have to
worry about datastore maintenance, ever. There hasn't been a significant speed
hit for us at all.

I _highly_ recommend choosing this option when creating a new AppEngine
project. It's more expensive than a standard datastore, but we pay virtually
nothing for storage anyways.

Thanks to Ikai/Kevin and the rest of the HR datastore folks who made this
happen.

~~~
ludwigvan
This is slightly off-topic, but here in Turkey, ghs.google.com is blocked for
some reason; and Google App Engine projects with custom domains like
<http://gri.pe> are blocked as well. (Blogspot blogs with custom domains are
blocked too.)

So, you should be aware of that if you wonder why you get no hits from Turkey.

~~~
ZoFreX
It's because they block things overzealously. When one video on YouTube
offended them, they blocked all of YouTube. Some site running on GAE offended
them, so the whole of ghs.google.com got blocked.

------
piotrSikora
Comparison between Master/Slave (current) and High Replication (just
announced) Datastores:

<http://code.google.com/appengine/docs/python/datastore/hr/>

------
strlen
Does this perform Paxos on every single transaction (wouldn't that be
expensive?) or is Paxos used to elect a write coordinator replica and ensure a
consistent view of the cluster to guarantee serializable consistency?

If it's former, can we get more details? This would be quite similar to the
work Daniel Abadi had been doing at Yale at parallelizing distributed commit
(e.g., Paxos) to maintain high throughput despite increased latency.

~~~
rbranson
I'm going to assume the later. Why would they need to perform Paxos on every
single transaction when just using it to elect a master gets 99.9% of the
benefits? That would be consistent with the implementation of BigTable in
general.

EDIT: The higher write latency and 3x CPU cost is probably because it uses
synchronous replication to write to all of the slaves before acknowledging the
transaction.

~~~
powera
A reasonable thought, but wrong. The existing datastore has basically the
99.9% of the benefits you describe. Unfortunately, 99.9% availability
(especially when a single datastore request may touch 10 things and thus be at
99% availability) isn't good enough.

One of the major problems with having a single bigtable is partial outages.
These can either be the major events like there were today, with 10% or more
of the cell having issues, or it can be just one server in the bigtable having
performance issues. The key to having higher availability with the High
Replication datastore is that if certain entities are slow or unavailable in
one datacenter, we can rely on the performance of other datacenters to limit
the latency.

~~~
rbranson
Thanks for the clarification, but I'm not sure I follow what I was wrong
about. I wasn't saying that it should have 99.9% availability, I was just
stating that in most situations, using Paxos to elect a master provides very
nearly all the benefits (99.9% was just a vague figure to indicate this) of
doing a Paxos commit each time.

~~~
powera
Specifically, for every write, it will undergo the Paxos mechanism; the writes
aren't serialized through a single master.

~~~
strlen
Very interesting. Will more information on this be included in the CIDR paper
on Megastore?

------
babyshake
They say you can only activate this for new apps. But there's no way AFAIK to
do this while keeping the same app id.

There should definitely be a way to do a one-time upgrade from master/slave to
this new datastore without having to mess around with moving our data from one
app to another.

~~~
powera
Well, if you use Google Apps and your own domain, you can point the domain at
the new app after copying data over.
<http://code.google.com/appengine/docs/domain.html>

Being able to do a migration online and for the same appspot.com domain is
something we'd like to do, but I don't have any timeline as for when that will
be available.

~~~
richardw
Unless you use HTTPS, which currently depends on the appspot domain.

In any case, thanks! Very welcome upgrade.

------
Maro
You can get scalable Paxos replication for yourself when ScalienDB comes out:

<http://github.com/scalien/scaliendb>

(My product.)

------
rbranson
How are they providing consistent get/put/delete in the event of network
partition? Is it CP?

~~~
brown9-2
I would assume that if one of their datacenters became partitioned from the
others in the group, then they would stop serving requests out of that DC
entirely. This would alleviate the concern for network partition-tolerance,
wouldn't it?

~~~
rbranson
The problem is that C systems have masters. What happens when the master
becomes network isolated? The election and recovery process is tricky. If 100%
consistency is assumed, there MUST an outage period until this is recognized
and corrected.

~~~
powera
If you require a read/write to talk to a majority of the replicas, you don't
need to have a specific replica as the master to maintain consistency.

------
marcc
That explains the 30 minute outage earlier today!

~~~
powera
While the HR datastore is designed so events like that won't happen (and it
was not affected by said outage), there's no correlation between this launch
and the datastore unavailability earlier today.

------
grandalf
Can this be selected only for some entity groups?

~~~
mmastrac
No, it's all or none, configured at the appid level.

