
Consistency Models of Cloud Storage Services - mariomario
https://cloudrail.com/compare-consistency-models-of-cloud-storage-services/
======
jamesblonde
I'm going to make a prediction. Cloud storage services will follow the same
roadmap as SQL -> NoSQL -> NewSQL. NoSQL systems grew from the need to scale-
out relational databases. NewSQL systems bring both scalability and strong
consistency, and companies moving back to strong consistency. Similarly,
object stores systems grew from the need to scale-out filesystems. A new
generation of hierarchical filesystem will appear that bring both scalability
and hierarchy, and the cloud will eventually move back to scalable
hierarchical filesystems. I'm basing this prediction on recent work on scaling
hierarchical filesystems by providing distributed metadata layers -
[https://goo.gl/NPXLQN](https://goo.gl/NPXLQN)

~~~
spullara
As it turns out, very few software engineers can build sophisticated
applications on top of KV stores.

------
jamesblonde
Big companies work around this problem by having their own metadata store
which is a (non-consistent) replica of S3/GCS' metadata.

Netflix built s3mpr to stop problems with Hadoop jobs inserting files, other
jobs then starting but not finding the file(s) when they execute a listing for
it:

[http://techblog.netflix.com/2014/01/s3mper-consistency-in-
cl...](http://techblog.netflix.com/2014/01/s3mper-consistency-in-cloud.html)

Spotify are doing the same for GCS.

AWS have even done it for EMR -
[http://docs.aws.amazon.com/ElasticMapReduce/latest/Developer...](http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-
fs.html)

~~~
cle
Even better than a replica, is using a strongly-consistent metadata store to
drive the S3 metadata--use the metadata store to acquire unique S3 keys, write
to S3, then transact the S3 manifest back to the metadata store. Then you do
not have to deal with the problem of replication.

------
cle
This is extremely important to think about if you're building on top of cloud
services. Misunderstanding consistency models and limitations can make or
break your application and cause serious data quality problems.

This seems relatively misleading, at least for S3. S3 is similar to the
comments for Google Cloud. List operations are generally eventually
consistent. RAW is strongly consistent. Read-after-overwrite is eventually
consistent.

I can only comment on S3 because I have a lot of experience with it.

~~~
jamesblonde
"Read-after-overwrite is eventually consistent." This means update to normal
people.

~~~
cle
Read-after-overwrite makes it clear that it's in contrast to and distinct from
read-after-write. Sometimes different words mean the same thing, and you pick
the word based on its context. And in this context, the readers aren't "normal
people"\--whatever that means--they're engineers. Where the specific meaning
of "update" and "write" can vary depending on an even more specific context.

------
abofh
Am I the only one who balked when they didn't link to the 'special cases':
[http://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction....](http://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html#ConsistencyModel)

The special case is 'new' vs. 'replace'.

S3's consistency has changed over time, but if you're going to pretend to be
the authority on any of them, at least be accurate on the biggest of them.

------
cammio
how does Swift behave? eg. ovh object storage

~~~
alpha_ori
Presumably this is covered by the "Rackspace" entry, which is the last entry
on the table.

