
Strong consistency in S3 (data lake) - albertlie
Hi all, I just started to build the big data pipeline that require strong consistency instead of eventual consistency. I&#x27;m searching for data storage or data lake that will be capable of serve strong consistency for OLTP that is scalable even for big data. Have a good idea?<p>Currently I&#x27;m using S3 but found out that it&#x27;s only support eventual consistency even if it&#x27;s durable and scalable
======
mneil
You could look at tuning hdfs or Cassandra to be strongly consistent. By
default though they both use eventual consistency for replication. The
difference is you can set the required number of nodes to accept a write as a
success. What I can't say for sure is if this is a good idea.

~~~
albertlie
Hi mneil,

Thanks for the suggestion,

a.Cassandra I don't think Cassandra is good option in this as well since it's
by natural is AP database (designed for high availability)

b. HDFS I wonder for keeping the strong consistency I should use other
database on top of hadoop maybe like bigtable/Hbase since they are designed
for CP database. Maybe you know the data consistency model in HBase or
Bigtable?

c. Maybe GoogleCloud Spanner or CockcroachDB? I just read from this about
possiblity of NoSQL to be ACID compliant, but I'm not sure about this option
is suitable for big data or not for long term since our data growth is super
fast. Did you ever know about this is being used in big data use case?

~~~
mneil
You can make Cassandra cp by making writes persist to 100% of nodes. Obviously
you lose the availability.

Mongo, for example, can be acid but you lose write throughput again. Postgres,
I believe, has better throughput for most workloads over mongo and is acid
compliant. I do not have any experience with spanner our cockroach so I can't
really comment there. I don't believe hbase offers strong consistency either.

Maybe postrges is a good choice? I didn't mention it because the "data lake"
term made me immediately think hdfs and I wouldn't call postgres a lake.

