Hacker News new | past | comments | ask | show | jobs | submit login

"We settled on postgresql, because, despite going into toast tables and having to implement sharding on top"

This is a common mistake. The choice doesn't actually guarantee consistency at all. PostgreSQL only guarantees consistency as long as you don't try to communicate with it over a network, which you obviously do if you have shards. To address this problem we have things like 2pc, but they aren't very useful for an RDBMS on their own without the whole infrastructure of a distributed database.

The networking isn't a problem if you only acknowledge a write when the underlying database says so. You could have a write succeed before you crash. But you won't lose an acknowledged write.

No, but when your requirement is to "never loose a write regardless of which server fails" you have to make sure the mutation was acknowledged by more than one server.

So the least you have to do is to wait for a second server to confirm the write. But now what do you do when the network link between the two servers goes down in-between a write? The write might have already been applied on the first server or it might not have. The same goes for the second server. But you can't be sure either way.

For our thought experiment, let's assume that while the servers can't talk to each other anymore, some clients can still talk to each of the servers.

At this point, you either have to give up availability (i.e. don't accept any further writes until the fault is resolved) or consistency (i.e. potentially return unacknowledged/stale data from one of the nodes until the fault is resolved).

While returning unacknowledged/stale data is acceptable for some usecases, it does break ACID semantics. You can't get both ACID and "always-on" availability with a simple replication scheme like this.

If you want linearizability and HA, you need at least three servers and a more complex, quorum-based scheme. However, Postgres doesn't support that (AFAIK) - you have to use something like Zookeeper or CockroachDB. I think this is what zzzcpan meant.

(I realize that you only spoke about never loosing a write and didn't say anything about updates/queries. If you don't care about ACID, postgres with synchronous replication is a good highly available solution)

You're taking the wrong approach.

We don't have auto-healing HA, you have 32 master databases which have replica databases underneath them with synchronous replication. Meaning things must be synced to both before the COMMIT OK is received by the client.

Then you do the sharding logic in the application.

No write can be sent back as being "OK" unless it's on disk on 2 servers which represent a vertical slice of our entire database structure.

We assume power-loss scenarios mostly, which means if it's on disk and not in vfs then we're fine- as power-loss is more likely than complete raid degradation or server disappearance, although the replicas help with that too.

You don't need quorum at all in this scenario, and no matter which database or client fails you will not lose data that you've acknowledged, even on immediate power loss to 50% of your entire infra.

So what do you do when the master for one of your shards fails? Do you drop all incoming writes for the shard on the floor? Or do you fail over to the shard's slave and promote the slave to the new master?

Since you said you don't have "auto healing HA" I assume you don't fail over, but discard/deny incoming writes until the master comes back up.

This is a valid approach, but, I don't see how it contradicts what I said at all:

  - I said you can't get full ACID and HA failover at the same time with postgres
  - Your scheme does not provide HA failover

I explicitly said that if you can forgo either full ACID or HA in case of a failure, postgres is fine.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact