The post touches upon it, but I didn't really understand the point. Why doesn't synchronous replication in Postgres work for this use case? With synchronous replication you have a primary and secondary. Your queries go to the primary and the secondary is guaranteed to be at least as up to date as the primary. That way if the primary goes down, you can query the secondary instead and not lose any data.
How is PostgreSQL (or MySQL) "considerably less scalable" exactly? etcd isn't particularly known for being scalable or performant. I'm sure it's fast enough for your use-case (since you've benchmarked it), but people have been scaling both PostgreSQL and MySQL far beyond what etcd can achieve (usually at the cost of availability of course).
[I work at Tailscale] I only mean scalable for our very specific and weird access patterns, which involves frequently read-iterating through a large section of the keyspace to calculate and distribute network+firewall updates.
Our database has very small amounts of data but a very, very large number of parallel readers. etcd explicitly disclaims any ability to scale to large data sizes, and probably rightly so :)
This is getting confusing. The tweets sound like you are concerned about write scalability, and here it sounds like you are concerned about read scalability?
> So we can get, say, 1000 updates, bundle them all, get it synced in say ~100ms, and then answer all 1000 requests at once, and still only take ~100ms.
I assume the same trick is applicable to RDBMS as well? So you batch the 1000 updates, and do one commit with a single fsync.
> Virtually every other database I've used is quite naive about how they flush blocks to disk, which dramatically reduces their effective transactions/sec. It's rare to see one that made all the right choices here.
Can you elaborate on this? Anyway RDBMS worth its salt should be able to saturate the disk IOPS, i.e. the act of flushing itself wouldn't be the bottleneck.
> Our database has very small amounts of data but a very, very large number of parallel readers.
So the control plane is the sole writer of this database, and there are maybe 100s/1000s of other readers, who each has a watcher on etcd? Who are these readers? If they are different processes on different machines, how did it work when the database was in the json file?
Sorry for the barrage of questions, but I have to ask out of curiosity.
Not OP, but if I could ask further... How much consistency can you tolerate on your reads? From the use case you mention I imagine... quite a lot, but you could risk locking yourself out of networks/systems if you get it wrong?
I've always found it hard to reason about relaxing consistency, and I think people underestimate how much complexity they take on by moving away from serializable isolation towards something looser. (Fun fact! Many databases out of the box don't correctly handle the classic transaction example -- read an account balance, subtract an amount from the balance, and then add the amount to another account's balance.)
Usually people design their app with the expectation of strict serializable isolation, relax it because of some production emergency, and then deal with the business consequences of the database doing the wrong thing until the company goes out of business (usually not due to database isolation levels, to be fair).
AFAICT Linearilazable is about the best we can expect in reality (at least for a distributed system), but as you point out: Very few people actually check their assumptions... and even fewer actually think about DB transactions correctly in the first place. It's actually really, really hard and people have these rules of thumb in their heads that aren't actually correct.
Which gets me to wondering if we could formalize some of this stuff... (in relevant "code scopes", dgmw!)
EDIT: If there is one thing I am certain about it is the fact that a lot of consistency can be relaxed around human interaction. It's lossy anyway, and people will call you (eventually, depending on anxiety/shyness) if you haven't fulfilled an order. The browser is the first order of that and that's already out of date once you show a page, so... Anyway, that's just to say it's amusing how much people worry about consistency on the front end
Can you go into more about what these problems are? I've always used databases (about 15 years on Oracle and about 5 years on Postgres) and I'm not sure if I know what problems you are referring to. Maybe I have experienced them, but have thought of them by a different name.
SQL - I'm not sure what the problems are with SQL. But it is like a second language to me so maybe I experienced these problems long ago and have forgotten about them.
ORM - I never use an ORM, so I have no idea what the problems might be.
testing latency - I don't know what this refers to.
SQL is fine. We use it for some things. But not writing SQL is easier than writing SQL. Our data is small enough to fit in memory. Having all the data in memory and just accessible is easier than doing SQL + network round trips to get anything.
ORMs: consider yourself lucky. They try to make SQL easy by auto-generating terrible SQL.
Testing latency: we want to run many unit tests very quickly without high start-up cost. Launching MySQL/PostgreSQL docker containers and running tests against Real Databases is slower than we'd like.
Dependencies: Docker and those MySQL or PostgreSQL servers in containers.
Can you put some numbers on how much time is too much? I've never seen anyone go this far to avoid using a database for what sounds like the only "real" reason is to avoid testing latency (a problem which has many other solutions) so I am really confused, but curious to understand!
Running all of our control server tests (including integration tests) right now takes 8 seconds, and we're not even incredibly happy with that. There's no reason it should even be half that.
So it's not really in our patience budget for adding a mysqld or postgres start up (possible docker pull, create its schema, etc).
>right now takes 8 seconds, and we're not even incredibly happy with that
With the amount of explaining and skepticism you're having to deal with in most of the threads here (plenty of reasonable questions, some seem to approach the question with the assumption that your approach is totally wrong) I feel compelled to comment on how nice such a fast feedback loop would be just so it's known that you're not listing these benefits into an ether that doesn't appreciate them.
Not sure what their requirements are, but I'm using a "spin up an isolated postgres instance per test run" solution and end up with ~3s overhead to do that. (Using https://pypi.org/project/testing.postgresql/
Edit: 3s for global setup/teardown. Not per test function/suite.
Docker itself doesn't add much latency. It just makes getting MySQL and PostgreSQL easier. If anything, it helps with dependencies. The database server startup still isn't great, though.
If you don't use Docker, you can just leave the database server running in the background, which removes the startup latency (you can of course do this with Docker too, but Docker has a tendency to use quite a few resources when left running in the background, which a database server on it's own won't).
I mean that's an `apt install postgres` or `brew install postgres` away. Takes about 5 minutes. I guess it could become a pain if you need to work with multiple different versions at once.
Being deep in the cloud world right now, with aws and terraform and kubernetes cli tools, etc, not having to install third party tools on my machine does sound pretty great, but also entirely unrealistic.
Managing local DBs once new versions are out and your server isn't upgraded yet is irritating, but when I'm using a Mac I'd still rather use a native DB than Docker because of the VM overhead, since I've not yet run into a bug caused by something like "my local postgres was a different version than the server was." (Closest I've gotten was imagemagick for mac doing something a bit differently than for linux, about 10 years ago at this point.)
> I've not yet run into a bug caused by something like "my local postgres was a different version than the server was."
Ran into that at a recent place - the code was doing "= NULL" in a bunch of places (before my time) and PG12 treated that differently than PG11 did which broke a bunch of tests.
Were you doing a lot of logic in SQL itself? Sounds like not really, but then I'm surprised you'd have so many tests hitting the DB directly, vs most feature logic living above that layer in a way that doesn't need the DB running at all.
Why are your unit tests touching a database? I’m a real stickler about keeping unit tests isolated, because once I/O gets involved, they invariably become much less reliable and as you mention, too slow.
Sorry, I should've just said tests. Our tests overall are a mix of unit tests and integration tests and all sizes in between, depending on what they want to test.
I think the "database" label is tripping up the conversation here. What's being talked about here, really, is fast & HA coordination over a (relatively) small amount of shared state by multiple actors within a distributed system. This is literally Etcd's raison d'etre, it excels at this use case.
There are many operational differences between Etcd and a traditional RDBMs, but the biggest ones are that broadcasting updates (so that actors may react) is a core operation, and the MVCC log is "exposed" (via ModRevision) so that actors can resolve state disagreements (am I out of date, or are you?).