The post touches upon it, but I didn't really understand the point. Why doesn't ...

apenwarr · on Jan 13, 2021

That would have been considerably less scalable. etcd has some interesting scaling characteristics. I posted some followup notes on twitter here: https://twitter.com/apenwarr/status/1349453076541927425

judofyr · on Jan 13, 2021

How is PostgreSQL (or MySQL) "considerably less scalable" exactly? etcd isn't particularly known for being scalable or performant. I'm sure it's fast enough for your use-case (since you've benchmarked it), but people have been scaling both PostgreSQL and MySQL far beyond what etcd can achieve (usually at the cost of availability of course).

apenwarr · on Jan 13, 2021

[I work at Tailscale] I only mean scalable for our very specific and weird access patterns, which involves frequently read-iterating through a large section of the keyspace to calculate and distribute network+firewall updates.

Our database has very small amounts of data but a very, very large number of parallel readers. etcd explicitly disclaims any ability to scale to large data sizes, and probably rightly so :)

throwdbaaway · on Jan 14, 2021

This is getting confusing. The tweets sound like you are concerned about write scalability, and here it sounds like you are concerned about read scalability?

> So we can get, say, 1000 updates, bundle them all, get it synced in say ~100ms, and then answer all 1000 requests at once, and still only take ~100ms.

I assume the same trick is applicable to RDBMS as well? So you batch the 1000 updates, and do one commit with a single fsync.

> Virtually every other database I've used is quite naive about how they flush blocks to disk, which dramatically reduces their effective transactions/sec. It's rare to see one that made all the right choices here.

Can you elaborate on this? Anyway RDBMS worth its salt should be able to saturate the disk IOPS, i.e. the act of flushing itself wouldn't be the bottleneck.

> Our database has very small amounts of data but a very, very large number of parallel readers.

So the control plane is the sole writer of this database, and there are maybe 100s/1000s of other readers, who each has a watcher on etcd? Who are these readers? If they are different processes on different machines, how did it work when the database was in the json file?

Sorry for the barrage of questions, but I have to ask out of curiosity.

Quekid5 · on Jan 13, 2021

Not OP, but if I could ask further... How much consistency can you tolerate on your reads? From the use case you mention I imagine... quite a lot, but you could risk locking yourself out of networks/systems if you get it wrong?

EDIT: Negation is important

jrockway · on Jan 13, 2021

I've always found it hard to reason about relaxing consistency, and I think people underestimate how much complexity they take on by moving away from serializable isolation towards something looser. (Fun fact! Many databases out of the box don't correctly handle the classic transaction example -- read an account balance, subtract an amount from the balance, and then add the amount to another account's balance.)

Usually people design their app with the expectation of strict serializable isolation, relax it because of some production emergency, and then deal with the business consequences of the database doing the wrong thing until the company goes out of business (usually not due to database isolation levels, to be fair).

Quekid5 · on Jan 14, 2021

Interesting perspective, thanks!

Not sure whether I agree or disagree, actually...

AFAICT Linearilazable is about the best we can expect in reality (at least for a distributed system), but as you point out: Very few people actually check their assumptions... and even fewer actually think about DB transactions correctly in the first place. It's actually really, really hard and people have these rules of thumb in their heads that aren't actually correct.

Which gets me to wondering if we could formalize some of this stuff... (in relevant "code scopes", dgmw!)

EDIT: If there is one thing I am certain about it is the fact that a lot of consistency can be relaxed around human interaction. It's lossy anyway, and people will call you (eventually, depending on anxiety/shyness) if you haven't fulfilled an order. The browser is the first order of that and that's already out of date once you show a page, so... Anyway, that's just to say it's amusing how much people worry about consistency on the front end

bradfitz · on Jan 13, 2021

We could've done that. We could've also used DRBD, etc. But then we still have the SQL/ORM/testing latency/dependency problems.

irrational · on Jan 13, 2021

Can you go into more about what these problems are? I've always used databases (about 15 years on Oracle and about 5 years on Postgres) and I'm not sure if I know what problems you are referring to. Maybe I have experienced them, but have thought of them by a different name.

SQL - I'm not sure what the problems are with SQL. But it is like a second language to me so maybe I experienced these problems long ago and have forgotten about them.

ORM - I never use an ORM, so I have no idea what the problems might be.

testing latency - I don't know what this refers to.

dependency - ditto

bradfitz · on Jan 13, 2021

SQL is fine. We use it for some things. But not writing SQL is easier than writing SQL. Our data is small enough to fit in memory. Having all the data in memory and just accessible is easier than doing SQL + network round trips to get anything.

ORMs: consider yourself lucky. They try to make SQL easy by auto-generating terrible SQL.

Testing latency: we want to run many unit tests very quickly without high start-up cost. Launching MySQL/PostgreSQL docker containers and running tests against Real Databases is slower than we'd like.

Dependencies: Docker and those MySQL or PostgreSQL servers in containers.

nmjohn · on Jan 13, 2021

Can you put some numbers on how much time is too much? I've never seen anyone go this far to avoid using a database for what sounds like the only "real" reason is to avoid testing latency (a problem which has many other solutions) so I am really confused, but curious to understand!

bradfitz · on Jan 13, 2021

Running all of our control server tests (including integration tests) right now takes 8 seconds, and we're not even incredibly happy with that. There's no reason it should even be half that.

So it's not really in our patience budget for adding a mysqld or postgres start up (possible docker pull, create its schema, etc).

ac2u · on Jan 14, 2021

>right now takes 8 seconds, and we're not even incredibly happy with that

With the amount of explaining and skepticism you're having to deal with in most of the threads here (plenty of reasonable questions, some seem to approach the question with the assumption that your approach is totally wrong) I feel compelled to comment on how nice such a fast feedback loop would be just so it's known that you're not listing these benefits into an ether that doesn't appreciate them.

disgruntledphd2 · on Jan 14, 2021

Yeah, I'm inspired to reduce the cost of my tests to get to a better place as a result of this.

Mind you, I build ML/statisical models, so my integration/e2e tests are definitely not going to get down to 8 seconds.

xupybd · on Jan 14, 2021

Wow that's really quick. I can see why that would be very desirable.

It would create a nice flow to get feedback from your test suite that quickly.

viraptor · on Jan 13, 2021

Not sure what their requirements are, but I'm using a "spin up an isolated postgres instance per test run" solution and end up with ~3s overhead to do that. (Using https://pypi.org/project/testing.postgresql/

Edit: 3s for global setup/teardown. Not per test function/suite.

irrational · on Jan 13, 2021

Ah, we don't use Docker or any other container technology. Maybe that is why we aren't seeing the latency issues you are referring to.

bradfitz · on Jan 13, 2021

Docker itself doesn't add much latency. It just makes getting MySQL and PostgreSQL easier. If anything, it helps with dependencies. The database server startup still isn't great, though.

nicoburns · on Jan 13, 2021

If you don't use Docker, you can just leave the database server running in the background, which removes the startup latency (you can of course do this with Docker too, but Docker has a tendency to use quite a few resources when left running in the background, which a database server on it's own won't).

bradfitz · on Jan 13, 2021

So then every engineer needs to install & maintain a database on their machines. (Hope they install the right version!)

I mean, that's what my old company did pre-Docker. It works, but it's tedious.

irrational · on Jan 13, 2021

Do the developers not have an internet connection? Why can’t they just hit a database running in the cloud or your own server somewhere?

throwdbaaway · on Jan 13, 2021

There is a middle path -- use lxd. This way, the mysqld/postgres process can always be running, while the dev machine remains clean.

(But of course the battery will drain a little faster if it is a laptop)

nicoburns · on Jan 13, 2021

I mean that's an `apt install postgres` or `brew install postgres` away. Takes about 5 minutes. I guess it could become a pain if you need to work with multiple different versions at once.

majormajor · on Jan 13, 2021

Being deep in the cloud world right now, with aws and terraform and kubernetes cli tools, etc, not having to install third party tools on my machine does sound pretty great, but also entirely unrealistic.

Managing local DBs once new versions are out and your server isn't upgraded yet is irritating, but when I'm using a Mac I'd still rather use a native DB than Docker because of the VM overhead, since I've not yet run into a bug caused by something like "my local postgres was a different version than the server was." (Closest I've gotten was imagemagick for mac doing something a bit differently than for linux, about 10 years ago at this point.)

zimpenfish · on Jan 14, 2021

> I've not yet run into a bug caused by something like "my local postgres was a different version than the server was."

Ran into that at a recent place - the code was doing "= NULL" in a bunch of places (before my time) and PG12 treated that differently than PG11 did which broke a bunch of tests.

cpuguy83 · on Jan 14, 2021

Would love to know what resources you are speaking of here.

We've definitely done some whack-a-mole with allocations in the engine, and of course there's always things getting changed/added still.

majormajor · on Jan 13, 2021

Were you doing a lot of logic in SQL itself? Sounds like not really, but then I'm surprised you'd have so many tests hitting the DB directly, vs most feature logic living above that layer in a way that doesn't need the DB running at all.

jolux · on Jan 13, 2021

Why are your unit tests touching a database? I’m a real stickler about keeping unit tests isolated, because once I/O gets involved, they invariably become much less reliable and as you mention, too slow.

bradfitz · on Jan 13, 2021

Sorry, I should've just said tests. Our tests overall are a mix of unit tests and integration tests and all sizes in between, depending on what they want to test.

jolux · on Jan 13, 2021

I see. That makes more sense.

jgraettinger1 · on Jan 13, 2021

I think the "database" label is tripping up the conversation here. What's being talked about here, really, is fast & HA coordination over a (relatively) small amount of shared state by multiple actors within a distributed system. This is literally Etcd's raison d'etre, it excels at this use case.

There are many operational differences between Etcd and a traditional RDBMs, but the biggest ones are that broadcasting updates (so that actors may react) is a core operation, and the MVCC log is "exposed" (via ModRevision) so that actors can resolve state disagreements (am I out of date, or are you?).