The first prototype implementation of dqlite was in Go, leveraging the hashicorp/raft implementation of the Raft algorithm. The project was later rewritten entirely in C because of performance problems due to the way Go interoperates with C: Go considers a function call into C that lasts more than ~20 microseconds as a blocking system call, in that case it will put the goroutine running that C call in waiting queue and resuming it will effectively cause a context switch, degrading performance (since there were a lot of them happening). See also this issue in the Go bug tracker.
The added benefit of the rewrite in C is that it's now easy to embed dqlite into project written in effectively any language, since all major languages have provisions to create C bindings.
Does WebAssembly or any of its runtimes provide a way to do this?
Presumably, https://github.com/CraneStation/wasmtime would benefit from such an FFI API being specified.
My guess is their main goal was to use this in go, where they already have experienced go and C developers and adding a third language would muddy things.
But it made me think that a killer app for Rust would be this entire concept done in one embedded DB with bindings for most of the popular languages.
B) I don't think I've seen a rust binary more than 10 megs. Cargo, rust, and ripgrep are both about 6 mb on my disk; fd is 2.5mb. These seem like a reasonable standins for a binary of significant size and complexity. sqlite3 itself is about 1.3mb.
C) Dqlite doesn't seem particularly concerned with disk-constrained systems, though I may be interpreting their site incorrectly, and the low footprint should be equally achievable with the rust runtime—surely the database itself would be a much larger concern.
This just seems like an unusually good fit for the benefits of the language—reliable client glue you can import into many runtimes where being able to prove data flow would be an strong defensive coding pattern. That said, I think that C is a good, conservative approach here, I'm certainly not knocking anyone's judgement. Overall the parent poster is absolutely correct: there's a strong correlation between use of rust's type system and size of outputted code.
Everything is an embedded device nowadays, so for reference, if you buy a WiFi AP today and open it up, you're likely to find a 8 or 16 MiB NOR flash inside, maybe a 128 MiB NAND flash (with realistically 64 MiB space since it will be doing A/B updates).
I don't think the database size is a big concern. For me the focus in dqlite is very much on the 'd' - you store atomic configuration data in there, it's not about throughput.
Seems like, at least for embedded devices, you'd want something as small as possible so as to avoid consuming all available disk (not to say any other language will balloon it significantly or not).
Found the answer on Reddit:
> rqlite is a full RDBMS application, but dqlite is a library you must link with other code. It's like the difference between MySQL and libsqlite3.so.
It looks like dqlite's documentation has changed -- for some reason frames are no longer mentioned anywhere. So maybe this isn't the case any more, but this was once the biggest differentiator for me.
According to https://github.com/canonical/dqlite/blob/master/doc/faq.md this is still the case.
Yes, exactly, Dqlite is a library, rqlite is a full application.
It's good they made it as a separate project can be used independent of LXD containers.
In includes an answer about the difference with rqlite.
To me reading the docs it seems like dqlite has been developed by the team who develops LXD at Canonical as LXD is listed as the biggest user of the project and it says on the authors github that he works at Canonical at/with LXD/LXC.
Interesting project, good luck to the author/authors if you read this!
Yes, good luck to the creators of this project, it looks very interesting and I've been watching it for a few years now.
That is the primary reason why I do not consider it for new projects. It's just to slow to iterate on.
1. create new table as the one you are trying to change
2. copy all data,
3. drop old table,
4. rename the new one.
“Copy all data” also can be difficult if the table has data that the database created that must stay the same because you use it elsewhere. That shouldn’t be a problem with SQLite, as it doesn’t allow rowid as foreign key, but if you use it as a foreign key outside the database, or use the hash of a full row to detect changes, it may still bite you.
It also may mean being offline for a significant amount of time, but that also often is (effectively) the case for databases that support deleting columns
sqlite supports ADD COLUMN and RENAME COLUMN DDLs.
Dropping columns is not supported, nor is adding some of the more complex column, that does require going through full table rewriting.
See another reply in this thread: https://news.ycombinator.com/item?id=20841814
My resolution, window size, color settings, text zoom, font rendering, etc, are almost certainly different, too, but at least they've made the page more than twice as slow by forcing the correct font.
I added answers to your other questions to the FAQ:
The main differences from rqlite are:
- Embeddable in any language that can interoperate with C
- Full support for transactions
- No need for statements to be deterministic (e.g. you can use time())
- Frame-based replication instead of statement-based replication
More fundamentally, as mentioned above, Dqlite is a library, whereas rqlite is a RDBMS (albeit a pretty simple and lightweight one).
Does this store the entire log for all time? When you bring up a new node, does it replay the entire history? If not, how do you bring up a new node without data?
How does backup/restore work?
How do upgrades work? Is the shared WAL low-level enough that it's 100% stable/compatible between sqlite/dqlite versions? If not, what happens if half your cluster is on the old version while you're upgrading, and sees things it doesn't understand yet?
Is it possible to encrypt node/node traffic? Or can you easily send the node-node traffic over a proxy, like Envoy? How about over a unix domain socket or "@named" unix domain socket (which we use for Envoy here at Square)
Looks awesome, by the way!
I thought Linux didn't support real async disk IO. Is that not the case?
If Linux has no real async disk IO, how does Dqlite achieve fully async disk IO?
There is now a new async I/O API available in Linux (I'm not remembering the name right now, but it was developed by folks at Facebook). It looks promising so I'll check it at some point. (dqlite author here)
> A: There can’t be a conflict situation. Raft’s model is that only the leader can append new log entries, which translated to dqlite means that only the leader can write new WAL frames. So this means that any attempt to perform a write transaction on a non-leader node will fail with a ErrNotLeader error (and in this case clients are supposed to retry against whoever is the new leader).
Correct me if I'm wrong, but isn't that essentially the same limitation WAL mode in normal sqlite has? With WAL you can have as many reads going on as you like, in parallel to a single write. That seems directly comparable to what the dqlite FAQ says, unless I'm missing something.
I believe they (the LXD team) are working on upstreaming the WAL changes but due to SQLite's very strong compatibility guarantees they want to be very certain the API and protocol are correct before carving it in stone. Not to mention they are the only major users of the feature, so more widespread use would also be nice before merging it upstream.
That's interesting. Didn't know about it.
So an easy use-case that springs to mind is any sort of distributed IOT device that need to track state. So any industrial or consumer monitoring system with a centralised controller that would use this for data storage. Specifically, that this enables the use of multiple nodes for high throughput imagine many, many, many sensors and a central controller streaming real-time data.
All of that needs tooling.
I have a teeny, tiny cluster using MySQL+galera as a multi-master cluster, but it took a while to iterate to monitoring that tells me when one node is unhealthy and getting the correct repair and restart procedures.
FWIW, I built all the functionality into rqlite from the very start, for exactly those reasons. In the real world a database must be operated.
I like to use projects which lots of other projects depend on directly. That way if the main project goes unsupported, all the other projects using it will band together to support a fork. I believe open source that is not created for a company will last much longer. (I like that they rewrote it in C, though; it would probably survive well as a fork if enough people/projects use it)
Odd, I'd think x64 would be the most commonly used architecture. Is this a mistake?
If you want light-speed insert/delete, you could probably don't use the disk at all: as long as a majority of your nodes don't die, you won't lose any data. You can also go somewhere in between and save to disk only at specific intervals.
I know Zookeeper, for example, supports observer nodes: essentially a cheaper read-only cache. Chubby at Google had the same thing.