Hacker News new | past | comments | ask | show | jobs | submit login
Testing consistency of rqlite (github.com/wildarch)
139 points by otoolep on April 19, 2022 | hide | past | favorite | 37 comments



rqlite author here, happy to answer any questions. Many thanks to Nienke Eijsvogel, Ruben van Baarle and Daan de Graaf for executing this testing.

https://github.com/rqlite/rqlite


Not really related to TFA, but I had a look at the python driver documentation and I was very surprised by this statement [0]:

> Limitations

> Transactions are not supported

Is there something I am badly misreading? How come is a so fundamental feature not supported? Maybe it is just a limitation in the python driver?

[0] https://github.com/rqlite/pyrqlite#limitations=


No, it's due to rqlite.

rqlite does support a form of transactions -- you can send a set of SQLite statements, and either all will be successful, or none will be. You can even send a BEGIN statement, and a SQLite transaction will be opened (and you can COMMIT later). However, due to the nature of Raft, restarting the cluster in the middle of an open transaction will not clear the transaction -- and most folks probably don't expect this. That's why I need to think more before advocating use of traditional transactions with rqlite -- distributed transaction functionality is easy to get wrong, so requires careful thought. You can use SQLite transactions, but you must be careful. That's why drivers don't tend to support it yet.

https://github.com/rqlite/rqlite/blob/master/DOC/DATA_API.md...


Taken from the readme:

> the behavior of a cluster if it fails while such a manually-controlled transaction is not yet defined

Do you envision manually-controlled transactions working seamlessly one day? If not, have you thought about what guarantees are feasible/you hope to provide?


Yes, I have a design in mind, but it's a fair amount of work. It will require a new API too -- and API that allows clients to explicitly create a new connection, create a transaction on it, and close that transaction when finished.


I there a go client other than gorqlite which doesn't seem to actively maintained.


Would it be a better compromise to support transactions as long as they were sent as a single block?


Not sure if I follow you, but that's exactly what rqlite does support.

https://github.com/rqlite/rqlite/blob/master/DOC/BULK.md


Sorry, I meant from a marketing perspective, saying "transactions are only supported in batch mode" rather than saying they aren't supported.


MySQL didn’t have transactions for years! A transaction layer can be implemented by adding a transaction table, a transaction column to each other table, and joining the two in your queries.

  where table.transaction_id = transaction.id and transaction.committed = true
You would begin a transaction by inserting a new row and then updating it to committed when done.

With some metaprogramming you may be able to do this all transparently.


How would this provide read / write isolation? Or rollback in case of failure(power outage)?


As an exercise for the reader. :)

Both are possible and have been done before.


Very cool project putting together raft and sqlite. Definitely seems like it's biggest benefit would be super simple operation when you also really need the distributed behaviors. Is that accurate? What do you guys use it for and where do you think the sweet spot is for its application?


Yes, simplicity of operation is a key goal of rqlite. To quote from the FAQ:

rqlite is very simple to deploy, run, and manage -- in fact, simplicity-of-operation is a key design goal. It's also lightweight and easy to query. It's a single binary you can drop anywhere on a machine, and just start it, which makes it very convenient. It takes literally seconds to configure and form a cluster, which provides you with fault-tolerance and high-availability. With rqlite you have complete control over your database infrastructure, and the data it stores.

https://github.com/rqlite/rqlite/blob/master/DOC/FAQ.md#why-...


Another commenter mentioned automatic sharding to which you replied "no". But short of that what do you think about use cases where the database is "naturally" very sharded, say where every user has a separate database? With such a design the operator might want to scale rqlite down to zero instances when a db is unused for a while and start it back up quickly on demand.

Have you thought about how rqlite could fit into this design space?


I have considered it, but it would introduce a very large amount of complexity -- and make rqlite much more complicated to operate. Adding this type of functionality would push mean rqlite would no longer be a worthwhile system to use. It would do something (sharding) probably no better than other systems, but no longer be trivial to operate. rqlite does something somewhat narrow, but does that very well (though I would say that :-) )


I have to say, I like your product thinking. I agree with you.


That seems better solved outside of rqlite in the application layer and inetd or k8s.


Can you also auto-shard with rqlite, or is it only designed to distribute a single set of data among many SQLite instances?

EDIT: Looks like no: "rqlite is about replicating a set of data, which has been written to it using SQL. The data is replicated for fault tolerance because your data is so important that you want multiple copies distributed in different places, you want be able to query your data even if some machines fail, or both."


Understandable. Sharding is a huge can of worms in itself. Balancing, changing shard keys, changing table attributes before and after sharding... it's a mess.


Yeah, it would distract from the key goal of rqlite -- which is a trivial to deploy, simple to operate, reliable distributed store for critical relational data.


Correct, the answer is no, it doesn't support that form of sharding.


One of the authors of the tests and blog post here, let me know if you have any questions about our testing process.


Thanks again for the great write-up.


What is the difference between Rqlite adn Dqlite?


Quoting from the FAQ:

dqlite is library, written in C, that you need to integrate with your own software. That requires programming. rqlite is a standalone application -- it's a full RDBMS (albeit a relatively simple one). rqlite has everything you need to read and write data, and backup, maintain, and monitor the database itself. rqlite and dqlite are completely separate projects, and rqlite does not use dqlite. In fact, rqlite was created before dqlite.

https://github.com/rqlite/rqlite/blob/master/DOC/FAQ.md#how-...


Thanks! Should have done my homework. So I guess Dqlite fits my idea better (Though I wish Dqlite would have local non-raft reads).


How people use SQLite outside single app, how do you manage RBAC?


EDIT: Title has been slightly changed, now. It was originally "Jepsen testing of rqlite, the distributed DB built on Raft and SQLite"

Hmm, perhaps a bit of confusion from the title. It sounds like they ran the Jepsen suite of tests against rqlite, which is great, but not done _by_ Jepsen / Kyle (https://jepsen.io). Others have done this themselves too and that's fine, but half the of the problem is correctly implementing the tests which has been done incorrectly by others in the past.


I don't see any need to rewrite the title here, so I've reverted it from "Jepsen-style testing of rqlite, the distributed DB built on Raft and SQLite".

From https://news.ycombinator.com/newsguidelines.html: "Please use the original title, unless it is misleading or linkbait; don't editorialize."


That's reasonable -- thanks.


OK, how do you suggest I change the title? I'm open to a better one. I'm used to the casual use of "Jepsen" for this type of testing. But yes, there is also the case where Kyle does the testing himself -- and this is not that.

Their implementation is up on GitHub, I'm sure they would be interested in any feedback on it.

https://github.com/wildarch/jepsen.rqlite

(To be clear, I wasn't involved with this testing, I just heard about it today and found it interesting)


I'd probably call it "Jepsen-testing". English is ambiguous, only so much you can do, so don't worry about it. People rag on titles too much on this site anyway.


My "ragging" is appropriate here because Jepsen is both the name of a reputable database testing group and the test suite. A lot of people do drive-by HN, without reading the specific submission too closely, and having an accurate title is important. The new title edited in is much better than the original was.


I called it "Jepsen-style" testing.


It's worth pointing out that Kyle who goes by Aphyr, not Jepsen, which is the tool he wrote for this testing.


I meant to point it out in the sense of the organization, and called him out by name mostly just because he’s kind of the guy out front. I always forget his exact username.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: