I read this as a damnation of POSIX filesystem semantics. File descriptors getting reused, insane locking semantics on close, allowing you to delete an open file and providing no means to prevent it, fork causing problems, unspecified behavior of sync() (is it barrier or "full" sync).
Though: Multiple copies of SQLite linked into the same application. Weird and rare scenario, but why not keep the global list of open sqlite files in a global shared memory segment?
Why everything which doesn't work as someone expected is called "damned"? These features have been there for decades and we know what to do, what not to do, and how to exploit these features.
I personally like how POSIX works, and how well documented its operation and failure modes. I love systems which don't prevent foot guns, and go bang spectacularly when I foot gun myself.
It's also ironic how we don't get to like systems which do limits us in severe ways, and then get angry to a system which doesn't limit us the same way. Fun times.
> Why everything which doesn't work as someone expected is called "damned"?
Because POSIX doesn't provide features that are necessary for reliability. Workarounds, if they exist, are highly platform-specific. That I'd say is damnation of the standard.
For example, how do you, within POSIX, ensure that a write to the storage media is "durable"? People argue about what combination of various "sync" calls and open flags you need to achieve this. (Disregard the case of hardware cheating -- nothing can be done about that.)
How do you make an "owning" lock on a file: a kind of that you and only you (the owner) can remove? Why is a lock applied per for file descriptor and not the file object? What is the workaround here?
In general, POSIX + threads = damnation because there's too much per-process global state, which was kind of OK "before MT". It's clear that the standard was written before multithreading was even foreseen to be the norm, and it contains many SNAFUs wrt threads, not just files. (Signals and fork are the first to come to mind.)
Oh, yeah, don't mention Linux. It has so many extensions and additions to POSIX to alleviate POSIX messups that it's not even funny.
In posix, you call "sync". Everything beyond that is workarounds for hardware and software level cheating. There are good reasons that modern filesystems aren't quite as simple as that, and some of those reasons are shameful, but it's not on POSIX.
Edit: I just reread the spec, and it is on POSIX: They recognized that "sync" was not actually writing, and documented the real usage. Unix 5 has this to say about sync:
> Sync causes all information in core memory that should be on disk to be written out..This includes modifi~ super blocks, modified i-nodes, and delayed block I/0.
> It should be used by programs which examine a file system, for example check, d.£ etc. It is mandatory before a boot.
But by System 6, Lion's book is already referring to the peculiarities of delayed writes.
There was also the entire fiasco where postgres and Linux kernel developers discovered it was damn-near impossible to get the intended behavior with fsync as there were way too many pitfalls, underspecification, and filesystem-level incompatibilities to know what happened in the non-happy case.
Seems to me, it's a lot tougher to make all of this stuff always work right in the face of buggy or hostile programs.
If you have a super-serious hardcore owner lock on a file that only the opening process can release, what happens when a buggy or hostile program locks a file and never unlocks it, even after exiting?
We know what to do, but no single developer is in a position to do it -- POSIX file locking requires global coordination between all developers of code running inside a single process.
This is fundamentally incompatible with the use of libraries.
Observe that there is no way SQLite could fix this bug, the SQLite developers instead have to shove off the responsibility to everyone one using the SQLite library. Add a few more layers of libraries where this requirement isn't documented as clearly, and this is basically guaranteed to go wrong somewhere.
I love systems which don't prevent foot guns, and go bang spectacularly when I foot gun myself.
It's good that you know what you like. Just make sure that none of your software is run by me because when your software goes bang spectacularly and foot guns me instead of you I emphatically don't like it.
That's because all my code is tested for both leaks and all scenarios. None of my code ever have gone bang in production. To be honest, no service I have written ever restarted outside system reboots or configuration changes.
Having systems with no guardrails doesn't equate to having bad code automatically.
Actually, there's no hubris. I'd rather put the money where my mouth is. Let's take an example which I worked on for almost a decade (and still working on).
How can I make sure that I have no leaks?
1. I design software by hand. I design construction and destruction chains beforehand.
2. I implement the modules one by one, create some test suites, plug to valgrind, make sure that it has no leaks.
3. Chain the modules together, re-run the tests.
4. For every component and chain which passes the test, "Seal" the unit. Any change requires whole set of tests again.
For this set of components, since components doesn't change, test suites are also "sealed".
After every build, I have a CI/CD pipeline which runs a series of tests including, unit, module and end to end scenarios. Test the result with ground truth up to 32 significant digits. If something doesn't hold, flag the build and fail. I also keep timing values for certain states, and they shouldn't deviate much. Remember, we need speed.
We should have invalid inputs. However these inputs shouldn't reach to the processing state and just be marked invalid and thrown out. Apply the above pipeline. Implement, test and seal.
Since the stack is stable, and this is an "old school" C++ code, I don't need to migrate libraries and other stuff around much. So, the test suite I've written doesn't need maintenance unless the seal is broken.
However, with the feature I'm implementing, I need to break a couple of these seals, but no biggie. Extend a function, write a couple more unit tests, run the test suite and hammer the code, seal it again. Usual tests will go on, of course.
End to end valgrind tests are done periodically, with not every build. It takes around ~12 hours to complete that with full tracing and reporting.
I'd rather be methodical and give the code I've written a torturous shake down, instead of saying this looks good and move on. I trust myself with the code I write, but not so blindly to go over the top and say that "I'm the one". Instead I do my best, but I believe that I'm the worst coder around here, so I test to break my code. Not to validate.
I still doubt you cover "all cases". Even for a simple problem (e.g. given three integers interpreted as side lengths, decide if they result in a equilateral, isosceles, or scalene triangle), the number of cases will be daunting (in our case: 65, see Robert V. Binder, "Testing Object-Oriented Systems. Models, Patterns and Tools").
Despite the effort you chose to invest into your framework, I still doubt you achieve anything close to this depth within your system and the underlying stack.
No, never. SQLite is one of the things I use most. Similarly, I respect the project and developers behind it a lot.
On the other hand, I don't accept calling something damned because it's old, or has quirks or both. This is the same API (and set of standards) I work with, and I had my fair share of problems with it too.
However, I accept that no API/Standard is perfect, and work my life around it. Also we have a huge ecosystem built around it, and while it's not perfect, it's working so far.
And yes, I refuse to give my freedom to make errors, crash and burn spectacularly in the name of ease of use and abstractions. Because I need that performance, and want to be able to reach to hardware without all these layers and safety nets.
We have abstracted safety nets above POSIX level, and anyone can use that if they want.
so gow donyou test it for such cases as discussed here? Like a power outage while writing to disk? Maybe you were lucky and didn't hit thst in production, yet.
We operate with a cattle and pet model [0], and we use a lot of services on that cattle servers. Their power can go out, disks can go bad, hardware can fry. We've seen a lot of things. As an HPC center, we absolutely hammer what we have.
Even manufacturers baffled how we can fry our servers. We're eating memory controllers in one generation, and on board NICs were being cooked on others for example.
XFS and EXT4 can handle a lot of abuse, incl. power loss without any problems during a heavy write. Regardless of the services we run, we didn't loss any data during a power outage, and for us, power outage means "power outage during full load".
Making sure that your scientific computation can continue from the point you left it is a big business. Nobody wants to lose three weeks or a month just because a server gave out its magic smoke.
I also like it when systems go bang spectacularly on a failure, vs silently allowing it.
But, related to the Posix thing about reusing fd's, I just ran into this a couple of days ago and it took me 6 hours to figure out.
I made a change in HashBackup to interrupt saving a big file if the backup time limit had been reached. Tested it a while, seemed like it worked fine. I was using a 15-second timeout. For whatever reason I tried it with a 5s timeout. This went bang, but in a completely different area of the program responsible for copying backup files to a destination, where it raised an exception "hey, this file was X bytes but I only copied Y". Go look at the source file, it's fine. Look at the destination file, and indeed, it is only Y bytes. Go look at the copy code, it looks fine but obviously isn't, so I start putting all kinds of debug stuff there to check file sizes, do double reads at EOF, ... nothing helps. Then I add an lseek to report the current file position and indeed it is at X, even though only Y bytes have been copied. So I realize that something else has moved this file pointer on me.
To help track that down, I override Python's os.open and os.close and display the pathname and fd so I have a history of who is using what fd's. After going through that with a microscope, I see that the file being backed up is open on fd 6, and when the timeout occurs, it gets closed. Then that fd gets used by the copy function for the source file.
BUT, there is an asynchronous read process used during the backup. It has exception handling if an fd is closed by higher levels and the read thread gets an EBADF error, then resets itself. But there is a race condition: if the fd gets reopened fast enough, the read thread will never see the EBADF error and thinks it is still reading from the file to backup, which it continues to do. Now there are 2 processes reading from the same file and mayhem ensues.
Of course it's ultimately my fault, but we do have at least 32 bits for a file descriptor number. It would be a lot nicer if Posix kept incrementing it instead of using the low numbers and then wrapped around, like Unix PIDs do. And yeah, I realize this would screw up select, dup, etc because of the way they are designed.
> To help track that down, I override Python's os.open and os.close and display the pathname and fd so I have a history of who is using what fd's.
Nice! I'd recommend trying strace to track system calls in the future though. Monkey patching open and close will only catch code using those functions and not any c libraries or very smart people using ctypes.
Strace captures every syscall and dumps the parameters and returns.
I've used both depending on the situation - I am a huge fan of strace (especially 'strace -f') and have done some truly stupid shit with it.
Best example being dealing with a situation where a client's code could only have one connection to a particular external streaming API they were using and the developer in charge of connecting to it was a tad territorial and wouldn't provide me a way to access the data that actually worked for the task I'd been assigned.
Solution: strace his process with a -s argument that was larger than any read his process ever did, then backprocess it into the original bytes on the wire, then handle them myself.
Yes, this was a horrible hack, but it allowed me to prove the concept of the thing the client wanted building without causing massive political drama that would've been more trouble for everybody involved than it was worth.
Ideal for that sort of situation would (for me) likely be trapping the open and close functions and having them emit information to stderr while -also- having strace log to the same stderr so I could see how the two compare - with the caveat that whether it's viable to do that is highly variable depending on context.
But as a last comment on strace, I present an old entry from my quotefile:
<@mst> actually, I think my first thing to try would be to strace the code
<@mst> and try and match up the new value of $! with a failed syscall
<@mst> but I mean if strace was a person I'd totally be asking them out
to dinner so maybe I'm biased
Newer python versions come with what they call audit hooks, which can log events on for example file open. The open function already has such audit built in.
It can be seen as somewhere in between home made monkey patching and strace. Strace is great, because it shows absolutely everything. But showing everything can also be too much, a pre filtered view is often what you really want.
If you want a similar tracing mechanism that spans your whole system I think you could do it with eBPF.
I've seen a similarish problem - if I say "the initial bug report was 'why is my application logging to its pid file?'" I imagine you can work out the rest.
> Why everything which doesn't work as someone expected is called "damned"?
> ...
> It's also ironic how we don't get to like systems which do limits us in severe ways, and then get angry to a system which doesn't limit us the same way. Fun times.
You are conflating surprising systems and flexible systems.
Something that is flexible doesn't have to be surprising. Nor does something surprising have to be flexible.
The Principle of Least Surprise says that surprising behavior in software is bad. Systems should strive not to do things that catch users off guard absent a good reason.
I personally don't get surprised by POSIX to consider it's violating this "least surprise" principle.
Instead, for example Java has given me much more surprises, and at catastrophic levels. I was using Choco Solver [0] back in the day, and I created two instances of it, attached to different classes. Which is perfectly normal, right?
Somehow they've cross linked between these two instances, affected the results they have computed, and created persistent memory leaks which needed system reboots to claim back. Java should be immune to that, but no.
Preventing that needed to run only one instance of Choco, which limited my performance greatly. Luckily, the system had a queue/consumer structure, so running only one didn't need extensive changes.
I can't comment on Choco, but no garbage collector is immune against programmer's mistakes. For example, it's easy to create leaks with observable pattern (short-lived observers not unsubscribing from a long-lived observable). IOW: know when to use weak references.
Systems without guardrails are explicit, transparent and flexible. It's easy to understand how they behave. It might be a little harder to get things right, but when you get it right, there's nothing to second guess.
Systems without guardrails are low in overhead. It's possible to get raw performance.
Yes, I love these systems. I might spend a little more time and need a little more concentration when I develop stuff on these systems, but when it runs, it runs for good.
Trying to make systems foolproof by limiting them is not good. If that's good, we should all love iOS, right?
Your argument against footgun avoidance is only applicable in narrow cases. In aviation, for example, there are quite a few footguns you can kill yourself with - and, which is very relevant to the issue in general, other people as well - and the practice is to mitigate them wherever feasible. We have stall warnings and stick pushers, ground proximity warnings and TCAS, ABS and interlocks to prevent doing things like engaging reverse thrust when the wheels are off the ground. Now, with software-controlled fly-by-wire systems, the ability to mitigate pilot footguns has been greatly expanded, and is being taken advantage of. Furthermore, these mitigations appeared first in high-performance military aviation, before fly-by-wire appeared in commercial aircraft.
For a more specifically computer-systems example, perhaps we should consider security? - oh, wait...
The analogy is not very accurate I may say, because in a flight, you prevent human footguns during flight, every day. However, a developer need to avoid that footgun once, and when the developer verifies the scenario with testing, this part of the code can be considered "sealed", never to be touched again unless a bug is found, or a serious need arises to enter that part of the code.
Most security problems arise from memory management problems from my experience. This has nothing to do with POSIX. The problem is universal there, and we may delightfully argue that a stricter programming language like Rust can alleviate most problems, however tools like Valgrind can also find a lot of problems during a simple testing run.
If we return to aviation, flight safety and memory management, there are some standards AFAIK which ponder about determinism and memory management on these systems (and smaller "single use" systems). These systems neither know POSIX, nor has the capacity to do so (they spend all their power to catch and tag the object they're tasked with).
Again, the security, reliability and other footgun related stuff ends at the program we're designing and running, and not on POSIX itself.
> However, a developer need to avoid that footgun once, and when the developer verifies the scenario with testing, this part of the code can be considered "sealed", never to be touched again unless a bug is found, or a serious need arises to enter that part of the code.
This just does not hold up at all, in practice. It is not at all uncommon for footguns in software go undetected in testing, and lie in wait, like so many landmines, for someone to set them off - or for some blackhat to find them.
> Most security problems arise from memory management problems from my experience.
That does nothing to somehow compensate for, or otherwise render harmless, those that are not.
More generally, by describing various ways you can seek to avoid footguns, you are simply providing evidence that they are a problem, not a feature.
This is a C vs Java argument. There are people who prefer each.
> when you get it right, there's nothing to second guess
Given the topic is "how to corrupt your db file," does that mean they didn't get it right yet? Or that the underlying platform prevents them from getting it right?
> This is a C vs Java argument. There are people who prefer each.
As a person who've developed sizeable projects with both, you hit the nail on the head (and yes, Java can do amazing things in the foot gun department, believe me). However, my favorite language duo is C/C++ at the end of the day.
> Given the topic is "how to corrupt your db file," does that mean they didn't get it right yet?
This means you (as the developer) has a misunderstanding between you and the system. The article contains a very large swath of problems from hard to understand implementations (e.g. locks), to hardware which doesn't listen you to bugs in SQLite itself.
So, in the article there are a lot of cross-platform bad practices which one's warned about. Shrinking this to "POSIX is bad!" is just wrong. This is what I'm trying to point out.
Ehh. I kinda like being able to update the browser binary while it’s in use and and in memory and then only get the new version when I restart the browser.
What I get is a crash when I open a new tab because the process launched to manage it is using the newly installed binary, which isn't compatible with the old binary that's running.
Similar problems occur if you do in-place upgrades of software using dynamic libraries.
Windows has the reverse problem, where you are blocked to delete an in-use file, but it's buggy as hell and very opaque as to what is holding that file.
The windows feature is not buggy, the process that created the file potentially is as the creating process chooses the sharing mode. Except for executable images (EXE/DLL): they're always locked on the filesystem as long as any running program is referencing the image. That's by design: designers didn't want "invisibly" used space on disk.
For what happens when you do that on Linux: the file entry gets deleted, but disk space doesn't get deallocated until the last program using the file has exited. If you replace the file, the program will still be referencing the old file (now unreferencable by any name/path, except possibly through /proc/../fd).
I've dealt with far too many cases of crashed programs retaining their locks on files to call it anything but buggy. And Windows offers no UI for finding out which process is holding the file handle.
I think your diagnosis is wrong. When the program exits, in whatever way, all resources/locks/etc are cleaned up. (Hey, the OS is also nice enough to tell waiters about abandonend mutexes.) There's no built-in GUI, but there's Process Explorer and other sysinternals utilities that you can use to find out what is keeping the file open.
I don't know how you could avoid this, and it has me wondering if you understand the issue. This is basically the file descriptor version of a use-after-free bug. I'm not one of those "rust fixes everything" people, so I don't answer this about just anything, but I think this issue would completely disappear in a memory safe language and especially one where the wrapper for file descriptors had good prevention of race conditions closing a file, so it hardly seems fair to blame the filesystem API.
I mean, in some cases you could blame the API for high frequency of application bugs, but using a closed file seems like a bug category that is unreasonable to do so for. The issue of use-after-free is larger than the filesystem API.
> I don't know how you could avoid this, and it has me wondering if you understand the issue.
I understand the issue. Someone else has already mentioned how you could avoid this: allocate fds randomly instead of sequentially. This would significantly reduce probability of accidental reuse. If you widen fds to 64 bits then you're pretty much good.
Yes, it'd break syscalls that rely on sequential allocation like select. It just shows how POSIX has painted itself into a historical corner, alongside with many interactions with threads being a kludge.
Windows does not allocate handles sequentially, and I've seen this pop up as a race condition and bug. I don't think changing the way they're assigned is necessarily good enough.
Indeed it does not, but it's not "random" either, i.e., they do get reused, probably rather quickly, so it does not solve the problem.
Mark Russinovich or Raymond Chen (I think) had an article about why it's a bad idea to forcibly close files that are in use and they gave exactly that as a reason: the closed handle gets valid again, but for a different object, and, voila - corruption!
It sounded like the underlying issue was the OS reusing the same int value for the file descriptor (2), rather than a dangling FILE*, which AFIK is not something rust could prevent. The only way you could really prevent this is if the OS generated GUIDs for newly opened files.
The filesystem might not be able to block use-after-free bugs, but it could greatly mitigate them. In particular, if the fs avoided reusing file descriptors, it could return an error on use-after-free instead of silently corrupting a random data file.
My guess is the shm to detect multiple copies linked in would probably be a lot of complexity for the sole purpose of trying to compensate for a really bad practice.
Yeah global shm state is a bitch you have kinda a ownerless resource that must be managed somehow cooperatively. This adds global mutexes to the equation and quirky concepts on initialization and abandonment of the shared state.
Usually none of these issues are dealt with in the shared memory sdk you’ll be using, so you gotta model it with mental experiments just like you would a lock free implementation in a threaded process.
> allowing you to delete an open file and providing no means to prevent it
I honestly think this one of the greates advantages of POSIX systems over Windows. If I as a (super)user want to delete a file I should always be able to do that and applications will just have to deal with it.
You can delete the name/directory entry, but the file remains allocated and in use as long as any program holds it open. So how useful the POSIX behavior is, is debatable.
Like Windows file share mode. Mandatory, "hard" lock on a file. The OS doesn't let you delete it if another process holds it open. (You can forbid any of 3: reading, writing, deleting by another process. It is completely orthogonal to the ACL permissions system.)
You can update Linux without rebooting, except for the kernel (and even then there's options). Rebooting once a month is way more often than some of my Linux systems.
And you will never know if the updated libs cause segfaults (at best) or shenanigans (at worst).
You can replace the open files on Windows (most oth time, see my other comment here), but the decision to reboot is not a technical one (and back in the days many invested in no reboot upgrades) but to have less headache on a corner cases.
Eh. What about updated libc.so or libssl.so? If daemons using them do not get restarted, they keep using the old version. If they do get restarted, well, then, it's just as if you rebooted the machine. Somewhat quicker, but in effect it's a reboot.
How would the multiple copies find the global memory segment? In theory each copy could create such a segment and check that all handles received by APIs live in it, but I can’t see how you could convince multiple versions to get a handle to the same shared segment (well, without a file handle).
> How would the multiple copies find the global memory segment?
They would all use the same reserved "name". It's possible to atomically create or open a SHM segment with a given "name". Then have a header at the start of the segment with critical metadata like signature, version, etc. Barf unconditionally if the metadata is found to be invalid (i.e., some other process deliberately used the name to sabotage sqlite [1]), or, in addition, have an option to not use SHM.
[1] The situation is not different from some process sabotaging another by deleting or corrupting well-known files.
I might be misunderstanding something, but doesn’t that mean that only one SQLite library could be loaded per system? That hardly seems like an acceptable compromise for an embedded software.
You misunderstand. It means that a single application should not use two or
more copies of the SQLite library to open the same database file.
As a practical matter, you really have to work hard to get an application to use two different copies of SQLite at the same time - all the while avoiding symbol collisions on link. You can do it, but it takes some work. And then on top of that your application has to decide to open two or more connections to the same database file, using different copies of SQLite in each case.
Yeah, managing to do that is ... almost impressive.
The more common failure mode I've seen is a library bundling and inlining an old version of a database API in a way that means it overrides the shared object I was expecting my code to link to, and alarums and excursions resulting thereby.
(I don't imagine SQLite would -break- in that situation, but I doubt my application code would be any less unhappy than in the cases of that problem I've encountered in the wild)
I bet this situation is even relatively common amongst users of SQLite from Node, which is notorious for having multiple versions of the same dependency.
That one is easier as you think: 2.2.1. Multiple copies of SQLite linked into the same application
I used two different C# libraries to access the same database. And under the hood, they both used a different SQLite library, with different native SQLite builds (of the same version though).
I don’t know it, but as far as I understand it has to be the exact same shared library, otherwise some in-memory caches don’t get updated on writes. This doesn’t happen, if it’s two separate processes, that are using different libraries. In my case it was just one process.
But it’s also possible, that SQLite was built by different compilers/toolchains and that did the trick.
What happened: my database files were always corrupted and needed repair. The application did only a few writes, and sometimes the changes just disappeared.
Edit: the SQLite website says: „But, if multiple copies of SQLite are linked into the same application, then there will be multiple instances of this global list.“
Really, the only way to prevent corruption of a SQL (or any other kind of) database is to have a server process manage all accesses to the database storage. And even then....
SQLite is great, but I would not use it in a multi-machine, multi-processing environment.
SQLite is among the most reliable software you can choose, and there's nothing about a multi-machine, multi-processing (?) environment which changes that.
Developers do need to understand that it's SQLite, not a poor man's Postgres. This require correct engineering, and the SQLite docs are in a class of their own in patiently explaining in simple terms how to do this.
If you embrace what SQLite is, you can do a remarkable number of things with it, safely and reliably.
> SQLite is great, but I would not use it in a multi-machine, multi-processing environment.
Yeah great, you shouldn't. SQLite is meant to be replacement of custom files to store data by apps (mobile apps, desktop apps, server apps). This isn't meant to scale to millions of users, but just store data in a structured manner.
Doesn't point 1.2 (backing up the file in the middle of a transaction) contradict the part above about how SQLite is resistant to power failures and simply rolls back the incomplete transaction? You'd just end up with a db outdated by 1 transaction, which is fine for a backup anyway.
I'm not sure, but from the last sentence, it seems as though they're talking about what would happen if you backed it up but didn't also back up the journal or WAL. I think the bit about SQLite being resistant to power failures assumes that the journal or WAL are present. This is supported by the next section on "Deleting a hot journal" which says "SQLite must see the journal files in order to recover from a crash or power failure."
Besides the journal, live backups of the database file might not be atomic in which case the backed up copy could have content written after a fsync but missing content written before it which should not happen on power loss.
Does Linux yet have a config option to randomize the file descriptor number it hands out? So many times have I been bitten by silent use after close of these FDs.
During some low-impact exploratory work on my local machine recently, I launched a Jupyter Lab notebook that connected to a SQLite db that was in a directory MS One Drive had claimed. The sync processes had nuked the db within five minutes. I was able to recreate the db easily, from csv files. Important lesson learned!
No, I think what happened is that OneDrive overwrote the local db with a version from the cloud that was several seconds older, corrupting everything. Or maybe it just overwrote the lock file? I'm not completely sure. Whatever it was, it appears that OneDrive replace a newer version of the db with an older or partial/corrupted one from a prior sync.
I should mention that I was able to repeat the corrupting problem reliably. The problem disappeared when I moved to a working directory that was not monitored by OneDrive.
I was setting up a very simple db from a few CSV files, for use in a data science exercise for a course. There was nothing interesting happening in my code at all-- maybe a simple table join sql query, at most.
Is it possible to turn off transactions alltogether in SQLite? So they function like a MyIsam table in MySql or a Aria table in MariaDB?
I have crunched many billions of queries over the last years in MySql and then MariaDB. I like MariaDB because it offers tables that are way faster due to no transaction overhead.
I know this goes against the popular opinion to use transactional tables for everything. But depending on the task at hand, the performance gain can be very well worth turning transactions off.
I consider trying SQLite. But if you cannot switch off transactions, performance will probably not be up to par.
What do you mean "transactions"? I agree with others that you may have other definitions of transaction. If you meant "I don't care about corruption, just make it as fast as possible" (this is mainly about about write performance though, since reads are already fast), then there are two PRAGMA commands related to this:
I'm no expert on SQLite, so someone please correct me if I'm wrong here, I'm effectively guessing.
I don't there is a way of turning off transactions completely, you can either do it explicitly or implicitly.
However, if you do it explicitly (wrapping BEGIN/COMMIT manually), you can batch all your INSERT/SELECT statements inside one transaction instead of one transaction per query. This should make things a bit faster as the lock would only have to be acquired/released once instead of once per operation.
May I ask whether these databases provide full ACID-compliance? I thought that some form of transactions is always needed internally in order to rollback non-atomic filesystem changes in case of an error. Wouldn't you get an inconsistent database otherwise?
There is a Minecraft mod that is really really fast and it corrupts your world very quickly. I am thinking of rewriting that mod and making it "slow" but correct again. So yes, for me it is very important that my software works. I don't need broken software.
>How often do you experience "errors" that corrupt your database files?
I don't know what you mean by this sentence. ACID has nothing to do with that. It goes way beyond that. For example, confirming that updates have been written to disk has nothing to do with database corruption. You can lose writes/data without database corruption.
Just out of curiosity what is the domain where this performance jump is desirable but ACID isn't? This is pushing it into like, low single digit millisecond or sub-millisecond level right? Is sql even the right choice for that need?
No, that makes sense. I don't think SQLite was ever built for creating the fastest possible setup, nor one that can have the most concurrent usage, because there are plenty of others projects aiming for those goals, while SQLite goals are different.
AFAIK there's PRAGMA journal_mode = OFF, which may help with performance, but if you want to go beyond that, you may [1] have to modify the code or write your own VFS driver.
To expand on this, journal_mode controls if SQLite should document the transaction or not, not if there should be transactions at all, so you're effectively disabling logging, which indeed should also contribute a bit to better performance.
Unless you are referring to another meaning for the term “transactions” that I’m unaware of, I’ve found running your SQLite queries inside transactions grately increases throughout.
You obviously work with or design software that doesn’t care about transactions/ACID. It’s either scary that you’re not aware (read something, assume it is still correct, another concurrent process does the same, free £$ etc), or are ridiculously clever that it doesn’t matter.
"Unfortunately, most consumer-grade mass storage devices lie about syncing. Disk drives will report that content is safely on persistent media as soon as it reaches the track buffer and before actually being written to oxide. This makes the disk drives seem to operate faster (which is vitally important to the manufacturer so that they can show good benchmark numbers in trade magazines)."
Also historically Sun StorEdge disk arrays would report success on a sync once the data was transferred to the appliance, because they had a write-back battery that would allow the array time to finish writing the data even on sudden power interruption.
Unfortunately for me, they also had a thing where if a write-back battery passed a certain number of hours in operation the array would no longer trust it and defaulted to then not using it at all - which is entirely reasonable right up until the point where your systems team won't buy a replacement.
I once offended the lead sysadmin at that job on a phone call so much that he had to pass me off to his junior because I'd been telling him for months that the battery needed replacing and basically my opinion was "yeah, have fun waiting an hour while that system fscks" interspersed with helpless laughter.
(in my defence, I wasn't paid to be on call and he called me at 2am during a good friend's 30th birthday party, so my capacity for diplomacy was even more limited than normal)
Probably. Although it’s possible they solve the problems with a builtin battery backup (capacitor?) that can flush buffered data to flash (at least that’s how I’d do it)
For non-enterprise drives you still need to pay attention, sometimes power loss protection means that data at rest is not corrupted as opposed to an assurance that things in the write cache will make it to stable storage.
I recently saw SQLite used (and written to) in a Docker container, with the database file on a mounted volume.
That made me nervous!
I don't know enough about how Docker implements volumes under the hood to know how likely (or not) it is to break filesystem behavior that SQLite depends on for its transactional guarantees.
I've used mounted volumes in docker for high-traffic postgresql databases and they've held up nicely. Should test this with SQLite, but I imagine it would perform just fine.
Half of the universe nowadays run on containers with volume mounts. No need to be worried about it being a container per se.
What you should worry about though is the underlying volume driver and its storage type. Volumes can be simple bind mounts to your local disk, being more or less identical to no container. But volumes can also be attached straight to a network disk, where same dangers apply as if you would have mounted that network disk on host.
As long as you’re not using docker for windows you should be fine. Docker desktop for windows uses a network share, to connect the Windows filesystems to the Linux docker system.
Volumes are made exactly for holding databases. And also bind mounts should work without any issue in most configurations.
But if there is some kind of network/virtualization between the physical disk and the container, you should investigate it in more detail.
Do I understand correctly that “2.2.1. Multiple copies of SQLite linked into the same application” is only a problem if more than one copy accesses the same database file?
The title actually surprised me because of that. I would’ve assumed the SQLite team would pronounce it sequelite since es kew el ite is pretty awkward to say.
But the grammar would suggest they do say it by pronouncing each letter of the acronym separately.
I definitely prefer the pronunciation that sounds like a sugar-free beverage additive.
"This Java tastes like Rust, would you parse me sum see'kwel-ite?"
Though: Multiple copies of SQLite linked into the same application. Weird and rare scenario, but why not keep the global list of open sqlite files in a global shared memory segment?