> When building databases, we care about durability, so database authors are usu...

throwawaylinux · on Feb 17, 2022

> No, not without that. Even with that, you can't have durable writes; Not on a mac, or linux or anywhere else, if you are worried about fsync()/fcntl+F_FULLSYNC because they do nothing to protect against hardware failure: The only thing that does is shipping the data someplace else (and depending on the criticality of the data, possibly quite far).

"The sun might explode so nothing guarantees integrity", come on, get real. This is pointless nitpicking.

Of course fsync ensures durable writes on systems like Linux with drives that honor FUA. The reliability of the device and stack in question is implied in this and anybody who talks about data integrity understands that. This is how you can calculate and manage error rates of your system.

geocar · on Feb 17, 2022

> "The sun might explode so nothing guarantees integrity", come on, get real. This is pointless nitpicking.

I think most people understand that there is a huge difference between the sun exploding and a single hardware failure.

If you really don't understand that, I have no idea what to say.

> Of course fsync ensures durable writes on systems like Linux with drives that honor FUA

No it does not. The drive can still fail after you write() and nobody will care how often you called fsync(). The only thing that can help is writing it more than once.

throwawaylinux · on Feb 17, 2022

What is the difference in the context of your comment? The likelihood of the risk, and nothing else. So what is the exact magic amount of risk that makes one thing durable and another not, and who made you the arbiter of this?

> No it does not. The drive can still fail after you write() and nobody will care how often you called fsync(). The only thing that can help is writing it more than once.

It does to anybody who actually understands these definitions. It is durable according to the design (i.e., UBER rates) of your system. That's what it means, that's always what it meant. If you really don't understand that, I have no idea what to say.

> The only thing that can help is writing it more than once.

This just shows a fundamental misunderstanding. You achieve a desired uncorrected error rate by looking at the risks and designing parts and redundancy and error correction appropriately. The reliability of one drive/system might be greater than two less reliable ones, so "writing it more than once" is not only not the only thing that can help, it doesn't necessarily achieve the required durability.

geocar · on Feb 17, 2022

> What is the difference in the context of your comment? The likelihood of the risk, and nothing else. So what is the exact magic amount of risk that makes one thing durable and another not, and who made you the arbiter of this?

What's the difference between the sun exploding and a single machine failing?

I have no idea how to answer that. Maybe it's because many people have seen a single machine fail, but nobody has seen the sun explode? I guess I've never had a need to give it more thought than that.

> It does to anybody who actually understands these definitions. It is durable according to the design (i.e., UBER rates) of your system.

You are wrong about that: Nobody cares if something is "designed to be durable according to the definition in the design". That's just more weasel words. They care what are the risks, how you actually protect against them, and what it costs to do. That's it.

throwawaylinux · on Feb 17, 2022

I was asking about the context of the conversation. And I answered it for you. It's the likelihood of the risk. Two computers in two different locations can and do fail.

> You are wrong about that: Nobody cares if something is "designed to be durable according to the definition in the design".

No I'm not, that's what the word means and that's how it's used. That's how it's defined in operating systems, that's how it's defined by disk manufacturers, that's how it's used by people who write databases.

> That's just more weasel words.

No it's not, its the only sane definition because all hardware and software is different, and so is everybody's appetite for risk and cost. And you don't know what any of those things are in any situation.

> They care what are the risks, how you actually protect against them, and what it costs to do. That's it.

You seem to be arguing against yourself here. Lots of people (e.g., personal users) store a lot of their data on a single device for significant periods of time, because that's reasonably durable for their use.

akrymski · on Feb 17, 2022

There is a point at which a redundant array of inexpensive and unreliable replicas is more durable than a single drive. Even N in-memory databases spread across the world is more durable than a single one with fsync.

Unfortunately few databases besides maybe blockchains have been engineered with that in mind.

throwawaylinux · on Feb 17, 2022

> There is a point at which a redundant array of inexpensive and unreliable replicas is more durable than a single drive. Even N in-memory databases spread across the world is more durable than a single one with fsync.

Unless a failure mode you are concerned about include being cut off from the internet, or your system isn't network connected in the first place, in which case maybe not eh?

Anyway surely the point is clear. "Durable" doesn't mean "durable according to the whims of some anonymous denizen of the other side of the internet who is imagining a scenario which is completely irrelevant to what I'm actually doing with my data".

It means that the data is flushed to what your system considers to be durable storage.

Also hardware failures and software bugs can exist. You can talk about durable storage without being some kind of cosmic-ray-denier or anti-backup cultist.

cryptonector · on Feb 17, 2022

Say you have mirrored devices. Or RAID-5, whatever. Say the devices don't lie about flushing caches. And you fsync(), and then power fails, and on the way back up you find data loss or worse, data corruption. The devices didn't fail. The OS did.

One need not even assume no device failure, since that's the point of RAID: to make up for some not-insignificant device failure rate. We need only assume that not too many devices fail at the same time. A pretty reasonable assumption. One relied upon all over the world, across many data centers.

kikimora · on Feb 17, 2022

This is not about hardware failure but OS crashes and bugs that much more frequent.

geocar · on Feb 18, 2022

If the OS has bugs that will make it crash, what makes you think those bugs aren’t going to affect fsync()?

ClumsyPilot · on Feb 17, 2022

"but guess what: We just need to put a big beefy capacitor on the board, or a battery someplace to protect against that. We don't need to write the flash blocks and read them back before returning from fsync() to get reliability"

I believe drives that do have capacitors are aware of it and return immediately from fsync() without writing to flash. Thats the point of this API

Since neither Macs nor any other laptops have SSDs with capacitors, this point is kind of moot.

geocar · on Feb 18, 2022

Erm. They absolutely do. Most laptops have batteries as well— including all of the ones that Apple makes.

ClumsyPilot · on Feb 18, 2022

I have at various points replaced or upgraded 15 NVME SSD's in desktops and laptops, and I have not seen a single one - could you please let me know where I can find a non-server SSD with capacitors that are large enough for it to flush data in case of a sudden power loss?

Laptop batteries are irrelevant - battery failure, freezin or cutting power to the curcuitbord by holding the off buttons are the failrue modes you have to protect against.