Hacker News new | past | comments | ask | show | jobs | submit | algernonramone's comments login

I feel that "deterministic" is probably a better word here than "idempotent".


Not speaking to your comment specifically, but adding context to the thread: idempotence would mean having the same result, regardless of whether you run something once, twice, or 10 times, over the same input set. Idempotence requires but goes beyond determinism, as it also accounts for any externalities and side effects.

For example, let’s consider a function that accepts a string as an argument and then writes that string to disk. We can consider the disk state as a side effect of the function.

The function itself is perfectly deterministic (output string is a predictable and consistent function of input string), but depending on the implementation of side effects it may not be idempotent. If, for example, this function room simply added the output to a file “output.txt”, this file would grow with every incantation, which is not idempotent. If instead we overwrote the output file so that it reflects only the singular output of the previous run, then the side effects would also be deterministic, that would be idempotent.

At a pedantic level you could redefine your scope of deterministic to not just include outputs, but also include the external state and side effects, but for practical purposes the above distinction is generally how deterministic and idempotent would be applied in practice in computing. I cannot speak to the math-centric view, if there is a different definition there.


This captures the mathematical definition too which is just that an element x is idempotent if x applied to itself gives you back x. I.e, what you said that the function applied many times produces no change to the system.


I don't think the string to disk example can be idempotent no matter how you implement it as it is dependent on the external state of the disk.


That may be true on a theoretical level, but if you’re talking practically to a data engineer that’s the definition of idempotence you’re going to find they expect.


Practical consideration might be that a disk may experience a fault in one of the executions: works fine a hundred times but fails on 101st (eg hits disk-full error or a bad block).

But that simply means it's harder to implement true idempotency when it comes to disk usage.

This is why the problem is usually simplified to ignore unhappy paths.


I fear y'all (or I) may be dancing a bit too deeply into hypothetical here...

The idempotent version of this function doesn't blindly write. Most of Ansible, for example, is Python code doing 'state machines' or whatever - checking if changes are needed and facilitating if so.

Where y'all assume one function is, actually, an entire script/library. Perhaps I'm fooled by party tricks?

Beyond all of this, the disk full example is nebulous. On Linux (at least)... if you open an existing file for writing, the space it is using is reserved. Writing the same data back out should always be possible... just kind of silly.

It brings about the risk of corruption in transit; connectors faulty or what-have-we. Physical failure like this isn't something we can expect software to handle/resolve IMO. Wargames and gremlins combined!

To truly tie all this together I think we have to consider atomicity


> Writing the same data back out should always be possible...

Depends on the implementation: maybe you open a temp file and then mv it into place after you are done writing (for increased atomicity)?

But as I already said, in practice we ignore these externalities because it makes the problem a lot harder for minor improvements — not because this isn't something software can "handle/resolve".


Maybe even hermetic builds? https://bazel.build/basics/hermeticity


To give a concrete example: some build systems embed a "build number" in the output that increments for each non-clean build (yeah this is stupid but I have seen it).

This is deterministic (doesn't change randomly), but not idempotent.


The main problem with a statement like that is that "interesting" is extremely subjective. Personally, I often find math and CS to be more interesting when it's further from reality. To each his own, I suppose.


This is great, and I will download it, but I might be missing something because I don't see the PDF? I guess I will stick to the epub.


I would love to see him write a book, or even just publish a nice compendium of his papers ala Donald Knuth's Collected Papers series (with light editing/updating, background info, commentary, etc.) I think that would truly be a pleasure to read.


I would too, though I should point out that I think all of Lamport's papers and books are free on his website: http://lamport.azurewebsites.net/pubs/pubs.html?from=https:/...

Some of them even have some basic background information if you click the link (e.g. Arbiter-Free Synchronization).


My thought here is that not all of the data in the database is being accessed at the same time, so the un-accessed data is "at rest". Is that correct, or am I barking up the wrong tree?


Assuming full-disk encryption is in use (LUKS, TrueCrypt/VeraCrypt, BitLocker, etc.), there is enough information held in RAM to decrypt the entire disk. If the attacker gains access to a privileged user, or at least to a user allowed to read the file system (such as the user running the database), they can exfiltrate the unencrypted contents of the disk, regardless of what the DB software is actively accessing.


Ah, OK, makes sense. Thanks for the clarification!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: