PSIAlt's comments

PSIAlt · on Jan 28, 2017

Most space in email attachments is photos and office documents.

PSIAlt · on Jan 28, 2017

1. This is nothing to do with distributed storage

2. Symlinks are not required to be linked to existing file

3. Symlinks are not atomic

4. You still need to maintain filedb (or how do you resolve which storage contains given file?)

nso · on Jan 28, 2017

1. Depends on your distributed storage. If you use clustered file system your argument is void.

2. While true I fail to see how that is relevant. What part of my described flow would be broken?

3. While true I again fail to see relevance. Are you just listing characteristics of symlinks? A field in a database could exist with a filepath pointing to a non-existing file as well.

4. Sure. I was describing how to avoid the counter and magic number, not the database.

PSIAlt · on Jan 28, 2017

Thanks for your question. In your solution there is some consistency vulnerabilities:

- (Step 1) If "delete service failed" - timeout is a fail? The decrement probably was done, but your app do not see response. In this case you dont know whatever decrement was done really or not. You cannot do re-try with this error (unless you use "magic" =))

- (Step 1) If you do "unmark deleted index" this does not mean it was fixed on disk forever. Your app or even server can shutdown/reboot before you ensure disk write.

- (Step 2) If marking failed you generally cannot rollback. You can try to rollback the decrement, but operation can possibly fail too. On next delete iteration the counter in filedb will be corrupted in this case.

- (Step 3) Indexes become exclusive data owner. If you lose indexes for 1 mailbox somehow - you cannot rely on counters in filedb anymore. So, you cannot do any decrements anymore until full double-side recheck.

- Any bug in any app in this chain can destroy filedb consistency.

zzzcpan · on Jan 28, 2017

Your magic numbers don't seem to guarantee consistency either, multiple faulty exchanges and the file could be gone forever: maybe random numbers happened to be the same, or the sum of substractions during faulty exchanges got to 0, or there was an overflow, etc.

Distributed algorithms are not easy, but for this case there are proven solutions, look up how Riak deletes records for example, with "tombstones" and "reaping". Basically you have to think in terms of CRDTs and have a log of operations, a versioning scheme, a synchronization, etc., all of which help you to make sure that no operation is applied twice anywhere and that in case of faulty communications nodes can synchronize and apply missing operations. This gives you consistency, eventually, and a reliable way to know when files can be physically deleted. Or, you know, just use Riak.

grogenaut · on Jan 29, 2017

or dynamo, or cassandra, or several others that all exhibit good behavior around these things. Also all of the above (and riak) have event streams when tombstones go away so you can hook into this and reap the file.

Xeoncross · on Jan 29, 2017

There would be no rollback of the deleted marker since we don't want to undo the delete - we want to keep trying until it works.

> timeout is a fail?

No, a timeout would just issue a retry until we got a response back (or if there is a reader it would keep connecting until it got through). There is no danger about deleting a file multiple times since any additional unneeded deletes would report such.

> this does not mean it was fixed on disk forever

Then the system would keep seeing the marker (after it restarted from the crash), repeat the delete call, and then trying to delete the email for good. It can crash 100 times but eventually it would get both removed.

> Indexes become exclusive data owner. If you lose indexes...

Then no mater how you build this system you're toast and have to restore backups. No system can handle losing indexes (& the error correcting or checksum copies) whether they are btrees, file blocks, or memory addresses.