If you cp your data onto a Plan9 machine, what results is pretty much exactly th...

gpvos · on Sept 12, 2014

For my understanding: what happens if you open a file, change one byte and close it again? Since the SHAsum of the contents has changed, is the entire file now copied?

noselasd · on Sept 12, 2014

Only the block where the byte resides. A block is typically 512b to 4096b. (So it's not that unlike a normal drive, where you also have to rewrite an entire sector even if just a byte changed)

Venti doesn't know about files, it only knows about blocks of data. It's a key/value store where the key is sha-1 and the value is a block(blob) of data.

The filesystem running on top of Venti will ask Venti to store the new block where you changed the byte and the filesystem will update the metadata that assembles all blocks to a file.

bakul · on Sept 12, 2014

1. Plan9 has no hard links so if you copy unix dir. tree to a plan9 machine you'd lose all the hard link info. 2. Venti doesn't use fossil or inodes. Venti is just content addressable storage system; not a fileserver/system. 3. Fossil is a fileserver.

derefr · on Sept 12, 2014

Ah, I was going off an explanation I was given myself at one point--but this makes more sense, thanks.

pedrocr · on Sept 12, 2014

> Honestly, I'm terribly confused why all filesystems haven't been broken into these two easily-separable layers. Is it just inertia?

The penalty for doing content addressed filesystems is of course the CPU usage. btrfs probably has most of the benefits without the CPU cost with its copy-on-write semantics.

Note that what you describe (and my initial process) is a different semantic than hard-links. What you get is shared storage but if you write to one of the files only that one gets changed. Whereas with hardlinks both files change.

derefr · on Sept 12, 2014

In effect, hard links (of mutable files) are a declaration that certain files have the same "identity." You can't get this with plain Venti-on-Fossil, but it's a problem with Fossil (objects are immutable), not with Venti.

Venti-on-Venti-on-Fossil would work, though, since Venti just creates imaginary files that inherit their IO semantics from their underlying store, and this should apply recursively:

1. create two nodes A and B in Venti[1] that refer to one node C in Venti[2], which refers to object[x] with key x in Fossil.

2. Append to A in Venti[1], causing a write to C in Venti[2], causing a write to object[x] Fossil, creating object[y] with key y.

3. Fossil returns y to Venti[2]; Venti[2] updates C to point to object[y] and returns C to Venti[1]; Venti[1] sees that C is unchanged and does nothing.

Now A and B both effectively point to object[y].

(Note that you don't actually have to have two Venti servers for this! There's nothing stopping you from having Venti nodes that refer to other Venti nodes within the same projected filesystem--but since you're exposing these nodes to the user, your get the "dangers" of symbolic links, where e.g. moving them breaks the things that point to them. For IO operations they have the semantics of hard links, though, instead of needing to be special-cased by filesystem-operating syscalls.)

ori_b · on Sept 12, 2014

You seem to be confusing venti and fossil.

theworst · on Sept 12, 2014

Can you explain further? I am not a plan9 expert, by any means, but I'm stuck at where GP made the confusion. Thanks!

yungchin · on Sept 12, 2014

He just swapped the names I think - Venti is the block store, Fossil is the file system layer.

noselasd · on Sept 12, 2014

> Honestly, I'm terribly confused why all filesystems haven't been broken into these two easily-separable layers. Is it just inertia

They are. Pretty much all operating systems provides a block layer, on top of which filesystems are normally layered.

Though the block layer isn't content addressed, it's just indexed (store block number 55, read block number 9. etc.)

khc · on Sept 12, 2014

Content addressable systems trade CPU and memory with disk space. If you expect duplications to be low, you are usually better off with a background scrubber.

saalweachter · on Sept 12, 2014

ZFS sits on top of zpools.