
Ugarit: content-addressable storage and backup written in Chicken scheme - landakram
https://www.kitten-technologies.co.uk/project/ugarit/doc/trunk/docs/intro.wiki
======
ChuckMcM
Very nice. Content addressable storage has a number of wonderful properties.
At Blekko we would hash 'keys' (like a URI) which would identify a 'bucket'
where that URI was stored. This spread crawling the web evenly across multiple
servers.

At Netapp I worked for a bit on a content addressable version of a filer where
each 4K block was hashed and the hash became the block address. Unlike Ugarit
the block hashes were in an SSD based metadata server rather than being hashed
into directories. The feature that fell out of this was you got content
deduplication for 'free' since any block that hashed to a particular code you
already had stored you didn't need to store again. (and this exploited the
fixed length defense against hash collisions).

~~~
kmicklas
The fact that the majority of infrastructure in tech is oblivious to something
as obvious and beneficial as content-addressable storage is one of the most
depressing things about the industry to me.

------
beagle3
Sounds very close to bup/git and borg. In many ways, git is a content
addressable storage with a couple of metadata, remote synchronization and
merging features mixed in. bup uses git's internals to great effect for a
backup system, and borg drops git compatiblity and elegant multi-client
support, but goes farther by providing other backup functionality such as
efficient pruning and built in encryption.

------
zmix
Problem with this is, that the last development seems to have happened in 2015
and the issue tracker is pretty full with stuff going back to 2012.

~~~
alyrik
I aint'n't dead :-D

Largely, Ugarit is ticking along nicely in my production setup, doing backups
of my servers; when I get time I work on (a) performance, which is still weak
in some cases and (b) archive mode, which is a fun spare-time project rather
than something urgent I use. But the core case of doing backups Just
Works(tm).

~~~
zmix
Ok, good to hear. I am seeking for a backup solution for my heterogenuos home-
network, and thus evaluating the possibilities.

------
dragonshed
If the author or a contributor is able to share, I'm curious how thia compares
to camlistore.

------
nix0n
How do you handle hash collisions?

~~~
kmicklas
Use a big enough hash length that this is less likely than cosmic rays
flipping your bits or something.

------
pmoriarty
I wonder how performant, mature, and reliable this is.

~~~
alyrik
Performance: A bit meh at bulk bytes-per-second snapshotting, particularly
when the vault is accessed over ssh, as it's latency sensitive (send block,
wait for response, get next block, send, wait for response, repeat - rather
than streaming the blocks). However, the mtime cache means that it's really
good at spotting the files that have changed, so it scans my couple of
terabytes fairly rapidly and uploads the tens of megabytes that have changed
each day.

Mature: Well, it's been running nightly for years for me, and has saved the
day on a few occasions when I've had to restore stuff from the vault.

Reliable: The only data I've lost was when a vault disk actually died, and I
hadn't replicated it because, well, I was happier losing some history
sometimes rather than buying more disk for that use case. However, when I get
archive mode in production, I'm going to build a better vault replication
solution than rsync!

