Given how cheap storage is now, I think there are a lot of applications for which it makes sense to record all previous states by default (e.g. for debugging) and make mutating history (by squashing/compacting it away) the special case (a bit like running 'logrotate' every week)
I've got a draft of a blog post here:
edited to add:
This work is a natural evolution of the "OXenstored" work presented at ICFP 2009 http://web.cecs.pdx.edu/~apt/icfp09_accepted_papers/83.html
This ensures that only the minimal state required is stored (as opposed to the entire process heap), and also that state can be reconstructed intelligently to preserve sharing and special resources. For instance, file descriptors (if running in Unix mode) could be reified to a filename/offset and reopened, and memory mapped areas (such as the shared ring structures that Xen uses) could be re-granted from the hypervisor.
You could build a Camlistore-like service on top of Irmin, but this design/eval hasn't happened yet to my knowledge. It's on the TODO list for Irmin applications -- while I really like Camlistore, I do also want finer control over routing of blobs in a content-addressed network to manage the physical placement of my personal data.
Another interesting aspect of Irmin is that the backend blob stores are also just another functor. In addition to the existing memory/HTTP-REST/Git backends, we've sketched out the design for a convergent-encrypted backend store as well. Other backends such as existing k/v stores like LevelDB or Arakoon shouldn't be difficult, and patches are welcome.
In this scenario using this kind of verbose state logging starts to sound like a huge win, especially if tools exist to visualize how state mutation is different on various legs and use that information to infer dependency guesses between execution branches.
More docs emerging in the coming weeks (particularly as we upstream the Xen toolchain integration, which has been a very helpful deployment to iron out the bugs in the betas). Do feel free to file questions on <https://github.com/mirage/irmin/issues> in the meanwhile.
I assume you're involved in the project. Do you have a way to fix the blog to include (at least parts of) the title in the .. title?
Right now the title is 'Blog' and people like me (I know, it's a bad habit) that open tabs to read them later will have trouble relocating that thing in a far too large list of tabs.
In other words, I think I could implement what's being described as a very thin shim on top of git-annex. You'd just need a special git merge driver (the same as Irmin, which requires 3-way merge providers), but with the extra caveat that all three components of the merge have to be present in the local annex before a merge can take place.
This is just based on the article, and assumes I've understood correctly.
Is this accurate?
There are some interesting tricks going on from the native Irmin representation to the Git conversion (which is slightly less descriptive than Irmin and so virtual nodes are constructed to represent the extra data in Git). Will write that up in more detail in a future post I think, but for now:
(is the Git serializer, and you can see in the interface how you can spawn an on-disk and in-memory Git).
and the simpler in-memory backend:
where the implementation is mostly a noop since no mappings between representations needs to take place.
(Edited to note:) The reason for wanting an in-memory backend in the first place is that this is also very useful for IPC coordination. You could build a session layer where all the messages that go back-and-forth between two processes are recorded into an in-memory layer, and then when the whole process is done, the entire graph of communication can be dumped out to a Git tree as the log (for later analytics or debugging). If disk space is an issue, the Git tree can later be rebased to eliminate the intermediate communication commits. This is very, very useful for debugging.
Highly relevant :)
(talk coming soon)
I just read through your Data Package Manager post, and I have to say 1) bravo and 2) I need that - yesterday.
I hope Dropbox doesn't acqui-quash you before these things get out of hand :)
Perhaps the idea is more like a database, and some administration is manual. So, merges are like mini-migrations. Or, merges could be thrown back to a human user of the app to combine them (perhaps with some domain specific interface) - again, manual.
With Irmin, if your your application needs a distributed queue to coordinate workflow tasks, then you grab an MQueue datastructure that explains how it deals with multiple readers and writers, and you use that. If you instead need a distributed set with no strong ordering guarantees, then you can implement this as a series of pull/push/merge operations instead.
One interesting exercise we're doing at the moment is to build the equivalent of Okasaki's purely functional datastructures in Irmin. Since the module signatures of these datastructures are quite similar to their non-distributed counterparts, it should be possible to swap distributed/local datastructures depending on the deployment scenario of the application (with the local ones being much more efficient due to the lack of remote synchronization).
If a datastructure really can't merge something, then it can raise a conflict that will can ripple up to the user interface. The design aims to minimize this by letting the application specify a non-failing merge semantic where possible though.
BTW, would be much appreciated if you could point to related work on the subject (papers, other projects, blogs, etc).
That'll certainly happen when we complete the research papers on the subject. It's a little out of scope for a blog post series that primarily focuses on trying to explain the stuff we're building.