If you're interested, we've described some important ways that dotmesh is _different_ to git here: https://docs.dotmesh.com/faq/#how-does-dotmesh-differ-from-g...
It always struck me that I should be able to "docker push" my data and share that with my team just as I do my apps. In fact, I had built a quick hack to do something similar called Dockershare (https://github.com/ahnick/dockershare). I realized through that effort that a custom docker volume plugin would be needed and that it was a much larger problem than what I had time to tackle.
I imagine that dotmesh must have grown out of what was being done with dvol? (https://github.com/clusterhq/dvol) In any case, kudos for getting this built. I'm excited to try it out.
For the same reasons as you mention in your post, we've been hard at work with Kubernetes support - Persistent Volumes but with extra features!
For instance, if I have two runs of a test, that produce different outputs, can I merge the data back together at the end?
If not, then this is only capturing one aspect of git -- the archiving of snapshots of the state of the data.
We are exploring it. We have some thoughts on higher level understanding of data that might make it possible.
But definitely starting with the basics, as you said.
On a very small app, i can see the utility of dotmesh.
This is where the Docker and Kubernetes integration comes into play -- if your app is captured entirely in Kubernetes manifests, the only thing left to capture (apart from the declarative Kube manifests, which should already be in version control) is the state that exists in Kubernetes Persistent Volumes. Dotmesh provides a Kubernetes Persistent Volume driver which provides Dotmesh StorageClass PVs and a Dynamic Provisioner, meaning that you really can capture the entire state of your app with Dotmesh... as long as you're deploying it with Kubernetes.
Code and infrastructure are already under control thanks to version control and terraform, ansible etc -- this completes the picture.
Give it a go: https://dotmesh.com/try-dotmesh/ and please leave more feedback here or in our Slack! (linked to in the footer at the bottom of dotmesh.com)
Makes rolling back mistakes easy.
Some things which I personally find useful about Relaxo:
- Easy to move data around, merge and fork data (it's just a git repository).
- Easy to roll back or inspect changes. If you make a mistake, just reset HEAD.
- Easy to backup (guaranteed consistency on disk).
- Better grouping of changes by transactions, which have a description, date, and information about who committed it (can even tie to currently logged in user for a web app, for example).
In theory Relaxo could scale up. Using libgit2 as the backend, it wouldn't be hard to use redis as an object store for git. The git data structure on disk is really just a key-value store with some specific data structures.
The main issue with Relaxo is query performance and indexes. Simple queries like fetching a document is fast. Complex queries including subsets, aggregations, and joins require supporting indexes to work efficiently, and this is something that is hard to build into a pure document storage system. The naive solution is to load all the documents and filter them, which is actually fine until you get a large number of documents (e.g. 1,000+).
However, git does provide one useful guarantee - it will sort directory entries. With this in mind, it's possible to make radix-sorted indexes (e.g. /invoices/by_date/2017/07/). You can use this to do basic indexes, but it's still not as good as a traditional SQL database in this regard.
I haven't used Relaxo itself, but personally, I like the fact that independent groups are thinking of version control semantics for data. Tells me it is heading in a positive direction.
Relaxo used to be a couch query server (https://github.com/ioquatix/relaxo-query-server - not so useful any more) and ruby front end (https://github.com/ioquatix/relaxo-model - still useful). But I got frustrated with the direction of couchdb 2.x so I rewrote it to do everything in-process and use git as the document store. It organically grew from that.
Unless you are operating at scale, doing things in-process is vastly more convenient. Sending ruby code to the query server to perform map-reduce was a cumbersome process at best. It's easier just to write model code and have it work as expected.
Systems like Postgres a great when you have a single database and multiple front-end consumers though. You'd need to put a front-end on top of relaxo in order to gain the same benefits, but it would be pretty trivial to do so - just that its never been something that I've needed to implement. The API you'd actually want is one that interfaces directly with your Ruby model instances, rather than database tables and rows. I think there is room for improvement here - probably implementing a websocket API that exposes the raw git object model and then allowing consumers to work on top of that.
I was a happy couch 1.x user, but moved away with 2.0. Nothing specific about it, just needs and timing.
The architecture is super simple, I'd suggest that the first place to look is the source code.
There are really only two ways of accessing the underlying data store - a read-only dataset and a read/write changeset which can be committed.
It's purely a key-value storage at the core - a key being a path and a value being whatever you want.
On top of that you can build more complex things, e.g. https://github.com/ioquatix/relaxo-model which provides relational object storage and basic indexes (e.g. has one, has many, etc)
1. The first thing I thought when I saw this is "How is this secure?". You're wanting to store the most sensitive information a business has - credentials + production DB. I took a look around the site + Google and couldn't find anything about security. Client side encryption of data seems like it would be good to make people comfortable with storing their data at dothub. I'm not sure if there is any use case for dothub having unencrypted data (at least not yet)?
2. "Application states" is quite a vague term, when I saw that I thought it was referring to capturing the state of a running process. "A git-like CLI for application states" is not a very compelling pitch. As others have noted, for all but the most masochistic of users, "a git-like CLI" is a negative point.
The benefit you're offering is "Snapshot production data to be able to replay in development" and "Snapshot failed CI builds to debug later". I'd recommend putting those up-front and in bold. A more compelling tagline (to me) is "Dotmesh - version control and snapshots for your production data".
1. We mention encryption in the docs FAQ https://docs.dotmesh.com/faq/#what-do-you-encrypt -- where did you look for it? Maybe we can make it easier to find. Noted about this being a priority.
2. Thanks for proposing the updated tagline! I'll run it past the team ;-) we'll certainly develop more messaging and use cases around production data as we develop the project beyond 0.1 :-)
I searched on https://dotmesh.com for "security" and "encryption", searched Google for "site:dotmesh.com security", and tried going to http://dotmesh.com/security, but got nothing for all three.
Ultimately the reason that ClusterHQ failed, I think, was that we believed we had product-market fit before we really did, and we started scaling too soon.
When we started, it wasn't possible to connect storage to containers at all, and so we had to put a lot of work into making that possible. And by the time we'd got Flocker working reliably across AWS, GCE, OpenStack & a dozen or so storage vendors, we'd been commoditized by Kubernetes.
Our premature scaling then made it harder to adapt as fast as we needed to. Many lessons learned!
We're focusing on a rigorous approach to finding product-market fit, my colleague Alice has written more about this here: https://dotmesh.com/blog/dotmesh-hypotheses/
Learn about architecture, use cases (tutorials) & lots more: https://docs.dotmesh.com
"$ dm cluster init
dm: command not found"
We have a hosted service if you don't want to run your own nodes (https://dothub.com) but the server and client are both open source. disclaimer: I work on the project
It seems like dot* would have to know about the application logic to show useful diffs, but maybe it can be done generically at the DB level.
We have an issue for this here: https://github.com/dotmesh-io/dotmesh/issues/85
Does this help? https://docs.dotmesh.com/concepts/architecture/