
Show HN: Dotmesh – A git-like CLI for application states - mrmrcoleman
https://dotmesh.com/blog/introducing-dotmesh/
======
ntumlin
It might be better to just call it version control for application states,
rather than saying a "git-like CLI" for application states. When I hear "git-
like CLI" I interpret that as "hard to use" and "confusing"

~~~
lewq
Interesting feedback, thanks. I agree that the git CLI is confusing,
unfortunately it's the best thing we've got that a huge number of developers
are familiar with for exploring a state tree with branches and commits.

If you're interested, we've described some important ways that dotmesh is
_different_ to git here: [https://docs.dotmesh.com/faq/#how-does-dotmesh-
differ-from-g...](https://docs.dotmesh.com/faq/#how-does-dotmesh-differ-from-
git)

------
ahnick
This is really exciting to see and something I feel has been missing for some
time. I think if this is grown correctly it would be a great acquisition for
Docker to make.

It always struck me that I should be able to "docker push" my data and share
that with my team just as I do my apps. In fact, I had built a quick hack to
do something similar called Dockershare
([https://github.com/ahnick/dockershare](https://github.com/ahnick/dockershare)).
I realized through that effort that a custom docker volume plugin would be
needed and that it was a much larger problem than what I had time to tackle.

I imagine that dotmesh must have grown out of what was being done with dvol?
([https://github.com/clusterhq/dvol](https://github.com/clusterhq/dvol)) In
any case, kudos for getting this built. I'm excited to try it out.

~~~
binocarlos
Thanks for the feedback - "docker push <mydata>" is nice :-)

For the same reasons as you mention in your post, we've been hard at work with
Kubernetes support - Persistent Volumes but with extra features!

------
toomim
If this is like git, does that mean that you can _merge_ two different
branches of data?

For instance, if I have two runs of a test, that produce different outputs,
can I merge the data back together at the end?

If not, then this is only capturing one aspect of git -- the archiving of
snapshots of the state of the data.

~~~
lewq
How would you want merge to work?

~~~
hiccuphippo
Show the different rows or documents one besides the other and let the user
choose which to put in the merged table. It would need to change foreign keys
to match the id in case it needs to change. But dothub seems to work at the
filesystem level so this seems impossible for it.

------
infinitone
It seems like a good idea in theory but i'm not so sure it'd work in many
environments in practice. If i understand it correctly, you're storing all the
state such as files but there is state that is tied to that specific machine
(ie. machine fqdn, machine-specifc filepaths) and you wouldn't want to apply
that state on another machine. I guess you could do some data wrangling and
.stateignore that stuff but it would require quite the effort on a large
application that spans many components and many teams.

On a very small app, i can see the utility of dotmesh.

~~~
lewq
Hey! Yes, it's hard to capture the state of VMs.

This is where the Docker and Kubernetes integration comes into play -- if your
app is captured entirely in Kubernetes manifests, the only thing left to
capture (apart from the declarative Kube manifests, which should already be in
version control) is the state that exists in Kubernetes Persistent Volumes.
Dotmesh provides a Kubernetes Persistent Volume driver which provides Dotmesh
StorageClass PVs and a Dynamic Provisioner, meaning that you really can
capture the entire state of your app with Dotmesh... as long as you're
deploying it with Kubernetes.

Code and infrastructure are already under control thanks to version control
and terraform, ansible etc -- this completes the picture.

Give it a go: [https://dotmesh.com/try-dotmesh/](https://dotmesh.com/try-
dotmesh/) and please leave more feedback here or in our Slack! (linked to in
the footer at the bottom of dotmesh.com)

------
ioquatix
I store all my application state directly in git:
[https://github.com/ioquatix/relaxo](https://github.com/ioquatix/relaxo)

Makes rolling back mistakes easy.

~~~
lewq
Nice! (Non snarky question) does that scale?

~~~
ioquatix
That's a good question. Relaxo is a database designed around immutable,
transactional structures where _convenience_ is more important than _scale_.
Think of things like comments on a blog, items for sale in a small shop -
[https://github.com/ioquatix/financier](https://github.com/ioquatix/financier)
is an example of an actual project which is in production.

Some things which I personally find useful about Relaxo:

\- Easy to move data around, merge and fork data (it's just a git repository).

\- Easy to roll back or inspect changes. If you make a mistake, just reset
HEAD.

\- Easy to backup (guaranteed consistency on disk).

\- Better grouping of changes by transactions, which have a description, date,
and information about who committed it (can even tie to currently logged in
user for a web app, for example).

In theory Relaxo could scale up. Using libgit2 as the backend, it wouldn't be
hard to use redis as an object store for git. The git data structure on disk
is really just a key-value store with some specific data structures.

The main issue with Relaxo is query performance and indexes. Simple queries
like fetching a document is fast. Complex queries including subsets,
aggregations, and joins require supporting indexes to work efficiently, and
this is something that is hard to build into a pure document storage system.
The naive solution is to load all the documents and filter them, which is
actually fine until you get a large number of documents (e.g. 1,000+).

However, git does provide one useful guarantee - it will sort directory
entries. With this in mind, it's possible to make radix-sorted indexes (e.g.
/invoices/by_date/2017/07/). You can use this to do basic indexes, but it's
still not as good as a traditional SQL database in this regard.

~~~
deitcher
I have seen a growth of such "vcs-like" databases, but I think the
preponderance remains SQL stores like MySQL/MSSQL/Postegres or NoSQL like
Mongo/Cassandra/Redis/Couch/etc. For those - or anything that has its own
model of storage or processing and, in the end, is backed by filesystem-type
storage, dotmesh provides a really nice solution.

I haven't used Relaxo itself, but personally, I like the fact that independent
groups are thinking of version control semantics for data. Tells me it is
heading in a positive direction.

~~~
ioquatix
Relaxo actually grew out of Couch DB.

Relaxo used to be a couch query server ([https://github.com/ioquatix/relaxo-
query-server](https://github.com/ioquatix/relaxo-query-server) \- not so
useful any more) and ruby front end ([https://github.com/ioquatix/relaxo-
model](https://github.com/ioquatix/relaxo-model) \- still useful). But I got
frustrated with the direction of couchdb 2.x so I rewrote it to do everything
in-process and use git as the document store. It organically grew from that.

Unless you are operating at scale, doing things in-process is vastly more
convenient. Sending ruby code to the query server to perform map-reduce was a
cumbersome process at best. It's easier just to write model code and have it
work as expected.

Systems like Postgres a great when you have a single database and multiple
front-end consumers though. You'd need to put a front-end on top of relaxo in
order to gain the same benefits, but it would be pretty trivial to do so -
just that its never been something that I've needed to implement. The API
you'd actually want is one that interfaces directly with your Ruby model
instances, rather than database tables and rows. I think there is room for
improvement here - probably implementing a websocket API that exposes the raw
git object model and then allowing consumers to work on top of that.

~~~
deitcher
Pretty cool. Is there a write up on architecture and usage models? I’d like to
see it.

I was a happy couch 1.x user, but moved away with 2.0. Nothing specific about
it, just needs and timing.

~~~
ioquatix
Thanks for being so interested.

The architecture is super simple, I'd suggest that the first place to look is
the source code.

There are really only two ways of accessing the underlying data store - a
read-only dataset and a read/write changeset which can be committed.

It's purely a key-value storage at the core - a key being a path and a value
being whatever you want.

On top of that you can build more complex things, e.g.
[https://github.com/ioquatix/relaxo-model](https://github.com/ioquatix/relaxo-
model) which provides relational object storage and basic indexes (e.g. has
one, has many, etc)

------
dantiberian
This looks really cool! A few thoughts:

1\. The first thing I thought when I saw this is "How is this secure?". You're
wanting to store the most sensitive information a business has - credentials +
production DB. I took a look around the site + Google and couldn't find
anything about security. Client side encryption of data seems like it would be
good to make people comfortable with storing their data at dothub. I'm not
sure if there is any use case for dothub having unencrypted data (at least not
yet)?

2\. "Application states" is quite a vague term, when I saw that I thought it
was referring to capturing the state of a running process. "A git-like CLI for
application states" is not a very compelling pitch. As others have noted, for
all but the most masochistic of users, "a git-like CLI" is a negative point.

The benefit you're offering is "Snapshot production data to be able to replay
in development" and "Snapshot failed CI builds to debug later". I'd recommend
putting those up-front and in bold. A more compelling tagline (to me) is
"Dotmesh - version control and snapshots for your production data".

~~~
lewq
Thank you for the great feedback!

1\. We mention encryption in the docs FAQ [https://docs.dotmesh.com/faq/#what-
do-you-encrypt](https://docs.dotmesh.com/faq/#what-do-you-encrypt) \-- where
did you look for it? Maybe we can make it easier to find. Noted about this
being a priority.

2\. Thanks for proposing the updated tagline! I'll run it past the team ;-)
we'll certainly develop more messaging and use cases around production data as
we develop the project beyond 0.1 :-)

~~~
dantiberian
> where did you look for it? Maybe we can make it easier to find. Noted about
> this being a priority.

I searched on [https://dotmesh.com](https://dotmesh.com) for "security" and
"encryption", searched Google for "site:dotmesh.com security", and tried going
to [http://dotmesh.com/security](http://dotmesh.com/security), but got nothing
for all three.

------
ferrantim
Congrats Luke and team. I'm curious, what did you learn at ClusterHQ with
Flocker that made you want to start dotmesh?

~~~
lewq
ClusterHQ was a fantastic learning experience. I'm proud of what we achieved
and the many strong relationships that were built in the team.

Ultimately the reason that ClusterHQ failed, I think, was that we believed we
had product-market fit before we really did, and we started scaling too soon.

When we started, it wasn't possible to connect storage to containers at all,
and so we had to put a lot of work into making that possible. And by the time
we'd got Flocker working reliably across AWS, GCE, OpenStack & a dozen or so
storage vendors, we'd been commoditized by Kubernetes.

Our premature scaling then made it harder to adapt as fast as we needed to.
Many lessons learned!

We're focusing on a rigorous approach to finding product-market fit, my
colleague Alice has written more about this here:
[https://dotmesh.com/blog/dotmesh-
hypotheses/](https://dotmesh.com/blog/dotmesh-hypotheses/)

------
lewq
Try dotmesh here: [https://dotmesh.com/try-dotmesh/](https://dotmesh.com/try-
dotmesh/)

Learn about architecture, use cases (tutorials) & lots more:
[https://docs.dotmesh.com](https://docs.dotmesh.com)

------
grkvlt
Looks useful for QA testing of distributed systems. I can also see a use case
where I snapshot the state of one container from a node in a cluster then pull
it onto the next node as it starts up before joining. It could maybe make
things converge quicker in blockchain applications as well, where each new
node needs to get a copy of the entire chain before it can do useful work?

------
zdkaster
This sounds really cool way to manage the lifecycle of software. Will try it
out. Though my first experience after trying the live hosted tutorial at
[https://dotmesh.com/try-dotmesh/](https://dotmesh.com/try-dotmesh/)

"$ dm cluster init dm: command not found"

~~~
lewq
Did you run the curl command that's the first item in the tutorial?

~~~
zdkaster
No, I didn't, sorry. After dm installed and set up, it seems working good.
Great work, thanks. I wasn't aware that it's required installation process in
Katacoda, just followed the Deploying Dotmesh to Docker Step 1.

------
simplify
Would this concept allow users to share subsets of data between each other?
assuming they had their own nodes.

~~~
binocarlos
Thats the idea yes! you can run dotmesh on your own servers and install
locally on each users machine to then push and pull just like git remotes.
It's using copy-on-write so you are only pushing the difference. Another main
use case is for CI to consume volumes, run tests then snapshot the results.

We have a hosted service if you don't want to run your own nodes
([https://dothub.com](https://dothub.com)) but the server and client are both
open source. disclaimer: I work on the project

------
ivan_ah
This is very interesting. Are there any tools provided for "DB diffs"? e.g.
show exactly which rows are different between two snapshots?

It seems like dot* would have to know about the application logic to show
useful diffs, but maybe it can be done generically at the DB level.

~~~
lewq
I love your thinking, Ivan!

We have an issue for this here: [https://github.com/dotmesh-
io/dotmesh/issues/85](https://github.com/dotmesh-io/dotmesh/issues/85)

------
sjellis
Awesome. I literally started writing a small tool for managing state
yesterday, because we really do need smarter ways to move application datasets
around.

~~~
lewq
Interesting! We should compare notes! Join our slack (in website footer) and
we can chat? Or I'm @lmarsden on Twitter :-)

------
nojvek
I read through. Still have no idea why I would use it.

------
raoulj
Maybe it's just me, but while I find the graphics informative on the landing
page, I wonder if they could be made to be more easily understandable.

~~~
lewq
Hey, thanks for the feedback!

Does this help?
[https://docs.dotmesh.com/concepts/architecture/](https://docs.dotmesh.com/concepts/architecture/)

~~~
deitcher
Would you mind sharing more details about how they are confusing? Always happy
to take feedback. Feel free to comment here or on the community Slack,
although a GitHub issue may be the best place. Whatever works... and much
appreciated.

