
Show HN: Stellar – Git for PostreSQL and MySQL - obsession
https://github.com/fastmonkeys/stellar
======
robert_tweed
Generally the hardest thing with version control on a database (for an
evolving codebase) is separating unrelated changes - such as schema changes vs
content updates - and branching and merging those changes in sync with the
code dependencies. Another issue is non-destructively replaying development
changes into test/production environments.

So for example, you might have a feature branch that includes some schema
changes and some value modifications, and a content branch that includes a
bunch of inserts into a few content tables that happen to include foreign key
references to each other (so you need to maintain referential integrity when
replaying those updates/inserts).

I don't see anything in the description that indicates this tool address those
problems. For me, those are really the only problems that a DB version control
system ought to be focused on. Speed of snapshotting is not all that important
in a development environment as you typically work on a cut-down dataset
anyway. A minute or so to take a snapshot a few times a day isn't a huge deal,
whereas taking more frequent snapshots doesn't seem like something that adds
any value, if it doesn't address any of the other problems.

~~~
baudehlo
This sounds like something Sqitch would help with.

[http://sqitch.org](http://sqitch.org)

~~~
mbucc
FWIW, I found shmig [1] much simpler than sqitch.

Last I used sqitch, it returned a zero value on error, which made me leery of
using it for scripted deployments.

[1] [https://github.com/naquad/shmig](https://github.com/naquad/shmig)

~~~
justtheory
Please file a bug report.
[https://github.com/theory/sqitch/issues](https://github.com/theory/sqitch/issues).

------
mbrock
I wish projects like these would always include some basic info in their
README about: (1) how it works, and (2) how it might fail.

~~~
davesque
There are only like 700 lines of Python code. Figure it out and submit a pull
request to update the README.

~~~
mbca
So it's yet another toy project with grandiose, overblown claims ("git for
databases"? seriously?!)

An honest update to the README would be to take the whole thing down several
notches, if nothing else then to avoid the kind of confusion the post you're
replying to expresses.

~~~
davesque
I really don't think we have to get so worked up about that kind of thing. If
you don't like it, just ignore it.

------
amirmc
If anyone's interested in git-like storage systems then it's work checking out
Irmin [1]. Previous discussion is at [2].

Excerpt: _" Irmin is a library to persist and synchronize distributed data
structures both on-disk and in-memory. It enables a style of programming very
similar to the Git workflow, where distributed nodes fork, fetch, merge and
push data between each other. The general idea is that you want every active
node to get a local (partial) copy of a global database and always be very
explicit about how and when data is shared and migrated

Irmin is _not _, strictly speaking, a full database engine. It is, as are all
other components of Mirage OS, a collection of libraries designed to solve
different flavours of the challenges raised by the CAP theorem. Each
application can select the right combination of libraries to solve its
particular distributed problem. "_

[1] [http://openmirage.org/blog/introducing-
irmin](http://openmirage.org/blog/introducing-irmin)

[2]
[https://news.ycombinator.com/item?id=8053687](https://news.ycombinator.com/item?id=8053687)

~~~
sgrove
I've spent a day looking at Irmin, trying to figure out the JSON CRUD
interface, but there's just no documentation, and the irc channel is pretty
much dead. The code looks good, but it's my first introduction to OCaml,
coming from Clojure. Would love to see some more documentation on it.

~~~
amirmc
Irmin's still being developed, hence not many docs at the moment. Apologies
there was no-one responding in irc (I assume you mean #mirage - there are
usually 30+ folks there). The mailing list is quite active and is a good place
to pose questions.

[http://lists.xenproject.org/cgi-
bin/mailman/listinfo/mirageo...](http://lists.xenproject.org/cgi-
bin/mailman/listinfo/mirageos-devel)

------
falcolas
So, it appears to just copy tables around within the database. I wouldn't want
to use this on a DB over a few MB in size. Sure, restores are "fast" (a table
rename), but copies are not so much.

I can't imagine this would be kind to a production database (lots of cleanup
from copied & deleted tables), and would consume a lot more space than a
gripped logical backup of the tables in question.

~~~
obsession
I have regularly used this with database that's nearing 1000 megabytes. I
don't particularly mind slow snapshotting because my workflow is more about
restoring database back to baseline than taking copies.

Please don't use this for production. It is not stable enough and you only end
up with lost data.

~~~
falcolas
Why not use a binary backup method? Faster to backup and restore.

~~~
martin_
Agreed. I've been successfully using mylvmbackup on 10gb+ databases for a few
years now.

[http://www.lenzg.net/mylvmbackup/](http://www.lenzg.net/mylvmbackup/)

~~~
fud
We use our own lvm solution at work, works great, we do something like pgsnap
to swap to the snapshot or pgsnap master to work on the master. I'll check
this out.

------
m3h
Why does the author compare it to Git? The functions this software performs
are no where near those performed by Git. Nor it is a proper version control
system.

------
lucian1900
This sort of thing is useful, but already supported by Postgres through
transactional DDL. Migrations that fail will have their transaction reverted.

~~~
killing_time
The use case which this really excites me about is automated testing from the
GUI level. If the performance is good enough, this would be really useful for
restoring DB state in between tests.

~~~
j_s
ASP.NET developers using TransactionScope can use
[http://amnesia.codeplex.com/](http://amnesia.codeplex.com/) to manage
distributed transactions for automated UI testing.

------
Gigablah
From the code:

    
    
        INSERT INTO %s.%s SELECT * FROM %s.%s
    

Yeah, good luck with that.

~~~
kstrauser
What's wrong with that? _Assuming_ you escape the table names correctly, that
seems reasonable. And if you can't escape the table names, you're going to
have a hard time dynamically generating queries anyway. Parameterized queries
are a baseline requirement for _values_ , but are rarely supported by client
libraries for things like table or schema names.

~~~
Gigablah
Nothing wrong with the syntax, it's just that if you're going to create an
entire mirror of your database, you might as well use mysqldump. (Plus it'll
actually be portable).

------
bronson
Nice. I wrote a similar tool for Rails / ActiveRecord models:
[https://github.com/bronson/table_differ](https://github.com/bronson/table_differ)

It takes snapshots and computes diffs between snapshots or the live database.
It lets me drop and re-import some of my app's tables, then compute the
minimum set of changes between the previous import and the new import. I
wouldn't call it "git for ActiveRecord models" but it appears to be similar to
this project.

Comments welcome! The docs, as always, could use some help.

------
squigs25
The implications for this extend beyond backing up your database.

Imagine a world where daily time-series data can be stored efficiently: This
is a lesser known use case, but it works like this: I'm a financial company
and I want to store 1000 metrics about a potential customer. Maybe the number
of transactions in the past year, the number of defaults, the number of credit
cards, etc.

Normally I would have to duplicate this row in the database every
day/week/month/year for every potential customer. With some kind of git-like
storing of diffs between the row today and the row yesterday, I could easily
have access to time series information without duplicating unchanged
information. This would accomplish MASSIVE storage savings.

FWIW efficiently storing time series data is big problem at my company. No off
the shelf solution makes this easy for us right now, and we would rather throw
cheap hard disk at the problem rather than expensive engineers.

~~~
symfrog
Have you looked at Datomic? It seems to fit your problem description well.

~~~
AlwaysBCoding
+1 for Datomic, seems to be right in the wheelhouse for this problem.

------
crad
Maybe I'm missing something, but I didn't see anything with regard to indexes,
users, stored procedures, views or what not.

Seems like it's for table schema snapshotting in a database without any
external storage.

Browsing through the code, I see that it's highly table centric using
SQLAlchemy.

------
swehner
Line 53 of
[https://github.com/fastmonkeys/stellar/blob/master/stellar/o...](https://github.com/fastmonkeys/stellar/blob/master/stellar/operations.py)
is

    
    
                    CREATE TABLE %s.%s LIKE %s.%s
    

This made me think of a table called

    
    
                    create table `a; drop table users;`  (col int);
    

... which works in mysql.

I don't know if the stellar code will trip over something like this. But mysql
(SQL) shouldn't even allow names like that.

~~~
obsession
Yes, Stellar will probably trip over this. Similarly, if the attacker can edit
stellar.yaml, they probably can edit your .bashrc as well.

Im adding this to my TODO list.

------
codeoclock
Unfortunate name, excellent project :)

~~~
gexla
To elaborate on what this comment may be referring to. A Ripple
(cryptocurrency) fork was recently released with the name of Stellar.

~~~
uptown
It's also the phonetical-match to an iPhone app:

[https://steller.co/](https://steller.co/)

------
jdc0589
Shameless plug for mite: [https://github.com/jdc0589/mite-
node](https://github.com/jdc0589/mite-node) Simple migrations that take
advantage of everything you already know about git and sql, plus some other
cool stuff.

It's not too mature yet, the readme is mediocre at best, and it has some
issues that will popup when working with a team, but it's pretty damn useful.

------
jimktrains2
While not exactly the same thing, I've recently found and started using
[https://github.com/nkiraly/DBSteward](https://github.com/nkiraly/DBSteward)
to specify schema and then store the spec in my repo with the code. It also
supports diffing the current schema against a previous one, so that nice
upgrade sql scripts can be generated.

------
jamesmoss
Interestingly they don't show MySQL benchmarks in the readme; I suspect it
might be because the MySQL implementation is pretty basic

[https://github.com/fastmonkeys/stellar/blob/master/stellar/o...](https://github.com/fastmonkeys/stellar/blob/master/stellar/operations.py#L48-L71)

~~~
obsession
MySQL support was definitely an afterthought and could probably be improved
(maybe tracking the binary files directly?).

------
iurisilvio
I expected something related with Stellar coins.

Looks like a good project, I definitely want an easy way to manage development
databases.

------
level09
This is a nice project. I used to have my database dump tracked by git (in
binary mode). anytime my db changes I'll have to overwrite the file with the
new database dump and include it with the commit.

I'm just wondering if this project offers anything special/better than the
method I described.

------
iso8859-1
how does this compare to time travel queries?
[http://en.wikipedia.org/w/index.php?title=Temporal_database#...](http://en.wikipedia.org/w/index.php?title=Temporal_database#cite_ref-
SQL2011_1-2)

~~~
walterbell
Any databases that you would recommend for for temporal logic?

------
JohnDotAwesome
How does it work? Where does it breakdown? Why are these things not in the
README?

~~~
mixologic
Because its not doing anything magic whatsoever.

[https://github.com/fastmonkeys/stellar/blob/master/stellar/o...](https://github.com/fastmonkeys/stellar/blob/master/stellar/operations.py#L35-L73)

------
ZenoArrow
Just a small correction; it's not PostreSQL, it's PostgreSQL.

------
edem
Folks might confuse this with the Stellar currency (stellar.org). You might
give some emphasis in the title.

------
mosselman
Looks very nice, could you put up some practical examples?

------
josephcooney
Typo? Shouldn't it be PostgreSQL not PostreSQL?

