
Gopli – Database backup between remote or local hosts, written in Golang - timakin
https://github.com/timakin/gopli
======
caleblloyd
Does this sync / delete tables in the correct order so that there are no
foreign key constraint issues? Also, does it diff rows to tell what rows have
actually changed or does it attempt to update every row every time? Does it
handle cases where primary keys are not configured?

Percona's pt-table-sync has quite an extensive man page and explains how it
accomplishes these. It may be a good reference to update your README.md to
explain how the syncing works in gopli: [https://www.percona.com/doc/percona-
toolkit/2.2/pt-table-syn...](https://www.percona.com/doc/percona-
toolkit/2.2/pt-table-sync.html)

~~~
timakin
This reference looks nice. Thank you for your comment. Gopli's README is so
plain and just a kind of overview. So I should write down the process of
syncing and notice about key-constraint and so on.

------
tkyjonathan
This looks nice, but what is wrong with: mysqldump -h 127.0.0.1 --database
production | mysql -h yyy.yyy.yyy.yyy staging -f ?

~~~
tkyjonathan
Is this for replication or maybe updating developer environments with fresh
data? If the latter, maybe some more options like "WHERE date <= curdate() -
interval 30 day"

------
alixaxel
MySQL only. Would be nice to know without delving into the code. ;)

~~~
karmakaze
Exactly, I just read through it myself to realize it's a wrapper for the
'mysql' command. `management_system` needs to be documented.

~~~
timakin
Hi, Thank you for checking my code and README. I'll take an effort to let
Gopli work on other data storages. So please wait for that

~~~
alexbanks
In the meantime it might be wise to simply update the readme.

------
hardwaresofton
This is actually pretty interesting -- Makes me think one could actually move
the problem of sharding/replicating a database completely out of the database
layer itself, and instead just run the database through the add on?

Has anyone done this or know about any tools that take this approach? I mean
like, instead of worrying about distributing, sharding, or replicating
postgres/mysql/any-other-sql-compliant-db , why not just write mysql-dependent
but a implementation agnostic layer that sits on top and does the
coordination/sync bit? Sprinkle a little etcd and call it a day?

~~~
caleblloyd
I've seen analytics services do this - periodic sync of multiple data sources
into their analysis service, which is usually BigQuery or some OLAP system.

I don't think it would be a good idea for replication though. General syncing
is super intensive, you have to diff everything between the two databases.
Major DBMSs all have better replication processes that avoid this, in MySQL's
case the slaves are synchronized using the master's binary log so they replay
transactions that happen on master. It's far faster and allows for near real-
time synchronization.

~~~
hardwaresofton
What I was suggesting was moving the binary log replaying OUT of the database
layer, which is where I imagined the efficiency losses might arise.

I'm still fleshing this out in my head, but the idea is like: maintain a log
of SQL commands that were to be done on the database at a layer right _before_
the database layer. then, as writes come in, they go to the layer-before, get
replicated/distributed, and get copied to every node, and you could get easy
(?) multi-master support. At the very least, you could do master-slave
replication at the same cost that databases are doing it now, minus any
inefficiencies from not using the specialized binary log each DB uses today. I
guess the pre-database layer would do more gossiping than anything (in a
multi-master type setup).

The real benefit here is that you would then be able to distribute any
database that just supported SQL instantly (and even some that don't, with a
properly written adapter), by composing it with another external (rather than
internal) management process. There might even be some efficiencies to be
gained if the databases are sharded -- localized data might go to the wrong
node, but live there long enough to get reads and stuff like that.

I guess maybe instead of saying I'm thinking of building a layer on top, I'd
rather pull binary log replication out of the databases themselves, and find
some close-to-SQL log of commands that all DBs can share.

