
JSON Changelog with SQLite - iffycan
https://blog.budgetwithbuckets.com/2018/08/27/sqlite-changelog.html
======
derefr
I've seen this pattern a couple of times (e.g. for broadcasting changes using
Postgres's LISTEN/NOTIFY.)

Each time, I wonder why people are recreating the database's journal, within
the database, where this meta-journal will _itself_ be getting journaled.

Why not just consume the actual DB journal?

I'm not sure how that'd work for SQLite—it'd probably require a hack of some
sort, since SQLite's journal is an ephemeral part of the DB file. But for more
full-fledged RDBMSes, this would just be a matter of hijacking a streaming-
replication-based backup system to act as a store of record for events. E.g.,
in the case of Postgres, you'd just have an ETL pipeline pointed at your WAL-E
bucket, parsing through your WAL files either to load into a separate OLAP
store, or for stateless map-reduction whenever you need to answer a question.

~~~
anarazel
> Why not just consume the actual DB journal?

It's decidedly non-trivial to do so, as usually the journal doesn't contain
all the information to do so. Journal writes are often a bottleneck so DB
engines try to restrict their contents to just the necessary parts. Typically
it'll e.g. not contain information about table schemas etc.

You can do so, see e.g. postgresql's logical decoding, but it's plenty of
additional work.

~~~
derefr
> Journal writes are often a bottleneck

The point I was trying to make is that the alternative—writing all your DB
activity _into_ the DB—means that you're already, 100% guaranteed, creating
_more_ overhead than the heaviest possible journal-safety properties would
create. You're writing unabridged journal entries _into_ the DB, where they
then get even more journal attached.

There is no world where it makes _more_ sense to write a copy of everything
you do into an "everything I did" table, than it does to use an append-only /
immutable / log-centric DBMS (where "Postgres with an ETL pipeline pointed at
its WAL logs" is a rather hacky immutable DBMS.)

If you know, before you ever start coding, that you will need to know
everything that ever happened, writing everything twice—when your DBMS is
_already_ writing everything twice—is a rather silly way to _choose_ to go
about solving your problems. You're already in a world where you need
different guarantees than a traditional (ephemeral) RDBMS gives you; you just
haven't realized it yet.

~~~
cm2187
Perhaps it’s a dev vs ops problem. Only ops see the journal, understand it,
access it. If the dev want a journal they end up building their own.

~~~
WorldMaker
It's also a public versus private API issue. Most databases consider their
journal a private implementation detail and don't recommend programming
directly against it, as implementation can change as the developers change
internal details from database software version to version, or even on a
runtime whim as the system's query optimizer/write buffer state/etc needs
change over time in some of the databases.

For instance, Microsoft SQL Server has a built-in public change tracking API,
which you can tell is a leaky abstraction over top of the journal, but SQL
Server makes an API guarantee when using that specific change tracking API
whereas (at least the last time I looked) it will never make API guarantees
about anything lower level than that that accesses its journal. The Change
Tracking API documentation even warns you that it locks the journal into
certain deoptimizations and that there are clear performance trade-offs, which
also hints that using the lower level journal API more directly would likely
hit optimization problems and data loss issues.

As a dev, it's rarely a good idea to program against APIs your vendor
intentionally deems to be private/implementation details that can change
between versions/may be unstable in runtime operation. If the database doesn't
offer a public journaling API, then yes, sadly, sometimes the best or at least
least-worst answer is to build a sub-par journal inside the database itself,
with of course its own tradeoffs in performance.

~~~
anarazel
> For instance, Microsoft SQL Server has a built-in public change tracking
> API, which you can tell is a leaky abstraction over top of the journal,

Postgres' logical decoding [1] provides something very similar.

[1]:
[https://www.postgresql.org/docs/current/static/logicaldecodi...](https://www.postgresql.org/docs/current/static/logicaldecoding.html)

------
cannadayr
I solved this problem by wrapping sqldiff (included w/ sqlite src) as a custom
git diff driver.

[https://github.com/cannadayr/git-sqlite](https://github.com/cannadayr/git-
sqlite)

------
radiospiel
The sqlite sessions extension seems to be a good foundation for getting
changelogs out of a sqlite database. It is sadly not too well known:
[https://www.sqlite.org/sessionintro.html](https://www.sqlite.org/sessionintro.html)

------
groue
Related article: "Automatic Undo/Redo Using SQLite"
[https://www.sqlite.org/undoredo.html](https://www.sqlite.org/undoredo.html)

~~~
iffycan
Thank you for posting this. A former version of my post included a reference
to this because it's a great read, too.

------
Scaevolus
Neat! If you wanted to extend this more (string diffs, non-json encodings), a
custom function would be the next logical step. In Python it might be:

    
    
        import sqlite3
    
        def delta(*args):
            ret = {}
            for name, old, new in zip(args[::3], args[1::3], args[2::3]):
                if old != new:
                    ret[name] = old
            if ret:
                return ret
    
        db = sqlite3.connect(':memory:')
        db.create_function('delta1', delta, 3)
        db.create_function('delta2', delta, 6)
        db.create_function('delta3', delta, 9)
    
        print db.execute('select delta2("b", "20", "20", "c", 3.4, 3.6)').fetchone()
        >> (u'{"c": 3.4}',)

~~~
swuecho
in python 3.6

    
    
        import sqlite3
        import json
    
        def delta(*args):
            ret = {}
            for name, old, new in zip(args[::3], args[1::3], 
                args[2::3]):
                if old != new:
                   ret[name] = old
            if ret:
               return json.dumps(ret)
    
        db = sqlite3.connect(':memory:')
    
        db.create_function('delta1',3, delta)
        db.create_function('delta2',6, delta)
        db.create_function('delta3',9, delta)
        print(db.execute('select delta2("b", 20, 20, "c", 3.4, 
        3.6)').fetchone())
        # ('{"c": 3.4}',)

------
janemanos
ArangoDB also has a first tool to listen to changes in the database and
execute actions or queries. It is limited to a single node. Question is how
scalable all of these tools are [https://www.arangodb.com/2017/03/arangochair-
tool-listening-...](https://www.arangodb.com/2017/03/arangochair-tool-
listening-changes-arangodb/)

