GitDB, a distributed embeddable database on top of Git

rmbyrro · on July 7, 2022

If the rest of the world were like the software industry, we would see news like this on Facebook:

"KayaCar, a distributed kayak with wheels to drive on the streets of San Francisco"

sophacles · on July 7, 2022

I would consider using facebook if there were articles like that. Not sure why you consider this a problem... "Because I can" is a good enough reason to do anything that doesn't cause harm to others.

remram · on July 7, 2022

I think if someone made a KayakCar, we would still hear about it on HN not Facebook.

rmbyrro · on July 7, 2022

I did not make any judgement, just noting that this mindset is quite peculiar to the software industry.

Might be because of my ignorance of other areas of human activity, though...

chrsig · on July 7, 2022

> "Because I can" is a good enough reason to do anything that doesn't cause harm to others.

It's hard to tell in advance how any action may impact others. The extreme example: jumping off a bridge doesn't physically harm others. But having to look at, examine, or otherwise interact with your corpse may inflict trauma on others.

elpakal · on July 7, 2022

Isn't there also an environmental cost for all things blockchain because of power consumption? That doesn't directly impact others but might eventually.

TickleSteve · on July 7, 2022

"proof of work" is the computationally expensive thing... a block-chain is simply a data structure with many uses outside of cryptocurrencies.

rmbyrro · on July 7, 2022

Are you advocating for suicide on a crematory?

swyx · on July 7, 2022

did you just compare releasing a new OSS project to suicide?

elpakal · on July 7, 2022

I'm not on Facebook but from what I hear this is an article you are likely to find on there.

cultofmetatron · on July 7, 2022

> a distributed kayak with wheels

ahem... https://www.youtube.com/watch?v=bHpCEUSWt7Q&ab_channel=BeTRI...

mwexler · on July 7, 2022

Using git for so many things (it's a database! it's a blockchain! it's a dessert topping! It's a floor wax! [1]) does start to smell of "I have a hammer, and luckily everything looks like a nail if I squint enough". But it is fun to see how folks have stretched git into so many directions, even if some of them are more useful than others.

[1] SNL, 1976: https://www.nbc.com/saturday-night-live/video/shimmer-floor-... (may not work in all regions, sorry)

(edited link)

danielvaughn · on July 7, 2022

I don't see it that way - version control in general turns out to be highly applicable in many areas far beyond source code. I think it's still under-utilized.

pknopf · on July 7, 2022

Medical industry, design history files, etc.

partdavid · on July 7, 2022

Yes, it's extremely common to see applications which need a datastore to end up needing lots of version control type features later on, which end up implemented ad-hoc over a general database.

qbasic_forever · on July 7, 2022

Git is an abstraction over file trees (or even more generally just chunks of data), and it turns out a ton of problems in software are easily modeled and dealt with as files in a tree.

bastawhiz · on July 7, 2022

Lots of problems in software are easily modeled as ordered lists, but that doesn't mean linked lists are the right tool for accomplishing that.

dgllghr · on July 7, 2022

Looks very cool! Reminds me of Irmin (https://irmin.org/) but with a much simpler and more intuitive interface

anentropic · on July 7, 2022

I had same thought!

I dislike Go, but I bet gitdb is easier to use

uniqueuid · on July 7, 2022

Very cool. Now we need exactly this but for sqlite. Decentral offline databases that can sync.

_wjtv · on July 7, 2022

It doesn't appear to be doing much more than replication though. https://litestream.io does that for sqlite already. (Edit for clarity) GitDB seem to have more "distributed" ideas in their roadmap for v3 maybe.

uniqueuid · on July 7, 2022

Litestream is awesome, but I don't see that it enables multi-way merges yet. Am I wrong?

benbjohnson · on July 7, 2022

Litestream author here. You're correct. Also, there's no plans for multi-master replication right now. I do think it'd be an interesting project to make a eventually consistent distributed database using the SQLite session extension[1] but I haven't put much thought into that.

[1]: https://www.sqlite.org/sessionintro.html

uniqueuid · on July 7, 2022

I keep hoping that you'll eventually write this :)

Let me know when you do so I can chip in with some money.

_wjtv · on July 7, 2022

GitDB also doesn't handle merge conflicts, so I'm not convinced it would handle multiple clients well. The It's again marked as something they might do in the future someday. I suspect this is pretty fragile. (Note: I was _not_ claiming litestream would do such a thing. Just replication.)

qbasic_forever · on July 7, 2022

Are you sure you want to use sqlite if your workload is mostly multiple, independent writers?

uniqueuid · on July 7, 2022

Yes because I want to do offline sync (i.e. latency between syncs anything between 1 second and 1 week), yet index and query the current local state (which includes previously synced items).

If this sounds like an append-only log that's precisely correct; I just want it to work efficiently with fixed memory and ideally as an embedded library. Thus sqlite.

lifty · on July 7, 2022

Curious how this compares to Dolt or Noms.

hiccuphippo · on July 7, 2022

I guess this would allow git to have its own issue tracker and forum like fossil does?

qbasic_forever · on July 7, 2022

You can already store and sync issues in git repo metadata, check out git-bug: https://github.com/MichaelMure/git-bug

layer8 · on July 7, 2022

I believe one can say that it’s actual Git data, not just metadata.

tzahifadida · on July 7, 2022

Really don't get it and would like to understand. Let's say I don't need a full DB. Why use it instead of, let's say, sqlite? I know devops uses git itself for storing and pulling out stuff for their automations. Is it in that direction?

qbasic_forever · on July 7, 2022

Putting a sqlite db in a git repo doesn't work well, it's just an opaque blob to git so you can't merge changes beyond "wipe out every other change and accept this as the new complete state of the world".

beagle3 · on July 7, 2022

Disconnected remote work.

Git pull to receive updates, Git push to send your updates. No idea how/if it handles merge conflicts.

Also - you could have db “branches”.

encryptluks2 · on July 7, 2022

I've been imagining a similar concept, but using linting rules to create an optimized file tree with a SQL API to manipulate and query the data programmatically. You could even use a traditional text editor to manage the data.

amelius · on July 7, 2022

How does merging work? Would it even be acceptable to allow automatic merging in all types of application?

uniqueuid · on July 7, 2022

To be honest, I would be completely fine with a use case that offloads conflict prevention to the application.

I think there's a lot of cases where you can guarantee that your data conforms to some sort of CRDT [1] and therefore massively reduce complexity in the DB.

[1] even if it's last-write-wins

ltbarcly3 · on July 7, 2022

How is this distributed in anything but the most trivial sense? In the documentation it literally says this is not distributed, and there is a centralized git server.

If the creator doesn't know how common database terms are used they probably don't know much about writing a database.