Hacker News new | past | comments | ask | show | jobs | submit login
GitDB, a distributed embeddable database on top of Git (github.com/gogitdb)
98 points by fiatjaf on July 7, 2022 | hide | past | favorite | 38 comments



If the rest of the world were like the software industry, we would see news like this on Facebook:

"KayaCar, a distributed kayak with wheels to drive on the streets of San Francisco"


I would consider using facebook if there were articles like that. Not sure why you consider this a problem... "Because I can" is a good enough reason to do anything that doesn't cause harm to others.


I think if someone made a KayakCar, we would still hear about it on HN not Facebook.


I did not make any judgement, just noting that this mindset is quite peculiar to the software industry.

Might be because of my ignorance of other areas of human activity, though...


> "Because I can" is a good enough reason to do anything that doesn't cause harm to others.

It's hard to tell in advance how any action may impact others. The extreme example: jumping off a bridge doesn't physically harm others. But having to look at, examine, or otherwise interact with your corpse may inflict trauma on others.


Isn't there also an environmental cost for all things blockchain because of power consumption? That doesn't directly impact others but might eventually.


"proof of work" is the computationally expensive thing... a block-chain is simply a data structure with many uses outside of cryptocurrencies.


Are you advocating for suicide on a crematory?


did you just compare releasing a new OSS project to suicide?


I'm not on Facebook but from what I hear this is an article you are likely to find on there.



Using git for so many things (it's a database! it's a blockchain! it's a dessert topping! It's a floor wax! [1]) does start to smell of "I have a hammer, and luckily everything looks like a nail if I squint enough". But it is fun to see how folks have stretched git into so many directions, even if some of them are more useful than others.

[1] SNL, 1976: https://www.nbc.com/saturday-night-live/video/shimmer-floor-... (may not work in all regions, sorry)

(edited link)


I don't see it that way - version control in general turns out to be highly applicable in many areas far beyond source code. I think it's still under-utilized.


Medical industry, design history files, etc.


Yes, it's extremely common to see applications which need a datastore to end up needing lots of version control type features later on, which end up implemented ad-hoc over a general database.


Git is an abstraction over file trees (or even more generally just chunks of data), and it turns out a ton of problems in software are easily modeled and dealt with as files in a tree.


Lots of problems in software are easily modeled as ordered lists, but that doesn't mean linked lists are the right tool for accomplishing that.


Looks very cool! Reminds me of Irmin (https://irmin.org/) but with a much simpler and more intuitive interface


I had same thought!

I dislike Go, but I bet gitdb is easier to use


Very cool. Now we need exactly this but for sqlite. Decentral offline databases that can sync.


It doesn't appear to be doing much more than replication though. https://litestream.io does that for sqlite already. (Edit for clarity) GitDB seem to have more "distributed" ideas in their roadmap for v3 maybe.


Litestream is awesome, but I don't see that it enables multi-way merges yet. Am I wrong?


Litestream author here. You're correct. Also, there's no plans for multi-master replication right now. I do think it'd be an interesting project to make a eventually consistent distributed database using the SQLite session extension[1] but I haven't put much thought into that.

[1]: https://www.sqlite.org/sessionintro.html


I keep hoping that you'll eventually write this :)

Let me know when you do so I can chip in with some money.


GitDB also doesn't handle merge conflicts, so I'm not convinced it would handle multiple clients well. The It's again marked as something they might do in the future someday. I suspect this is pretty fragile. (Note: I was _not_ claiming litestream would do such a thing. Just replication.)


Are you sure you want to use sqlite if your workload is mostly multiple, independent writers?


Yes because I want to do offline sync (i.e. latency between syncs anything between 1 second and 1 week), yet index and query the current local state (which includes previously synced items).

If this sounds like an append-only log that's precisely correct; I just want it to work efficiently with fixed memory and ideally as an embedded library. Thus sqlite.


Curious how this compares to Dolt or Noms.


I guess this would allow git to have its own issue tracker and forum like fossil does?


You can already store and sync issues in git repo metadata, check out git-bug: https://github.com/MichaelMure/git-bug


I believe one can say that it’s actual Git data, not just metadata.


Really don't get it and would like to understand. Let's say I don't need a full DB. Why use it instead of, let's say, sqlite? I know devops uses git itself for storing and pulling out stuff for their automations. Is it in that direction?


Putting a sqlite db in a git repo doesn't work well, it's just an opaque blob to git so you can't merge changes beyond "wipe out every other change and accept this as the new complete state of the world".


Disconnected remote work.

Git pull to receive updates, Git push to send your updates. No idea how/if it handles merge conflicts.

Also - you could have db “branches”.


I've been imagining a similar concept, but using linting rules to create an optimized file tree with a SQL API to manipulate and query the data programmatically. You could even use a traditional text editor to manage the data.


How does merging work? Would it even be acceptable to allow automatic merging in all types of application?


To be honest, I would be completely fine with a use case that offloads conflict prevention to the application.

I think there's a lot of cases where you can guarantee that your data conforms to some sort of CRDT [1] and therefore massively reduce complexity in the DB.

[1] even if it's last-write-wins


How is this distributed in anything but the most trivial sense? In the documentation it literally says this is not distributed, and there is a centralized git server.

If the creator doesn't know how common database terms are used they probably don't know much about writing a database.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: