

Ask HN: What sort of database would be a fast backing store for git? - andrewstuart

In this thread, mhodgson explains why Postgres would not be a fast data store for git. https:&#x2F;&#x2F;news.ycombinator.com&#x2F;item?id=9833548<p>mhodgson says: &quot;Once you understand how git works under the hood it&#x27;s actually fairly easy to predict that performance will be poor. A simple checkout involves accessing 100s if not 1000s of objects. Also, you can&#x27;t fetch these all at once because the objects you need to fetch are determined based on a nested tree. So you have to query the tree all the way down, getting each nested tree or blob based on the previous tree&#x27;s contents. So ultimately you&#x27;re doing 100s-1000s of queries for any given git command. Each query is fast, but even at 1-2 ms per query it adds up quickly.&quot;<p>So the question is, what sort of data store would be extremely fast and scalable for storing git?  A graph database? A key&#x2F;value store?  Seems a little odd that the plain old file system is the fastest option.
======
anarazel
> mhodgson says: "Once you understand how git works under the hood it's
> actually fairly easy to predict that performance will be poor. A simple
> checkout involves accessing 100s if not 1000s of objects. Also, you can't
> fetch these all at once because the objects you need to fetch are determined
> based on a nested tree. So you have to query the tree all the way down,
> getting each nested tree or blob based on the previous tree's contents. So
> ultimately you're doing 100s-1000s of queries for any given git command.
> Each query is fast, but even at 1-2 ms per query it adds up quickly."

You can do such queries on the server side if necessary. Doubt that'll
immediately be possible with the given API; but there's nothing stopping you
from doing that.

> So the question is, what sort of data store would be extremely fast and
> scalable for storing git? A graph database? A key/value store? Seems a
> little odd that the plain old file system is the fastest option.

But git is not a "plain old file system". It has its own indexing and builds
up search datastructures in memory.

------
gull
Indexing speeds up a data store. If an extra layer that builds an index was
added to git, or was build on top of git, that would make git queries faster.

This changes the question. Could one build an index for git? Why doesn't git
have an index already?

