
Ask HN: Any C programming Postgres wizards willing to port 460 lines of code? - andrewstuart
I&#x27;d really like to use Postgres as a backing store for libgit2 https:&#x2F;&#x2F;libgit2.github.com&#x2F;<p>I wonder if there are any kind C programming wizards who know Postgres and might consider doing the open source port? It&#x27;s beyond my Python programming skills and I dare not write crappy C code for fear of creating something nasty and insecure.<p>I can repay either by reciprocating with Python&#x2F;web development&#x2F;Linux&#x2F;AWS knowledge, or if I have nothing of value to offer then I can offer thanks and praise.<p>The existing MySQL implementation is 460 lines of code.<p>There&#x27;s a MySQL implementation here:
https:&#x2F;&#x2F;github.com&#x2F;libgit2&#x2F;libgit2-backends&#x2F;blob&#x2F;master&#x2F;mysql&#x2F;mysql.c<p>There&#x27;s a sqlite implementation too:
https:&#x2F;&#x2F;github.com&#x2F;libgit2&#x2F;libgit2-backends&#x2F;blob&#x2F;master&#x2F;sqlite&#x2F;sqlite.c<p>Some relevant links:
http:&#x2F;&#x2F;blog.deveo.com&#x2F;your-git-repository-in-a-database-pluggable-backends-in-libgit2&#x2F;
======
mhodgson
Lucky for you I actually did exactly this over a year ago. We're not using it
anymore so I'll just open source it for you:
[https://gist.github.com/mhodgson/d29bbd35e1a8db5e0800](https://gist.github.com/mhodgson/d29bbd35e1a8db5e0800)

Please note that I also don't know much C, but this implementation does work.
Also included is a Postgres version of the Ref DB backend (so nothing hits the
filesystem). There are a few bits that are not implemented since we didn't
have use for the reflog and those parts are technically optional.

Would probably be good to get another set of eyes on this from someone much
more familiar with C.

Hope this helps!

~~~
andrewstuart
How awesome is that!

What is the license?

Any reason you didn't use it in the end? What was your use case for it?

Question for you... and I'll read the source in the morning...but just quickly
does it prevent storage of duplicate objects? That's one of the main things Im
interested in is saving space when multiple git repos contain exactly the same
object.

And THANKS again. Awesome. Can I do anything for you? Send you a bottle of
wine? Help with some Python or Linux? If you put your contact in your profile
I'll drop you an email.

~~~
mhodgson
Happy to help. The license is MIT (just added).

In terms of duplicating objects, I believe that if you do choose to store
objects from many repos in the same table, they will NOT be duplicated and you
will get your space savings. Don't take my word for it though.

We actually did use this code in production for a period of time. In the end
we realized that one of the main features of Git, immutability, didn't suit
our needs well and we designed a versioning system based closely on Git, but
built on Postgres directly. The main benefit of this is using primary keys as
the object ids, instead of hashes of the content. This means we can change the
content without changing the object's id (which in normal Git then means
changing the tree, commit, and every parent commit).

Good luck!

~~~
anulman
Did you consider leveraging the refdb to offer immutable primary keys?

I had been hacking together a Kyoto Tycoon-backed implementation for a project
(since dropped); our design exposed the ref id to the user (e.g. 'master',
'master/mhodgson', etc) and branch/merge as necessary. This way, our primary
keys remained a constant refName that pointed to the HEAD of a commit chain,
each of which referenced immutable commits/trees/blobjects.

Although my days of libgit2 hacking are long past, I'm very curious if/how our
design could have been improved; immutable pkeys were important for us as
well.

Github:
[https://github.com/anulman/libgit2/tree/kyoto/src/backends/k...](https://github.com/anulman/libgit2/tree/kyoto/src/backends/kyotoTycoon)

~~~
mhodgson
I'm not sure I follow. Our use case required the ability to easily update
blobs (in this case formatted written text) without having to rewrite history
every time. I don't immutable ref ids addresses that particular requirement...

~~~
anulman
Not sure they would either, though perhaps a use case for git_commit_amend
[1]?

Regardless, sounds fairly implementation-specific. Think I just followed you
on Twitter, happy to discuss further offline.

[1]
[https://libgit2.github.com/libgit2/#HEAD/group/commit/git_co...](https://libgit2.github.com/libgit2/#HEAD/group/commit/git_commit_amend)

------
csomar
I have no problem with such requests, and I think there should be more. But
there is one issue with that. What's your motive?

Are you looking for this to complete a $xxK client project? Or are you looking
for this to help students understand a C implementation?

It makes a world of difference. If you are getting paid for the result, then
you should share with the developer. If you are not, and doing this for the
greater good, then it's fine.

We should really define the line here. There is enough exploitation in the dev
world and the last thing we want is developers exploiting other developers.

------
adwmayer
Have you tried
[https://github.com/davidbalbert/libgit2-postgresql](https://github.com/davidbalbert/libgit2-postgresql)?
I know nothing about libgit2, but came across this repo while looking it up.
If it's got some issues it's a great place to start from, since it also gives
you somebody who could code review and you don't need to start completely from
scratch.

~~~
lunixbochs
I did a quick review.

This project uses PQescapeLiteral() and asprintf() to build queries. I'd
recommend prepared statements over this approach (which is how the MySQL
backend builds queries).

It doesn't implement anything but write() and free() yet, so it's not actually
functional.

Unfortunately I believe this would clobber the library license (GPLv3 on this
Postgres plugin vs GPLv2 with a linking exception for libgit). With this in
mind I'd probably just recommend a rewrite. It's not that much code.

Update: it looks like the libgit2 backends aren't well maintained:
[https://github.com/libgit2/libgit2-backends/issues/13#issuec...](https://github.com/libgit2/libgit2-backends/issues/13#issuecomment-112319590)

~~~
andrewstuart
I did notice it's a different license. Do you think there is value in porting
the MySQL or SQLite version? They would share the libgit2 license.

------
mrkmcknz
I wish there were more Ask HN posts like this.

It's also a great chance for someone who perhaps is looking for a gig to put
some code up here that quite a few people will review.

------
lfowles
If it's so short, why not give it a shot and put it up for code review so
others can catch the nasty and insecure bits? Great learning experience :)

~~~
andrewstuart
It would be good if it was awesomely and expertly done rather than amateurish,
which my fumbling would result it.

Good idea though, time I started getting some C skills but on a less important
bit of code.

~~~
angersock
Unless you're doing this for a business use case (which your insistence on
expert quality and security earlier kinda suggest), it shouldn't matter.

Don't fish for free work when you could turn this into a learning opportunity.

Besides, why would you want to use PG as a backing store unless you were doing
weird multitenant stuff?

~~~
andrewstuart
I'm a sole developer in pyjamas coding in the lounge room trying to come up
with an idea that will make an income when I'm not doing my day job to pay the
bills, in case you're asking if I am a rich company trying to get free
development services. I'm happy to reciprocate technical services if what I
know is of value. I write open source code too so I'm contributing although
not to the libgit2 project.

Same as many other developers here on HN I suspect. Coding in the hope of
building something people want to use.

~~~
angersock
So, you're looking for somebody else to do at least some of the technical work
for a passive income project, without some sort of profit-sharing arrangement
or transparency about that fact (until I brought this up)?

Truly rich companies are more likely just to pay for talented developers or
services than small folks--see also aquihires.

Anyways, enough of the business/philosophy stuff...what are you looking to
gain by using PG as a backing store?

------
_cbdev
I decided to do it just for the fun of it.

[https://github.com/cbdevnet/libgit2-backends/blob/master/pos...](https://github.com/cbdevnet/libgit2-backends/blob/master/postgres/postgres.c)

Caveat Emptor: I did not test this since somehow the Debian package of
libgit2-dev seems not to include the GIT_* constants. It should probably work,
though.

------
userbinator
It probably isn't the most efficient way, but were an ODBC backend written for
it, libgit2 would be able to use any database that has an ODBC driver - which
AFAIK is almost all SQL-based ones.

------
richardwooding
Wow! My C knowledge is dated, but my Postgresql knowledge is good. Would be
fun to attempt if I had time.

------
kungfooman
Didn't know that libgit2 has a MySQL backend, thanks. :D

------
andrewstuart
Performance is an interesting question, I wonder how a Postgres backend would
compare to serving git from file system.

------
ed_blackburn
As an aside is their any crossover or anticapted convergence between libgit2
and git?

~~~
ed_blackburn
Presume from the down votes that's a contentious question? I never knew you
could add different back ends to libgit2, maybe this'd be a useful feature to
cmd line version? I'll answer my own question in anyone else is interested in
the diversity of how git is or can be implemented:
[http://thread.gmane.org/gmane.comp.version-
control.git/20421...](http://thread.gmane.org/gmane.comp.version-
control.git/204210/focus=204270)

------
synthmeat
You know what?

I propose we use "Task HN:" and do these requests more frequently. We'll
surely learn _a lot_ , come to ingenious solutions, and maybe, just maybe,
make our time on HN more productive. I, for one, want to at least see, if not
help, what small but interesting byte-size hurdles others encounter and how
others can solve it in different ways, and all the discussion around it.

Working on something together binds communities even tighter.

~~~
pmorici
Sounds like that would devolve into a situation where people are trying to
exploit others for free work. There are plenty of places on the Internet where
you can go to find people asking others to write software for them for little
or no money. Should this really be one of them?

~~~
philippnagel
Upvoting/downvoting should, at least in theory, prevent that from happening.

~~~
brudgers
The best form of moderation is not encouraging behavior that requires
moderation. Downvotes are a form of community feedback but not really a
moderation tool. For example, in cases where I am in doubt about the benefit
of something to HN, my threshold for downvoting is much larger than my
threshold for upvoting: I am more likely to upvote when "it might be useful"
than to downvote a "might not be useful" because there is a person's karma at
the other end of the arrow.

------
ugexe
It's sort of telling that you refuse to put any effort into showing anyone you
can at least put together a crappy implementation to demonstrate understanding
and contribute back to the same knowledge space (beginner knowledge is
knowledge) you wish to extract from.

The demand for the highest quality code for what is essentially begging is
also not becoming. Even contributing a terrible implementation would tickle
people's motivation button (so they can teach someone without building your
entire project, show they are measurable better than someone, etc)

"Beggars can't be choosers"

~~~
andrewstuart
I don't see why I shouldn't ask if anyone is willing to help.

1: it's open source snd presumably would be committed to the libgit2 project.

2: it's likely to be useful to others

3: it's central, critical code that isn't well suited to beginners. Of course
it should be quality code, it's responsible for storing data.

4: it would strongly benefit from the eye of a Postgres expert, which I am not

5: I do not have money to hire developers

6: I write open source code too so I contribute my time to the public

~~~
scrollaway
Don't let the haters fool you - they're the same guys who told dropbox that
they didn't see the point of their services over sharing usb flash drives.

This is a good idea. Other people could abuse it but this is certainly not
abuse, any more than it would be to just file a feature request as an issue on
a project. Hope it comes through.

~~~
ugexe
This comparison is lost on me. What do the drop box naysayers have to do with
people submitting code requests?

~~~
grandalf
there are just some people who initially hate nearly every good idea

------
CyberDildonics
Is there a reason it couldn't be done with C++11/14 with an extern interface
to C?

Here is a lib to match python's string functions in C++.

[https://code.google.com/p/pystring/](https://code.google.com/p/pystring/)

When working with string and vectors C++11 can actually be pretty straight
forward, productive, and clear with no manual memory management. It definitely
doesn't have as many string functions out of the box as modern scripting
languages though.

