
Git implemented in Rust - adamnemecek
https://github.com/chrisdickinson/git-rs
======
akdas
I love to see people reimplementing existing tools on their own, because I
find that to be a great way to learn more about those tools. I started on a
Git implementation in Rust as well, though I haven't worked on it in a while:
[https://github.com/avik-das/gitters](https://github.com/avik-das/gitters)

~~~
giancarlostoro
Are you basing it off anything in particular? Like outside of just going
through gits own source code?

~~~
kreetx
Just reading source code is a whole different experience than actually
implementing something yourself. Implementing something pretty much forces you
to "be right" in your understanding, while reading can be anything from
"really studying" to skimming.

~~~
tracker1
Completely agree, though sometimes you _HAVE_ to go through existing source
when you do something wrong. I've implemented libraries based off of specs and
white papers, and even then there's some vagueness that doesn't work in
practice when there are holes.

Love to see things like this all the same as they tend to solidify
protocols/specifications. This of course can have both good and bad results.

------
jonty
If you're interested in this, you may enjoy "Building Git" by James Coglan - a
comprehensive book that takes you through reimplementing git in Ruby.

[https://shop.jcoglan.com/building-git/](https://shop.jcoglan.com/building-
git/)

~~~
abhijat
If you have read this book, would you care to comment about its content and
quality?

It is something I am interested in buying but I have not been able to find
reviews of it.

~~~
steveklabnik
I haven’t read it yet, but I’m lucky to call the author a friend. I read all
of his tweets about it while he was building it. I expect it to be of
extremely high quality. I’ve got a lot of respect for his abilities.

The only reason I haven’t read it yet is that I think it deserves a lot of
attention and I haven’t made time yet.

~~~
abhijat
Thanks, I bought the book.

Still undecided as to which language to work in, I wanted to use rust but the
one part I'm unsure about is possibly where tree like data structures will
need to be implemented?

I guess that's tough to do in rust? I will give it a try.

~~~
Twisol
Graphs are harder if you try to use pointers to identify neighbors. In Rust, a
pointer is not merely an index into memory; it is a statement and guarantee
about how that memory will be accessed. These semantics are more than a graph
structure needs.

To "get around" this (but see the next paragraph), we usually associate each
node to a simple identifier such as an integer (usize), and store all nodes
into a Vec. Neighbors are indexed by this identifier rather than a pointer.
(Indirection? Maybe. But I'm pretty sure most processors support a base+offset
mode just as easily as a direct reference mode.)

Honestly, I think this is good practice even outside of Rust. If you're using
pointers, and you want to associate some extra data to a node that isn't part
of its graph structure -- a label, say, or some other structure that you're
modeling the relationships of -- there's no good way to add that data without
modifying the definition of a node. A pointer indexes a _single_ point in
memory. An abstract identifier may index any number of points in memory --
just create another table containing the new information you want to
associate.

~~~
skybrian
There are benefits but they come with downsides: you can have node ids
pointing to deleted nodes, and collecting unused nodes is up to you. If you
want to avoid fragmentation, you need to recycle nodes and possibly node ids
as well. These are things garbage collectors handle.

It's not all that different from a database, though.

~~~
Twisol
> It's not all that different from a database, though.

Indeed! I'm tempted to coin a variant of Greenspun's Tenth Rule: any
sufficiently complicated program contains an ad-hoc, informally specified,
bug-ridden, incomplete implementation of a database engine.

> There are benefits but they come with downsides

Downsides relative to what? The downsides listed are all in common with
traditional pointer-based references, so I would argue that garbage collection
is rather orthogonal to the question of storing indices instead of pointers.
Any allocation comes out of a memory arena of some kind, be it an explicit
vector of slots or the implicitly-defined standard heap. The tools for solving
these problems are the same in all cases.

Certainly, Rust's references avoid all of the problems you listed. Rust
pointers essentially embed the semantics of a garbage collector at compile-
time [0], in the domain where ownership patterns can be strictly verified. But
nodes within a graph are already a poor fit for the ownership model -- the
entity that "owns" a node is really the graph itself, not its neighbors -- so
that's the level of granularity at which Rust's references are useful. You
need something else within the scope of the graph.

EDIT: On reflection, you might be referring to a language like Java which has
an ambient global garbage collector. Indeed, using indices instead of pointers
means you're on your own -- you've allocated the memory arena through the
standard means, but then you take on the responsibility of managing that
memory yourself. This is a fair criticism! Purely in my experience, data
modeled as a loose graph of directly-related objects is a lot more difficult
to understand and maintain than data modeled indirectly using some form of
identifier -- mostly because of the effects I mentioned in my earlier post on
associating new information to an entity.

[0] [https://words.steveklabnik.com/borrow-checking-escape-
analys...](https://words.steveklabnik.com/borrow-checking-escape-analysis-and-
the-generational-hypothesis)

~~~
skybrian
Yes, that's what I meant. Cleaning up unreferenced nodes (should you want to
do that) would require some kind of mini-garbage collection algorithm. And
indeed, git has a gc command to do this, even though reference counting would
work for a DAG. It's doable, but it's not what I'd call simple.

But if you know through some other means the exact time when a node should be
deleted, you can delete it at that time, and anyone following a soft reference
will find that it's no longer there, which may be a way of catching a bug.
This is how both databases and entity component systems work. But it does mean
that resolving a reference can fail, and you have to handle that somehow.

------
ernst_klim
Also a git implementation in pure OCaml, used for irmin git-based kv storage:

[https://github.com/mirage/ocaml-git](https://github.com/mirage/ocaml-git)

[https://github.com/mirage/irmin](https://github.com/mirage/irmin)

~~~
ioquatix
Here is an implementation of a Ruby based kv store build on top of rugged,
which is built on top of libgit2:

[https://github.com/ioquatix/relaxo-model](https://github.com/ioquatix/relaxo-
model)

It's so much fun, but not that practical for scalable websites.

~~~
ernst_klim
>It's so much fun, but not that practical for scalable websites.

Git based kv has a bit different purpose than the regular kv storage. They are
intended for communication between entities, running in parallel, sort of
transactional memory. They are not intended for users' data storage.

~~~
seanmcdirmid
That sounds more like a tuple space than a KV store?

~~~
ernst_klim
Not sure, but the idea is that you could not only read and write, but write in
parallel so the keys are merged according to the merge rule you've provided.

------
shafte
Not really related, but a quick plug for the work that Mercurial is doing to
port substantial portions of its main binary to Rust: [https://www.mercurial-
scm.org/wiki/OxidationPlan](https://www.mercurial-scm.org/wiki/OxidationPlan)

My understanding is that they want to get it fully ported before Python 2 EOL.

------
hannob
I don't see any info about a license.

Strongly recommend using some standard FOSS license before plenty of people
add commits and it gets a big mess clearing up the licensing situation later.

~~~
littlestymaar
> Implementing git in rust for fun and education!

Also, not having a license file isn't a messy situation, that means “this
project is protected under Berne Convention copyright“: the author is the only
one holding every rights on the code and every use that is not explicitly
allowed is a copyright infringement (unless it's fair use).

~~~
Zarel
That sounds nice and all but it's wrong in two major ways.

First: GitHub has a Terms of Service which was somewhat-recently amended to
make this license grant explicit:

[https://help.github.com/en/articles/github-terms-of-
service#...](https://help.github.com/en/articles/github-terms-of-
service#5-license-grant-to-other-users)

"Any User-Generated Content you post publicly, including issues, comments, and
contributions to other Users' repositories, may be viewed by others. By
setting your repositories to be viewed publicly, you agree to allow others to
view and 'fork' your repositories (this means that others may make their own
copies of Content from your repositories in repositories they control)."

(Crucially, it doesn't require an open-source license, though.)

Second: even without that, there's such a thing as an implied license:

[https://en.wikipedia.org/wiki/Implied_license](https://en.wikipedia.org/wiki/Implied_license)

Like, if you write something down on a piece of paper, you can't then sue the
owner of the paper for copyright infringement.

Similarly, if you upload code to GitHub, and tell it to share your code, you
can't then sue them for sharing your code, ToS or no ToS.

~~~
masklinn
Pretty much all the TOS says is there's an implicit reproduction license
(other users can see & fork the work) and possibly broadcast (the fork itself
has the visibility of the original). Not adaptation, not use, not
exploitation, …

And that license grant is solely through github as a service, it's unclear
that a local clone is even permitted.

~~~
Zarel
That's... somewhat true. My main objection is to "the author is the only one
holding every rights on the code and every use that is not explicitly allowed
is a copyright infringement (unless it's fair use)".

The ToS doesn't say there's an implicit reproduction license, though; it says
there's an explicit reproduction license.

The other licenses can still be argued to be implicit. For instance, you have
a decent argument that local clones are an implicit license – GitHub provides
a "Clone or download" button directly on the repo page, and it's one of the
main use cases of GitHub. (Other arguments exist.)

~~~
littlestymaar
> The ToS doesn't say there's an implicit reproduction license, though; it
> says there's an explicit reproduction license.

Then it's totally excluded from my claim which aims «every use that is not
explicitly allowed». :)

------
_bxg1
Rust seems like a good language for git given its performance and memory-
safety, no?

~~~
aidenn0
Actually any high performance GC'd languages would be fine too because latency
is a non-issue for long running git operations (you won't notice if your git
clone pauses for 100ms, whereas you will notice if your UI does). Throughput
of malloc() and GCd languages tends to be similar when latency isn't a
concern.

~~~
_bxg1
I was speaking more to stability. Rust is designed to be an incredibly safe
language without sacrificing any performance; that seems like a good match for
a version-control system.

~~~
aidenn0
Most GC'd languages are also memory safe.

~~~
iknowstuff
FYI, Rust's safety goes beyond that. Its ownership model keeps you safe from
data races, unlike, to my knowledge, GC languages.

~~~
aidenn0
IIRC Rust's safety is provided by affine types; all languages with affine or
linear types can provide the same guarantees Clean and Mercury both come to
mind of the top of my head (IIRC Clean had "Concurrent" in its name at one
point), and I think there are both Haskell and F# variants with either affine
or linear types.

In addition there are many other solutions to safe parallelism and/or
concurrency, some of which don't require a type system at all; Erlang is
famous for safe concurrency and is untyped.

Lastly, there's good old fashioned multiprocessing which can be safe just by
not sharing memory.

There is no one feature that is new in Rust, but it has a relatively unique
set of features in the non-GC language world; ATS is the only one coming to
mind, though I'm sure there are some other niche ones.

I love this combination in rust because latency sensitive operations in GCd
languages are notoriously hard to achieve. Lisp was able to be an operating
system because nobody needed to run quake at 100fps on a lisp machine. With GC
you can pick latency or throughput but can't reliably get both without coding
around the GC.

This does mean for me that when considering things that rust is particularly
good at, latency sensitive applications stand out; this is not to say it's bad
at non-latency sensitive applications, just that one has a lot more choices
when latency is a non-issue.

------
d33
I like the changelog. Looks like a zero-effort way to publish something other
people could make use of.

~~~
chx
No, currently they can't as there is no open source license.

~~~
cyborgx7
I used it anyway. Sue me.

------
mises
Can somebody tell me what is behind the recent rewrite-all-the-things-in-rust
craze? I get it can have some benefits in terms of security, but it seems
rewriting so many things in it just for the sake of it is a bit excessive.

I understand some of these are very likely for educational purposes (like this
one and others; it's good for getting more familiar with the language), but it
still seems to be a bit of a strange trend (especially since people who don't
need to learn are doing it, seemingly just because "yay rust").

~~~
jchw
I scrolled down for this comment.

Actually there's a lot of great reasons to rewrite everything in every
language. Git is an especially good piece of software to implement everywhere
because it's relatively stable and it's pretty useful.

As for _actual_ reasons, one good example is so you can keep your dependencies
in the language, using the language package manager. For Go nobody even
questions that this is worth it; it enables painless cross compiling and
completely static, libc-free binaries. For Rust that may not be a thing, but
you do at least get the benefits that you could integrate Git functionality
without having to hack around in porcelain.

This one here is a learning experience by it's own description, but I would
suggest people stop complaining about "rewriting everything" in $LANGUAGE. The
opposite complaint is often cited as a reason _why to not_ use the language
(that, for example, basic programs haven't already been ported.) If we did
build an alternate world with feature parity, unit testing, optimizations, in
a memory safe language, I doubt many people would be complaining about the
strange trend of rewriting things anymore.

~~~
mises
That's a good point; you're right that it's helpful to be able to interface
with such a ubiquitous program natively in a language of choice. I did see a
mention on the page of deploying as a crate and using in another program; that
seems very convenient.

> stop complaining

Not a complaint; more a question as to why there's a specific move around
rust. I appreciate the reply; that's exactly the kind of response I was
looking for.

~~~
jchw
Fair enough. There's definitely a lot of folks who do get upset over this
stuff, maybe as pushback to the new language hype.

------
ryanolsonx
I started one in PHP (because why not). It's pretty fun!

------
chris_mc
I would like to see some sources like this that are language agnostic that
give you the tools needed to implement your own popular tool. For example,
where could I look to find a written description of the way git works from the
ground up? Kind of like a "guide to implementing X" type of thing, but without
code.

------
SilasX
What's with all those posts about "X implemented in Rust" today?

------
sneakernets
I'm still waiting on a C64 port.

------
qualsiasi
Am I the only one who thinks of this article after reading the whole thread? >
[https://overreacted.io/name-it-and-they-will-
come/](https://overreacted.io/name-it-and-they-will-come/)

HN discussion:
[https://news.ycombinator.com/item?id=19485609](https://news.ycombinator.com/item?id=19485609)

------
OpenBSD-supreme
>rust this

>rust that

Why re-invent the wheel? Ada is superior. :^)

------
k0t0n0
awsm I was really interested in how could one go about implementing git like
system. nice work OP

