Hacker News new | past | comments | ask | show | jobs | submit login
Announcing Building Git (jcoglan.com)
291 points by griffinmb 9 days ago | hide | past | web | favorite | 52 comments

"There’s a huge section of the tech ecosystem that’s constantly told they’re not smart enough to be here and that their work doesn’t matter. I spent a decade hearing C was beyond mere mortals, that you must be a genius to go anyway near low-level code, or algorithms, or distributed systems. The inventor of Git is notorious for pushing this narrative! But the truth is, anyone with enough brains and patience to learn how to do any kind of computing is “smart enough” to learn things like this."

As someone who spent the first decade or so as an embedded software developer, I wanted to move into web based systems because its a "bigger world". I'm a couple of jobs into this bigger world now, and it just keeps getting bigger. I feel the need to keep my finger on the pulse of new tools and technologies, but I try to hold that in tension with the advice we get here to keep things simple, be pragmatic, don't adopt technologies or techniques just because google does. I've always fancied myself as a generalist, but this bigger world makes me feel a bit like Bilbo Baggins, "like butter spread over too much bread."

My point: I've been in both worlds and I feel more at sea working on internet based systems. I had more mastery over the embedded systems I worked on, than over the web based systems I work on now. So my experience is in support of the author's point that ...

"anyone with enough brains and patience to learn how to do any kind of computing is “smart enough” to learn things like this."

Moreover, I would suggest that the average full-stack web developer is probably over-qualified.

The thing with web-development is that you don’t actually have to follow the trends to be productive.

A few years ago I read an article on LinkedIn, about how a local development company he helped our neighbouring municipality switch from ASP Web-Forms to the first version of AngularJS and Web-API because it was a more modern tech that would make them future proof, especially in regards of hiring.

I laughed at it at the time, and sure enough, they’re in trouble now because AngularJS didn’t actually stick around for long enough for anyone to really learn it. Where as the newest versions of ASP .Net Core are actually being build in a format that is extremely similar to ASP web-forms.

I’d like to say that I don’t think you should actually use ASP web-forms if you can avoid it, because it is dated, but it still runs half the internet in my region of the world and it’ll probably continue to outlive a lot of trendy techs for the foreseeable future.

Our goal is that if something changes radically every few years, then it’s not really worth adopting to our generalist stack, because we can’t afford to waste resources re-learning React or Angular or X every two years.

Since it seems that he used Ruby throughout the book, I wonder if that's the right choice or it will limit the book's audience.

I'm contemplating writing a long "from scratch" style book on another topic and I'm still not sure which language I should use. Using more than one (Java, Python,...) and generating multiple flavours of the book could be a good idea I think.

According to the author, readers of the book are already translating the port of Git in the book to C++, Clojure, Elixir, Haskell, Java, Node, Rust, and Swift


> Using more than one (Java, Python,...) and generating multiple flavours of the book could be a good idea I think.

I used to think so too, but tbh I think this will result in either a lot more work than you would like, or your audience are going to end up with one language that is being used ideomatically and the rest of the languages are not being used ideomatically.

Yeah that's pretty much the case with Modern Compiler Implementation in {ML, Java, C} by Appel. The Java and C ones are basically transliterated ML and not everyone appreciates that.

On the one hand, people do buy the Java and C books. On the other hand, I think they eventually realize it's probably easier to learn an ML dialect and learn the "real" code.

This may or may not apply to the git book though, since doing it in Ruby is already not "idiomatic".

I don't really think the choice of language matters that much.

I don't know Ruby at all but I have read through part of the book and can follow the examples pretty easily. Plus, I think I'll get more out of the book if I just use his examples as guidelines and have to do some mental work to translate them into the language I'm using.

I've been working through this book over the past few days, and writing my own git implementation in Rust in parallel, and I've found Ruby to be much simpler from an implementation standpoint. Ruby's standard library provides a lot of useful functionality out of the box (SHA1 digests, the deflate algorithm) which I've had to pull in dependencies for in my Rust version.

Obviously the Rust version is significantly faster, but the Ruby version is higher level and better expresses the concepts the book is trying to demonstrate without getting too bogged down in syntax, types, etc.

The author discusses this in episode 13 (2019-03-24) of the podcast _The Yak Shave_ at 22:37: http://yakshave.fm/13

From what I understand, Ruby is highly accessible and has a standard library that covers what's used in the book.

As a rubyist, I wish more books were written in it. I suspect it's one of the easier languages to read as pseudo-code, even with less familiarity.

Just write it in pseudocode.

Real programming languages introduce too much accidental complexity (memory management, type declarations, etc.) that usually have nothing to do with the subject matter. (Unless you’re writing a book about that particular language of course)

It will also look bad if you happen to choose a language/framework that will be dead 10 years from now, even if the core concepts of your book would stay relevant.

> too much accidental complexity (memory management, type declarations, etc.)

I know those were just examples, but that might hint at why ruby is a fine choice here. It has neither explicit memory management nor type declarations. It's pretty close to pseudocode already.

By writing in pseudocode you pretty much guarantee there will be bugs in the implementation because they've never been executed.

If you can read Python you can almost read Ruby code

Ruby has blocks. Python does not. And Ruby blocks are very extensively used.

blocks are the same as passing a function or a lambda in python:

    [1, 2, 3].map { |i| i + 1 }

    [i + 1 for i in [1, 2, 3]]

    def plus(i):
      return i + 1

    map(plus, [1, 2, 3])

    map(lambda i: i + 1, [1, 2, 3])

Far from it. Ruby blocks aren't really lambdas or anonymous functions. For example when you use return inside a block, it's not just the code in the block is being returned, but the whole enclosing function. And, you don't really "call" a block but yield to it. It's not even an object that you can inspect. It's syntax. It's really a new kind of control structure. Ruby blocks have few equivalents in other languages.

   def f(&block)

   f { } # => Proc
   f(&-> { }) # => Proc
   f(&lambda { }) # => Proc
   f(&:something) # => Proc

   lambda { }.class # => Proc (I think you get the idea)

   method(:f).class # => Method

   lambda { |i| i + 1 }.call(10) # => 11
you don't typically return in a block, there is implicit return. you would do "next" or "break" depending on what you want to achieve. as far as I know that's the main difference between procs and methods.

(not that anybody would do what I do, most people just use blocks, some people use yield, few people use the &block argument, but that's exactly the same)

Sound a lot like inlined lambdas in Kotlin. Lambdas can be passed with the same block-like syntax, and when they're marked as "inline" you can return from the enclosing function from within the lambda (because the lambda will have its body inlined by the compiler).

This likely will be just another book on the long list of books I've owned but don't get through, but hard to ignore a 700-page book with a topic both esoteric and pragmatic (I use git daily but have very little knowledge of it beyond what commands to Google). And one in Ruby -- a language I almost never use today, or really see, but definitely grew up with.

Yeah I'm interested in this topic too, but there's very little chance that I will go through a 700 page e-book.

I'm probably old-fashioned, but it seems like too much "screen time".

I have about five 500- to 1000-page dead-tree compiler books, and many (most) of them have sat around unread for months or even a few years -- at first.

But over time I've managed to get through significant portions of them. That's probably because they are laying around near my couch. The sight of them reminds me why I bought them :) I find the paper books more ergonomic to skim.

In contrast, I feel like an e-book would just get lost on my iPad. This feedback might not be actionable but I thought I'd share...

The desire to buy more books than you can ever read is so universal, there's a Japanese word for it: Tsundoku

I wonder what the Japanese is for endless scrolling of a Kindle directory?


The free chapter "Reshaping history" looks like a reprint of the publicly available information about rebasing from the Git Book. It doesn't really showcase the main book premise of teaching fundamental data structures and engineering concepts used in Git.

if you go through the files in the repo they are very easy to read (I would recommend starting with the test/ folder): https://github.com/jcoglan/jit

obviously the lib folder requires a lot of git-specific knowledge but I didn't see any use of features that don't have exact or very similar equivalent in more popular programming languages.

I think the most "complex" might be things like Struct.new(*Inputs::ATTRS) where it's creating a new anonymous class from a list of attributes which becomes the class instance attributes (nothing crazy :) )

There's a post on HN of Linus's first version of git (that I can't find). It's short.

There's another reason git is a great choice as a large project to build: you can get something that works very quickly, and then just add to it. Because that is how git was developed. It's like Gall's Law, but evolving only additively. And although Linus was inspired by BitKeeper, it was stripped down to its nakednes, til no longer anything to take away (like Exupery).

Also why git's UI design is so terrible.

St.-Exupéry’s quotation is about a plane that is perfect and streamlined, so precisely suited to its purpose that it feels like an extension of the pilot’s body.

That is not at all the same thing as a hacky prototype just starting out on its journey towards accreting random code detritus.

You're right that git isn't an extension of anyone's body - but few engineering projects, where this quote is often applied, are.

But you're wrong to describe his prototype code as "hacky" - especially when I'm referring to the design - which really is pared down to the essentials. Even "streamlined" and "precisely suited to its purpose". It's also not "just starting out", in the sense that the design was informed by BitKeeper, just much simpler.

When you get your data structures right, the code is obvious (Brooks).

You're right to say "accreting". I'm not sure about "random", but you're wrong to call the code "detritus". The engineering on git is really top-notch, extremely performant, but only where needed; and with the most helpful error messages I've ever seen. It has many great aside features, like git diff --color-words. I'm not an git expert; I expect there's much more.

PS you're also right it's Saint-Exupéry, not Exupery.

Maybe the post was Fabien Sanglard's Git code review:


Thanks for checking, but that's not it. It was posted on HN.

I found Git's initial commit (https://news.ycombinator.com/item?id=8650483) which is a repost, linking to https://github.com/git/git/tree/e83c5163316f89bfbde7d9ab23ca...

But it's more complex than I recall, and is dated Apr 8, 2005; wheteas Linus thinks he did the very first version Apr 3. https://www.spinics.net/lists/git/msg24141.html

This book looks really awesome and I want to follow along with it in Java. I don't have the money to buy the book currently being a university student. In the style of r/RandomActsOfPizza, would it be possible for someone to do a RandomActsofBook and buy this book for me? I could pay the money back to them in a couple of months or pay it forward...whichever they would prefer.

I can get a copy for you. Let me know what email to send you the ebook. I only ask for 2 things. 1) try to complete as much of the book as you can (it’s 700 pages!), and 2) pay it forward when you have the means. Cheers!

Thank you so much. I am about to start working in july so will be sure to pay it forward once I get a little financially stable. I have some time to kill before that in which I plan to dive into this book. My email is: f"rishi25m{chr(64)}gmail{chr(46)}com" (This is the python3 fstring)

Sent you the download link. Let me know if you have any problems with downloading.

Hey can I buy the book through you? My card gets rejected, but I know my card is fine, it's the payment service issue. Yesterday I sent an email and a tweet[0] to the author, but he didn't reply. Basically, I want to pay you £36, you send me the pdf and buy it again. If you want to help me out, then send me an email. My address is in the profile.

[0]: https://twitter.com/pau1riddle/status/1116205409780600833

Sent you an email to make sure I got the right address. Please reply when you get a chance.

The book's code is implemented in Ruby, and the source is here, according to the author, around 6000 lines (haven't checked myself).


If you'd really like to make a Java equivalent of any specific algorithm you can start even without a book.

I guess that anybody not versed in Ruby would first have to understand a lot of Ruby details first. The author mentions other efforts relatively similar to his, in other languages, like http://gitlet.maryrosecook.com/docs/gitlet.html which is node JavaScript.

I know the author has very graciously provided the source code. I would just like to read through the book to know what kind of reasoning exists behind each of the parts in the author's own words.

This looks extremely interesting. It's also one of those things I'd love to read as a physical book, but I can't seem to find that option.

Oh well, e-readers are okay too I suppose.

Hopefully this being on the front page will bump up that 320 books sold number a bit!

Looks very interesting.

Was it definitely 320 when you wrote this comment? It says 350 now. Hopefully because people bought it.

Yup, it's 380 now, so it's either a interactive counter or the author just keeps updating it.

the linked repository is amazing, I don't think I ever saw such nice code and such neat organization: https://github.com/jcoglan/jit (everything uses standard ruby libraries or default gems like minitest)

I'm trying it out and doesn't work very well on macOS (git add has issue with the lockfile, but all tests pass).

I might use it as it's a very easily customizable implementation of git.

Sounds pretty cool, a 708 page book detailing how to build Git from scratch in all its glory. I just wish I had time to go over the entire book.

This looks really nice, both because it covers Git in a new way, and that it approaches solving a large problem with an eye towards explaining how to do it. It really reminds me of Donald Knuth's work on literate programming, in both senses.

I have been working through this book in rust, and I have to say, the book is a real joy. Extremely clear writing, very good pacing, and excellent explanations of technical topics.

If you're at all interested you should buy this.

There is an article showing how to write git using python called "Write yourself a Git" [0] (HN post here: [1]).

This one shows the fundamentals of git in far less than 700 pages. In fact you can go through it and understand git in an hour or two. For me this seems much more worthwhile than seeing in depth how all the git porcelain commands are done.

[0] https://wyag.thb.lt/ [1] https://news.ycombinator.com/item?id=19386141

Git written in Ruby, it's a no from me.

My understanding is that Github implemented git in ruby to make their rails site work better.

github uses libgit2 https://github.com/libgit2/libgit2 (in C) but I'm pretty sure they have a lot of wrappers around it (maybe a ruby gem with native extensions, maybe using ffi)

edit: ruby gem: https://github.com/libgit2/rugged which has a native extension written in C to bind libgit2 functions to ruby methods.

They almost surely did not implement Git in Ruby, but merely wrote bindings to it. I used to remember reading through their Ruby source code for this (Edit: found it, it's called Rugged). But even then, you need a more efficient API layer for this binding to be more practical for programmers, hence the libgit2.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact