Gitlet.js – Git implemented in 1k lines of JavaScript (2015)

baby · on April 13, 2021

So, I'm going to put that out there, but I really wish I could read security-sensitive code like this. TLS implementations, cryptographic implementations, etc. It would be really amazing if there was a way for Github to automatically display code like this by extracting comments out.

In general I think it's an amazing way of learning, I remember reading the gobyexample.com website and writing the same thing for go assembly: https://davidwong.fr/goasm/add

dale_glass · on April 13, 2021

Comments are good, this particular instance, not so much.

This description spends a lot of time stating the blindingly obvious, eg:

    if (addedFiles.length === 0) {
      // Abort if no files matched path.
      throw new Error(files.pathFromRepoRoot(path) + " did not match any files");

Or:

    // If --bare was passed, write to the Git config indicating that the repository is bare.
    // If --bare was not passed, write to the Git config saying the repository is not bare.
    config: config.objToStr({ core: { "": { bare: opts.bare === true }}}),

This really adds no information if the names are clear and you can read JS. The important thing isn't what is happening in the code, because you already can see that in the code. It's WHY. For instance, it'd be far more useful to explain what is a 'bare repository', what are the implications of it, and why are we keeping track of that data, than to say "If it's bare, we write in the file that it's bare".

I feel that explaining TLS and similar in this manner would be completely unhelpful. Check out this document, for instance, specifically the "Algorithm for crypt" part:

https://www.akkadia.org/drepper/SHA-crypt.txt

Very well explained. But try and tell me what's the purpose and why those specific steps, which look increasingly bizarre as you delve in, and whether deviating from any given step would be acceptable or not (ignoring matters of compatibility)

baby · on April 14, 2021

You don’t have to explain TLS in this manner, but you could comment the relevant specification paragraphs next to the code to facilitate audits.

pmiller2 · on April 13, 2021

As @chungy says in a sibling comment, this is what's known as a literate program. It never really took off, and I suspect it's partially because doing it well means you'll be writing a lot more text than you'd typically see in a program's comments.

See http://www.literateprogramming.com/ for more information and examples.

jacobolus · on April 13, 2021

The particular method of turning javascript comments into a "literate" source view, used in the link under discussion here, comes from Jeremy Ashkenas circa 2009, e.g.:

https://underscorejs.org/docs/underscore-esm.html

https://coffeescript.org/annotated-source/lexer.html

https://backbonejs.org/docs/backbone.html

bollu · on April 13, 2021

What is the tool that builds this literate source view from source code?

pmiller2 · on April 13, 2021

My guess is it's Docco: http://ashkenas.com/docco/

Varriount · on April 13, 2021

Huh. I actually write my code in this style. Perhaps not quite as detailed, but with the same idea in mind.

I wish code like this was more common. Over and over, I've heard fellow programmers reply with something along the lines of "well it should be obvious what it does! just look at the name and implementation - all you need is right there", when it was suggested that they add documentation to their implementations. What never seems to quite get across fully is that, yes, the code is all there, and it may even be clean and fairly readable, but the inherent complexity is such that only programmers experienced in subject X will ever have a chance of actually understanding it.

"So, what? If it's that complex, then maybe only programmers who are that experienced should attempt to read and/or contribute to it"* is another sentiment I've seen expressed. Of course, this misses the logic that, in order for such programmers to exist, they have to learn from somewhere, and start from something.

And how will we ever have programmers experienced in that subject

krageon · on April 13, 2021

Yes, they have to start somewhere. That somewhere is reading and working with the code they want to read slowly.

kevin_thibedeau · on April 13, 2021

Try and read the source to Metafont. It's a scattered morass with blocks of code broken out from it's point of usage for expository purposes.

colejohnson66 · on April 13, 2021

Or the OG TeX for that matter. Knuth writes his code in WEB (his own literate programming language based on Pascal)

baby · on April 13, 2021

I think that this is something else, it's a simple difference but having the code on the size changes everything for me.

platinumrad · on April 13, 2021

> I think that this is something else, it's a simple difference but having the code on the size changes everything for me.

I assume you mean "on the side" and I agree, it's a game-changer. Normally I find literate programs to be tedious to read but simply changing the layout makes a huge difference.

chungy · on April 13, 2021

It's a very old idea. Donald Knuth implemented it with both Pascal and C to promote an idea of "literate programming" -- you'd literally just write prose sprinkled with little bits of programming. On compilation, you can choose to compile either the documentation or program.

truthr · on April 13, 2021

The deep learning library fastai uses this. Their source code is also their documentation. They use something called "nbdev" to accomplish this. And I have also seen other projects both related and unrelated to deep learning adopt it. This idea stems from Knuth.

someoldguy · on April 13, 2021

For an example, the PBRT book is great.

WalterGR · on April 13, 2021

PBRT is Physically Based Rendering: From Theory To Implementation by Matt Pharr, Wenzel Jakob, Greg Humphreys.

https://www.pbrt.org

https://www.pbr-book.org

https://www.amazon.com/Physically-Based-Rendering-Theory-Imp...

jonas21 · on April 13, 2021

> This book has deservedly won an Academy Award. I believe it should also be nominated for a Pulitzer Prize — Donald Knuth

It's hard to beat that book review.

tlarkworthy · on April 13, 2021

Here is some JWT minting and parsing you can run in the browser https://observablehq.com/@tomlarkworthy/firebase-admin ported from MIT licensed firebase-admin Github repo.

Oauth 2.0 clients https://observablehq.com/@tomlarkworthy/oauth-examples

This is a WIP but its a full on Identity Provider implemented the browser it a literate programming env! https://observablehq.com/@endpointservices/auth

Mistri · on April 13, 2021

We have a project at UC Berkeley for CS 61B (undergrad data structures) called "Gitlet", where we need to make a git implementation in Java. Was super fun!

If you're curious: https://inst.eecs.berkeley.edu/~cs61b/sp20/materials/proj/pr...

gnulinux · on April 13, 2021

Ah, Hilfingr memories.

Mistri · on April 13, 2021

Haha yep, he's still teaching. My semester of instruction from Prof Hillfinger was cut in half thanks to the pandemic though :(

enricozb · on April 13, 2021

There is also Shit, a git implementation in an almost POSIX shell [0]. Not my software.

[0]: https://git.sr.ht/~sircmpwn/shit

baby · on April 13, 2021

every time I look at these custom git repository I'm confused as to how I can read the code. For inept people like me it's here: https://git.sr.ht/~sircmpwn/shit/tree

colejohnson66 · on April 13, 2021

The term “tree” is pretty common in source control systems. So, for me, I just look for a link to the “tree”, which you managed to find.

Just putting it out there for future readers.

dang · on April 13, 2021

Discussed at the time:

Gitlet: Git implemented in JavaScript - https://news.ycombinator.com/item?id=8931984 - Jan 2015 (66 comments)

Also: Git implementation in 1k lines of Node.js - https://news.ycombinator.com/item?id=16453979 - Feb 2018 (2 comments)

primitivesuave · on April 13, 2021

This is so cool, and also my first time seeing this brilliant format for long-form explanations of code. Is this a thing out there, and does anyone have any pointers on libraries that do this?

NobodyNada · on April 13, 2021

It’s called literate programming, invented by Donald Knuth: https://en.m.wikipedia.org/wiki/Literate_programming

fs_tab · on April 13, 2021

Backbone.js comes to mind: https://backbonejs.org/docs/backbone.html.

Edit: The author of backbone.js has a tool that generates documentation in this style. http://ashkenas.com/docco/

momothereal · on April 13, 2021

In the 'real' Git, one of the functions I use frequently is `git checkout -- .`, which discards any unstaged changes. Looking at Gitlet's implementation, it wouldn't handle that well as `checkout` can only take a ref.

So I guess I'm wondering what the real `git checkout -- .` actually does behind the scenes?

ufo · on April 13, 2021

FWIW, since Git 2.23 I prefer using "git switch" and "git restore" instead of "git checkout".

40four · on April 13, 2021

This is correct, ‘git restore’ is the way to do this in the current version of git. ‘checkout —-‘ still works, but the ‘help’ text in ‘git status’ now recommends using ‘restore’.

Not only is it more semantic, it reduces the complexity of ‘git checkout’ which is a common complaint.

Really, It doesn’t make much sense to tack on some random ’--‘ flag on ‘checkout’ for that functionality. ‘restore’ is clear and describes what it does.

JimDabell · on April 13, 2021

> ‘git restore’ is the way to do this in the current version of git.

No, that’s incorrect. `git checkout` is the way to do this in the current version of Git.

The current version of Git allows you to use `git restore`, but the documentation says this about it:

> THIS COMMAND IS EXPERIMENTAL. THE BEHAVIOR MAY CHANGE.

Until that warning goes away in a future version of Git, `git checkout` is the way to do this.

40four · on April 13, 2021

Fair point, I had honestly never read the docs page on restore.

But, I still have to disagree. They may not have landed on the final shape of the API, but they are clearly encouraging people to use restore for this purpose.

There is no ‘warning’ message in the CLI. In fact the opposite, the help text in every ‘git status’ message explicitly states to use ‘restore’ for the case of resetting the changes of a single file, where it used to say use ‘checkout —-‘.

Yeah I’m used to using checkout —- like the rest of us, and sure we can all still do it if we want. But I would never recommend ‘checkout —- filename‘ to anyone in 2021. It doesn’t make any sense that that would be how you do that. It’s confusing trying to explain it to someone.

“Yeah I know you usually use this to completely change branches, or go to a particular commit hash. But if you use this weird ‘—-‘ flag & a file name, it does a mini ‘reset’ on that one file”

Why would you use some random flag on a command (checkout) that is typically used for a totally different use case? If anything, this should be a flag on ‘reset’.

I’m not a git ’complexity hater’, but it’s a perfect example of why a lot of people complain about git.

JimDabell · on April 13, 2021

> the help text in every ‘git status’ message explicitly states to use ‘restore’

Good point. This makes no sense. There should either be a warning in the documentation and `git status` should not recommend it, or there should not be a warning in the documentation and `git status` should recommend it. It’s either ready for use or not, it shouldn’t be inconsistent about it.

I agree it’s a positive change, but the current state is ambiguous when it shouldn’t be.

You don’t need the “weird -- flag” in most cases, by the way, it’s not Git to blame for that, and it’s the same for both checkout and restore. The `--` is the standard POSIX way to distinguish between options and filenames. If there’s no ambiguity (e.g. `git checkout .`) then you can skip it. You only need it if the pathspec could be interpreted as an option.

40four · on April 13, 2021

Thanks for the explanation, I did not realize that about —-, good to know!

masklinn · on April 13, 2021

> Not only is it more semantic, it reduces the complexity of ‘git checkout’ which is a common complaint.

It doesn’t since the feature can’t be removed from git checkout for BC reasons.

mlyle · on April 13, 2021

It does, because it sounded like he was concerned with the complexity of explaining the broadness of uses for git checkout.

Now we can treat git restore like it is a separate thing.

40four · on April 13, 2021

This is a better way to say it and what a I meant thanks.

rocketbop · on April 13, 2021

I don't see why it couldn't be removed from a future release after a period of deprecation.

masklinn · on April 13, 2021

Because there’s a million aliases out there which alias « git checkout —- . » to something sensible and there is no real justification for breaking them: checkout can trivially delegate to restore.

nyanpasu64 · on April 13, 2021

I have a Git alias for `rs = restore -SWs`. `git rs commit :` more-or-less mimics what `git checkout branch :` did. (BTW, : means "repository root, regardless of current .")

svnpenn · on April 13, 2021

My problem with restore, is it doesn't print anything. Checkout tells you how many files were restored

40four · on April 13, 2021

Ok, I re-read your command, and realized you’re doing ‘checkout’ minus minus period (to be explicit).

Honestly it had never even occurred to me to use the period there, but it does work :) I always used it for single files.

As we said in other comments, that’s why they introduced ‘restore’.

But, if you’re looking to reset all changes, in every file back to the state of the HEAD, then I believe ’git reset —hard’ would be the appropriate tool!

ptbrowne · on April 13, 2021

The ref would be HEAD (the last commit) here.

yewenjie · on April 13, 2021

There is also isomorphic-git.

https://github.com/isomorphic-git/isomorphic-git

theknarf · on April 14, 2021

I also recommend js-git (https://github.com/creationix/js-git) if you need more low level control.

nikital · on April 13, 2021

Great presentation! I created something similar — a step-by-step Git implementation that is designed for teaching Git internals:

https://www.leshenko.net/p/ugit/

leokennis · on April 13, 2021

Reading stuff like this makes me lose all hope to ever really understand (and being able to tame) git:

> Under the covers, the pull command runs git merge FETCH_HEAD. This reads FETCH_HEAD, which shows that the master branch on the beta repository was the most recently fetched branch. It gets the commit object that alpha’s record of beta’s master is pointing at. This is the second commit. The master branch on alpha is pointing at the first commit, which is the ancestor of the second commit.

meetups323 · on April 13, 2021

The format is nice, but I can't help but feel it'd be much easier for a human to parse as just normal TS with docstrings for flavor. For some of the more advanced functions, it's all but impossible to actually figure out what the input and return types are -- one must parse the english to find the references to "(see the module description for the format)", then try to figure out what that means, then go there, then repeat, until primitives are reached. Compare to TS where the return type is either right there in the code, or I can at least hover over the function to see a preview or Cmd+Click to get to the type definition itself.

Not to mention the difficulty of manually keeping the english type descriptions in concert with the actual code -- some things computers are much better at than humans, verifying types is one of them.

All that said, the flavor text is well written and I'm glad the author took the time to include it!

rognjen · on April 14, 2021

To people who're feeling like they are not able to understand or write something like this:

While the author is definitely skilled, please know that you're looking at a finished project that took what looks like ~10 months of work.

Do not compare what you know now with what the author has accomplished. They likely learnt a lot along the way and you would as well.

If you're really keen to understand it start with the first commits: https://github.com/maryrosecook/gitlet/commits/master?after=...

The first two (meaningful) commits simply create the .git directory and the tests for it. You can almost certainly understand that.

de_keyboard · on April 13, 2021

I'd strongly suggest looking at https://github.com/isomorphic-git/isomorphic-git

It's a newer Git implementation but it has loads of features and it's easier to read than most. An excellent project.

atum47 · on April 13, 2021

I once had to do some git stuff programmatic, back then I was working in python. I found out about this library called porcelain (I think it wasn't being maintained anymore). I end up having to go through the whole code so I can get some things done. Do not recommend. It is indeed a cool and helpful thing that you did. Congratulations.

billconan · on April 13, 2021

this is cool! but I'm curious to see a git blame implementation.

phist_mcgee · on April 13, 2021

[flagged]

dang · on April 13, 2021

Please don't take HN threads on flamewar tangents. That's what we're trying to avoid here.

https://news.ycombinator.com/newsguidelines.html

NobodyNada · on April 13, 2021

`blame` is one of Git’s most useful features for me, because if I’m confused about a piece of code I can instantly check to see when that code was added and why. Does it fix a bug; if so what’s the bug tracker ID and how was it resolved? Why did the programmer chose to do things this way instead of some other way that seems more obvious? How did this change fit into the rest of the codebase, and were any other files or functions changed to match?

(Of course, the utility of this depends on how well people write commit messages. There’s nothing more annoying than looking at a blame and seeing “update somefile.java.”)

If your organization is using blame to stir up conflict and point fingers, that sounds like a human problem rather than a technical problem, and a problem that you would still have even if `git blame` never existed.

SamBam · on April 13, 2021

Something tells me I wouldn't want to work in whatever team you work with.

git blame is extremely useful for knowing when something was added, and what the context was. It is not about "blaming" people for mistakes.

adkadskhj · on April 13, 2021

Yea, it strikes me as a poor name. Cheeky perhaps. I once had someone take me aside and ask me if it was rude to use "blame" in github to figure out who touched a specific line. They thought the name implied some sort of finger pointing and fault.

masklinn · on April 13, 2021

> Yea, it strikes me as a poor name.

Other VCS have aliases like « annotate » for the operation.

asxd · on April 13, 2021

I mean, it just tells you the commit that last changed a line...

billconan · on April 13, 2021

it's not that simple. git is a snapshot, i.e. it saves an entire file. How to recover changed lines from it efficiently? And How to do it for the entire history?

Basically, I want to understand this code

https://github.com/git/git/blob/master/blame.c

asxd · on April 13, 2021

I was replying to another comment that has since been deleted. It said something roughly along the lines of git blame being a contentious and passive aggressive command.

I totally agree the technical implementation would be cool to study :)

Stratoscope · on April 13, 2021

If you ever want to see "deleted" (flagged/dead) comments like that one, turn on "showdead" in your profile.

asxd · on April 13, 2021

Ah I see it now, thanks for the heads up!

adkadskhj · on April 13, 2021

I'm so confused on what you're saying. Are you saying it _isn't_ what commit last occurred on the given line?

How would you explain this UI? https://github.com/git/git/blame/master/blame.c

Ie, what does blame mean to you?

quickthrower2 · on April 13, 2021

Nah, that award is for team time logging tools. Blame is normally used as a "who to ask" tool. Often the answer it gives you is "who last refactored this file" or "who originally moved this file from another location and lost the link to the prehistoric era when the interesting stuff happened"

sdesol · on April 13, 2021

My tool is designed to obtain insights from Git history and I've never understood how people can take such offense to git blame. I do agree calling it "blame" was probably not the best thing to do, but the fact is, it serves a purpose, which is, it's designed to help you understand how a piece of code came about.

For you to call this feature "passive aggressive" must have been borne out of some traumatizing experience and for that, you have my sympathy.

lostcolony · on April 13, 2021

Another word for blame is 'responsibility'.

In a political organization, it's good to take responsibility for something, but bad to blame. A generous interpretation of that is that the person/group who made the decision should raise their own hand, and that it's a faux pas for others to point them out.

In a healthy one, you can recognize it's the same thing, and the important thing is how you respond to it (i.e., it's important to understand the responsible party so you can understand the reasons/motivations for the action, and accept them or address them).

29athrowaway · on April 13, 2021

git blame is mostly used to answer the question "when did this change and how?"

Being able to answer that question is key for maintainers.

hu3 · on April 13, 2021

I'm sorry if you had bad experiences with git blame usage. People can be assholes sometimes.

It's not meant to be a finger pointing tool but can be used for that.

sodapopcan · on April 13, 2021

That's Linus for you, I guess?

I dunno, I always found `git blame` really funny yet at the same time, I empathize with those who are a little taken aback by it (including... what's the company's name who makes IDEs who calls it "annotate"... ?).

I largely love it because most of the time when I `blame`, that's exactly what I want to do. I don't need software to give me a nicer verb so I can pretend that's not what I'm actually doing.

And most of the time, the author is exactly who I thought it was going to be (which can result in jovial convos) or, more commonly, ITS ME! Oh, the humility. I guess I can see it being a problem if you work on a toxic team (shit, sorry) and yes, I do understand desire for a sheen of positivity in the language of the software we use, but pretty much every of the time I go to call `git blame`, I mean `git BLAME`. While it certainly does happen, I rarely find myself thinking, "Who wrote this amazing code!?? I MUST KNOW!!!" and in those circumstances, it actually soothes my ego to "blame" them for it :D

edited for grammar/spelling

cjohansson · on April 13, 2021

Git can also be an offensive word just like blame. It just depends on what attitude people have towards the context. I am not offended by git, blame, master and slave and I think it would be best if people who are offended just change their attitude towards the context

srathi · on April 13, 2021

Shameless similar plug: I recently implemented parts of git in Golang as a learning exercise.

https://github.com/ssrathi/gogit

melenaos · on April 13, 2021

Are these project for more or for browsers? I would love to been able to create wpa that could save on git without any backed. Is this possible with this or with isomorphic git?

dcwca · on April 13, 2021

This is such great work.