Hacker News new | past | comments | ask | show | jobs | submit login
Gitlet.js – Git implemented in 1k lines of JavaScript (2015) (maryrosecook.com)
361 points by tambourine_man on April 13, 2021 | hide | past | favorite | 75 comments



So, I'm going to put that out there, but I really wish I could read security-sensitive code like this. TLS implementations, cryptographic implementations, etc. It would be really amazing if there was a way for Github to automatically display code like this by extracting comments out.

In general I think it's an amazing way of learning, I remember reading the gobyexample.com website and writing the same thing for go assembly: https://davidwong.fr/goasm/add


Comments are good, this particular instance, not so much.

This description spends a lot of time stating the blindingly obvious, eg:

    if (addedFiles.length === 0) {
      // Abort if no files matched path.
      throw new Error(files.pathFromRepoRoot(path) + " did not match any files");

Or:

    // If --bare was passed, write to the Git config indicating that the repository is bare.
    // If --bare was not passed, write to the Git config saying the repository is not bare.
    config: config.objToStr({ core: { "": { bare: opts.bare === true }}}),

This really adds no information if the names are clear and you can read JS. The important thing isn't what is happening in the code, because you already can see that in the code. It's WHY. For instance, it'd be far more useful to explain what is a 'bare repository', what are the implications of it, and why are we keeping track of that data, than to say "If it's bare, we write in the file that it's bare".

I feel that explaining TLS and similar in this manner would be completely unhelpful. Check out this document, for instance, specifically the "Algorithm for crypt" part:

https://www.akkadia.org/drepper/SHA-crypt.txt

Very well explained. But try and tell me what's the purpose and why those specific steps, which look increasingly bizarre as you delve in, and whether deviating from any given step would be acceptable or not (ignoring matters of compatibility)


You don’t have to explain TLS in this manner, but you could comment the relevant specification paragraphs next to the code to facilitate audits.


As @chungy says in a sibling comment, this is what's known as a literate program. It never really took off, and I suspect it's partially because doing it well means you'll be writing a lot more text than you'd typically see in a program's comments.

See http://www.literateprogramming.com/ for more information and examples.


The particular method of turning javascript comments into a "literate" source view, used in the link under discussion here, comes from Jeremy Ashkenas circa 2009, e.g.:

https://underscorejs.org/docs/underscore-esm.html

https://coffeescript.org/annotated-source/lexer.html

https://backbonejs.org/docs/backbone.html


What is the tool that builds this literate source view from source code?


My guess is it's Docco: http://ashkenas.com/docco/


Huh. I actually write my code in this style. Perhaps not quite as detailed, but with the same idea in mind.

I wish code like this was more common. Over and over, I've heard fellow programmers reply with something along the lines of "well it should be obvious what it does! just look at the name and implementation - all you need is right there", when it was suggested that they add documentation to their implementations. What never seems to quite get across fully is that, yes, the code is all there, and it may even be clean and fairly readable, but the inherent complexity is such that only programmers experienced in subject X will ever have a chance of actually understanding it.

"So, what? If it's that complex, then maybe only programmers who are that experienced should attempt to read and/or contribute to it"* is another sentiment I've seen expressed. Of course, this misses the logic that, in order for such programmers to exist, they have to learn from somewhere, and start from something.

And how will we ever have programmers experienced in that subject


Yes, they have to start somewhere. That somewhere is reading and working with the code they want to read slowly.


Try and read the source to Metafont. It's a scattered morass with blocks of code broken out from it's point of usage for expository purposes.


Or the OG TeX for that matter. Knuth writes his code in WEB (his own literate programming language based on Pascal)


I think that this is something else, it's a simple difference but having the code on the size changes everything for me.


> I think that this is something else, it's a simple difference but having the code on the size changes everything for me.

I assume you mean "on the side" and I agree, it's a game-changer. Normally I find literate programs to be tedious to read but simply changing the layout makes a huge difference.


It's a very old idea. Donald Knuth implemented it with both Pascal and C to promote an idea of "literate programming" -- you'd literally just write prose sprinkled with little bits of programming. On compilation, you can choose to compile either the documentation or program.


The deep learning library fastai uses this. Their source code is also their documentation. They use something called "nbdev" to accomplish this. And I have also seen other projects both related and unrelated to deep learning adopt it. This idea stems from Knuth.


For an example, the PBRT book is great.


PBRT is Physically Based Rendering: From Theory To Implementation by Matt Pharr, Wenzel Jakob, Greg Humphreys.

https://www.pbrt.org

https://www.pbr-book.org

https://www.amazon.com/Physically-Based-Rendering-Theory-Imp...


> This book has deservedly won an Academy Award. I believe it should also be nominated for a Pulitzer Prize — Donald Knuth

It's hard to beat that book review.


Here is some JWT minting and parsing you can run in the browser https://observablehq.com/@tomlarkworthy/firebase-admin ported from MIT licensed firebase-admin Github repo.

Oauth 2.0 clients https://observablehq.com/@tomlarkworthy/oauth-examples

This is a WIP but its a full on Identity Provider implemented the browser it a literate programming env! https://observablehq.com/@endpointservices/auth


We have a project at UC Berkeley for CS 61B (undergrad data structures) called "Gitlet", where we need to make a git implementation in Java. Was super fun!

If you're curious: https://inst.eecs.berkeley.edu/~cs61b/sp20/materials/proj/pr...


Ah, Hilfingr memories.


Haha yep, he's still teaching. My semester of instruction from Prof Hillfinger was cut in half thanks to the pandemic though :(


There is also Shit, a git implementation in an almost POSIX shell [0]. Not my software.

[0]: https://git.sr.ht/~sircmpwn/shit


every time I look at these custom git repository I'm confused as to how I can read the code. For inept people like me it's here: https://git.sr.ht/~sircmpwn/shit/tree


The term “tree” is pretty common in source control systems. So, for me, I just look for a link to the “tree”, which you managed to find.

Just putting it out there for future readers.


Discussed at the time:

Gitlet: Git implemented in JavaScript - https://news.ycombinator.com/item?id=8931984 - Jan 2015 (66 comments)

Also: Git implementation in 1k lines of Node.js - https://news.ycombinator.com/item?id=16453979 - Feb 2018 (2 comments)


This is so cool, and also my first time seeing this brilliant format for long-form explanations of code. Is this a thing out there, and does anyone have any pointers on libraries that do this?


It’s called literate programming, invented by Donald Knuth: https://en.m.wikipedia.org/wiki/Literate_programming


Backbone.js comes to mind: https://backbonejs.org/docs/backbone.html.

Edit: The author of backbone.js has a tool that generates documentation in this style. http://ashkenas.com/docco/


In the 'real' Git, one of the functions I use frequently is `git checkout -- .`, which discards any unstaged changes. Looking at Gitlet's implementation, it wouldn't handle that well as `checkout` can only take a ref.

So I guess I'm wondering what the real `git checkout -- .` actually does behind the scenes?


FWIW, since Git 2.23 I prefer using "git switch" and "git restore" instead of "git checkout".


This is correct, ‘git restore’ is the way to do this in the current version of git. ‘checkout —-‘ still works, but the ‘help’ text in ‘git status’ now recommends using ‘restore’.

Not only is it more semantic, it reduces the complexity of ‘git checkout’ which is a common complaint.

Really, It doesn’t make much sense to tack on some random ’--‘ flag on ‘checkout’ for that functionality. ‘restore’ is clear and describes what it does.


> ‘git restore’ is the way to do this in the current version of git.

No, that’s incorrect. `git checkout` is the way to do this in the current version of Git.

The current version of Git allows you to use `git restore`, but the documentation says this about it:

> THIS COMMAND IS EXPERIMENTAL. THE BEHAVIOR MAY CHANGE.

Until that warning goes away in a future version of Git, `git checkout` is the way to do this.


Fair point, I had honestly never read the docs page on restore.

But, I still have to disagree. They may not have landed on the final shape of the API, but they are clearly encouraging people to use restore for this purpose.

There is no ‘warning’ message in the CLI. In fact the opposite, the help text in every ‘git status’ message explicitly states to use ‘restore’ for the case of resetting the changes of a single file, where it used to say use ‘checkout —-‘.

Yeah I’m used to using checkout —- like the rest of us, and sure we can all still do it if we want. But I would never recommend ‘checkout —- filename‘ to anyone in 2021. It doesn’t make any sense that that would be how you do that. It’s confusing trying to explain it to someone.

“Yeah I know you usually use this to completely change branches, or go to a particular commit hash. But if you use this weird ‘—-‘ flag & a file name, it does a mini ‘reset’ on that one file”

Why would you use some random flag on a command (checkout) that is typically used for a totally different use case? If anything, this should be a flag on ‘reset’.

I’m not a git ’complexity hater’, but it’s a perfect example of why a lot of people complain about git.


> the help text in every ‘git status’ message explicitly states to use ‘restore’

Good point. This makes no sense. There should either be a warning in the documentation and `git status` should not recommend it, or there should not be a warning in the documentation and `git status` should recommend it. It’s either ready for use or not, it shouldn’t be inconsistent about it.

I agree it’s a positive change, but the current state is ambiguous when it shouldn’t be.

You don’t need the “weird -- flag” in most cases, by the way, it’s not Git to blame for that, and it’s the same for both checkout and restore. The `--` is the standard POSIX way to distinguish between options and filenames. If there’s no ambiguity (e.g. `git checkout .`) then you can skip it. You only need it if the pathspec could be interpreted as an option.


Thanks for the explanation, I did not realize that about —-, good to know!


> Not only is it more semantic, it reduces the complexity of ‘git checkout’ which is a common complaint.

It doesn’t since the feature can’t be removed from git checkout for BC reasons.


It does, because it sounded like he was concerned with the complexity of explaining the broadness of uses for git checkout.

Now we can treat git restore like it is a separate thing.


This is a better way to say it and what a I meant thanks.


I don't see why it couldn't be removed from a future release after a period of deprecation.


Because there’s a million aliases out there which alias « git checkout —- . » to something sensible and there is no real justification for breaking them: checkout can trivially delegate to restore.


I have a Git alias for `rs = restore -SWs`. `git rs commit :` more-or-less mimics what `git checkout branch :` did. (BTW, : means "repository root, regardless of current .")


My problem with restore, is it doesn't print anything. Checkout tells you how many files were restored


Ok, I re-read your command, and realized you’re doing ‘checkout’ minus minus period (to be explicit).

Honestly it had never even occurred to me to use the period there, but it does work :) I always used it for single files.

As we said in other comments, that’s why they introduced ‘restore’.

But, if you’re looking to reset all changes, in every file back to the state of the HEAD, then I believe ’git reset —hard’ would be the appropriate tool!


The ref would be HEAD (the last commit) here.



I also recommend js-git (https://github.com/creationix/js-git) if you need more low level control.


Great presentation! I created something similar — a step-by-step Git implementation that is designed for teaching Git internals:

https://www.leshenko.net/p/ugit/


Reading stuff like this makes me lose all hope to ever really understand (and being able to tame) git:

> Under the covers, the pull command runs git merge FETCH_HEAD. This reads FETCH_HEAD, which shows that the master branch on the beta repository was the most recently fetched branch. It gets the commit object that alpha’s record of beta’s master is pointing at. This is the second commit. The master branch on alpha is pointing at the first commit, which is the ancestor of the second commit.


The format is nice, but I can't help but feel it'd be much easier for a human to parse as just normal TS with docstrings for flavor. For some of the more advanced functions, it's all but impossible to actually figure out what the input and return types are -- one must parse the english to find the references to "(see the module description for the format)", then try to figure out what that means, then go there, then repeat, until primitives are reached. Compare to TS where the return type is either right there in the code, or I can at least hover over the function to see a preview or Cmd+Click to get to the type definition itself.

Not to mention the difficulty of manually keeping the english type descriptions in concert with the actual code -- some things computers are much better at than humans, verifying types is one of them.

All that said, the flavor text is well written and I'm glad the author took the time to include it!


To people who're feeling like they are not able to understand or write something like this:

While the author is definitely skilled, please know that you're looking at a finished project that took what looks like ~10 months of work.

Do not compare what you know now with what the author has accomplished. They likely learnt a lot along the way and you would as well.

If you're really keen to understand it start with the first commits: https://github.com/maryrosecook/gitlet/commits/master?after=...

The first two (meaningful) commits simply create the .git directory and the tests for it. You can almost certainly understand that.


I'd strongly suggest looking at https://github.com/isomorphic-git/isomorphic-git

It's a newer Git implementation but it has loads of features and it's easier to read than most. An excellent project.


I once had to do some git stuff programmatic, back then I was working in python. I found out about this library called porcelain (I think it wasn't being maintained anymore). I end up having to go through the whole code so I can get some things done. Do not recommend. It is indeed a cool and helpful thing that you did. Congratulations.


this is cool! but I'm curious to see a git blame implementation.


[flagged]


Please don't take HN threads on flamewar tangents. That's what we're trying to avoid here.

https://news.ycombinator.com/newsguidelines.html


`blame` is one of Git’s most useful features for me, because if I’m confused about a piece of code I can instantly check to see when that code was added and why. Does it fix a bug; if so what’s the bug tracker ID and how was it resolved? Why did the programmer chose to do things this way instead of some other way that seems more obvious? How did this change fit into the rest of the codebase, and were any other files or functions changed to match?

(Of course, the utility of this depends on how well people write commit messages. There’s nothing more annoying than looking at a blame and seeing “update somefile.java.”)

If your organization is using blame to stir up conflict and point fingers, that sounds like a human problem rather than a technical problem, and a problem that you would still have even if `git blame` never existed.


Something tells me I wouldn't want to work in whatever team you work with.

git blame is extremely useful for knowing when something was added, and what the context was. It is not about "blaming" people for mistakes.


Yea, it strikes me as a poor name. Cheeky perhaps. I once had someone take me aside and ask me if it was rude to use "blame" in github to figure out who touched a specific line. They thought the name implied some sort of finger pointing and fault.


> Yea, it strikes me as a poor name.

Other VCS have aliases like « annotate » for the operation.


I mean, it just tells you the commit that last changed a line...


it's not that simple. git is a snapshot, i.e. it saves an entire file. How to recover changed lines from it efficiently? And How to do it for the entire history?

Basically, I want to understand this code

https://github.com/git/git/blob/master/blame.c


I was replying to another comment that has since been deleted. It said something roughly along the lines of git blame being a contentious and passive aggressive command.

I totally agree the technical implementation would be cool to study :)


If you ever want to see "deleted" (flagged/dead) comments like that one, turn on "showdead" in your profile.


Ah I see it now, thanks for the heads up!


I'm so confused on what you're saying. Are you saying it _isn't_ what commit last occurred on the given line?

How would you explain this UI? https://github.com/git/git/blame/master/blame.c

Ie, what does blame mean to you?


Nah, that award is for team time logging tools. Blame is normally used as a "who to ask" tool. Often the answer it gives you is "who last refactored this file" or "who originally moved this file from another location and lost the link to the prehistoric era when the interesting stuff happened"


My tool is designed to obtain insights from Git history and I've never understood how people can take such offense to git blame. I do agree calling it "blame" was probably not the best thing to do, but the fact is, it serves a purpose, which is, it's designed to help you understand how a piece of code came about.

For you to call this feature "passive aggressive" must have been borne out of some traumatizing experience and for that, you have my sympathy.


Another word for blame is 'responsibility'.

In a political organization, it's good to take responsibility for something, but bad to blame. A generous interpretation of that is that the person/group who made the decision should raise their own hand, and that it's a faux pas for others to point them out.

In a healthy one, you can recognize it's the same thing, and the important thing is how you respond to it (i.e., it's important to understand the responsible party so you can understand the reasons/motivations for the action, and accept them or address them).


git blame is mostly used to answer the question "when did this change and how?"

Being able to answer that question is key for maintainers.


I'm sorry if you had bad experiences with git blame usage. People can be assholes sometimes.

It's not meant to be a finger pointing tool but can be used for that.


That's Linus for you, I guess?

I dunno, I always found `git blame` really funny yet at the same time, I empathize with those who are a little taken aback by it (including... what's the company's name who makes IDEs who calls it "annotate"... ?).

I largely love it because most of the time when I `blame`, that's exactly what I want to do. I don't need software to give me a nicer verb so I can pretend that's not what I'm actually doing.

And most of the time, the author is exactly who I thought it was going to be (which can result in jovial convos) or, more commonly, ITS ME! Oh, the humility. I guess I can see it being a problem if you work on a toxic team (shit, sorry) and yes, I do understand desire for a sheen of positivity in the language of the software we use, but pretty much every of the time I go to call `git blame`, I mean `git BLAME`. While it certainly does happen, I rarely find myself thinking, "Who wrote this amazing code!?? I MUST KNOW!!!" and in those circumstances, it actually soothes my ego to "blame" them for it :D

edited for grammar/spelling


Git can also be an offensive word just like blame. It just depends on what attitude people have towards the context. I am not offended by git, blame, master and slave and I think it would be best if people who are offended just change their attitude towards the context


Shameless similar plug: I recently implemented parts of git in Golang as a learning exercise.

https://github.com/ssrathi/gogit


Are these project for more or for browsers? I would love to been able to create wpa that could save on git without any backed. Is this possible with this or with isomorphic git?


This is such great work.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: