Hacker News new | past | comments | ask | show | jobs | submit login
Shit – An implementation of Git using POSIX shell (sr.ht)
814 points by kick 11 months ago | hide | past | favorite | 227 comments

Hiya HN. I was ranting on Mastodon earlier today because I feel like people learn git the wrong way - from the outside in, instead of the inside out. I reasoned that git internals are pretty simple and easy to understand, and that the supposedly obtuse interface makes a lot more sense when you approach it with an understanding of the fundamentals in hand. I said that the internals were so simple that you could implement a workable version of git using only shell scripts inside of an afternoon. So I wrapped up what I was working on and set out to prove it.

Five hours later, it had turned into less of a simple explanation of "look how simple these primitives are, we can create them with only a dozen lines of shell scripting!" and more into "oh fuck, I didn't realize that the git index is a binary file format". Then it became a personal challenge to try and make it work anyway, despite POSIX shell scripts clearly being totally unsuitable for manipulating that kind of data.

Anyway, this is awful, don't use it for anything, don't read the code, don't look at it, just don't.

> instead of the inside out

Now the name makes even more sense. I first read it as sh/git, but reading it as something that starts inside and slowly works its way out is now my preferred explanation of the name.

I am simultaneously revolted and fascinated with the brilliance of your explanation.

The name also plays well with the porcelain & plumbing metaphores of git.

metaphors. Not being pedantic, it just puzzled me a while to see what was wrong with that word.

what do you get when you cross a metaphor and a semaphore? a metaphore

So it's a metaphor that only a limited amount of people can use at any given time?

I never sem a phore that I didn't like.



A semi-metaphor?

This comment looks like an offtopic pun to me and feels distracting.

If it's so hard to learn about this tool and so easy to learn about it the wrong way, that's a pretty obvious hint that there's something wrong with the tool.

Having to learn about the internals is a giveaway that the tool suffers from poor encapsulation.

To me git is definitely one of those tools where one should satisfice and not learn it deeply, because it's not worth the effort. One can successfully stick to a simple workflow and ignore anything git astronauts come up with, like git flow if they want to keep their sanity and focus on what matters - creating quality software. And almost any team has some git fetishist which will be thrilled to help when things go south. And if they don't, it's probably for the better.

> If it's so hard to learn about this tool and so easy to learn about it the wrong way, that's a pretty obvious hint that there's something wrong with the tool.

Not necessarily. This may mean (and I think in this case, it does) that people are too afraid to learn about those "internals" - or should I say, the mental model behind the tool (and then some of those people write tutorials for others, perpetuating the problem). And with a "monkey see, monkey do" approach, people can fail at anything, up to and including tying their own shoelaces.

There is no such thing as a perfect encapsulation. Not in programming, and especially not in the physical world. "Internals" are ever-present and leak into view all the time. A good abstraction is just one that you can use day-to-day without constantly minding what's going on behind the scenes.

More importantly though, when you're just learning a bunch of git commands in isolation ("monkey see, monkey do"), you're not learning a tool/an abstraction - you're just learning its interface. That's sometimes OK, but in general, for effective use of an abstraction it's better to learn what moving pieces it is abstracting away. Which in case of Git is that it's a DAG. DAGs are kind of fundamental in programming, too; it's good to understand them.

Its part of a bigger issue - source control tools have wildly different fundamental models. Moving to git from anything else will be confusing, because users jump to conclusion about what commands and operations are doing. Its different from other tools in important ways.

I moved from SCCS to RCS to CVS to Subversion to Git.

Moving to Git was the more difficult step for me. I prefer it now. For a while I had to move back to Subversion and I hated it.

Can't use svn for the life of me, though I used it for 10 years before trying git for a real dvcs need (remote team for a while)...

Just the idea pains me. Missing git add -p, cherry-pick & rebase -i so much I immediately put git-svn on if I have to go back...

Also, it makes telecommuting easier, asynchronous team work so much simpler...

I think the key 'abstraction' that people don't understand is cherry-pick. I can't explain clearly in fine details /how/ it works, but it is the base of so much of git's power...

Monkey see, monkey do would be watching someone use git and imitating them. Reading the manual or an overview of the commands and using them is how learning new tools typically works.

It should be possible to learn the commands for creating a branch, uploading our changes or making a commit like it was/is possible for all version control tools and then move on with our professional lives, which likely revolve around writing software and not fumbling with git.

By the way, I love your conversation about Merkle trees below; it was one of the most surreal things I've read lately. :-)

> It should be possible to learn the commands for creating a branch, uploading our changes or making a commit like it was/is possible for all version control tools

The problem revolves around the fact that, despite same name being used, git!branch != svn!branch, git!commit != svn!commit, etc. They serve related purposes - but not the same, because the concepts behind them are different. Learning a tool means learning those concepts. So in the process of learning "git commit" and "git branch", you're supposed to pick up on the "pointer to a node in a DAG" thing - otherwise you haven't learned "git commit", you've learned something else that's vaguely similar. And then you'll have difficulties when its behavior goes against your expectations.

But they behave pretty darn close to SVN branches and commits and... they're also called branches and commits. It's clear why that happened - no one would have used a weird tool which turned the old concepts on their heads, so git was being taught based on comparisons with existing tools.

Now that git's very popular, the teachers have become arrogant and are claiming that our mental models for how VCS work are wrong and we should instead adapt our thinking to the git internals. In almost all other professions a confusing tool is scorned, but only developers are expected to learn how all sorts of weird contraptions work and then anyone who can't keep up is scorned instead.

git is almost 15 years old and here we have yet another attempt at clarifying how it works to the masses. Why are there so many git GUIs and tutorials and attempts to clarify how this tool works? It's a freaking VCS, not rocket science. git took something that used to be straightforward and doable by any developer and turned it into an over-complicated mess.

Now here's a question for you: why do you defend this anti-developer tool instead of siding with fellow developers?

> There is no such thing as a perfect encapsulation.

It's a logical fallacy to use this as an excuse for a tool having bad encapsulation/abstraction.

Yes it's impossible to have a perfect abstraction. No that doesn't mean we shouldn't try harder.

Completely agree.

> Which in case of Git is that it's a DAG

And one step beyond that, that it's a Merkel tree. It's key to understanding stuff like "if I change a commit, it changes all commit after that" or "if I move (cherry pick, rebase) this commit, I'm creating a new one, not really moving".

> And one step beyond that, that it's a Merkel tree.

Not really, not every block chain is a Merkle tree. Since Git history is not linear, you can’t order the commits in any canonical way. You definitely can order them in some way (like "git log" does) and then construct a tree for that list of hashes, but this is not really useful computation. Git repo integrity is verified simply by HEAD commit hash because you normally clone the entire repository anyway.

Is git even a Merkle tree? Git history forms a DAG, not a tree. But I'm not 100% sure about how the hashes are computed - whether they work on the DAG, or on some local subtree.

No, it’s not. Commit hashes are exactly what it says on the tin: hash sums of commit objects, which are basically text files that include hash sums of tree objects (directory tree state), hash sums of parent commits, your commit message and other metadata. You can see it with "git cat-file":

    $ git cat-file -p d8defd0bb0062ed541de173a2aec834b64d6adbe
    tree cbdb56fe9bb1766d8fc2b2e53c9c934efbacbf1c
    parent a8a1049c06d100a3f926a82414e6addb9b9af5e8
    author ilammy <alexei@cossacklabs.com> 1581533224 +0200
    committer ilammy <alexei@cossacklabs.com> 1581533224 +0200
    fixup! Avoid unsigned overflow in length computations
And compute the commit hash manually to verify that:

    $ sha=d8defd0bb0062ed541de173a2aec834b64d6adbe
    $ cat <(echo -ne "$(git cat-file -t $sha) $(git cat-file -s $sha)\0") \
          <(git cat-file -p $sha) | sha1sum
    d8defd0bb0062ed541de173a2aec834b64d6adbe  -
Git prefixes object content with object type ("commit" in case of commits), its size in bytes (textual, decimal), terminated by a null byte. And hashes all of that get the commit hash.

>that's a pretty obvious hint that there's something wrong with the tool.

Or that the problem space is inherently difficult.

Almost everybody I've ever worked with who complained about Git has a conversation with a coworker that goes something like this:

Coworker: "I really hate Git, it's so hard to understand what's going on internally."

Git guy: "Did you read the documentation?"

Coworker: "Nope."

Git guy: "Did you read Git - book?"

Coworker: "Nope."

Git guy: "Did you read Scott Chacon's 'Pro Git'?"

Coworker: "Nope."

Git guy: "..."

That conversation already went off the tracks at "internally", since it shouldn't matter at all how it works internally. We do not learn how most things work in detail, because we wouldn't have time to live our life if we did.

In this particular case, git is a version control tool and it supports various typical operations for such tools. One should be able to learn the commands and then successfully use the tool. If that's not possible, I continue to assert that there's a problem with the tool.

The likelihood for this conversation to occur, in a way, also depends on the tool itself, and tells us something about it.

As a matter of fact, git is hard to use.

The question is: keeping the same level of functionality, could it have been made easy to use?

It also speaks about the quality of users as a whole - we somehow forgot to RTFM at all or lead people to documentations, simultaneously accepting lack of documentation in a vicious cycle.

And the answer is: yes, absolutely. Mercurial is a whole level better. Unfortunately git won - because of GitHub, not on technical merit.

> the supposedly obtuse interface makes a lot more sense when you approach it with an understanding of the fundamentals in hand.

Agreed. I always said the best git tutorial is https://www.sbf5.com/~cduan/technical/git/

> The conclusion I draw from this is that you can only really use Git if you understand how Git works. Merely memorizing which commands you should run at what times will work in the short run, but it’s only a matter of time before you get stuck or, worse, break something.

I never really understood Git until I read this tutorial: https://github.com/susam/gitpr

Things began to click for me as soon as I read this in its intro section:

> Beginners to this workflow should always remember that a Git branch is not a container of commits, but rather a lightweight moving pointer that points to a commit in the commit history.

> When a new commit is made in a branch, its branch pointer simply moves to point to the last commit in the branch.

> A branch is merely a pointer to the tip of a series of commits. With this little thing in mind, seemingly complex operations like rebase and fast-forward merges become easy to understand and use.

This "moving pointer" model of Git branches led me to instant enlightenment. Now I can apply this model to other complicated operations too like conflict resolution during rebase, interactive rebase, force pushes, etc.

If I had to select a single most important concept in Git, I would say it is this: "A branch is merely a pointer to the tip of a series of commits."

And you can see this structure if you add to any "git log" command "--graph --oneline --decorate --color". IIRC some of those are unnecessary in recent versions of git, I just remember needing all of them at the point I started using it regularly.

I have a bash function for it (with a ton of other customizations, but it boils down to this):

  function pwlog() {
    git log "$@" --graph --oneline --decorate --color | less -SEXIER
  pwlog --all -20
(...in that "less" command, "S" truncates instead of wraps lines, one "E" exits at EOF, "X" prevents screen-clearing, and "R" is to keep the color output. The second "E" does nothing special, it and "I" (case-insensitive search) are just to complete the word)

You can also set $GIT_PAGER/core.pager/$PAGER and create an alias to accomplish this:

  #export PAGER='less -SEXIER'
  #export GIT_PAGER='less -SEXIER'
  git config --global core.pager 'less -SEXIER'
  git config --global alias.l 'log --graph --oneline --decorate --color'
  # git diff ~/.gitconfig
  git l
core.pager: https://git-scm.com/docs/git-config#Documentation/git-config...

> The order of preference is the $GIT_PAGER environment variable, then core.pager configuration, then $PAGER, and then the default chosen at compile time (usually less).

>This "moving pointer" model of Git branches led me to instant enlightenment.

As opposed to any other VCS? Feels like that model is the only one that works with SVN too. I struggle to see how "branch is a container of commits" is a viable model to begin with.

It's a good-enough description of SVN, where branches exist in the same directory tree, commits are tied to the branch by way of the path, and the merge tools are "merge this batch of commits from branch A to trunk" (you don't have to take the whole branch at once).

One of the biggest hurdles my co-workers have had learning git after having used svn for years is the "bucket of commits" mental model they've built up for branches. A common question is how to merge a single commit.

This is closer to how I would describe HG and SVN. Branches can be traced from start to merge, they are heavy. You know which commits came from what.

In git you can lose track of what came from what branch when you start merging multiple back and forth, this does happen with svn.

Mercurial branches are different from git branches; they're topological structures that emerge when a revision gets an alternate child. They're like growing and stopping lines of development. They exist on their own, Mercurial simply allows to give them names. What git calls branches in Mercurial is called bookmarks.

> A branch is merely a pointer to the tip of a series of commits.

But this is not actually correct because a branch can often point to a commit that is not the tip.

It is the tip for that branch. Even if there exist other commits building on the commit the current branch points to, the pointer is still at the tip for that branch.

The point is that a branch is simply a pointer to a commit that automatically encapsulates all of the parent commits.

I think I see what it means though. The branch uses that commit as a new tip to then branch off of, not necessarily meaning a new branch starts at the existing 'tip'.

>The conclusion I draw from this is that you can only really use Git if you understand how Git works.

I say this as someone who uses git regularly, and who prefers it to all other version control systems I have tried:

A tool that breaks the principle of encapsulation by forcing you to grok its internals if you are to have any hope of understanding its arcane and inconsistent usage syntax is frankly not a very good tool.

By contrast, I don't understand how vim works beyond the base conceptual level (keypress goes in, character shows up on screen or command is executed) and yet I don't have any trouble using it. I don't need to know vim's internals to use it effectively. Vim is a good tool.

> I don't understand how vim works beyond the base conceptual level

How much time have you spent trying to figure out how to change the font size in Vim, rotate text 90° in Vim, recalculate a formula in Vim, or insert an image into a document you're editing in it? If the answer is “none”, you probably have a pretty deep understanding of the data model Vim manipulates, even if you aren't aware of it.

On the other hand I understand de data model of git and can't to the most basic shit without looking up which invocation I need via search engine/man pages. Like... deleting a branch `git branch -d` (-D for forced deletion). Deleting a remote? `git remote rm`. Knowing the model teaches me nothing about the ui.

This seems like a good opportunity to plug two aliases I wrote about a year ago that have been very helpful for cleaning up all the extraneous branches that would show up when I ran `git branch`.

I run `git listdead` after I merge and delete a branch and do the next fetch (`git pull --rebase`). That lists the branches that can now be deleted.

Then I run `git prunedead` and it actually removes them.

Previously if I ran `git branch` it would list every development branch I had ever created in nearly a decade of work. Now it lists maybe ten branches.

   listdead = "!sh -c \"git branch -vv | grep ': gone]' | awk '{print \\$1}'\""

   prunedead = "!sh -c \"git branch -vv | grep ': gone]' | awk '{print \\$1}' | xargs git branch -D\""

Myself, I frequently refer to the man pages, as well as StackOverflow.

> rotate text 90° in Vim, recalculate a formula in Vim, or insert an image into a document you're editing in it

I'm not sure if I'm missing some features in Vim or you're actually pulling my leg by forcing me to notice that I know more about text than I care to admin :-)

(I'm not OP, BTW, just a random passer-by)

The latter! Sorry, didn't mean to cause you to question your sanity.

Using Vim requires you to understand how the data it operates on is structured. The same applies to Git. Plain text is just a lot simpler than a VCS repository.

But you can use your awesome vim skill to feed text into a OpenOffice document, and while the OOo internals are probably 100x more complicated than vim, the user interface for "text on my screen" stays the same, and transition is smooth, even though the internals underneath is vastly different. If git requires 'everyone' to know the internals before they can use it, as opposed to rcs,cvs,svn,perforce users who can more easily flip around between those for most basic usages, then its on git for having a complicated shell around a complicated set of internals.

It would have been nice if there was a simpler shell around the complex machinery for those (us?) who don't want to do crazy stuff, who don't need to be able to do crazy stuff and who could settle for only the simple 90% of the tooling like we do with the alternatives, but are forced to use git for external reasons.

Plain text is internally ropes-something, indented with meters of vimscript, colored with a syntax model that is okay to change, hard to create from scratch, etc. it’s all hidden from a regular user who uses a subset of all features.

But if you ignore the non-ms movement, shortcuts and advanced transforms, it is still a text editor that everyone may use. You can’t put your text (text, not a current mode!) into a state that looks okay but requires a vim guru to continue or start over because something is broken in the model. That’s different from git issues where working copy looks okay, but the branch and merge are broken in subtle ways.

>Plain text is just a lot simpler than a VCS repository.

Than a Git repository, not a VCS one. Not saying that VCS = plain text, but much simpler models exist for merging teh codes.

Scott Chacon wrote a book on git internals that was published by peepcode some time ago. Searching for where to buy it turned up this HN thread: https://news.ycombinator.com/item?id=7999515

Looks like peepcode was acquired, but the book was open sourced: https://github.com/pluralsight/git-internals-pdf

I'm reading through it now. It starts with the fundamental git structure and works up from there.

>I reasoned that git internals are pretty simple and easy to understand, and that the supposedly obtuse interface makes a lot more sense when you approach it with an understanding of the fundamentals in hand.

Everybody's brain is different but I actually understand all of git's internals (the "plumbing") but it doesn't help me with the git commands (the "porcelain").

Yes, I know that git is a DAG (Directed Acyclic Graph), and that HEAD is a pointer, and the file format of BLOBs and SHAs, etc. If I were to implement a DVCS, I would inevitably end up reinventing many of the same technical architecture decisions that Linus came up with. But none of that insider knowledge really helps me remember git syntax if I haven't been using it in more than a month. Even though I grok git's mental model, I still can't answer the top-voted "git" questions on Stackoverflow without a cheat sheet: https://stackoverflow.com/questions/tagged/git?tab=Votes

The git UI and unintuitive syntax is just too hard for me to remember unless I use it every day.

In contrast... In vi or MS Word, I can effectively modify text without digging into underlying "rope data structure"[1]. In databases & SQL, I can "INSERT INTO x" without learning the "internals" of b-trees[2]. In Photoshop, I can stack layers without learning the math "plumbing" of alpha blending[3]. And yet for some reason, Git in particular needs people to learn it "inside out" more so than other tools. Not sure why Git needs this cognitive prerequisite.

[1] https://en.wikipedia.org/wiki/Rope_(data_structure)

[2] https://en.wikipedia.org/wiki/B-tree#B-tree_usage_in_databas...

[3] https://en.wikipedia.org/wiki/Alpha_compositing#Alpha_blendi...

Unfortunately, I think you have to learn this way because everyone who came before you also did. So not only is the porcelain oriented towards this understanding, but so are people's existing repositories and workflows.

People regularly use the limited subset of git that Github permits without learning how it works internally. If only the 'edit this file' button were powerful enough to do actual work... It isn't, and that's the other problem: internals knowledge actually helps you do day-to-day versioning tasks. The reality is that one day two developers will submit PRs that conflict, and you'll have to find a way to merge them both, and knowing how rebasing works inside absolutely helps you. The analogy is more MSWord style-stacking than manipulating ropes directly, because Git does manage to completely hide some of its guts. (Object storage, compression, transfer come to mind.)

Though git's interface is highly flawed, I do think a good VCS needs to expose more of its storage model to the user (or at least a model isomorphic to it) than most apps.

In vi and Word, you're not worried about state changes outside of saving the current state and, possibly, undoing some number of steps. In a VCS, you might need to check out, merge, or compare arbitrary states from the history, and doing this inherently requires a deeper understanding of how the history is stored. A good VCS should expose these internals in a clear way. In my experience, working with even 1 teammate immediately requires you to have some mental model of how your VCS deals with merging different histories.

That said, it's up to the VCS's interface to make these things clear. Git's mental model is simple enough, and the porcelain can do some of this stuff very well, but CLI is arcane; I end up storing extremely common functions as shortcuts because I'd never remember them or want to type them even though I use them dozens of times a day.

> In databases & SQL, I can "INSERT INTO x" without learning the "internals" of b-trees[2].

True, but it is very helpful when designing your indexes.

> oh fuck, I didn't realize that the git index is a binary file format

I wonder if the project (or likely, another one) might be better served by implementing the index using plain text (or whatever else might be more natural for shell wrangling) to elucidate the conceptual structure rather than matching git literally.

PS: The name is very apropos. One doesn’t see too many such fitting opportunities — feels warm and fuzzy to see this one well used :-)

> I feel like people learn git the wrong way

Why do you think it's the wrong way? I sit somewhere in between and think that some people want to know the details and learning from inside is a good idea. But some other people want to simply be users and for the tool to get out of their way - and that's also good. So if the docs or the UX make either way hard or less effective, that's on the docs or the UX to improve.

You'll use git a hundred times a day, every day, for the rest of your career. It's easily worth the hour it'll take to learn it properly.

Sure, but I also use a mouse. I could learn how the optical mouse works. I have some guesses about it too, but never actually learned the details.

But I'm a user of it - it works even if I don't understand exactly how and nobody tells me that I learned using the mouse "the wrong way" because of it.

Yes, and you don't have to spend one hour learning how to push a button.

A version control system is tackling a non-trivial problem. Go learn it properly, otherwise you'll be one of the 'users' that, at best, will be stumped on trivial issues, losing productivity and running to others for help. At worst, you'll be making bad decisions and dragging down your team.

Would you also say that you don't need to learn anything and can just "guess" while working with a programming language?

Go learn it properly, otherwise you'll be one of the 'users' that, at best, will be stumped on trivial issues, losing productivity and running to others for help. At worst, you'll be making bad decisions and dragging down your team.

I have no problem with people on my team asking each other for help, and I definitely don't consider it bad for productivity when they do. If someone on my team suggested people asking them for help was bad I would bring it up in their next one to one because that's a really bad sign something is wrong.

If everyone on my team decided to learn the internals of git so they didn't need to ask one another when a problem arose I would be genuinely concerned about how the team is working.

I think the point was "trivial issues". What if a Java programmer kept asking their teammates if the statement terminator in Java is colon or semicolon?

Users can resolve trivial issues without knowing the internals though, so that wouldn't make sense.

A typical mouse has 2-3 buttons, a wheel, and an X/Y axis.

Git is 100x more complex, if not more.

Give me a break. I'm sick of people glamorizing the idea that you should have your hand held through each step of every tool you use and never expend any effort on becoming an expert in the tools of your trade. Git is an engineering tool, designed by and for professionals. Imagine this kind of obscene complatency in other fields.

I meant the tool-vs-internals idea, not the specific example here. If you want something comparable in complexity: we learn programming from `print "hello world"`, not from memory models and assembly. Some people even just start with `=sum(...)` in excel. Every programmer pretty much stops at the level that's useful and productive for them.

There's often sentiment that people should know more, but I don't think I've ever seen anyone saying starting programming from high level is "the wrong way".

Example from out of it: doctors learn both how to use USG and how it works. But in every case, I've seen it in that order: practice, then internals.

>we learn programming from `print "hello world"`, not from memory models and assembly

You're talking to the wrong crowd with me, you know. I disagree with this approach, too. Maybe we start with "hello world" to get a taste, but the first thing we should do is start breaking it down.

Thanks. I do understand where you're coming from - your original claim makes sense in that context :-)

You understand git. If someone said you need to know how sed works, or grep, or babel, or clang, or bash, or perl, or literally any other tool that you use regularly that you don't know the internals of then you'd quite reaaonably say they were wrong because you're an expert what matters to you and you can do your job without knowing how something else works. Perhaps you should try to respect that other people choose to become experts in things that aren't git.

If you use any of the tools listed, or any other tool, several times a day then it is reasonable to know the internals at least a bit. If you use grep (or git - the same applies) once a month then it's fine to just memorize some commands.

I don't think it's asking to have one's hand held to complain about git's poor interface. There's no reason other than lazy design to have a tool where to show all remotes it's

$ git remote -v

But to show all branches it's

$ git branch -a

It's like it's been purposefully designed to be obtuse.

It seems like you've made this example obtuse.

`git remote` and `git branch` list the remotes and branches respectively.

Adding -v makes both of these verbose. It will additionally show what each branch/remote is "pointing at".

Adding -a to `git branch` shows remote tracking branches in additional to local branches. This not normally interesting so the default is to list only local branches.

...yes, remote tracking branches are interesting as well, I don't know a situation where they wouldn't be.

There's plenty of weirdness in Git, but honestly my main complaint is that the interface is awful and the documentation makes Dostoyevski look modern and sleek.

You clearly have not ever worked on repos where nobody ever cleans up after themselves as far as feature branches go. I'm working with repos with remote branches numbering in the hundreds. `git branch -a` is pretty useless at this point unless paired with grep.

this is what I have my alias set to - `git branch -a --sort=-committerdate`

That's just by accident. There's plenty of obtuse git commands , strange command names, non-standardized parameters, etc.

We're all talking about learning the internals of git and how it works, not it's poorly formed command lines. Pointing out how shitty the interface can be doesn't mean you shouldn't learn how your tools work.

You say potato, I say potato. If the interface is clunky, you will learn by heart some switches and pass them on as cargo cult.

>We're all talking about learning the internals of git and how it works, not it's poorly formed command lines.

But the argument is being put forward that to understand the command lines you have to understand the internals.

Every other field makes damn sure that their tools are usable, comfortable and as safe as they can be. While programmers act like it's your fault if you hurt yourself while using a chainsaw-hammer.

I've watched children use a mouse for the first time, and there is definitely an internal model you needed to learn. No, you don't need to learn exactly how optics work, in the same way that you don't need to learn exactly how Git is reading from files, or how its hashing algorithm is implemented.

But you do need to understand that the mousepad doesn't correspond to points on the screen, and you have to learn to treat it more like a treadmill than anything else. Going back in time and thinking about it from a rollerball perspective can help with that -- new users have a tendency to use something like 90% more space because they don't grok that for long movements they have to pick up the mouse.

People are bringing up the mouse as simple because they're used to using mice. But hand anyone a mouse for the first time and you'll find out that they aren't simple. They're just doing comparatively less than Git, so the problem space is slightly easier to tackle. And that's even ignoring the hand-eye coordination problem we take for granted, and that can take weeks for someone new to computers to get over.

Talking about internal mechanics is broadly useful when teaching computer literacy -- everything from mice, to copy/cut-paste, to shift-selection of files, to the file browser itself benefits from trying to build a systemic, mental model of some kind of behind-the-scenes abstraction.

git is a data structure manipulation tool, mouse is a cursor manipulation tool. A much better analogy would be trying to use a mouse without having a good idea of what the cursor is for.

The only difference is that grasping the cursor will probably take you minutes because it's a simple concept, and grasping the data structure takes a bit more effort because it's just a more complicated topic.

You don't need to know the implementation details of git, but you need to know the data structure it operates on, cause otherwise you're just walking in the dark.

And the other IO device? People absolutely will tell you you're typing the wrong way.

I'm not certain that git will be the dominant VCS forever, as I'd used CVS, Perforce, Subversion, Mercurial, in various degrees when they were dominant (or at least relevant).

Who knows, maybe Linus will have another epiphany, while Microsoft somehow mismanages GitHub and squanders all the goodwill away. Then, a group of upstarts...

That said, wanting to learn git's internals for the sake of knowledge is fine as motivation.

Then there's Gitlab and a few others.

Also sourcehut / sr.ht.

Even if it stays at widely used as it is now for just two more years it'd already be worth the hour of effort.

It's not a given you will use it for the rest of your career. As two examples, neither Google nor Facebook use git.

shrug I'm probably never going to work at either of those places. For pretty much everywhere else, Git works just fine.

Maybe if I play my cards right, I'll use git for the rest of my career. If not, maybe there will be something new eventually, but I imagine that the concepts learned in mastering Git would still be useful.

You should give Mercurial a try. When you enable the changeset editing extensions it does everything Git does and is much easier to use and understand. I’ve trained people on both systems and Mercurial is much less creaky. The only reason everyone uses Git is for historical reasons because most of the modules In referring to weren’t added to Mercurial until like 09 when Git started seeing widespread use.

I used mercurial quite a bit back in 2010. It's nice, but I don't see the value in sinking a bunch of time into it these days.

Every one of the projects that I interact with regularly are in a Git repo on some kind of Git hosting service and the projects are run by people who understand/use Git regularly. For those projects, switching to Mercurial is a net loss, even just considering the time it takes to migrate the codebase + related processes (think CI, issue queue integration, even the repo hosting itself).

Sure, I could use hg-git, but that doesn't gain me much either: now I'm the guy with the weird setup. If something goes wrong with my setup, it's too weird for other people to help with. If something goes wrong with somebody else's setup, I'm not that helpful because I have a weird setup.

many places were fine with cvs and svn as well

Was anyone _really_ fine with cvs?

I'd add that folks should not be forcing the use of git, even within an organization that uses it.

Mercurial works great, has a sane CLI and can both manipulate and interoperate with git repos.

Many of the tools mentioned, at their core, are manipulating the same DAG data structures.

I found Mercurial to be just as capable and much easier to learn. I’m really only considering moving my team to Git as a least-common-denominator move since nobody really makes tools for Mercurial outside of Facebook. Otherwise it’s harder to learn Git and definitely it’s harder to train interns and juniors to use it.

> neither Google nor Facebook use git

What do Google and FB use as their VCS? Also, any source? I don't mean this to be an attack, I am just genuinely amazed by the statement.

Google uses a custom implementation of the Perforce interface called Piper. Google has looked at git and mercurial, but have concluded that they can't scale to the level they need it to. Read more about it here:


Facebook uses Mercurial.

> Why do you think it's the wrong way?

Because the porcelain is a nonsensical pile of crap on it’s own so you really can not make sense of it from the top down, it actively resist that approach.

That’s not an assertion that it’s a good thing mind, just that it’s the only one: learning git from the bottom up is much easier than top-down, and people who dislike that approach are simply hosed.

I view not learning the basic concepts of how git works inside like trying to learn SQL without know what a table is.

You could, but its easier if understand that basic abstractions which are its fundamental building blocks.

I think it's more like learning enough SQL to build an application without learning about indexes and transactions: you can be sufficiently productive until you encounter corner cases or things go wrong.

> I view not learning the basic concepts of how git works inside like trying to learn SQL without know what a table is.

That’s complete nonsense. A table is not a low level implementation detail of sql it’s a core feature.

And I don’t have to known how tables are represented on disk or what they store exactly to acquire a good intuition of how things work.

SQL is in and of itself an abstraction decoupled from the underlying concerns of implementation and execution. Something git’s porcelain definitely is not.

Theoretically, a person could learn SQL by knowing relational algebra and still not know what a table is. Just sayin’.

Git internals are fine. Now, if the utils gave access to human-oriented operations with those internals without the user googling every time they need something not-yet-memorized, that would be splendid. As it is, the utils are already pretty shitty without a reimplementation.

eh, common.sh is a lot more complicated than it could be. example:

    write_hex() {
     echo "$hex" | sed -e 's/../&\n/' | while read -r hexbyte; do
      printf "\\x$hexbyte"
you can imagine other shenanigans with xargs or something, but I think this strikes the best balance between performance and readability (as far as shell script goes).

read_int16 and read_int32 don't work on big-endian systems, or if int is 16 bits instead of 32. the latter issue can be easily fixed by explicitly specifying -td2/-td4, but the former issue is not so easy. I think it requires either figuring out the endianness beforehand, or better, something like this:

    od -An -tx1 -j"$offs" -N4 "$path" | while read a b c d; do
     echo $(((0x$a << 24) | (0x$b << 16) | (0x$c << 8) | 0x$d))
oddly, this is used in ls-files already. and yes, I checked: 0x$a is POSIX, and the arithmetic evaluation size must be at least a signed long, which is at least 32 bits.

'for x in $y; do printf "$a%s$b" "$x"; done' is equivalent to 'printf "$a%s$b" $y' (assuming neither a nor b contain format specifiers). similarly, 'for i in {1..100}; do printf "$a"; done' is equivalent to 'printf "$a%s.0" {1..100}'. unfortunately, brace expansion is not POSIX, but these are both significantly more efficient (both in code size and execution time) than the loop methods.

sha1sum is not POSIX. I think shell arithmetic provides you enough tools to implement https://en.wikipedia.org/wiki/SHA-1#SHA-1_pseudocode directly, although it may be slightly slower than a C implementation. awk is probably faster than shell.

Such a long text... to end with arguing non-POSIX solution. But the initial goal of the project was having a POSIX code, and it can indeed be a valid goal. So the whole post is "a lot more complicated than it could be." Tl dr: you wouldn't make POSIX code.

what the fuck? I specified POSIX alternatives to non-POSIX uses in the code. only one feature, brace expansion, is not POSIX, so I specifically did not recommend its use.

Th specific part that I understood as your argument for non-POSIX solutions:

"'for x in $y; do printf "$a%s$b" "$x"; done' is equivalent to 'printf "$a%s$b" $y' (assuming neither a nor b contain format specifiers). similarly, 'for i in {1..100}; do printf "$a"; done' is equivalent to 'printf "$a%s.0" {1..100}'. unfortunately, brace expansion is not POSIX, but these are both significantly more efficient (both in code size and execution time) than the loop methods."

I didn't understand why you write that part at all, considering the goals of the program we discuss (which is to demonstrate some git primitives in POSIX compliant shell code).

'for x in $y; do printf "$a%s$b" "$x"; done' is used in the code already. I am proposing that it be changed to 'printf "$a%s$b" $y', which is also POSIX compliant, shorter (even including a comment), and faster. I included the part about brace expansion as a side note, not proposing that it be used.

> 'printf "$a%s$b" $y', which is also POSIX compliant, shorter

Yes, indeed, thanks for pointing that, it's documented:

"The format operand shall be reused as often as necessary to satisfy the argument operands."


I’m sorry if I misunderstood you. I’m indeed interested in what from all you wrote is then what you would suggest to be changed, as I also looked at his code and also read here that it was done in short time so I also believer there are possibilities for improvement. Specifically, reimplementing sha calculation itself should be a non goal, in my opinion. Just that the .sh code itself works on all POSIX shells, not that the whole system has to be POSIX only: calculating sha in shell is surely not the point of demonstrating how git works.

Would you say if the data format was say JSON, YAML or TOML or any more human and bash friendly format it would have been easy to implement with your current experience?

Not to vouch for having Git store things in human format. But I often think about how inefficient JSON API's and YAML storage formats are (in parse time) just to be a benefit of a user debugging it or discovering the API through a browser. But since most people use a JSON prettifier plugin or a tool like Postman anyways, what is the benefit of the line format being character strings? Wouldn't a binary package not be just as easy translatable into human readable JSON formatted output as a compacted JSON string is?

> what is the benefit of the line format being character strings? Wouldn't a binary package not be just as easy translatable into human readable JSON formatted output as a compacted JSON string is?

One benefit is that I can look at an arbitrary file/response and be able to tell with a fairly high certainty whether it's JSON, YAML, or TOML, but there's no way that I tell whether it's messagepack, bson, or protobufs.

Most of the time you know the format you expect to decode you don't have to guess it anyway.

But I think you should be able to detect the type of format for binary encodings just as well as there spec is pretty specific. Maybe not at glance as a human, but that is the point I'm making. Should all line formats be made absolute human readable and parsable at glance just to the benefit of debugging at the cost of performance. Where with just a simple lens tool you can look at the data in a completely different (human friendly) view. Tools like this already exist in the form of WireShark, only they mostly operate at a deeper level.

Even with the self deprecating nature of this effort and project: you continue to produce a lot of open source contributions. I see your work and blog all over the place. You’re a machine!

So what’s your secret?

funny how these same words came to many https://codewords.recurse.com/issues/two/git-from-the-inside...

For anyone wanting to understand Git from inside out at a fundamental level, I can't recommend the 'Building Git' book enough.

The code is Ruby, but there's enough explanation for each snippet to be able to follow along in whatever language one prefers. I had no problems with translating to Go, for example.

1 - https://shop.jcoglan.com/building-git

Still more sane than JavaScript to me. Would use this over a 500MB Node.js implementation.

Is this a reference to something? I can't imagine it would take 500mb to reimplement git in JavaScript. This one's <75kb and full of explanations: https://github.com/maryrosecook/gitlet

It's a dig at the typical size of a `node_modules` folder. These often are very large, and often do contain several thousand files, largely due to transitive dependencies.

For anyone looking to use Git in JS, check out https://github.com/isomorphic-git/isomorphic-git. I've had great success with it and really like it as a library. It's API design is good and it's tree-shakeable so the size of the library is very reasonable, even if taking it as a whole.

Indeed, I've come to the same conclusions myself. I'm the resident git expert at my job but, quite honestly, the only thing that separates me is that I've learnt git's internals. But I learnt it because I'm lazy and it's easier, not because I want to be the best or anything.

So many people say they "know" git, but then I watch them work and it's "git commit -a" all the time and "git clone" when something doesn't look right. It's really amazing how people refuse to learn this essential tool.

Couldn't resist looking at it after your repeated warnings. It seems that shellcheck site has some suggestions to improve the code.

You know that Linus wrote first version of git in mostly bash, right?

Why even have the index? Just skip that step.

I wanted to have the staging area, and I had decided upfront that I wouldn't make the repository state inconsistent between shit and git. So the index needed to be done.

Also, you need to generate a tree out of something. Could just hash the entire worktree every time, but that would be pretty lame.

What I never fully understood is why the staging area isn't simply a commit that gets amended repeatedly as files are staged into it. Maybe just a tree pointer, since the rest of the commit data isn't available until commit, but you could fill in some placeholders. ("(staging)" for the message, current times for the timestamps, etc.)

(Note that index doubles for other functions like merging/conflict resolution, but I never thought that was a good thing, and could be separated out.)

I've wanted this for a long time, and also a frequently-amended working-tree-as-a-commit.

Why? I prefer a each branch to have its own staging area and working tree, which maps better to my mental model of "branch as an under-development feature".

Currently my workflow to achieve this involves a lot of stashing.

Do you know about git worktree? Because that sounds exactly what you want. Each worktree has its own index (staging area) and working tree.

What's the difference between a stash and a "working tree as a commit"?

The 'staging' area is implemented through the index. And the index is used for more things then just deciding what gets in the next commit. A lot of gits speed comes from caching Stat data of files so that it does not have to hash the complete working tree for each operation. That's not something you can just ignore.

Someone proposed splitting this up the other day, but even that would come at the cost of performance and an increase of complexity.

Wow, I just totally assumed the index was a tree object. It seems much less elegant that it isn't.

Maybe it isn't because it would necessitate creating a lot of blob objects as you staged and unstaged changes, which might not get garbage collected for some time. I can't see any other reason.

Love the name -- very fitting.

> I said that the internals were so simple that you could implement a workable version... inside of an afternoon. So I wrapped up what I was working on and set out to prove it.

Been there. Done that. With other things, not git.

I suspect many others here have too.

> Five hours later, it had turned into less of a simple explanation of "look how simple these primitives are, we can create them with only a dozen lines of shell scripting!" and more into "oh fuck, I didn't realize that ...". Then it became a personal challenge to try and make it work anyway...

Yep. Been there too. Done that too. Again, with other things, not git.

I suspect many others here have too.

This was done in the span of a few hours, and was committed to sr.ht using itself.

Drew even livestreamed the whole thing, but I don't think he's uploaded it to a PeerTube instance yet.

Where was it livestreamed?

He has a subdomain on his site that he uses for it:


You're not the right person to ask about this, but if Drew is around I would love to hear high-level details about what how this setup works and what the average monthly cost is.

I can see that it's open source[0], and I'm very tempted to copy it. I'm already in the midst of migrating all of my video hosting to peertube, but I don't have a solution I'm confident in for livestreaming other than Twitch -- especially because when in the rare instances where I do stream coding sessions they can go up to 5 or 6 hours, at which point archiving and storing that video starts to look a lot more costly.

[0]: https://git.sr.ht/~sircmpwn/live.drewdevault.com/tree

I don't use it very often. I just threw it up on a Linode with minimal effort so I could have a working live streaming setup. You'll note from the readme:

>This is the website for my self-hosted livestreaming platform (aka bag of hacks dumped into a server).

PeerTube is nice in theory but in practice it's been really really unreliable for me.

Why not simply stream to somewhere live Twitch/YouTube and have them serve the recording offline later on?

I try to avoid proprietary services.

You know, YouTube is pretty easy. I bet someone could implement it in an afternoon and a dozen lines of shell script...

Is this a challenge?

A weekend at most.

WORST—-and I mean WORST—case, could crank out a clone during a hack week.

Never tried yet for hosting content, however I've read good things about Lbry.


Ah yes, we needed another blockchain to solve this problem. Perfect.

The amusement here being that git itself is Merkel trees ;-)

(And yes, I've seen, and boosted, your Mastodon tirade, and am ... apprehensive in commenting here.)

Not many people know that these trees were invented by the German federal chancellor Angela Merkel.

(they were not, this is a shitpost, they're actually called Merkle trees)


Damned edit window...

What's wrong with blockchain? Honest question since I've never used them.

I'm definitely not the right person to ask, but:

Archival/storage of video shouldn't be that costly.

With 5400rpm drives (better for archival than more or less any other type of storage media, including faster hard drives), it looks like the going rate is about a United States cent per gigabyte. Two for 7200rpm drives from manufacturers that seem to produce the most reliable drives on the market, consumer-side.

A setup that could survive through a reasonable amount of drive failure, then, seems to be relatively inexpensive, so long as you're not trying to archive your video In The Cloud®.*

*Someone Else's Computer

Storage is cheap. It’s the delivery that’s costly.

Delivery being costly is a myth propagated by Big Cloud®. Any dollar store VPS that isn't DO will have more than enough for streaming video all day every day.

That's irrelevant, though, given the person's question was about storage and archival.

It all depends on the use-case/context. Hosting a single video with few concurrent views is cheaper to do on VPS. Hosting videos with short response time in any region, with high resiliency, etc. is likely cheaper to do on CDNs. It's not a myth. It's "general advice may not work for you".

> Hosting videos with short response time in any region

Clouds are indeed selling that, but I think that's false advertisement. At least from here (western Balkans) it looks this way.

Not sure what you mean by false advertisement. Australia-Netherlands (common European pop) connection is often >300ms from a home connection. Home in Australia to Sydney pop is likely <10ms. It makes a massive difference with many small resources, or restarted transfers. That's just physics at some point.

But does it really? I can see why a large company wants to squeeze milliseconds out of asset delivery, but as a watcher of a small independent creator I would have no problem waiting a second for the video to start playing.

Background: I run a live streaming start-up.

Latency directly impacts bandwidth, which impacts quality, since all current-gen user-facing live streaming protocols that matter (HLS, DASH) are layered on top of HTTP (on top of TCP), and that's already the best trade-off for end-user delivery today.

For VOD it's less of an issue since you can just maintain a larger buffer, but with live that's a trade-off with being closer to the live edge or choosing poorer quality. It works OK for some cases, it's bad for others (like sports, or when letters on the screen become illegible due to compression artifacts).

Building your own CDN off of el cheapo VPSs is theoretically viable, the beauty of HLS and DASH is they're 100% plain old HTTP, so just drop Varnish, add GeoDNS on route53 and off you go. Actually I'd love to have the time to try that :)

> Not sure what you mean by false advertisement.

Here, the roundtrip latency is ~14ms within the country (e.g. from here to capital city), and 40ms to the closest AWS or GCP datacenters (both are in Frankfurt).

I can get a 1gbps port from Online.net for $11/m with unmetered bandwidth.

Delivery is cheap. Big clouds just mark it up by criminally high 500%+.

That's already a lot given the expectations set by the name.

I once landed a job as a web developer in a marketing department where IT was gatekeeping production hard.

Although all the code was in git (the real git), deployment involved a magical shell script someone in IT had written ages ago. Only after a bunch of rocky, outage-causing deployments did we have the sense to start digging into this magic script.

It turned out it was just a bad, thousands of lines long re-implementation of git using perl and mysql that captured and stored diffs and rsync'd them to production. ...And this was well into the era of CI/CD and infrastructure automation tools.

Eventually, the company brought in new IT leadership that put an end to that kind of nonsense, which freed us in marketing to buy purpose-built PaaS for our needs.

Obviously do this kind of crazy stuff as you please on the side. But at work, build only what uniquely adds value to your company. For most, dev tools are probably pretty low on that list.

I really disagree, a small amount of customisation of tools can reap significant productivity gains.

I think your anecdote is really just an example of bad management, not bad tooling. Could easily be that some management doofus had prohibited git from being installed on the 'production' system.

I think the problem is the idea of 'tooling'.

A table saw by itself is mostly useless. With a fence and a miter gauge, it becomes useful. With a push block, stop block, subfence, outfeed table, infeed table, featherboard, crosscut sled, tenon jig, and dado set, it is the single most useful tool in a woodshop. Keep in mind it is still one tool and all those accessories are not "tooling", they are accessories to a single tool that increase what you can do with it. The tool always works the same way, and anyone can use it with any of those accessories in any woodshop in the world. In effect it becomes a new, larger solution, made up of many features that extend the utility of the tool.

That's not really what we have. Mostly what we have are jigs. A jig isn't an outfeed table or a crosscut sled. It's a hack for a particular job. If you need to make one specific cut 300 times, you nail together some scrap wood, dial in the miter gauge, angle the saw blade, and make your cuts. And the jig is scrap once again.

But in the heady new world of "DevOps engineering", the jig is now "tooling", and we pat ourselves on the back that we were able to nail some scrap wood together and claim it created business value. Of course, it's not a shitty jig like in the bad old days of shell scripts ("ha ha! remember when we were productive with this simple code that was portable and not gigantic or complicated? how foolish!"), because instead of making it out of scrap wood, we now make it out of scrap steel with a MIG welder. We're advanced now.

And I'll go further. The fact that most woodworkers make their own tablesaw extensions is illustrative of the problem: craftspeople like having fun with their toys. Is there value and experience and dollar savings you get out of making your own crosscut sled? Sure! But it'll also take you 1-2 days of buying parts, measuring, cutting, gluing, clamping, drying, aligning, and finishing. Any business with any sense should have paid $100 to just buy a complete crosscut sled made of aluminum with a good design that will last forever. But they are too dumb to notice they're spending an inordinate amount of time and money on craftspeople making jigs.

I wish the ghost of W. Edwards Deming would rise from the grave and call us what we are: bullshit artists.

If I could buy the software equivalent of spending "$100 to just buy a complete crosscut sled made of aluminum with a good design that will last forever", I would have done that.

That is arguably what P/S/SaaS is, instead of building bespoke or Chef'd instances. But I'm not one to reimplement something in F77 in Julia just to have something to blog about.

Regardless, sometimes a DSL is just what you need, and you'd better have someone who likes creating compilers do it. Otherwise it's like when builders do wood things without talking to a carpenter first.

I think you're saying that "creating home made tooling is a waste of time, and you should just use professional tools"?

In which case I agree, for almost all cases. A carpenter trying to build their own saw or chisel would be laughed out of the workshop, and rightly so.

However, the mark of a master craftsman is one who can identify when custom tooling are necessary and knows something about how to build them.

In my opinion, this is simply an over-extended metaphor. Programming ultimately is not carpentry, and custom tooling is justifiable in many more cases.

My wife is a woodworker by trade and this metaphor is on-point.

I return to my original point: if your company is working on something where "production" is novel/unique/a differentiator, then you probably need to invest a little time in how you manage and deploy to production (e.g. you need something that's more than just a jig, and you can't go down to the store and buy it because it literally doesn't exist).

There is probably a certain point in scaling an engineering org (maybe 50+ devs) where you inevitably have to devote some engineering time to this anyway (e.g. you adopted/bought a tool that requires non-trivial maintenance and customization).

If, on the other hand, you're working on something where production and deployment are a well-understood--maybe even commoditized domain--then you should direct your precious engineering time elsewhere.

Marketing website infrastructure has its nuances, but it's well-understood. CRUD apps that talk to databases are a similarly well-understood area.

>Mostly what we have are jigs. [...] Of course, it's not a shitty jig like in the bad old days of shell scripts[.] Instead of making it out of scrap wood, we now make it out of scrap steel with a MIG welder. We're advanced now.

Early contender for Best Analogy 2020.

Sounded like you traded one incompetence for another. Why in the world would you gatekeep your deployment on an external service?

I wasn't aware of the plumbing / porcelain metaphor in git until now.

It's fantastic to what lengths some of us go for a good pun :-)

There are two hard problems in computer science:

1. Cache invalidation. 2. Naming things 3. Off by one errors.

This person has #2 nailed down, hard.

Amusing, but less than the title suggests (no porcelain, no reliability properties, no error checking, etc).

Still, fun stuff.

Interesting terminology, I'd never heard of porcelain in the context of git.

In case there are others like me, this will save some clicking:


Am I reading that correctly? A porcelain command is one that's not supposed to be used in scripts, but the --porcelain flag is for when you do want to use things in scripts?

And then there's random stuff like "git status --porcelain", an easily-parsed version of "git status"

Why would you expect reliability and error checking in a shell implementation of git? It's clearly a mad bit of fun.

I wouldn't necessarily expect it in a "mad bit of fun", and I'm implicitly noting that that's what this is.

I completely require it in shell scripts in general. Good checking of errors is really a fundamental requirement of professional programming.

I initially thought it was "SourceHut gIT" not "SHell gIT"

I don't mean any offense by that - sourcehut is fine - just what popped into my head.

This is great, but I already use "shit" as an alias for "fuck".

Your home life must be quite unusual.

If there's anything you don't want to confuse, it's your shit and your fuck. Keep them separate, people. Keep them separate.

The world is big enough for everyone's kinks; I'll keep mine off yours and you return the favor tx

You can have whatever kinks you want, but you can't escape the germ theory of disease.

There's a Reggie Watts song about this..

HN wisdom FTW

Do you type dvorak? I have the same exact alias for that reason.

No, I could never get the hang of Dvorak.

I have it because when I make a command line mistake, I more often say "shit" than "fuck". It feels more natural to type what I'm actually thinking. :)

I’m just here to support the four people that have a sense of humor on HN.

To be fair, humor (on its own) is often detrimental to HN discussions. At least as top level comments.

But for a toy project like this, it seemed appropriate.

It seems that the logical next step after the birth of this monstrosity would be to create the next best platform: ShitHub

https://wyag.thb.lt/ If you want to hop the same train, I saw this long ago...

So if this is the git internals, shit is what goes inside the porcelain plumbing?

are there any other implementations in other languages? (preferably c++).

I use libgit2, but the code is a bit difficult to read and hack. I know there is a ruby book implementing from scratch.

I want to understand the git internals.

"write yourself a git" is an online book that walks you through implementing git in python.


I know of two, one in pure Ocaml and one pure Go.



Let's rewrite everything in JavaScript:


What a strange way to spell Rust:


(to GP: lots of links related to reimplementing git in there)

There is JGit implemented in Java


The WYAG book many have referenced is pretty good. Haven’t gotten through it yet, but I’m looking to do a Go implementation. I also want create my own SCM if I have the brainpower for it to test some new workflow ideas.

Please do. The one package I've used was missing features like "commit all files" last time I used it a year ago. https://github.com/src-d/go-git

Isn't this how Git was originally done?

It's slightly more complicated than that. The `git` command line itself was done in shell. Initially, `git foo` just ran `git-foo-script`, and then those were also written in shell. But the actual sha1 and packfile stuff was always in C from the beginning.

Source: https://git.kernel.org/pub/scm/git/git.git/commit/git?h=v0.9...

You might be thinking of a different DVCS called Arch, which was initially implemented as a set of shell scripts:


Impressive! Though very lacking on any kind of instructional use. "init" is easy enough to figure out, but there's no "add" step. Shame :)

Sorry, I wasn't expecting this to HN before I had written some porcelain commands. I pushed an updated README that explains how to write commits with this.

10/10 on wordplay

It's cute, but it negatively impacts my ability to effectively advocate for the tool in my workplace.

So... That's a very good choice of name then. You shouldn't advocate for it.

Part of the joke is that it's a terrible idea and you shouldn't use it

This is essentially adding a blob inside .git/objects by taking a sha1 hash of the data after appending a header "blog <length>\00<data>". Then the first two characters of the hash are created as a directory and the remaining 38 acts as the file name inside which the zlib compressed data is stored. Nice project for learning the git internals.

Well, it also includes creating tree and commit objects, and reading and writing the git index.

Yes, of course. I was mainly stating the starting point.

Try and market this shit

"Shellgrit" which I've heard used to recover from starting to say "shit" in circumstances where that isn't appropriate.

Better name and contraction of "shell git" ? Or maybe this is not something in common usage. Dunno.

i like the name so that's why i read it

wasn't the original version of git mostly entirely shell scripts?

that means i can put shit on my busybox.

I feel like the name was conceived before the tool in this case. Also, hilarious.

I don't think anybody should be targeting plain POSIX shell anymore. There are much richer shells available almost everywhere.

Unfortunately "rt" TLD is not a thing. Otherwise it'd be worthwhile to register sh.rt as an alternate domain for sr.ht.

What's wrong with people nowadays? Couldn't they come with a better name ?

Git is already an insulting term where I come from - I remember one of our children, years ago when they were young, looking at my screen and saying "why are you typing git ??" in a shocked sort of tone.

They're much less concerned about their language now.

I don't mean to overstate the case - it's not a swearword or the sort of thing you'd really censor, just a playground term for a mean person. All the same it's an ugly word and (however irrationally) this is one of the reasons I prefer Mercurial to this day.

Mercurial is also an insult. Both projects were started in response to a mercurial git’s actions.

I think the name is genius (baring in mind it is a personal project and not intended for production use).

I don't care about the negative/down votes, but this is literally shit. Imagine the communication:

* Here in our project we use shit, and it's very good. * do "shit pull" / "shit pop" * Sorry, I don't have shit in my machine.

But, again, english is a funny language.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact