Hacker News new | comments | show | ask | jobs | submit login
Note the commit hash (github.com)
86 points by Xlab 1323 days ago | hide | past | web | 29 comments | favorite

Here's a patch I submitted purely for discussion to add a feature to Git to allow for vanity commit hashes back in 2011, it generated some interesting discussion: http://lists-archives.com/git/756392-choosing-the-sha1-prefi...

Unlike the linked it didn't alter the commit date, but altered the Git code itself to add a new custom header to the commit object.

Git handles custom headers just fine since it ignores unknown headers for future compatibility. This commit still lives in the main Git repo at work without any issues:

    commit 313375d995e6f8b7773c6ed1ee165e5a9e15690b
    tree c9bebc99c05dfe61cccf02ebdf442945c8ff8b3c
    parent 0dce2d45a79d26a593f0e12301cdfeb7eb23c17a
    author Ævar Arnfjörð Bjarmason <avar@booking.com> 1319042708 +0200
    committer Ævar Arnfjörð Bjarmason <avar@booking.com> 1319042708 +0200
    lulz 697889

Testing it out:

  > git commit -F message
  Try 0/4000000 to get a 1337 commit = 3650e08c9e1ecbbeec83daf7a959e3edcf15bd4f
  Try 100000/4000000 to get a 1337 commit = 3952f7d5035f5e88f66aa5c70e5cc11fdd734852
  Try 200000/4000000 to get a 1337 commit = 51c910f5d535c515a04796eb7c7a70cbd2325599
  Try 300000/4000000 to get a 1337 commit = d70c3e64b1d963461a6ee2f518c613483b979d68
  commit id = 313378458f8c4fb53c808f4b0bae5bf71ba5e23b
  [master 3133784] 1337 Test Commit
   1 file changed, 60 insertions(+), 35 deletions(-)

Right, and now you can use "git show --pretty=raw" to see the commit header:

    $ git show --pretty=raw 313378458f8c4fb53c808f4b0bae5bf71ba5e23b | head -n 10
    commit 313378458f8c4fb53c808f4b0bae5bf71ba5e23b
    tree 7e93df01bfc9c187d58a0b96e756dd8ac0031c82
    parent e4eef26d985177e4bdd32bf58b6ae40e7ae67289
    author Spencer Creasey <screasey@monetate.com> 1396872901 -0400
    committer Spencer Creasey <screasey@monetate.com> 1396872901 -0400
    lulz 843475
        1337 Test Commit
There's replies in that thread where the naïve technique I was using was improved a lot.

If you're wondering how he did it, Brad wrote a tool called 'gitbrute'[1] that (as the name suggests) brute-forces the prefix of the hash to whatever you like.

[1]: https://github.com/bradfitz/gitbrute

This is a fairly popular thing to do with bitcoin addresses as well; keep generating keys until you get one with a recognizable prefix in the address.

Tor hidden services also do this. (The address of a hidden service is a hash)

Not just for vanity, either - it makes phishing a hidden service harder: if users know the .onion for Agora starts with 'agora', then a phisher has to invest weeks of compute-time just to get a plausible .onion to start his phish with, rather generate than any old .onion in a millisecond.

Neat. I remember Stripe did something similar with their CTF challenge

Yep! There were a bunch of us who wrote GPU "miners" (mostly OpenCL). I was intending to open-source mine at one point but never got around to it.

At the end of the challenge, the network was hashing ten digits (i.e. 000000000 prefix) in just a few seconds. Here's one of the rounds: https://github.com/pushrax/round660/commits/master

I wrote this a little over a year ago, only (for the lulz) in Node rather than Go:


Mine allows using a custom word list in the commit message for the nonce.

Judging by this commit, I'm guessing gitbrute uses miniscule variations in the commit time instead.

EDIT: yep: https://github.com/bradfitz/gitbrute/blob/master/gitbrute.go

It is almost certain that there are a quarter-billion published Git commits by now. (GitHub says there were 150 million pushes in 2013 alone, and many of those have multiple commits.)

This means it is likely that, purely by coincidence, someone has at some point had their commit labeled as (badc0de).

Github doesn't seem to allow searching revision hashes, but presumably Google has most of Github crawled by now, and I didn't spot any commits with that as a label: https://encrypted.google.com/search?num=100&q=badc0de%20site... (more hits than I expected though).

Google doesn't appear to match partial hashes. I searched for a random GitHub commit hash, and it showed up, then removed one character, and it didn't.

If you switch to verbatim there are only about a dozen.

I think the odds of an accidental collision with badc0de are pretty low.

Each hex character represents 4 bits. That means that a 7 character string is 28 bits. That's about 268 million possibilities. On average, it would take around 134 million commits to get one that started with "badc0de".


    1 - (1 - 1/16^7)^250000000 ~= 61%

I thought the commit hashes had to do with the actual code being checked in. There's a random component to them as well?

Not random.

A Git commit is one kind of "object" in git. Objects in Git are hashed like so:

SHA1("[objecttype] [objectlen]\0[objectdata]")

and a commit object looks like this (blatantly stolen from the Stripe V3 CTF):

    tree #{tree}
    parent #{parent}
    author CTF user <me@example.com> #{timestamp} +0000
    committer CTF user <me@example.com> #{timestamp} +0000

    Give me a Gitcoin
The "tree" in the commit is the hash tree reference that actually points to your code.

You can probably cycle through a couple million milliseconds in the various timestamps involved to get a large selection of hashes to pick from without making your commit stand out.

Iterating through the author timestamp gives you a bit more than 16 bits for one day. Then you may wonder how much freedom you want to take with the committed timestamp; it should be later than the author timestamp, but if it's off by a few hours to a day it might still be fine, which gives you another ~14 bits, and you're already at 30 bits. Throw in flippable punctuation (i.e., in the commit message, do you write deadbeef as a single word or not? Do you add a full stop or not?) and perhaps some other variations of the commit message and file contents, and 32 bits of nonce is actually fairly easy to get.

It's still a very cool demonstration of why you really need to compare every single bit of your hashes.

No, but the hash also covers the commit message and meta data. Here, it was fiddled with the commit and author date for the desired effect.

Talk about how it was done, and a pointer to the code that was used: https://plus.google.com/115863474911002159675/posts/RT2Tvb1w...


they're hashes of the /commit/, which is the code + metadata including time + previous commit(s).

Is this something like the Bitcoin vanity addresses? Partial sha1 collision?

Yes. You can alter the content of the commit freely, so it's pretty easy to just fuzz the data until you get a commit you like.

This was the premise of Stripe CTF V3's second challenge ("Level 1: Mine me a Gitcoin") this year.

Yes, it's just as funny as cafebabe... http://en.wikipedia.org/wiki/Hexspeak


> In the talk about git in 2007, Linus said something about git being secure — that if someone would attempt to alter git history (as in attack that happened at Linux repo before), it would be instantly noticed because of hashes.

> Does this mean that compromising someone's git repo without anyone noticing can actually be done?

No. Brute forcing 4 bytes of the commit hash is something different than brute forcing 20 bytes. Because you know, exponentials.

By the way. That's pretty much what you're doing when you're mining bitcoin.

Edit: added parents comment, because deleted and i see no reason to delete this, others might ask the same question.

On a similar topic, this is the reason why some people urge gpg users to use long key ids (8 bytes) rather than short key ids (4 bytes) (see http://www.asheesh.org/note/debian/short-key-ids-are-bad-new... for example)

Thanks. I guess I just got so accustomed to working with shortened commit hashes in source control that I almost forgot that they have full versions.

And thus began, gitcoin.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact