Bruteforcing the Devil

lunixbochs · on June 6, 2014

I came up with a mutation-based solution to this. Supply any string and it finds a valid hash in as few edits as possible.

Example: http://bochs.info/img/mutation-20140606-024906.png

One could definitely optimize this to be less destructive and produce more pronounceable results. It's basically two pieces: an engine for suggesting mutations, and a simple algorithm to score and pick mutations. Changes to either half (vowel distribution, ngrams, etc) could result in better strings.

(fyi, this kind of attack is a big reason to use cryptographic hashes: http://en.wikipedia.org/wiki/Cryptographic_hash_function)

passcod · on June 6, 2014

When Pfiffer told me peh would post that on HN, my reaction was something like "Fine by me, but I'm pretty sure I'm going to get schooled." Looks like I was right.

I got schooled, I'm learning, next time I'll be more clever about it, thank you!

_ofdw · on June 6, 2014

Aw guys, don't go start posting on their site (which I won't link) with devil names. Everyone knows communities turn to shit when they get too big and 300 different users posting all with blank icons is going to kill the fun for them.

I'm not even a member of Merveilles but that makes me sad for them.

gone35 · on June 6, 2014

Well; for what ostensibly is a secretive, millenial "circle of artists and wizards"; that is an awfully non-cryptographic, weak hash function. Apparently (variants thereof) are commonly used in some JavaScript circles to mimic a popular implementation of Java's String.hashCode() method [1,2,3].

[1] http://werxltd.com/wp/2010/05/13/javascript-implementation-o...

[2] http://stackoverflow.com/questions/6122571/simple-non-secure...

[3] http://en.wikipedia.org/wiki/Java_hashCode()

makoConstruct · on June 6, 2014

You expected something more esoteric? I think it's a matter of perspective. Sure, as a crypto challenge, it's weak. But if you think of it as our standard UI for configuring one's user icon...

sounds · on June 6, 2014

The icon concept was a fascinating thing to see.

makoConstruct · on June 6, 2014

I don't think it's going to be a problem. Before anyone can post anything to merveill.es, they have to figure out how the heck anyone posts anything to merveill.es, which will take some lateral thinking.

pedrox · on June 6, 2014

This hash is very weak. You can actually find many preimages of a given hash value in seconds with a meet-in-the-middle attack: https://gist.github.com/pedrox/eb8d674bf2b8be63da0f

pedrox · on June 6, 2014

Also, this hash function is linear so it has the equivalent substrings property. One can take advantage of it and be able to generate preimages even faster. Take a preimage and find m equivalent strings for n substrings of it. Replacing the substring with their equivalent will get you n to the m-th preimages that hash to the same value.

lunixbochs · on June 6, 2014

You can brute force it insanely fast by iterating from the rightmost character and caching the entire hash prefix.

This hash doesn't scramble input very well, so you can fiddle with individual characters to converge on any desired hash value.

Record the "closest" hash value to your target generated by this loop, apply (only) that character change, repeat. If the hash value stops converging, add a character. This naive version pretty often does the job in 5 full iterations or so, which means (5 * len(s) * len(alphabet)) = maybe ~3000 total hashes to get a solution.

    for i in xrange(len(s)):
        copy = s
        for c in alphabet:
            copy[i] = c
            diff = abs(hash(copy) - 666)
            best = min(diff, best)

oneiric · on June 6, 2014

Is it supposed to be obvious where this #Merveilles community with its icons is on the web? Am I unaware of a whole type of communities like this?

percentcer · on June 6, 2014

Wasn't too hard to find. If it were obvious it'd kinda take the fun out of it.

thaumasiotes · on June 6, 2014

Well, judging by the link in the article...

anon4 · on June 6, 2014

No. Yes.

panarky · on June 6, 2014

  I’m half tempted to buy a few hours of highcpu AWS compute power and get
  it done nowish instead ... I set myself a $50 spending limit, which
  gave me about 24 hours of compute on an instance with 32 virtual cores

The price of a c3.8xlarge with 32 cores and 60 GB is currently $0.28 in us-west-2.

You could get 178 hours of compute for your $50 budget.

passcod · on June 6, 2014

This is for a spot instance, which could get stopped at any moment if the spot price increases, and that would be pretty bad for that task as I could lose computation results. The on-demand price is about $1.60 an hour. But wait! I'm not using the same dollar, I live in New Zealand, which makes that about NZ$1.90 an hour, or NZ$45 for 24 hours.

user24 · on June 6, 2014

I think shoruzorhorheugogeuzudeazaeon actually sounds a bit more like a demon than the other candidates.

ianamartin · on June 6, 2014

Damn. HN usernames have to be 15 characters or less. :(

makoConstruct · on June 6, 2014

WOW my C++ solution is horrible. It's as though I'd just ignored everything I've learned about Doing Things Right in C++ Post 2010. Such is hacking, I guess.

foxhill · on June 6, 2014

i was intrigued by this, so i done it in C.

on my 4-core MBP (2.6ghz ivy bridge) i can manage ~1.8 billion hashes per second.

i could parallelize with OpenCL, but i think this is enough. after a few minutes, i get ARbyhlf as a valid name (although i don't know if this is actually valid.. but it definitely might be)

(removed)

edit: just realised that my nonce calculation was wrong.. http://pastebin.com/bcHcECPJ

passcod · on June 6, 2014

Most of the time is spent on generating the random string, which was because I very quickly realised looking through the entire possibility space in order would be unproductive. Granted, my entire approach was unproductive, as shown by linuxbochs, pedrox, and others below :)

I actually did some micro-benchmarking in the midst of all that and found that hashing a single, static, string took something of the order of 20ns. That's half a billion hashes per second, or 2 billion on four cores. Add a little overhead for string generation, and yeah, we get the same number.

foxhill · on June 6, 2014

> Most of the time is spent on generating the random string

when i took out string generation, i went from 1.2 GH/s to 1.8GH/s (although, i'm assuming my string generation was much simpler)

> I very quickly realised looking through the entire possibility space in order would be unproductive

you say that, but it's not as unproductive as you might think. assuming the hash function is uniformly distributed (hah), there are 4 billion hash values that could result from a string. if we can check 2 billion a second, we should see a result every few seconds. maybe my search space pruning was particularly bad, because i see on average ~4-10 a minute, and they are usually quite clustered together. this suggests the hash is not uniformly distributed (although a cursory glance at the function should make it obvious to people with a maths background).

anyways, it was good fun to poke around :) cheers for sharing!

lifeformed · on June 7, 2014

Woah, if you zoom in onto this site, the letters slowly shift apart over time.

verroq · on June 6, 2014

Or you can write it OpenCL and get a result near instantly.

lunixbochs · on June 6, 2014

OpenCL would be way overkill for casual brute force. I get several solutions a second using a C hasher on a single CPU core of my laptop.

Maybe there's an OpenCL use for finding extremely optimal usernames, though.

ingenter · on June 6, 2014

Oh, the joys of generating a recognizable tripcode.