I'm new to hash functions though, so I don't yet know why SHA-256 has been designed in the way it has (e.g. why exact numbers were chosen in the bitwise rotations).
As far as I understand, the Sigma functions promote diffusion of bits to help prevent collisions, and Choice/Majority/Addition help to make it a one-way function, but I'm not entirely sure. I'd be interested in learning more about the design if anyone has experience in this field.
I'm more interested in how they came up with the parameters for the sigma functions. I'm sure it's described somewhere.
... taught everyone else how SHA-256 worked...
This is an awesome intro. And now I also want to know more about the things you mentioned wanted to learn more about.
An exercise I go through constantly is figuring out how to explain a thing I think I know to a curious person with no relevant training. Often in the process I discover I need to go do more research or actually test things because I did not understand them as well as I'd maybe thought.
Mostly this is just an exercise. But every so often I actually get to use this in anger. A non-technical friend who works in a Computer Science department asked me on Facebook to explain a joke she'd seen which involved localhost addressing, and I was very pleased to be able to provide a concise explanation using analogies that I know hold up to scrutiny. Obviously a joke isn't very funny if you need it explained, and I can't fix that, but I can avoid the discomfort of her not understanding a joke other people are laughing at in her place of work.
Given the content of knowledge sharing sessions that I sit through and the convoluted nature of some of them, I wish that people recognized that presenting information is not all that is required to teach. You can understand something perfectly, but teach it horribly.
Ideas take time to digest and it's not accurate to say that if someone leaves a room not understanding what you've just shown them, you don't understand it yourself.
Or even without replacement. Teaching someone doesn't imply a successful result.
2) I did have to take a sip of tea and think about my life when I realised I was watching a video by a Welshman about mining.
3) Although having said this, 'Welsh Bitcoin Miner' is going to fit seamlessly into my West Country themed cyberpunk adventure 'Cider Punk'.
2) It's certainly a combination I never anticipated.
3) I look forward to the book.
"When DEK taught Concrete Mathematics at Stanford for the first time, he explained the somewhat strange title by saying that it was his attempt to teach a math course that was hard instead of soft. He announced that, contrary to the expectations of some of his colleagues, he was not going to teach the Theory of Aggregates, nor Stone's Embedding Theorem, nor even the Stone-Cech compactification. (Several students from the civil engineering department got up and quietly left the room.)"
I had the big picture from the top of the page, and each section gave me a little more insight into what was going on. Great use of animation, and also very clearly written!
I also like the testimonials.
I feel like the Japanese people. I don’t understand what you’re doing here but I am convinced it is very clever.
I worked at an American company that made utility software for DOS/Windows. A major Japanese distributor liked our stuff, and made a distribution deal with us. They also helped with localization, and suggested feature changes to better fit what Japanese consumers wanted. Later this expanded to where they would also suggest new products they wanted for Japan and we'd develop them.
One product under development was very rarely causing a system lockup during installation. By "lockup" I mean it appears completely dead. No mouse motion. No keys worked. Hitting "CAPS LOCK" would not even toggle the light on the keyboard. Hard reset seemed to be the only way out.
It was rare enough that we always saw the install attempt after that hard reset work.
We just could not figure out what was causing this. The Japanese distributor decided we should go ahead and release, and their tech support people would deal with any complaints.
So we released...and their tech support got a lot of calls. People were hitting it a lot more than we had seen in testing.
...except they were not calling to complain about their system locking up. No, they were calling to complain about a slow install.
Apparently, when it became completely unresponsive all you had to do was wait 20-25 hours and it would complete the install.
I cannot imagine any American consumer whose PC has become completely unresponsive going 20-25 hours without giving up and resetting the thing. Not only were there such people in Japan, there were a lot of them. We didn't get one complaint about a lockup--all of the many many people who called called about it taking around a day to finish.
(Oh, and the knowledge that it was not a complete lockup was enough of a clue to let us figure it out. It suggested it was some sort of timeout, not a lockup. Our product needed to know what optical drives were available, and it turned out that the way it scanned for them did not get alone well with one particular brand of optical drive controller card, which could trigger about an hour timeout per drive checked for. We switched to a much more cautious drive scan, and the problem went away).
Our sales people traveled to a key meeting with execs at the Asian division of a multinational corporation (the potential customer). They did a big demo to show the software working in Chinese, hoping to impress the execs with how the software was ready to go.
The execs awkwardly said essentially, "Hey, that is probably great, but even though we handle the Chinese market, we don't speak Chinese; it's only our customers that do. So we couldn't tell what your software does, and we wouldn't be able to use this. Do you have an English version?"
That said, the purpose of this talk is about the mechanics of the function, and not its properties or how to use it safely. So don't let that detract from what is, really, an awesome presentation.
AFAIK the contrary is invalid, so that "not every output is the result of one and only one input".
Generally when people make this claim, what they're actually referring to is what's called Collision Resistance (CR) and/or Weak Collision Resistance (WCR), which instead make claims on difficulty of finding such collisions (of which infinitely many exist).
WCR, necessary for almost any cryptographic use, states that for any given input it should be difficult to find a different input which hashes to the same value. CR, generally desirable for cryptographic hash functions, states that it should be difficult to find two different inputs such that their hashes are equal. CR implies WCR, but WCR does not imply CR -- for example, SHA-256 (currently) exhibits CR but SHA-1 only exhibits WCR.
"So every piece of data in the world has its own unique hash digest."
This is false for the reasons apeescape describes: every piece of data in the world has its own hash digest, but these hash digests are not unique.
I'm saying that for a layman explanation, it's reasonable to say that hash outputs are unique. Because following that with "technically, it's more 'practically' unique, theoretically there are collisions but you won't encounter them with probability > 2^-256" (or whatever it is) just confuses the topic to them more than just summarizing. You have to admit that most people won't go on a 200h adventure to learn about the state space of 256+ bits and how to conceptualize tiny statistical probabilities, so there must be a point where you have to cut the explanation to an approximation of the truth. This is true in every field.
The air molecules in the room you're in can't all gather in one corner because that's not possible, it's forbidden by conservation rules.
But they won't gather in two opposite corners only because that's so tremendously unlikely, it would be allowed by conservation but statistically it's ludicrous.
The same is true at the opposite end of the spectrum. Almost all real numbers are normal (in all bases) but the nature of "Almost all" in mathematics is different in an important way from "All" and I want people to grasp this difference when I'm discussing properties of numbers. It definitely is not true that all real numbers are normal, you probably rarely think about any normal numbers at all.
I agree. I think this wording would be better than in my previous comment, what do you think?
it's reasonable to say that hash outputs are *almost surely* unique
You could have said exactly the same thing about MD5 right up until you couldn't. Then you could have said "oh yeah well MD5 is broken, but it's safe to assume you'll never find one for SHA-1", right up until we did. So if you say "oh yeah well SHA-1 is broken, but it's safe to assume you'll never find one for SHA-256", I disagree.
It would be one thing if collisions in hash functions were found by just repeatedly hashing things until you find a collision. If that were the case, then yes, I'd agree with you on those 1-in-2^256 odds, at least for a while. But by and large, that's not what happens. Over time, weaknesses are found in algorithms which allow you shrink the search space, which significantly changes your odds.
Had no idea about the prime roots and multiplication, that's pretty clever too.
I kind of hope there's some nuance to this story and it didn't quite happened in the way you described.
If they think it's "polite" to let a person talk for an hour without understanding a word of it (while nodding and smiling), and then have the nerve to call it "intelligent", that's an insult to both the presentor and the concept of intelligence.
Cultural differences notwithstanding, the idea of "completely wasting someone's time while pretending to pay attention" shouldn't be too difficult to empathise with.
This also has potential for landing in many movies where a hacker at work is involved.
I'm halfway through, but looks very well done, thanks!
If someone were to make a slower video explaining it with all the sections from this dissection, that’d be even more awesome.
Edit: Seems like the author created a video for this — https://www.youtube.com/watch?v=f9EbD6iY9zI (thanks to 1_player’s comment at https://news.ycombinator.com/item?id=23165906)
On a tangent, here’s an animation (by someone else) explaining AES encryption — https://m.youtube.com/watch?v=gP4PqVGudtg (thanks to harrigan’s comment at https://news.ycombinator.com/item?id=23165821)
Also, some people have been sharing a video explanation of AES/Rijndael. It is actually rendered from someone's Shockwave Flash movie. https://web.archive.org/web/20051124061027/http://www.cs.bc....
Also, I wanted to see how you updated the bit strings in place and was surprised not to find anything special. It looks like you use nothing but puts statements, which, as I understand it, simply add a carriage return at the end of a string but how then does the cursor move back up to the first line? Can some explain the technique being used here?
ruby sha256.rb sizzzzlerz enter
I didn't do anything special for the terminal animation. I just work out the current state of the hash function at each step, clear the terminal, and print the entire state back to the screen. I know it looks like I'm directly manipulating each individual bit in your terminal, but really I'm just redrawing your screen.
I'd like to do something more practical in future, but for this I just did what I needed to do to get an animation working. Still, every bit you are seeing is correct.
Hash functions use bit-wise operators. This is a world where "integers" are simply convenient fixed-sized blobs of bits and all bits are equally significant.
Addition is an arithmetic operation. Bits have significance in this world. The state of the input bits have more "influence" over the left hand bits of the result than the right hand bits.
Look at abcd+efgh=ijkl (Ignoring any overflow'd bits per SHA-256.)
To find the value of bit l, you only need to know bits d and h. You need to know all eight input bits to know the value of bit i.
I like this one better though.
He has opensourced his animation engine "manim" used in his videos: https://github.com/3b1b/manim
I hope you get lots of requests to include it in teaching materials.