
How Hash Algorithms Work (2007) - jjoachim3
http://www.metamorphosite.com/one-way-hash-encryption-sha1-data-software
======
koolba
The first section is wrong (emphasis mine):

> A hash function is simply an algorithm that takes a string of any length and
> reduces it to a _unique_ fixed length string.

Hash functions strive for uniqueness but unless it's precalculated to ensure
that it's true (by hashing every combination or deriving the parameters of the
hash function accordingly), it's not guaranteed. A _cryptographic_ hash
function gives a high probability of uniqueness but again it's not guaranteed.

> The word 'cat' will hash to something that no other word hashes too, but it
> will always hash to the same thing.

Say I have a (terrible) hash function H(X) => 1\. Now "cat" will hash to the
same value as the string " _I don 't understand hash functions_".

~~~
bquinlan
When I read that sentence, I assumed that the author meant that the computed
hash for an input must be deterministic.

But she doubles-down on the uniqueness claim later:

 _Each hash is unique but always repeatable The word 'cat' will hash to
something that no other word hashes too, but it will always hash to the same
thing._

This is such a large misunderstanding of hashing (as well as being obviously
impossible) that it is hard to trust the rest of the article.

~~~
analognoise
This is why I love HN; I either discover a great resource, or know what to
avoid and why, and I almost always learn something in the process.

------
rkda
>The word 'cat' will hash to something that no other word hashes too, but it
will always hash to the same thing.

Don't hashing functions have collisions?

~~~
hannob
> Don't hashing functions have collisions?

They do. The text is somewhat misleading and not properly explaining that.

All hash functions have collisions. But from a cryptographically secure hash
function we expect that nobody is able to find such a collision. They exist,
but the computational power to find one is not available to humans.

~~~
winston1984
>They do. >All hash functions have collisions.

This is wrong. There is something called a perfect hash function:

[https://en.wikipedia.org/wiki/Perfect_hash_function](https://en.wikipedia.org/wiki/Perfect_hash_function)

>a perfect hash function for a set S is a hash function that maps distinct
elements in S to a set of integers, with no collisions. In mathematical terms,
it is a total injective function.

They are very handy for hash tables with constant worst-case lookup time.

~~~
halomru
While very useful, you can only construct a collision-free hash function if
you know all possible inputs. Otherwise perfect hash functions can only give
guarantees over the frequency of collisions.

In the more general case, for a hash function with n bits output, the pigeon
hole principle demands that we have a collision at least every 2^n inputs.

~~~
chris_va
Though 2^n could be much larger than the number of items in the observable
universe fairly quickly.

------
stygiansonic
This is really a walk through of the SHA-1 algorithm.

It's also worthwhile to note that the statement that a hash takes a string and
reduces it to a fixed length string is a little misleading. They really work
at the binary level and this is seen in the example where the input is
converted to binary assuming ASCII and the output hex encoded.

~~~
Ar-Curunir
you could easily define hash functions that work over A..Z if you wanted;
there's nothing special about that.

The misleading part is about the reduction to a unique fixed length string;
that's not possible unless the input domain is equal to (or smaller than) the
output domain (and even then it's not necessary). Any other function is
guaranteed to have collisions.

------
jdwyah
"A Comprehensive and fundamentally innacurate guide"

Saying they're unique is just very very wrong.

~~~
jm0dotcodes
I can subscribe to this statement. I found that i don't get the same SHA-1
hash of 'test' as he did.

$ echo 'test' | sha1

4e1243bd22c66e76c2ba9eddc1f91394e57f9f83

:-P

~~~
Sholmesy

        echo -n 'test' | sha1
    

will give you the same output as the article by removing the new line.

    
    
        a94a8fe5ccb19ba61c4c0873d391e987982fbbd3

------
mbel
The title probably should be 'Cryptographic Hash Algorithms'. The definitions
from the post are approximately true for cryptographic hashes but not really
for hash functions in general.

~~~
bogomipz
Does anyone have a "more" comprehensive guide they can recommend? There were
parts of this I really liked but some of the "why" left me confused.

------
DanBC
It's a nice article, although they probably need to say that they're talking
about cryptographic hashes earlier on, at least mention that some hashes are
very easy to find collisions with.

------
tannranger
So there's a lot of talk about how encryption algorithms relying on the
difficulty of factoring primes could be weakened by quantum computers in the
near future.

Are there any technological advances or scenarios where the security of hash
algorithms could be weakened (other than computers just getting fasters via
~Moore's Law).

~~~
lindig
Surely you mean factoring numbers _into primes_ as primes cannot be factored.

~~~
tannranger
Right :X

------
Curious42
As someone extremely new to this; can this procedure be worked backwards to
retrieve the original text? If no, why not?

~~~
seanwilson
You can't retrieve the original text because information is lost in the
process and many inputs hash to the same value. However, if the range of
inputs is relatively limited, you can try hashing inputs until you find the
right hash (see rainbow tables for discovering user passwords from the hash
value).

~~~
_coldfire
>many inputs hash to the same value

?

The chances of a sha256 collision is essentially zero barring a vulnerability
being found in sha2. Far more likely for a comet to wipe out earth in your
lifetime.

~~~
xyzzyz
Chances of practically finding a collision are indeed really small, though as
you can easily calculate, there exists a sha256 hash value such that there are
at least 2^256 different 512-bit long bit strings that map to it (and actually
most of them should have this property).

------
bogomipz
I have a question about Step 5 in the post, it states:

Is "Step 5: Add '1' to the end"

Is this a delimiter for beginning of the padding or does it server some other
purpose?

------
natch
>Finally, if I were to give you only
'a94a8fe5ccb19ba61c4c0873d391e987982fbbd3' and tell you that it came from the
SHA-1, you should have absolutely no way to figure out what was put into the
function to create that.

add a "rainbow tables" caveat to that.

------
tomlx
Title is missleading. From the title I expected something about how to design
a hash algorithm, but the article is just a walk through the specific
operations SHA-1 performs w/o further explanation.

Can anyone recommand resources about the actual design of (cryptographic) hash
algorithms?

------
Zash
A hash function takes variable length input and returns a fixed length output,
that's all. Then there are sub-categories optimized for things like use in
hash tables or building blocks in crypto, all with varying emphasis on
uniqueness, output size and speed.

------
libruary
Would someone be able to ELI5 step 6 for me? I don't understand the math
needed in order to determine that 399 zeros needed to be added.

~~~
nolepointer
49 + 399 = 448 448 mod 512 = 448, because 512 goes into 448 zero times with a
remainder of 448.

~~~
libruary
Thanks! Notice the error in their second example?

------
glidek
What is the name of the hashing algorithm broken down in the article?

------
zebra1832
Title is very misleading. Only one hash function is presented. Afaict the
presented hashing function is not even named. Is it SHA1? No motivation is
given, just (pseudo)code.

