
Show HN: Beamsplitter – a new possibly universal hash - ohvirginia
https://github.com/cris691/beamsplitter.git
======
owenmarshall
> The default S-box

> This was obtained from random.org by requesting 8,192 random bytes, as were
> all S-boxes tested so far.

[https://en.wikipedia.org/wiki/Nothing-up-my-
sleeve_number](https://en.wikipedia.org/wiki/Nothing-up-my-sleeve_number)

~~~
barbegal
It would have been far better to select numbers generated by the NIST
Randomness Beacon [https://beacon.nist.gov/home](https://beacon.nist.gov/home)

And whilst you can sort of selectively choose which values to take from the
beacon, it should reduce the ability to add a backdoor.

~~~
owenmarshall
There are plenty of digits in pi.

If the hash is secure independent of s-box selection, I'd _much rather_ bet on
pi being normal than "the NIST beacon values aren't generated by AES in CTR
mode" ;-)

~~~
knodi123
> There are plenty of digits in pi.

Yes, but

"These fears can be allayed by using numbers created in a way that leaves
little room for adjustment. An example would be the use of initial digits from
the number π as the constants. Using digits of π millions of places after the
decimal point would not be considered trustworthy because the algorithm
designer might have selected that starting point because it created a secret
weakness the designer could later exploit."

~~~
skykooler
So why not just use the first 8192 bytes of Pi?

~~~
fdupress
Because they are known in advance and you could design to exploit their
structure.

~~~
ohvirginia
this was not meant to create some mystery around the included s-box, though I
get that it does do that.

the funny thing is the very fears that are being promoted about this are in a
way, sort of exactly the weaknesses that this parameterisible family of hash
function was designed to secure against.

I mean people are afraid that there's somehow malevolent design floor but that
could be true in any hash function with this you can use the structure to
create your own hash function but bringing your nest box which to me at least
greatly reduces the idear that there's some sort of exploit that could be
persisting.

anyway, that unintended mystery is not bad at all in I'm my opinion. it's fun
to watch people suspect byes I got from random.

it's also flattering because I think the skill required to create some sort of
crazy exploitable sbox is way above me and way above the level of skill
required to create a very good hash function.

people thinking that was my plan, hear this, it does not sound like a very
smart plan to spend all that effort creating one amazing exploitable sbox that
looks random but then at the same time say and even encourage people to use
their own sbox.

I don't feel the suspicion of the sbox being bad actually requires any defense
of it, because it seems just ridiculous to me, but I do think it's interesting
to point out, like, that sort of a plan suspected doesn't really make sense.

I'm not saying the people who have such suspicions are ridiculous at all. they
just haven't thought it through, I think and I understand the instinct to
paranoia especially directed at works in this space. I think it's a fairly
appropriate instinct. you just need to think things through.

the point was by using an s-box, you can bring your own s-box, to allay (or I
guess create) such fears about exploitable designs, and create your own hash
function.

some thoughts about how to do that I invite in the readme. I'm not prescribing
rules. pick your own, pick whatever you like. The point is you can make your
own hash function that will probably be a good hash function. I definitely
think you should test it with smasher, or whatever, to make sure it doesn't
have any kind of flaws. I'm fairly convinced, after testing a few random
boxes, you'll be highly likely to make your own good hashes with this.

~~~
owenmarshall
> I'm fairly convinced, after testing a few random boxes, you'll be highly
> likely to make your own good hashes with this.

[https://www.schneier.com/crypto-
gram/archives/1998/1015.html...](https://www.schneier.com/crypto-
gram/archives/1998/1015.html#cipherdesign)

Read and internalize this.

------
remcob
I'm confused. Supercop is a benchmark for cryptographic hash functions, but
SMHasher is a test for non-cryptographic hash functions. The use cases list
cryptography, but also universal hash functions which are generally not
crypto-grade. It compares itself to the SHA hashes, but only has 64 bit
output.

Is Beamsplitter supposed to be cryptography grade or not?

~~~
hackcasual
It's not, at least not now it seems. It's just seeing if you can use an S-box
design to create a "universal" hash.

------
api
I should warn any reader not to use this or any other novel cryptographic
algorithm in production. Don't use anything crypto in production until it has
been very heavily analyzed for years by professional cryptographers.

~~~
underdeserver
This.

If you want me to use your hash function, show me 2-3 independent analyses
from independent researchers.

------
lalaithion
This hash is only 64 bits, so if you have more than ~10^9 or ~2^30 elements
being hashed, collisions will become an issue.

Granted, most people don't have this problem, but it's not nobody.

------
cordite
The source is using `.cpp`, though it does not appear to be using any C++
features.

Would it be reasonable to move to `.c` so that it can be integrated in all
sorts of things?

Aside, when something is Apache licensed, and someone wants to make, say an
Erlang NIF with something, what effects does that embedding have on the NIF
library and users of the NIF library?

~~~
ohvirginia
what would be a better license to use to encourage people to use it?

also good point about CPP I will change that.

~~~
cordite
MIT and BSD licenses are very embeddable, but I am not familiar enough with
Apache licensing when it comes to embedding. This is why I ask.

~~~
ohvirginia
Switched to MIT now

------
randtrain34
can someone ELI5 why this is useful over existing hash functions?

~~~
rclayton
This is how I feel when people start talking about cryptography. Definitely
feel my university underprepared me on this topic. :(

~~~
google234123
That's the wrong attitude. Universities are a place where you should be much
of the learning yourself. There is not enough time in a class for a lecturer
to recite every word or idea that is present in a large textbook but there is
definitely enough time outside of class to read it.

~~~
zdragnar
> Universities are a place where you should be much of the learning yourself.

The professors in the first three years of my schooling definitely did
everything wrong, then. Passing and failing classes had next to nothing to do
with independent learning.

~~~
google234123
Passing classes is just a small part of what it means to attend a university.

------
_Microft
_I see that you ran out of particle names for your projects. May I introduce
you to super-symmetry, then?_

What's the issue with picking names that do not exist already? It has got the
upside that millions of webpages will _not_ appear in the results when people
are searching for your project's name.

~~~
ohvirginia
Is this a reference to something?

I sort of get the feeling you're using voice, but you're the one speaking.

I don't get the italics section.

Also are you suggesting I pick another name? It sort of seems like you're
replying to a comment, but this comment appears at top level.

If you can explain more I'll appreciate it. Thanks

Also, please suggest a name if that's your thing. I'm thinking 'metahadron'
goes with the voice section.

~~~
_Microft
No, there are no references in there.

The italis section is just a thing I add to comments sometimes: a few
sentences that loosely relate to the thing I am talking about. It can be a
quote, an imagined dialog, a flippant comment, ... other examples are at
[0][1].

I was annoyed.

I'm physicist and computer projects have the annoying custom of picking names
from physics, engineering or what else. Other people also come up with their
own names, why should not computer tech people also do this?

Atom editor, Electron framework, Neutrino.js, Crankshaft, ...

[0]
[https://news.ycombinator.com/item?id=23103049](https://news.ycombinator.com/item?id=23103049)

[1]
[https://news.ycombinator.com/item?id=23052357](https://news.ycombinator.com/item?id=23052357)

~~~
ohvirginia
Thanks for the explanation. I think the italis is cool and fresh.

hmmm, really interesting how you feel about the names. It sounds like that is
super annoying.

I never thought about how naming would affect people invested in the names
like this.

I don't think I need to defend it, so I'm not trying to here, just sharing
that for me, beamsplitter sounds like such a cool word, as if a beam were a
physical thing like a rock that could be split. Also something solid in
itself, and connotes advanced, possibly war, tech. lasers. I'll going for that
connotation. hash functions are usually very pathetically named.

also there's more to this name in this project because my initial design
imagined the "beam" of the input, ricocheting around a network of s-boxes
getting mixed. It seemed to me like the perfect hash, aesthetically and
efficiently, and universal. but to my disappointment, I couldn't get a pure,
s-box only design to work. I had to include some "traditional mixing function
hacks" like multiplication, rotation and xor. But I wanted to keep the name
because it was aspirational.

I can imagine that it must feel like all these annoying computer software
people taking all these names that are not from their area, but from your
area, and not leaving anything good for the rest. And when they have such high
profile already! Like nobody will listen to the poor physicists, especially
once all their names are taken, and then it will be more lonely. A nameless
space, with nothing left. Sounds pretty sad.

Funny is for me, it seems physics stands above software, so using such names
is a way to increase perceived value. But from your view, software has the
higher profile.

Thanks for sharing.

~~~
_Microft
Let me apologize for my complaints. It didn't occur to me that these names
might be used out of admiration for a field.

I've also have to admit that physics needs relatively few new names in general
which would make picking one a lot easier. There are also naming patterns as
well, e.g. for superpartners (new articles in supersymmetry are either
prefixed with S- or suffixed with -ion [0] in a predictable way).

[0]
[https://en.wikipedia.org/wiki/Superpartner](https://en.wikipedia.org/wiki/Superpartner)

~~~
ohvirginia
That was cool to read your reply. actually looking back over my code, I see my
achievement was better than I thought. I only used addition, and rotation. no
mult, nor xor.

technically tho rotation can be thought of as including multiplication and
xor. but also not. so I don't know.

------
snypher
I wish it had a more friendly name. I can't help but think of Room 641A.

[https://en.wikipedia.org/wiki/Room_641A](https://en.wikipedia.org/wiki/Room_641A)

------
asimpletune
Looks interesting! What is meant by a “universal family”?

~~~
schr0dinger
A universal set of hash functions is a set of hash functions such that
randomly choosing any hash function from the set guarantees an upper bound on
the number of collisions regardless of which keys from the universe are input
to it (which are also random).

Basically it makes it more difficult for an adversary to exploit collisions
from your hash function.

~~~
seph-reed
I don't get how this could be used. I tried to imagine, and ended up with
something wrong. This is what I imagined:

You have a list of hash functions, and choose one at random, then hash a
password. Later a hacker gets these hashed passwords, and has an extra hard
time? But this wouldn't work for checking passwords because you wouldn't know
what hash.

What is a real use case?

~~~
aidenn0
I may be wrong, but after doing a bit of research, here's one example:

Alice is storing keys in a hash table. Since this is a hash-table, the hash
(H) that Alice will choose must be fast. However, real-world hash-tables will
use a relatively small number of bits from the output of H, because even if
you have a table sized to 4 billion, that's only 32 bits.

Let's say that Alice does this by taking the lowest N bits of the output of H
(this works in practice regardless of which bits Alice uses) where 2^N is the
size of the table. N may change as elements are added

Eve wants to mess with Alice by sending a bunch of keys that all have the same
bottom M bits, where M is the largest expected value for N. Since the hash H
is very fast, this is very computationally cheap to brute-force, particularly
if you have access to very parallel hardware like a GPU.

Now consider that instead of using hash H, Alice uses hash-family U. Whenever
a hash table is created (or rehashed,) Alice selects a random hash from U. Eve
can no longer easily generate keys that will collide in the hash table.

From what I can tell, for password hashing, this is not appreciably better
than salting, if the size of the set of possible salts and the size of the set
U are the same.

------
mfbx9da4
What's an S-box (substitution-box)?

~~~
underdeserver
An arbitrary function of X bits to Y bits. You generally want to pick these to
be as non-linear as possible.

