Hacker News new | past | comments | ask | show | jobs | submit login
High Speed Hashing for Integers and Strings (2015) (arxiv.org)
28 points by PartiallyTyped 20 days ago | hide | past | favorite | 13 comments

Knuth, The Art Of Computer Programming V3, Sorting & Searching covered both multiply & shift and modulo prime integer hashing in some detail in its first edition written in the 1960s, published in 1973. While not "modern", Knuth has always been a bona fide classic, especially as regards hashing, and should really be cited by this survey paper, IMO. (EDIT: people now call this "Fibonacci Hashing" even though, in the source material itself, Knuth mentions many irrational numbers will do, not just his motivating choice of the golden ratio from Fibonacci numbers fame. There are blog posts about it being "the optimization that time forgot", etc.)

For not entirely clear reasons (maybe multiplier selection?), only Knuth's modulo prime suggestion "propagated" well in the 70s and 80s. This suggestion reception bias also aged poorly over ensuing decades with the CPU cost of modulo vs. multiply evolving from a 2x performance delta to more like 10x.

> last revised 9 May 2020 (this version, v9)]

Why is this a published paper? Someone should write a web service that renders stackoverflow posts in latex so that colleagues think I’m researching something complicated.

As the abstract says:

Some of the most practical hash functions have only appeared in theory papers, and some of them requires combining results from different theory papers. The goal here is to combine the information in lecture-style notes that can be used by theoreticians and practitioners alike, thus making these practical fruits of theory more widely accessible.

Reviews are pretty common and while they might seem trivial, writing a good one is an astonishing amount of work.

It is not a published paper, but it appears you did not pay much attention besides 'skimming' through (perhaps not even that).

The tl:dr; is that certain non cryptographic hash functions provide certain guarantees in different applications such as signatures, expected runtimes on hashmaps, distributed sampling and so on, it also shows some fast solutions with good guarantees and explains the mathematics behind said guarantees.

The professor mentions in the document that the document exists because modern textbooks do not go into detail on hashing.

The content in the document is sufficient for 1 or 2 lectures on introduction to hashing. We spent 1 lecture during a graduate course in Advanced Algorithms and Data-structures.

It's a survey paper.

As far as I can tell, it's not published anywhere?

I thought this was going to be about xxHash, which is awesome

xxHash is legacy, replaced by xx3. Still not the fastest block hash, https://github.com/rurban/smhasher

This is a good survey paper about the old, known, fast, simple one-line hashes. I've tested all of them in smhasher also. They are all *not* recommended for production use.

what? xx3 is xxHash. and your own page says "the fastest hash functions on x86_64 without quality problems are: xxh3low" Whether its the fastEST, it is fast as hell. And why not use in production? (yes, I know it is not cryptographically sound, that's not why I'm using it)

Two different things. Look again. xxh3 is the new one, xxHash the old one.

Wow, you really seem to have something against xxHash (of which xxh3 is the newest/best variant, but still part of the xxHash family http://cyan4973.github.io/xxHash). I'm starting to doubt whether your benchmarks are accurate, either due to malice or some type of incompetence.

Each variant is its own. xxHash is very different, and I have nothing against any. You obviously want to deflect from mixing them up, your own incompetence.

Applications are open for YC Summer 2021

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact