
Hyperscan: High-performance multiple regex matching library from Intel - pcr910303
https://www.hyperscan.io
======
sebazzz
Based on the documentation[1], and not surprisingly, this library has been
optimized for Intel processors. Particularly, you can enable AVX2 and AVX512
instruction sets in the regex. In addition to that, you can optimize the
matching database to a specific Intel processor family (like Haswell or Sandy
Bridge).

BSD-license, and they accept contributions. I wonder if they accept any flags
that would allow optimization on AMD processors.

[1]: [http://intel.github.io/hyperscan/dev-
reference/api_constants...](http://intel.github.io/hyperscan/dev-
reference/api_constants.html#cpu-feature-support-flags)

~~~
jchw
I wonder if they support allowing AVX2 optimizations on non Intel processors;
Intel MKL refuses to. I think for that reason alone I’d never use an upstream
Intel library directly...

~~~
glangdale
I built Hyperscan with my team in Australia (we've since moved on, or been
moved on, from Intel). There are no shenanigans to prevent Hyperscan from
using AVX2 on non-IA processors, although we didn't have any high performance
examples of same when we were last at Intel (2017).

The mechanism is trivial to control IIRC and would not be difficult to patch.
It is, after all and unlike MKL, open source.

~~~
dmbaggett
Kudos for making one of the most impressive pieces of software ever. The level
of optimization you achieved is truly rare these days.

~~~
gnufx
I don't know anything about Hyperscan, but it surprises me that it has a truly
rare level of optimization. How does the optimization compare with typical
tuned linear algebra, for instance?

~~~
gnufx
Rather than just downvoting, could someone answer the genuine question about
how the optimization that's been done compares with the work on linear algebra
which underpins so much? As a specific example, there's libxsmm for small
matrices.

There's no aspersions cast on Hyperscan at all, just a query about what makes
it "truly rare" for the benefit of people who don't have time to study it. It
would also be interesting to know how it compares with hardware regex
implementations, of which I haven't heard anything recently in connexion with
bioinformatics where the interest was.

~~~
jonstewart
I might be the only other person to solve multipattern regex searching. It’s
easy conceptually, and fiendishly tricksy when you try to get the tests to
pass. You want to take advantage of commonality between patterns but the
matches must still be independent. The engine I wrote uses a classical
interpreter backend, optimized well enough to beat TRE and the old PCRE
handily. I’d read about the bit parallel algorithms and play with them a bit,
but the pattern sets we’d throw at any given BP algorithm would break it, as
any given BP algorithm comes with several caveats.

The genius of the HyperScan team is that they threw damn near every algorithm
at their pattern sets, including some they invented, and divvied up the
patterns into subsets that fit the particular algorithms well. Usually getting
the tests to pass with one backend is the last act of any regex author—once it
ain’t broke, you’re too exhausted to fix it. Contemplating the complexity that
comes with a multi-algorithm backend makes me want to cry. But HyperScan did
it.

So, to put it in perspective with linear algebra, imagine a library that first
spent time analyzing given matrices, and then split them into submatrices and
operated on them with different algorithms according to the particular
patterns detected therein. That’s kind of insane to contemplate in a linear
algebra—it’s really not a domain that compares well at all—but it’s how
HyperScan works... and that ignores all the crazy ISA tricks it used.

------
clarkbw
We use this at GitHub to power the commit token scanning. It is very fast and
handles multiple regexes at once. We are looking for secrets from multiple
providers. Couldn’t find a better option for this type of usage.

~~~
klaas-
Can you run this as a pre-receive hook to actually prevent a public commit if
it has suspicious patterns in it?

~~~
pr0tocol_7
You can use gitleaks to do this now. I added that feature in v3:
[https://github.com/zricethezav/gitleaks/releases/tag/v3.0.0](https://github.com/zricethezav/gitleaks/releases/tag/v3.0.0)

------
dnpp123
I had to read Hyperscan source code for work.

Honnestly, once you understood (some of...) the math/automata details, this is
by far one of the best big codebase ever written.

So clean, beautiful, powerful. I've learned a lot from this codebase.

Does anyone know more codebases as well written as Hyperscan ?

~~~
glangdale
ex-Hyperscan guy here: thanks for the kind words. I'm sure my team (who are
primarily responsible for the cleanliness - as a developer, I make a great
"ideas person") would appreciate it.

Can you disclose how/why you were reading Hyperscan source for work? Just
curious, no agenda.

~~~
dnpp123
Wow thanks for your work then !

It was for a (now failed) startup selling IPS appliances. What really helped
us in the end was that we could run the runtime in C (C++ was not possible);
however the compiler/tooling we used was fairly old (...) so I had to dig into
the codebase details to make it work.

As a side note, if I had to say something bad about Hyperscan, it would be the
lack of high level documentation. I don't know now, but back then only a
couple of blog articles available... I always had been curious : was it
something intended to prevent copycats ? Lack of time ? Why not try to explain
more the high level math/automata details ?

If the high level documentation were to be improved, I'm pretty sure the
number of companies integrating Hyperscan would increase, hence Intel sales
would increase (since it has been bought by Intel ;)

~~~
glangdale
The lack of high level docs up to the point of open sourcing was partly due to
prevention of copycats but mainly due to lack of resources (chiefly time) to
spend time writing. We were a small team and time spent on documenting stuff
we all understood pretty well was time spent not chasing customers or
improving our product.

Later: well, there is a Hyperscan paper and there may be more material coming
out later.

Also, not to be a jerk, but a lot of people claim that they will
read/understand/use this kind of documentation and my experience was that only
a fraction actually _do_ , and of those fraction, most of them don't behave in
a way that's actually _useful_ enough to justify having made all those docs.
One is more likely to wind up with people kibitzing and making inane tweaking
suggestions ("use more NFAs! no, use more DFAs"); less likely is meaningful
OSS contributions or using the software when they might not have before.

------
Yver
As per the documentation, the library supports a subset of the PCRE syntax.

[https://intel.github.io/hyperscan/dev-
reference/compilation....](https://intel.github.io/hyperscan/dev-
reference/compilation.html#pattern-support)

------
reza_n
I wrote something very similar many years ago. I described it as a reverse
search index. The queries/regex gets indexed into a search tree and then text
is run thru it. It supported hundreds of thousands of parallel regex searches.
I called it dClass:

[https://github.com/TheWeatherChannel/dClass](https://github.com/TheWeatherChannel/dClass)

------
monstrado
Related, but ClickHouse utilizes this for their regex parsing.

------
benibela
Unfortunately no backreferences

I use FLRE:
[https://github.com/BeRo1985/flre](https://github.com/BeRo1985/flre)

That is also an impressive project, with multiple subengines from which it
chooses the fastest one. All written by one guy. Basically no documentation
though

I wonder how the performance compares

------
antpls
Could matching algorithms be expressed as a graph of tensorflow primitive ops
and run on Google's TPU?

~~~
glangdale
I suspect yes. This would be a wonderful project for someone. Do note that
many regexes can be expressed as a dataflow problem. We had an experimental
subsystem (called "Puggsley") that was a data-parallel take on the problem;
this wasn't published, but is pretty much an independent invention of the same
ideas in icgrep.

The major headache is the prospect of loops that are bigger than 1 character
position - these aren't as trivial to express (I have ideas, naturally).

I don't know how well this fits in with Tensorflow - it may be too
finegrained? But of course, getting onto custom hardware can yield massive
speedups.

------
layoutIfNeeded
Time to raise the rule count limits in regexp-based adblockers then!

~~~
kristianp
Maybe, when webassembly supports SIMD.

~~~
saagarjha
I’m fairly sure the parent was talking about content blocker-esque
architectures, where the regex matching is performed in native, browser code.

------
mehrdadn
Are there command-line tools for it?

~~~
glangdale
There are example codes; that's about it. This would be a good project for
someone, but there's a _lot_ of code in a "grep" that isn't there in Hyperscan
(finding and displaying lines and line numbers, lots of command-line goop and
options, pretty colors for results, etc. etc.).

Also, the limitations of Hyperscan might be quite noticeable. It doesn't
support all pcre facilities (e.g. capturing and arbitrary lookaround). It has
a considerable compile time - I wouldn't want to use Hyperscan to grep short
files! Justin Viiret (another ex-Hyperscan team guy) has a blog post about
this comparing our relatively heavyweight optimization strategy with RE2
(which gets down to the business of scanning a lot quicker than we do). You
can find it here: [https://01.org/hyperscan/blogs/jpviiret/2017/regex-set-
scann...](https://01.org/hyperscan/blogs/jpviiret/2017/regex-set-scanning-
hyperscan-and-re2set)

(sorry about the giant 01.org "dickbar", we couldn't control that)

The upshot is that most people looking for big collections of regexen in huge
amounts of data aren't really running 'grep' type tools. If someone wanted to
do that, like I said, it would be a good project.

~~~
fredsanford
Any guesstimates as to when it starts to become worth it to use hyperscan vs
pcre etc?

30 regex and 1 meg of data for example.

I'm somewhat curious as I have a couple of scraping things I do where I can
compile the regex once and keep it hanging around (or save to/load from disk
if that's feasible) for 3 to 5 minutes at a time.

~~~
glangdale
Well, 30 regex is not a great case for libpcre as libpcre really doesn't have
a multiple regex mode. Thanks to some excellent work in libpcre2 with
accelerated searching, it's no longer the "7-10x slower than Hyperscan on a
single regex" case of the past (still frequently slower), but it wouldn't
surprise me if HS was 50-100x faster than libpcre once compile time isn't a
factor.

However, the compile times of HS for 30 regexes might be not entirely trivial
(maybe a second, probably less). A megabyte you'd probably see benefits but I
think the benefits might not be drastic.

Probably the best way to find out would be to fire up hsbench (a tool that
comes with Hyperscan now), figure out the slightly weird corpus format (sorry)
and get some numbers for yourself on your own workload.

------
adorable_monkey
If you really need speed you should use FPGA.

~~~
matt-noonan
This is a pro-level troll!

For context, see slides 5 and 6 about the history of Hyperscan and Sensory
Networks Inc.: [https://openisf.files.wordpress.com/2015/11/oisf-
keynote-201...](https://openisf.files.wordpress.com/2015/11/oisf-
keynote-2015-geoff-langdale.pdf)

~~~
adorable_monkey
History is not a benchmark) Software universal algorithm can not be faster
than custom algoritm on FPGA for each job.

Here is a real history
[https://twitter.com/joeerl/status/1115990630793207808](https://twitter.com/joeerl/status/1115990630793207808)

~~~
glangdale
You are correct that the failure of Sensory Networks from 2006-2009 to develop
an effective FPGA matching algorithm does not imply that no-one can do it.
However, the rest of what you said is an assertion for which there isn't
really that much proof. FPGA is wonderful when you have the sort of available
parallelism it needs; it's not great competing with s/w if the problem is
expressed in algorithms with serial dependencies.

I think FPGA could do a great job of achieving worst-case performance by
simulating NFAs in a very straightforward fashion (beating s/w). However, no-
one does this as the headline numbers are terrible.

The thread you posted contains an assertion by someone who built an FPGA
parser with no performance numbers to back it up; the performance comparison
may well reflect relative skills with software and hardware as opposed to the
limits of software. Interestingly, one of the first responses refers to
another project of mine. :-) (simdjson). Unfortunately, given the subsequent
posts on the thread, it does not look like the author can provide further
details on the system, which is sad.

