It's entertaining that there is yet another area where "running large numbers of...

pythux · on June 16, 2020

This is indeed a performance critical problem, but it is already pretty much solved at this point. If you look at the performance of the most popular content blockers, their decision time is already below fractions of milliseconds. So it does not seem like performance is really an issue anymore.

glangdale · on June 17, 2020

Yeah, I don't doubt that it's solvable by other means (notably hashing). It's just amusing that something we started building in 2006 - and open sourced in 2015 - largely solves a problem directly (i.e. you don't specifically have to rewrite your regexes).

pythux · on June 17, 2020

To be fair, blocklists are not really lists of regexps. They contain some regexps indeed but the syntax is mostly custom and matching relies on both patterns found in URLs (This part could be partially expressed as RegExps) as well as a multitude of other things like constraints on domain of the frame, domain of the request, type of network request, etc.

rurban · on June 16, 2020

Hyperscan is still dynamic, and C only. Google or Mozilla could use perfect hashes and ship it. (Google won't). You need a native extension for a fast low memory blocker.