Hacker News new | past | comments | ask | show | jobs | submit login

It's entertaining that there is yet another area where "running large numbers of regular expressions, known in advance and not changing all that often" over lots of input is performance critical.

This was a key driver behind writing Hyperscan (https://github.com/intel/hyperscan) which I used to work on when it was primarily used for network security (thousands or tens of thousands of regular expression rules for firewalls), but doing this for ad blocking seems pretty sensible too.




This is indeed a performance critical problem, but it is already pretty much solved at this point. If you look at the performance of the most popular content blockers, their decision time is already below fractions of milliseconds. So it does not seem like performance is really an issue anymore.


Yeah, I don't doubt that it's solvable by other means (notably hashing). It's just amusing that something we started building in 2006 - and open sourced in 2015 - largely solves a problem directly (i.e. you don't specifically have to rewrite your regexes).


To be fair, blocklists are not really lists of regexps. They contain some regexps indeed but the syntax is mostly custom and matching relies on both patterns found in URLs (This part could be partially expressed as RegExps) as well as a multitude of other things like constraints on domain of the frame, domain of the request, type of network request, etc.


Hyperscan is still dynamic, and C only. Google or Mozilla could use perfect hashes and ship it. (Google won't). You need a native extension for a fast low memory blocker.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: