Hacker News new | past | comments | ask | show | jobs | submit login

> boiled each pattern in the trie down to its most unique dword or qword, and only attempted a full match after an initial hit

Oh that's interesting - like a very fast cmp before bothering to check the rest of the branches.

When you say pattern are you talking about specific letters/morphemes or subsequence? And were you pulling the [dq]words from the initial pattern or generating them some otherway (through some kind of hash or encoding function?).




Sorry, should've given more context. In this case the trie was for Aho-Corasick, so we built it from the most unique subsequences. Since there was no need to add or remove patterns at runtime, the subsequences were precomputed and the trie built server-side, then sent serialized down to the application.

On a different occasion I tried a few kinds of entropy coding, but their value was pretty limited, and obviously not applicable if you need to operate on a stream with Aho. In that case I only needed memory-optimized set membership testing, though, so ended up going with a Golomb-compressed set from pgs. 7-8 of [0].

[0] http://algo2.iti.kit.edu/singler/publications/cacheefficient...




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: