But seriously though, I played around with compression a bit initially: https://github.com/tpn/tracer/blob/master/StringTable/String.... Referenced Hacker's Delight when I was working on it. Granted, this was just for compressing the USHORT lengths into BYTE-sized lengths.
All of the compress/expand routines mentioned in that Hacker's Delight section you're quoting seem to mention upwards of 160+ instructions. My negative match logic fast-path only requires 12 instructions.
True. A gperf alike hash table (with special key indices) works only ok with compile-time known strings. But a simple CMP-alike perfect hash table with double indirection would be worthwhile to test against.
I called it Hanov after http://stevehanov.ca/blog/index.php?id=119
Problem is there that you have to calc a hash for each string, which is only fast with __builtin_crc/_mm_crc32_u64
A hash alone isn't going to help much; he has both a 1-char string in the set and to top it off, most of the other strings begin with (a distinct from the 1-char string one) the same string. There are some tricks (overpopulate the hash table with short string) but hashing is tricky here.
Source: 11 years of string matching work. :-) We built a multiple hash table string matcher in the early days of Hyperscan (it's gone now).
There is an Intel PEXT instruction, but I don't know the cycle count and I think it's limited to 64 bits.
With PEXT the whole prefix match is four instructions.
PEXT is pretty cheap... but, I still don't think I understand how what you're suggesting could optimally apply to prefix matching against all 16 strings at once, or why it would offer a substantial speed-up to the approach I've used? Can you elaborate?
I can't see how this avoids needing to do a comparison against the string in the slot once you've actually found the index though. I think I need to see this in C or asm to grok what the benefits are.
But seriously though, I played around with compression a bit initially: https://github.com/tpn/tracer/blob/master/StringTable/String.... Referenced Hacker's Delight when I was working on it. Granted, this was just for compressing the USHORT lengths into BYTE-sized lengths.
Are you sure your compression suggestion would work for prefix matches? If you can get lower latency than the fastest assembly version, I'm all ears: http://trent.me/is-prefix-of-string-in-table/#IsPrefixOfStri...
All of the compress/expand routines mentioned in that Hacker's Delight section you're quoting seem to mention upwards of 160+ instructions. My negative match logic fast-path only requires 12 instructions.