

Fast wildcard searches with trigrams - matthewwarren
http://www.mattwarren.org/2015/08/19/the-stack-overflow-tag-engine-part-2/

======
arthursilva
I'm fairly sure this is the same technique employed by pg_tgrm. There's a new
version (1.2) floating around the mailing list that's BLAZZZING fast!

------
buzzier
I'll just leave this here: [http://ntz-develop.blogspot.com/2011/03/fuzzy-
string-search....](http://ntz-develop.blogspot.com/2011/03/fuzzy-string-
search.html)

~~~
matthewwarren
thanks for that link, there's some interesting stuff in there.

I also tried using a trie, but as the linked post states, they are only good
for starts-with or ends-with searches. They can't be used for contains
searches, i.e. " _java_

------
grandpa
I think this is the same technique as Russ Cox used to search Google Code:
[https://swtch.com/~rsc/regexp/regexp4.html](https://swtch.com/~rsc/regexp/regexp4.html)

~~~
rryan
This is mentioned in the article. (and links to rsc)

------
andrewvc
This is a cool explanation of the technique, but maybe the author can share
why they built their own tag database vs. using elasticsearch or postgres,
both of which natively support trigram indexes (and in ES's case arbitrary
ngrams).

~~~
matthewwarren
It was "just for fun". I read about the Stack Overflow tag engine and I was
intrigued what it would take to build it. Plus they make a dataset available,
so I had ~8,000,000 questions to try it out on.

It's been a good chance to learn some optimisation techniques and new data
structures, such as trigrams and bit map indexes (coming up in the next post)

Also AFAIK SO still use their own custom solution for tag searches. They've
moved to Elasticsearch for regular searching, but not tag searches. So their
must be some benefit to their custom solution.

