
Super Fuzzy Searching on PostgreSQL - dhruvbhatia
http://bartlettpublishing.com/site/bartpub/blog/3/entry/350
======
aw2611
For people interested in implementing full-text search with PostgreSQL this
could also help : "Postgres full-text search is Good Enough!" [0]

[0] [http://blog.lostpropertyhq.com/postgres-full-text-search-
is-...](http://blog.lostpropertyhq.com/postgres-full-text-search-is-good-
enough/)

~~~
mark_l_watson
That is a good article. I referenced it last week when I was converting a
prototype Clojure app from using an embedded database and also Lucene to using
just Postgres. Postgres is like a Swiss army knife, lots of functionality :-)

------
radiowave
One thing that can help simplify the creation of the wordlist table is to
instead use a materialized view (with a gin or gist trigram index), and just
refresh it on some schedule.

Of course, this means that your fuzzy search capability will lag somewhat
behind live data in the case where a new distinct word has just been added to
the corpus, but in many cases this is acceptable: people looking for recently
created records will most likely have some exact reference for them, and will
only resort to using fuzzy search to find older records for which their
reference is rather more, well, fuzzy.

------
elchief
And indexes are faster in 9.4 .

See also the smlar extension (by one of the text search authors). Let's you do
TFIDF

I'd kill for BM25 and Solr style filters though...

~~~
radiowave
It appears the author was only looking at gist indexes (which AFAICT haven't
got faster in 9.4), rather than gin indexes (which have).

Gin indexes can certainly be used with the trigram extension, in exactly the
same way as the gist ones, and you would generally tend to expect gin to be
faster for queries, but slower for updates and inserts, compared to gist,
though I've never done a serious comparison on trigram data.

~~~
pgaddict
In my experience the wisdom "GIN is slower to update than GiST" is mostly
obsolete nowadays. At least that's what my benchmark results suggest
[http://blog.pgaddict.com/posts/performance-since-
postgresql-...](http://blog.pgaddict.com/posts/performance-since-
postgresql-7-4-to-9-4-fulltext)

It's also a bit shortsighted to look at the index maintenance in isolation -
when your workload is 99% reads, and queries with GIN indexes are way faster,
it's probably more efficient to use GIN indexes.

Moreover, there are often more expensive parts when inserting/updating data
(either inside the database, or in the application), so the index maintenance
is not really the main problem.

I'm sure there are workloads where it's still true, though.

~~~
radiowave
Right. I've had one case where index rebuild times kind of mattered, and gist
was definitely a lot faster at rebuilding, though not enough to make it worth
taking the hit on read performance. But this was pre-9.4, so it may be rather
different now.

The nature of data being indexed is probably the biggest determinant: gin is
really happy when you have a small number of distinct values across a large
dataset; I guess at the opposite end of the scale gist might come into it's
own.

