
Full-Text Search in Django with PostgreSQL - pauloxnet
http://www.paulox.net/2017/12/22/full-text-search-in-django-with-postgresql/
======
agconti
I can deeply vouch for the performance and joy of implementing Full-Text
Search with PostgreSQL and Django.

I just finished a a project where we chose Postgres's FTS over using Elastic
Search. At the beginning, I was worried about what performance we'd see since
we choose to not use ES. But after slight performance tweaking, we had even
our least performing queries under 50ms.

~~~
PhilipA
You need to look at other stuff than performance - relevancy is probably the
biggest thing when implementing search. Is it more relevant than what you
experienced with ES?

~~~
agconti
I'd argue that relevancy is more your application application's design then
the underlying system retrieving the results. For example, putting the same
dataset in Postgres or ES wouldn't make one deliver more relevant results
given equal configurations.

You could lean on the relevancy strategies built in to ES, but in my
experience you're better off understanding what relevancy means for your
dataset and implementing a strategy yourself. Your millage may vary though,
I'd never advocate reimplenting something that's already provided by your
chosen tool.

The options and tools for configuring and tweaking relevancy between ES and
PostgreSQL's FTS are surprisingly similar for many application use cases. If
you're interested you can check out Postgres' search rank and query weighting
configurations.

~~~
mickeyp
I disagree.

I've had to build complex queries against ElasticSearch and it is specifically
designed for things like this. We had custom weightings so when you searched
for certain natural keys associated with each item they would rank above
everything else, and that is easily do-able with ES. Simultaneously, we would
weigh results according to various metadata we had attached to each entry
(audio stream languages, subtitles, content owner name, genre, etc.). And
finally, if you searched for the name of the media (a movie or an episode in a
TV show) the user would see all the matches ranked accordingly, but again
weighed according to the content owner and various language features of that
media file.

You can probably hack that together with PostgreSQL, but is basically one big
query in ES. PG's FTS is still great; but its use-case is slightly different.

~~~
agconti
That’s cool and I hear you; that’s a complex relevancy definition.

( Maybe surprisingly? ) This is type of query is natively supported by
Postgres. That support is robust and mature, you don’t have to hack it
together.

ES is a great tool and it’s clear your a fan of it. If you’re interested, I’d
recommend you look into Postgres capabilities. It’s not a replacement for ES
by any means ( or even a competitor to in my opinion; Postgres isn’t even
distributed ). But for specific use cases, you might find that Postgres
capabilities surprise you!

~~~
brightball
I was about to give him the same answer but you beat me to it. It’s not jus
throwing stuff in a field in PG. You can weight multiple bits of data. You can
even define multiple different search vectors in separate columns if you want
to use different search styles in different situations.

~~~
mickeyp
Thanks! I replied to the OP; but weighing, although critical to our needs, was
a small part of a larger problem space we had to solve. I hate introducing new
technology unless it's strictly required, but we ran into limitations that
forced us to down the road of using ES.

------
yeukhon
Have worked with ES...

If you want the out-of-the-box distributed experience, then you can't go wrong
with Elasticsearch because of the built-in sharding mechanism. However,
figuring out how many shards you need for each index, where should each shard
sits, and how to balance out the cluster has always been a trial and error
exercise. Indices continue to grow like there is infinite space, and
performance will degrade. What I find useful in the end is to have multiple ES
clusters, which seems obvious right? One of the most frequently asked
questions is "how many nodes do I need", and the answer is always "it depends"
which is true, but gives me a chill. With Cassandra, from my experience,
adding more nodes (scale horizontally) so data spread more thinly, and revisit
some data structure change, would be an acceptable answer.

The other thing is a lot of folks I have worked with just dump stuff into ES
because ES feels like a dumping ground for "json"-like documents (reminds of
me MongoDB). Please compress, understand mappings and different fields to make
the document smaller...

I am still very convinced SQL databases are robust enough to take any lazy
dumping.

~~~
rpedela
FYI you can now split [1] and shrink [2] the number of shards.

1\.
[https://www.elastic.co/guide/en/elasticsearch/reference/6.1/...](https://www.elastic.co/guide/en/elasticsearch/reference/6.1/indices-
split-index.html)

2\.
[https://www.elastic.co/guide/en/elasticsearch/reference/6.1/...](https://www.elastic.co/guide/en/elasticsearch/reference/6.1/indices-
shrink-index.html)

~~~
ddorian43
Its not while live indexing and has many caveats

------
nicolashahn
We use Flask with SQLAlchemy, this guide on sqlalchemy-searchable:

[https://sqlalchemy-
searchable.readthedocs.io/en/latest/insta...](https://sqlalchemy-
searchable.readthedocs.io/en/latest/installation.html#quickstart)

had us easily swap out Algolia for search with Postgres FTS.

~~~
tinyhouse
Not familiar with SQLAlchemy. May I ask where do you host the search index and
also why you decided to leave Algolia? Thanks.

~~~
nicolashahn
Search index is created by adding a automatically populated search vector
column to each searchable table in Postgres and appropriate SQLAlchemy models.
Left Algolia partly because our old implementation was poorly done, overly
complicated, and becoming difficult to maintain, but also because Postgres FTS
could give us equivalent search capabilities (we didn't need all the features
Algolia had to offer) for free.

~~~
pauloxnet
I think everyone that use PostgreSQL FTS had similar stories and motivation to
switch from external software/service.

------
sandGorgon
Does anyone know how to play with relevance and spell corrections in FTS ?

In elasticsearch, relevance is Tf-IDF or BM25. I don't see anything similar in
FTS.

Not sure how I can results around ranking as well as spell
correction/suggestions simultaneously.

~~~
pauloxnet
I used Elastic in past project as I reported in my article, but I never used
relevance, can you add a link to the elastic official documentation about it ?

~~~
sandGorgon
here -
[https://www.elastic.co/guide/en/elasticsearch/reference/curr...](https://www.elastic.co/guide/en/elasticsearch/reference/current/index-
modules-similarity.html)

similarity/relevance is the mathematical difference between search and
database query. Its an inexact match.

~~~
pauloxnet
Thanks. I'll read it.

~~~
burntsushi
If you want to reason about these things from first principles, then I
recommend the IR book, which is free online: [https://nlp.stanford.edu/IR-
book/](https://nlp.stanford.edu/IR-book/)

------
rpedela
Can you expand on the difficulty with synchronization? With ES I understand
because you have to build it from scratch. However Solr has the
DataImportHandler [1] which is made to sync data from a relational database.

1\. [https://lucene.apache.org/solr/guide/7_2/uploading-
structure...](https://lucene.apache.org/solr/guide/7_2/uploading-structured-
data-store-data-with-the-data-import-handler.html#dataimporthandler-commands)

------
steve-chavez
Not to discredit Django but if you're already using PostgreSQL you could use
[http://postgrest.com](http://postgrest.com) which is a more tightly knit
component for fts.

With PostgREST you can avoid the abstractions overhead and just use plain SQL
and REST calls, basically to do a fts using `to_tsquery` you would do:

CREATE TABLE blog_entry( headline text, body_text tsvector );

GET localhost:3000/blog_entry?body_text=fts.Cheese&select=headline

~~~
pauloxnet
Thanks for your link. I read about Postgrest months ago and I think is a good
software. I think Django is good for some problem and postgrest for other.

------
penetrarthur
Why hasn't anyone brought up Django Haystack?

[http://django-
haystack.readthedocs.io/en/master/tutorial.htm...](http://django-
haystack.readthedocs.io/en/master/tutorial.html)

~~~
pauloxnet
I used haystack in different project to connect to Elastisearch as I wrote in
my article, but in this article I wrote about Django Full-Text Search with
PostgreSQL and you don't need haystack to use it, because there's a 'contrib'
module in Django code
[https://docs.djangoproject.com/en/2.0/ref/contrib/postgres/s...](https://docs.djangoproject.com/en/2.0/ref/contrib/postgres/search/)

~~~
penetrarthur
Out of curiosity, which one of them is faster for doing complex search?

~~~
pauloxnet
I'm sorry but "complex search" is too generic. I can say that in my project
the PG FTS is quite fast but a lot less complex than ES

------
pauloxnet
BTW I migrated the djangoproject.com from ES to PG FTS and it works great
[https://github.com/django/djangoproject.com/commit/1be9a483f...](https://github.com/django/djangoproject.com/commit/1be9a483fe876d36de35c5ece1000f16eadb0f13)

I'll write another article about it and, if they vote it, I'll have a talk
about it in the next PyCon Nove conference
[http://www.paulox.net/talks/#pycon-nove](http://www.paulox.net/talks/#pycon-
nove)

------
hitekker
Does the Django postgres module handle partial word lookups?

------
kuschku
Considering your nickname, @OP, I assume you’re the author of the website: The
site is currently broken on screens with a width of 1024px or less.

This screenshot was taken on my secondary monitor, which is 1280x1024 in
portrait mode:
[https://i.imgur.com/n3LGCKM.png](https://i.imgur.com/n3LGCKM.png)

~~~
StavrosK
It does something differently funky for me:
[https://i.imgur.com/gnXgBK5.png](https://i.imgur.com/gnXgBK5.png)

Also I'm not a fan of all the visual noise, such as the orange paragraph
markers next to every title.

EDIT: Replaced "fucking" with the intended "funky", I have no idea how that
typo happened.

~~~
TJSomething
I'm seeing the same thing in Firefox 57.0.3 on Debian, if my viewport is more
than or equal to 1320 pixels wide. It goes away for the text on screen if I
force it to repaint, but comes back if I scroll or resize the viewport.

~~~
pauloxnet
Thanks for the report.

