
Implementing faceted search with Django and PostgreSQL - simonw
https://simonwillison.net/2017/Oct/5/django-postgresql-faceted-search/
======
simonw
I've been working mostly with Elasticsearch for the past couple of years, but
I have to admit I'm deeply impressed with the results I got out of PostgreSQL
after only a few hours of hacking. Elasticsearch is far more flexible and
powerful but for smaller, simpler search projects it's hard to beat the
convenience of having everything built into the database.

~~~
aldoushuxley001
What do you find are the tradeoffs vs. working with Elasticsearch or Solr?

But great job, I've bookmarked your post to come back to and I'll give it a
shot for sure. Definitely very enticing.

~~~
simonw
Elasticsearch is massively more powerful and flexible.

The way I'm doing aggregation counts on top of PostgreSQL is very unlikely to
scale to millions of documents. Elasticsearch pretty much acts as a real-time,
horizontally scalable map/reduce engine so expensive aggregations get executed
on multiple shards and the results are gathered back together.

You may find my talk from DjangoCon this year interesting:
[https://speakerdeck.com/simon/the-denormalized-query-
engine-...](https://speakerdeck.com/simon/the-denormalized-query-engine-
design-pattern) /
[https://www.youtube.com/watch?v=NzcvewgqYog](https://www.youtube.com/watch?v=NzcvewgqYog)

~~~
StavrosK
What type of data is ES good for? It's more than documents and full-text
search, if I'm not mistaken, correct?

~~~
othersideofcoin
It also excels at storing logs for later aggregation and analysis.

~~~
StavrosK
Thank you, I was specifically wondering how good it is at data ingestion for
warehousing. Good to know that's one of its strengths.

------
fabian2k
I've implemented a facet search in Postgres for an application where the
number of documents is reasonably limited, and for that purpose it worked very
well.

Calculating the counts for each facet was by far the most expensive part of
the query. I found the GROUPING SETS feature very useful to calculate a lot of
aggregate counts efficiently. Though this wasn't parallelized by Postgres,
I'll have to check if that is possible in Postgres 10 now.

I wondered if there are better ways to calculate facet counts, but I simply
can't think of a way that doesn't require at least one full table scan.

~~~
simonw
I think facet counts at scale are probably best handled by Elasticsearch or
Solr, but I'd be delighted if a PostgreSQL expert would step in and convince
me otherwise based on some of the amazing work that went into PostgreSQL 10.

------
htfy96
It seems that PostgreSQL is becoming an ideal solution for early startups:
full-text search, well-designed plugin architecture, replica, partitioning,
Large Object storage

------
obartunov
Have you seen Alexander Korotkov post "Faceted Search in the Single PostgreSQL
Query" [http://akorotkov.github.io/blog/2016/06/17/faceted-
search/](http://akorotkov.github.io/blog/2016/06/17/faceted-search/)

------
coleifer
I'd imagine you could accomplish very similar results using SQLite and the
full-text-search extension (FTS3, 4 or 5). I believe the SQLite FTS modules
support a porter-stemming tokenizer, though you may need to remove stop-words
yourself. Before FTS5 you also need to write a user-defined function to rank
results, but there are numerous libraries which provide good implementations
(BM25, e.g.).

