Hacker News new | comments | show | ask | jobs | submit login
Implementing faceted search with Django and PostgreSQL (simonwillison.net)
124 points by simonw 11 months ago | hide | past | web | favorite | 19 comments

I've been working mostly with Elasticsearch for the past couple of years, but I have to admit I'm deeply impressed with the results I got out of PostgreSQL after only a few hours of hacking. Elasticsearch is far more flexible and powerful but for smaller, simpler search projects it's hard to beat the convenience of having everything built into the database.

Have you heard of ZomboDB? It aims to provide the best of both worlds; transactions and elasticsearch. I've been using it on my own pet project and am very happy that I only have to care about dealing with Postgres (the ES part goes automatically).

What do you find are the tradeoffs vs. working with Elasticsearch or Solr?

But great job, I've bookmarked your post to come back to and I'll give it a shot for sure. Definitely very enticing.

Elasticsearch is massively more powerful and flexible.

The way I'm doing aggregation counts on top of PostgreSQL is very unlikely to scale to millions of documents. Elasticsearch pretty much acts as a real-time, horizontally scalable map/reduce engine so expensive aggregations get executed on multiple shards and the results are gathered back together.

You may find my talk from DjangoCon this year interesting: https://speakerdeck.com/simon/the-denormalized-query-engine-... / https://www.youtube.com/watch?v=NzcvewgqYog

What type of data is ES good for? It's more than documents and full-text search, if I'm not mistaken, correct?

I have used it as a general data store, where I can accept eventual consistency. It is a very good distributed db if you are ok with the trade offs it imposes, mostly non-atomic commits to your cluster.

It also excels at storing logs for later aggregation and analysis.

Thank you, I was specifically wondering how good it is at data ingestion for warehousing. Good to know that's one of its strengths.

Generally speaking, you wouldn't want to run queries like this against the system of record (your database) in large applications where there are a lot of user queries, as that will impact performance.

Hence why solutions like elasticsearch and apache solr exist. They scale very well horizontally, allowing you to spread the load between multiple nodes.

Again, this comes down to what your use case is. If you have a tiny app with a relatively small number of users, using postgres for this functionality is probably ok.

I think this looks like a great tutorial but why is all the text in a small column oriented to the left?

Because I’m still using the site design I came up with in ~2007!

I’m hoping to find time soon to modernize it for 2017 era screen size at some point soon. I did at least manage to get some media queries in there so it’s readable on mobile phones.

Adding #wrapper {margin: auto} centres it and makes it look better IMO. Great article!

Bumped everything up to be a bit wider - thanks for the feedback.

If you just center it it would be perfect actually because it would be very readable.

I've implemented a facet search in Postgres for an application where the number of documents is reasonably limited, and for that purpose it worked very well.

Calculating the counts for each facet was by far the most expensive part of the query. I found the GROUPING SETS feature very useful to calculate a lot of aggregate counts efficiently. Though this wasn't parallelized by Postgres, I'll have to check if that is possible in Postgres 10 now.

I wondered if there are better ways to calculate facet counts, but I simply can't think of a way that doesn't require at least one full table scan.

I think facet counts at scale are probably best handled by Elasticsearch or Solr, but I'd be delighted if a PostgreSQL expert would step in and convince me otherwise based on some of the amazing work that went into PostgreSQL 10.

I'd imagine you could accomplish very similar results using SQLite and the full-text-search extension (FTS3, 4 or 5). I believe the SQLite FTS modules support a porter-stemming tokenizer, though you may need to remove stop-words yourself. Before FTS5 you also need to write a user-defined function to rank results, but there are numerous libraries which provide good implementations (BM25, e.g.).

Have you seen Alexander Korotkov post "Faceted Search in the Single PostgreSQL Query" http://akorotkov.github.io/blog/2016/06/17/faceted-search/

It seems that PostgreSQL is becoming an ideal solution for early startups: full-text search, well-designed plugin architecture, replica, partitioning, Large Object storage

Applications are open for YC Winter 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact