Hacker News new | past | comments | ask | show | jobs | submit login

I really wish I had some benchmarks here. Postgres can be crazy fast, but can it compete with Elasticsearch? How long would a typical query across 1,000,000 documents?

I haven't made benchmarks specifically against other products, but we use PostgreSQL across a database of 7.7m blog-post length documents and have found the search to be extremely fast.

The only times in which search slows down is when you search for common words that are not stop words and are also performing ordering (such as ranking) across that body of results before returning your window. That is to say, the search is still fast but sorting a large resultset can be slow.

The only gotchas I would call out to anyone thinking of using PostgreSQL full text search are:

1) Beware of row-level queries like ts_highlight() for extracting a fragment of the matched text, be sure that you are only doing this for the n lines that you are returning for a LIMIT and not the entire resultset (most of which you will discard).

2) Beware of accuracy and think about the text you are indexing. If you are receiving raw markdown and transforming to HTML, then you shouldn't index either (raw markdown may contain XSS attacks that can survive ts_highlight(), and the transformed HTML will contain markup that will be indexed). You should figure out a way to build a block of raw text that is safe to index and to show fragments for (ts_highlight()) which likely means running a striptags() over the HTML and also putting hyperlinks into the text (if you want those to be searchable) and indexing that.

We had no issues with performance, control over search ordering, or anything else. The only gotchas we encountered related to sorting and displaying results, and we resolved both easily enough, though too few of the high-level docs like the linked article pointed out the potential issues there.

That's going to depend on the size of the documents, number of terms, etc. The other big question is ranking -- from what I can tell, PostgreSQL's ranking functions (ts_rank and ts_rank_cd) can be performance problems, but I haven't seen any good head-to-head benchmarks on a given corpus.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact