
Postgres full-text search is good enough - percept
http://blog.lostpropertyhq.com/postgres-full-text-search-is-good-enough/
======
mdemare
no_future (dead) says:

I've actually tried both Postgres FTS and Elasticsearch(in conjunction with a
Postgres DB) and found Elasticsearch easier for search. It's much more robust
- for my specific case I needed a bunch of Japanese text to be searchable
alongside english text; Postgres has very poor support for this - and its API
is excellent and its very quick and painless to set up(documentation is not
stellar though). You don't have to worry about things like vectors or indices
with ES, you can just put the data in and it will be easily searchable, and
it's miles more efficient when searching across large data sets. It Java-based
so it is a pretty big resource hog though, so I can see the advantage of
someone who just needed very lightweight search for their project using
tsearch.

~~~
olefoo
It points to a pathology of HN that someone discussing their experience with a
specific technology is getting downvoted and flagged for describing their own
experiences. This is not reddit, the downvote is not meant for enforcing
groupthink.

I'm in favor of Postgres, I make my living in large part from my expertise
with it; but it rubs me the wrong way when people downvote an informative
comment covering someone's experience where it didn't happen to be the right
tool.

no_future I apologise to you for the poor reception your comment got from
immature people who don't understand how to discuss things and who cannot
handle new information. This did not used to be representative of the
community here.

~~~
raquo
I think it's more likely that no_future was shadow-banned for their _previous_
comment.

~~~
mercurial
What I don't like about shadow-banning is that it's a death sentence without
warning appeal for a single misstep. Drank a bit too much and typed a comment
you're going to regret the next morning? Shadowban. At least, that's what it
looks like from the history of the few shadowbanned people I looked at.

~~~
philh
Counterexample: no_future has a lot of downvoted comments, and vis total karma
is (slightly) negative. (I would guess that it was slightly positive until
yesterday, and then ve received a lot of downvotes on a single comment and got
autobanned.)

This wasn't a single misstep, it was just the first one that attracted much
attention.

~~~
jessaustin
Thanks, I hadn't seen those pronouns before.

------
NoCowLevel
Link to previous discussion here:
[https://news.ycombinator.com/item?id=8381748](https://news.ycombinator.com/item?id=8381748)

------
justin_vanw
As much as I would like it to be 'good enough', Postgres' full text search
lacks the feature set to be a reasonable solution to most search use cases.

Background: I wrote a product search engine for a company called iTrackr using
Postgres full text search, and later wrote the product search for milo.com
using Solr. I also designed a simple (but very effective) algorithm to boost
search rankings based on historical user click data, by generalizing click
data onto the feature space of the document set (using mutual information).
[http://www.google.com/patents/US20140067786](http://www.google.com/patents/US20140067786)

So, why would I recommend against using Postgres' full text search:

\- It is not configurable. A search engine like Solr or ElasticSearch allows
you to configure the tokenization pipeline, what is indexed, how it is
indexed, etc. Postgres hard-codes it's tokenization pipeline. You can turn
stemming on and off, but you can't, for example, lowercase text, strip out
html characters, handle special cases for tokenization (camel cased or
delimited text, etc). You can write your own tokenizer, but it has to be as a
C extension. Could you get it to do what you want? Mostly, but only with an
order of magnitude more work than with Solr or ElasticSearch.

\- Postgres has _no_ facility to make it possible to do faceting (histograms).
This is a pretty big deal breaker for most kinds of eCommerce. I've seen a few
attempts at this, but they are incredibly complex and slow compared to other
search engines.

\- Postgres does not give you much control over ranking. It has some good
generic ranking methods, but those will only get you so far. (Query Amazon for
'ipad -dkfjdkf' to see how well it's generic ranking function is. No generic
ranking function can do very well compared to something tuned to the domain,
and Postgres just isn't flexible enough to extend in any reasonable way.

\- It is not scalable. Solr and ElasticSearch are both very very easy to scale
up (and out). Postgres' built in replication is very heavy, as it has to
stream all database changes to all slaves (yes there are things like SLONY but
they are incredibly hard to deal with). In any real environment, you have to
protect your database from load. It's the the hardest thing to scale, and
consulting it for all traffic is a surefire way to run into issues that are
very hard to get back out of.

Now, with all that said, I would recommend it for a few cases:

\- You just want to do SQL queries with 'LIKE' queries, but indexed.

\- You need to do arbitrary relational constraints on top of text queries.
Doing a query like "give me all the tickets closed in November by Hugh where
the project was ProjectX and the ticket name has the word "nested loop" in
it's title OR it's description OR it's title is less than 15 characters long"
is going to be more or less impossible in any other environment. If you were
building something like Jira, or really anything that needs to support
relational logic, Postgres' full text search would be a great way to go.

\- This is probably the most common case I would recommend it: You already are
using Postgres, and you just want to add on some basic search functionality to
data that is already in there. In that case, being able to integrate with your
existing queries, and not having to support a seperate indexing step and
corresponding index updating processes is a huge win.

~~~
grey-area
_This is probably the most common case I would recommend it: You already are
using Postgres, and you just want to add on some basic search functionality to
data that is already in there._

Perhaps this is true for more sites than you imagine - you started your
comment saying that it is not a reasonable solution to most search use cases,
but the vast majority of site search on the web is subpar and way below psql
FTS. So for those sites FTS would be good enough, and a big improvement on
what is there with minimal time investment (far less than installing,
maintaining and learning to use something like solr).

I've replaced ILIKE type searches with psql FTS on a few sites and been very
happy with it - it certainly won't be suitable if your entire value
proposition is centred around searches, if you need configurable tokenisation,
many languages etc, but it is more than adequate for simple searches across
text columns where you don't need to use the psql ranking for example but just
want a list of hits to order as you see fit. If you don't want to have a JVM
dependency on your server, and don't have high traffic or specialised search
requirements, FTS is a good choice. There is a huge long tail of sites with
terrible search who could improve it by using FTS in postgres instead, and
that's who this article is aimed at - not people who know what stemming is and
why they should use it, or what faceted searches are (I didn't till I read
your comment, thanks!). Imagine for example a startup who want to quickly
enable a decent search of their blog posts - FTS is a good option which
doesn't require external dependencies and is very quick to start with.

Thanks for the insight though on what the limitations of psql FTS are and
where it starts to break down, it's interesting to hear this from someone
who's tried better options and written a search service which was heavily
used.

~~~
justin_vanw
I'm not convinced that you can magically improve search on those 'tail' sites.
Postgres' full text search is nontrivial to set up and use, even if you have
some SQL experience. I've done substantial work on text search in the past,
and I still have trouble following the docs for FTS. The interface is
cumbersome at best.

Finally, if search on a site is especially bad, probably the people that made
it can't do better, or don't care. I would suspect that most of them are in
the "can't do better" camp. One of the not-so-surprising things I have learned
about these sorts of programmers is, they don't spend any time reading up on
programming related topics, so they'll never see blog posts like this!

~~~
dayone
Elasticsearch is much more cumbersome to setup. I think Postgres' wins in
relative ease of setting up.

~~~
threeseed
No it isn't. DEB/RPM package install and you're pretty much done.

------
Axsuul
It may be good enough but I find ElasticSearch just easier to work with. For
example, ElasticSearch gives you query string syntax [1] for free and it's a
HUGE plus. It's both user friendly and extremely powerful. You can see it in
action over at GitHub's Code Search [2].

[1]
[http://www.elasticsearch.org/guide/en/elasticsearch/referenc...](http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-
dsl-query-string-query.html#query-string-syntax)

[2] [https://help.github.com/articles/search-
syntax/](https://help.github.com/articles/search-syntax/)

------
rachbelaid
As the author, it may be good that I clarify the goals behind this post.

First I'm glad that this post made HN first page for a second time.

I encourage to read this comment because it summarizes well few of the missing
features in Postgres about search.
[https://news.ycombinator.com/item?id=8715624](https://news.ycombinator.com/item?id=8715624)

It's not the first time that I get similar feebacks and I understand that
"Enough" is subjective notion and maybe I should've pick an other title like
"Postgres full-text search for basic search need".

Just to clarify, this post is not against ES or SOLR. This tools are great and
really powerfull but obviously this tools will better than Postgres Search. ES
/ SOLR are tools build to be anly around document store and search and on the
otherside the search in Postgres is only one feature in many.

I think there is multiple reason when you would want to give Postgres search a
go:

\- You have no experience with ElasticSearch or SOLR

\- The project is still early stage and you don't want to invest time/money in
a secondary datastore (eg: ES cluster)

\- You want to give quickly a search feature (maybe not perfect) in the hand
of your user

\- Team knowledge, you may want to be careful before introducing a new tech to
your team and ensure that people understand it. There is nothing worst than
introducing a tech and being the only one who can maintain it.

The tech world is big and doesn't only apply to Startup usecase, I worked in
very different environment: Financial institution, Open Source company,
Startup and Service company. A thing that I learn is that you don't always
have flexibility. In some company having new servers is long process and even
if I hate you have to do with this constraint then you have 2 choices front of
you: You do nothing and wait or you move yourself to offer to your users a
workaround solution without any extra dependency ... That's what Postgres
search can be in some usecases.

From my experience, if you are already using Postgres and you find the right
documentation (this post try to describe a start to end example) then I think
you could get a search working in 1 day of work without any extra servers or
resources.

I don't have any experience with SOLR so I will have to pick ES, but if you
never used it then you are going to have loads of "fun":

\- Sync. ElasticSearch IMHO cannot be used as a cannonical datasource so you
will need to figure out a sync strategy. Some people may argue on that but I
don't think that ES started with this goal in mind. They may change direction
now but still I wouldn't store my data only in ES.
[http://aphyr.com/posts/317-call-me-maybe-
elasticsearch](http://aphyr.com/posts/317-call-me-maybe-elasticsearch)

\- Hosting. In the case that you are not using a hosted ES service then you
will need to learn to host and build a cluster and also monitor it.

\- Learning to use it. After reading ~50pages of doc then you start to figure
out how to use type mapping, custom analyser and the 42 ways to do a search
query.

I can only speak from my experience but it took me ~2weeks to get my head
around all this point without any previous experience. The project for which I
used PG search and wrote this blog post recently moved to ES. More or less, 1y
later and I'm glad that the PG solution was good enought until now. In our
case we needed to support more languages: Japanese, .. and I also wanted to
get rid of a bunch of triggers.

The results are good and very fast but for english the results are quite
similar to Postgres solution. I think that's mainly because I'm using much
only ES basic feature, but we have loads of room for improvement which is a
good position to be.

Now there is few other usecases where you may want to consider PG search:

    
    
      - you are using MySQL and you are using the builtin then moving to Postgres can give your search a bit more power
    
      - Side project using Postgres and you want to start a search feature.
    
      - Search on machine output, case where analyser doesn't really matter and you don't need stemming. Lucene features for tokenization and stemming are great.
    

To summarize, I don't think that PG search is a silver bullet but it may be
"Good Enough" for your need to get you started. Spend 20min and read the post
and then maybe it's also good enough for you. On the other hand if you know
already ES/SOLR and you got the resources then it become less relevant to use
Postgres Search. ES/SOLR (tools base on Lucene) are amazing and really feature
rich, if you have important search need and long term plan about search then
you should look into them.

If you are interrested about the PG search and you want to know more about it
than I suggest you to have a look to this blog posts. I think that the author
did an amazing job.

[http://shisaa.jp/postset/postgresql-full-text-search-
part-1....](http://shisaa.jp/postset/postgresql-full-text-search-part-1.html)

[http://shisaa.jp/postset/postgresql-full-text-search-
part-2....](http://shisaa.jp/postset/postgresql-full-text-search-part-2.html)

[http://shisaa.jp/postset/postgresql-full-text-search-
part-3....](http://shisaa.jp/postset/postgresql-full-text-search-part-3.html)

I don't know if people will be interrested but I thought few times about
writing a following blog post "When Postgres search is not Enough" to describe
how to integrate Postgres with ElasticSearch with JDBC River to pull content
from the DB.. Let me know if it's something that you will be interrested
about.

 _Sorry for the numerous grammatical mistakes but I 'm not a native english
speaker and I don't want to spend too much time reviewing my comment grammar_

Thanks again to upvote this post on the 1st page of HN

~~~
jaddison
Speaking of syncing strategies, I use Django with PostgreSQL as the main
datastore and Elasticsearch as a secondary data store. PostgreSQL receives all
admin and staff related updates and Elasticsearch is updated automatically
using signals (and celery tasks if necessary).

I created a Django app to help with this syncing - check it out at
[https://github.com/jaddison/django-simple-
elasticsearch](https://github.com/jaddison/django-simple-elasticsearch). I
love feedback and pull requests!

~~~
rachbelaid
It's also a Django project, I like what you have done. It's a nice lib and
it's intuitive. How do you handle a ES document which is a composition of
multiple Django Model?

It's just personal opinion but I try to avoid having my application
responsible of the data integrity so I went into the way of ES River plugin to
pull data. I used the JDBC one, [https://github.com/jprante/elasticsearch-
river-jdbc](https://github.com/jprante/elasticsearch-river-jdbc) ... I met
some problems but at the end, it works quite well and don't need logic in the
application keep the data synced.

Two other reason that I used River was to not make the app slower by saving
data in ES and also of being able to run the application during development
without the need of ES and being installed able to substitute the search with
a stub.

~~~
jaddison
Good question - django-simple-elasticsearch is definitely focused on
generating a document from a single instance of a specific model. Of course,
you can add supplementary data from associated models via M2M or FK models as
you see fit (nested objects, etc.)

I've thought of adding in support for pushing bulk index request data to redis
(for example) so that an Elasticsearch river could pull from it; this would
decouple the app somewhat - but not completely as you've noted. It would
likely help with throughput however, and still provide you with the ability to
do pre-processing on the data as needed within your app's/project's context.

The Elasticsearch JDBC river isn't as flexible for processing data if I'm not
mistaken, as it doesn't have context for the data? Please correct me if I'm
wrong, but it's somewhat limited?

~~~
rachbelaid
I think that you nailed it, Elasticsearch JDBC is not flexible and you need to
be good in SQL if you have a complex model with many relation. Because JOIN
create a cartesian product you can end with duplicate and you have to find a
way to avoid that because duplicate affect the ranking. In my case I ended
using UNION queries.

Also you cannot also have multiple queries but not to update and existing
document(eg: adding extra fields) The SQL query for me ended being ~100 lines
but the model is quite complex with multiple languages support.

------
lucianmarin
I’m using it on [http://sublevel.net/](http://sublevel.net/) — there’s no
problem whatsoever. It does the job so well, especially if you have consistent
fields on tables. I like the fact that it can search multiple fields at once
without degrading the performance. You can try it on Sublevel without creating
an account.

------
midas
The use of a Materialized View here is clever, but it means that every time
you add a new row you have to refresh the view. For an app that accepts user-
generated content, this would be very frequent since you wouldn't want users
to create content and then test the search and not find it.

~~~
vidarh
I don't know how optimized "refresh materialized view" is, but making this
efficient is easy if you do materialized views the "old school" way, with
triggers (create a table instead of a materialized view; create triggers to
update based on individual rows, and a function to refresh the whole view).

------
haney
I've used tsearch2 on several projects and it's always been super handy. It's
much easier than adding another dependency just to support search.

~~~
ddebernardy
In case anyone ends up confused by the parent post, tsearch2 is not an
additional Postgres contrib module; rather, it's an old contrib module that
the built-in full text search features were built on top of. It has been
obsolete forever.

~~~
haney
Huh, one of the senior engineers at a previous employer turned me on to it. I
completely missed that it was built in now, I feel kind of silly now.

------
thorin
The Oracle docs have a nice overview of the Oracle equivalent
[http://docs.oracle.com/cd/B28359_01/text.111/b28303/toc.htm](http://docs.oracle.com/cd/B28359_01/text.111/b28303/toc.htm)
I need to give both of these a try sometime!

------
elehack
I've been teaching databases this semester, and PostgreSQL's full-text search
capabilities have been wonderful for allowing my students to build search into
their projects without needing to learn additional tech (we're using
PostgreSQL as our database anyway).

------
dayone
this is a awesome post. for most web applications, installing
solr/elasticsearch for full-text search capability in addition to a sql
database is highly time and maintainence intensive. Postgres for both storing
and searching is really useful here.

------
tschellenbach
I think Algolia is also a nice alternative if you're looking for a quick
setup.

