Anyone interested in this piece, may also be interested in my Demystifying PostgreSQL slides from some time ago. Though most if not all is still relevant today.
I've occasionally been tempted to struggle with postgres's full-text options... and always regretted it. It's idiosyncratic and quickly hits a ceiling, in terms of ranking-quality or query features, which is way below what's possible with specialized solutions.
Even if you despise Java, SOLR or ElasticSearch aren't that hard to get up as a service. Each hour invested into getting those working is likely to return more value than an hour with PG's full-text search.
You can. CREATE TEXT SEARCH PARSER does essentially that. (Yes, it's called parser, not lexer, but in this case the difference essentially doesn't exist).
> Let me define a pipeline of filters like Solr.
Hm. Essentially that should be possible using dictionaries.
> Let me use BM-25 text similarity like SQLite.
Hm. It'll not be possible to write bm25 directly like in sqlite (matching to the current row without specifying it IIUC), but generally it's easy to define additional functions and use them for sorting and whatnot.
I think the postgres text search functionality unfortunately has lacked somebody with interest in developing it for a couple years now. It can be useful, but it could be much better. The default text search parser is pretty much useless in my opinion :(
You can create the text search index over the result of arbitrary function. I have done it for a language that lacks stemming/ispell support. In pl/perl.
Could you give a small example to illustrate how/what you mean? Would you make a function that massage tables/views in "StrangeLanguage", and then spits out "StrangeLanguage" that's been stemmed/normalized etc -- and then feed that to pgs normal full text index system?
Full text search starts on slide 32: http://www.slideshare.net/noloh/demystifying-postgresql