

Full-Text Search in CouchDB Using...CouchDB - twampss
http://lethain.com/entry/2008/dec/08/full-text-search-in-couchdb-using-couchdb/

======
ROFISH
As an aside, personally I use the Sphinx full-text search engine. While it's
not entirely perfect (I would LOVE to have delta merging to not cost as much
IO), there's practically no cost to searching. One of my biggest pet peeves
for message board software was the search timer because all the search
algorithms sucked. It was either something like this or just plain used MySQLs
FULLTEXT feature which is horrible for 1GB+ pure text data. So when I wrote my
own forum software, I used Sphinx for zero-cost searching.

<http://forum.fangamer.com/forum/search/>

------
hedgehog
If you want full-text search in the data store Google gives you App Engine you
end up doing about the same thing. One thing though, when indexing text you
definitely want to stem the words and remove the duplicates. You can implement
a Porter stemmer (probably the right thing to do) or use a quick and dirty one
something like this:

    
    
        tails = 'ally ion ive ing ies ed ly al es e s n y'.split()
    
        def stem(self, s):
            slen = len(s)
    
            for t in tails:
                if s[-1 * len(t):] == t:
                    s = s[:-1 * len(t)]
                    break
    
            if len(s) > 2 and len(s) < slen and s[-1] == s[-2]:
                s = s[:-1]
    
            return s

