

Full Text Search with Whistlepig - zrail
http://bugsplat.info/2013-01-09-full-text-search-with-whistlepig.html

======
zrail
To the joker who searched for "(dasdsa#@!#@!K#...KJ#!@LKJ#@!LKJ+,.3,21/321"
thanks for the debugging help :)

~~~
quadhome
Post the refined version of your code?

~~~
zrail
Oh, I just wrapped the query parser in a begin/rescue block. You can see it
here:
[https://github.com/peterkeen/bugsplat.rb/blob/master/page.rb...](https://github.com/peterkeen/bugsplat.rb/blob/master/page.rb#L56)

------
rogerbinns
Another alternative is to do the search on the client. This is the approach
taken by the Sphinx documentation tool which generates static pages. It is
popular with (but not limited to) Python. See <http://sphinx-doc.org/> In the
bottom right of that page you can see a search box. "Exception" is an example
search to try.

The way it works is that doc build time it generates a index file that
contains various terms, performs stemming, stop word elimination etc. When you
do a search the client javascript downloads the index, performs stemming on
the search terms and does the lookup in the index. This is what the index
looks like for that site in JSONP format <http://sphinx-
doc.org/searchindex.js>

~~~
bambax
This is excellent; I developed such a solution some time ago, after asking on
StackOverflow if such a thing already existed and being told no.

It's actually quite simple to do from scratch anyway. My solution involved
being able to search non-consecutive terms on a page and return paragraphs
containing those terms. AFAIK, in-browser search cannot do this, which is a
shame.

~~~
rogerbinns
Sphinx actually does show the paragraphs in browser. Try a search to see.

The source files for Sphinx are plain text in rst format (like markdown but
with more functionality). It generates HTML from that, but (by default) the
original plain text files are also included in the output. The search index
tells it which pages match and the javascript grabs the plain text and shows
paragraph matches.

------
middayc
Very nice, but I see scoring as the important feature of a full text search,
which this one doesn't offer. It's still makes sense to use it if you big
enough number of documents or if query language suits you use-case, and you
implement scoring on top of it, but it would be nice to have in the engine
itself.

<http://fallabs.com/tokyodystopia/> is similar to this btw.

