

Whoosh: a fast pure-Python search engine - bdfh42
http://whoosh.ca/

======
delano
_... a typical single-term search takes 0.003 seconds in Xappy, while Whoosh
takes 0.01 seconds. The Whoosh index was less than 4 MB, while the
Xappy/Xapian index was 30 MB on disk._

I should hope it's fast with an index of 4MB.

That aside, it's more like a lookup engine rather than search engine. Search
refers to calculating multi-dimensional relevance in real time. Here's a
quick, two-dimensional example:

A search for "fender" could be configured to have related keywords: "music",
"gisbon", and "automobile", "ford". The strength of the relation should also
be configurable as it's used to create a score for the weight of each keyword.
Then expand each keyword: fender, fenders, music, musical, etc... These
expansions are given slightly lower scores than the keywords. Then we consider
what fields in the data set to look for these values. Each field has a score.
We determine which records to include by making a two-dimensional calculation
between the keywords and fields. Now we sort the result set based on the same
or other additional criteria and return.

There are _many_ possible dimensions include misspellings, synonyms,
wildcards, user history, brands. Basically anything. And the data-set get much
larger. That's why search is so difficult.

I'm not criticizing Whoosh specifically because other open source "search
engine" libraries have the same weaknesses. But it is important to understand
the difference between search and lookup.

~~~
alnayyir
This is why I don't believe in writing libraries in the abstracted language
they're intended for. I'd rather see them coded in C, or better, optimized C,
or better, pure asm written by someone who knows what they're doing.

You don't need python to be good at searching, you need to use it to leverage
fast libraries.

This is notably a critical failing of other abstracted programming languages.
(I won't name names, I don't want to be piked here Vlad Tepes style.)

~~~
MaysonL
Languages are not implementations.

Algorithms are more productive of speed than language choice.

It is highly likely that in five years, Python code will run competitively as
fast as C, assuming that either language is still in use.

~~~
alnayyir
Python code will run competitively as fast as C <\--- that's highly unlikely
that will happen in 50 years, let alone five years.

Have you been keeping track of programming history at all? Lisp isn't as fast
as fortran or C, and it's been around for 50 years.

You're delusional, I'm sorry.

~~~
MaysonL
Well... we'll have to agree to disagree - and you really should check out the
state of the art in Lisp compilers these days.

PS - I've been programming since before Lisp was a teenager.

------
thomaspaine
Looks very cool...it says it's as fast or faster than PyLucene and Xappy, but
it'd be nice to see some concrete benchmarks.

------
nihilocrat
There's definitely some clever coding afoot when a pure-python library can
outpace a C wrapper library.

~~~
adamtj
Well written python basically _is_ a C wrapper.

~~~
teej
So is Ruby - that doesn't make it fast.

