
Whoosh – Fast, full-text indexing and searching library in Python - albertzeyer
https://bitbucket.org/mchaput/whoosh/wiki/Home
======
caioariede
If you are using it for Django, take a look on Haystack [1]. It supports
Whoosh as well as Solr, Elasticsearch, etc...

For example, you could use Whoosh for development environment and
Elasticsearch for production if you need something more robust.

[1] [http://haystacksearch.org/](http://haystacksearch.org/)

~~~
j_s
[https://github.com/toastdriven/django-
haystack](https://github.com/toastdriven/django-haystack)

------
mattdeboard
Solr is just so easy to get up and running, so powerful with so many options
for scaling, I don't see the benefit of "pure Python" here. (To be fair,
despite living and breathing Python most days, I've never thought "pure
Python" is a selling point.)

~~~
randlet
Pure Python is definitely a selling point for those of us who regularly deploy
to multiple platforms. Knowing you can 'pip install pure-python-lib' on any
platform and have it work every single time without regards to what c
compilers and c libraries you have available is a major boon.

~~~
zzzeek
Solr and its backend Lucene are written in Java. so it runs on any platform
arguably even more easily than Python.

~~~
danudey
If you're assuming a known-good Python stack, then a pure-python solution will
just work. Using Solr/Lucene means having a known-good Python stack _and_ a
known-good JVM. Python is installed with most (all?) Linux distributions by
default, but Java is almost never part of the base install.

Also, a pure-python solution allows for an in-process index/search, rather
than adding another external process dependency which needs to be monitored
and maintained.

------
piqufoh
Whoosh is great in that it fits the django philosophy of building sites fast,
but I wouldn't use if for anything harder than a small site search. Once my
site grew larger than 50 mb of text queries started slowing things down.
Indexing (python, single threaded) took a while and the larger the index the
slower the queries were returned.

It's probably the best first iteration search app that I've come across, and
you can always slip in solr or something with more umph when you need it.

~~~
WoodenChair
Can you clarify why you would not use it on anything harder than a small site
search? Without context, the comment is not especially helpful.

~~~
mattdeboard
> Once my site grew larger than 50 mb of text queries started slowing things
> down. Indexing (python, single threaded) took a while and the larger the
> index the slower the queries were returned.

I think this makes it pretty clear. Not sure how much more explanation you
need.

~~~
WoodenChair
Wasn't in the original comment... yet another example of why HN should not
have edit.

------
davb
Whoosh is fantastic. I'm using it on an ecommerce web project just now for
indexing CMS pages, products, datasheets and product hierarchy (categories,
product families, etc).

FWIW I'm using Flask, Flask-SQLAlchemy, Whoosh and Flask-WhooshAlchemy [0].
Quick to get going with but the mix-and-match approach lets me easily rip
pieces out as the project grows.

[0] [http://pythonhosted.org/Flask-
WhooshAlchemy/](http://pythonhosted.org/Flask-WhooshAlchemy/)

------
daemonk
Sounds cool. Does it scale well? Has anyone used it for a large-ish amount of
data (gigs)?

~~~
davb
It works well enough for me. I can't say how large my index set is (because I
honestly don't have the figures to hand) but on the current project, a search
of around 50,000 products, 10,000 product families and a whole lot of
associated data (product attributes, datasheets, etc), an uncached search
takes around 65ms.

To put it in context, the psycopg2 calls to PostgreSQL take about 100ms to
retrieve the associated data once I've found my search results with Whoosh.

(Most of my response time sits with SQLAlchemy ORM, building up matrices of
data, which is why in production I'm caching the more complex queries with
memcached).

Overall, for a project of this size (I can't imagine having to index more than
a few hundred thousand objects) I'm very happy with Whoosh. If I get to the
point of indexing millions of objects, I'll optimize it then.

------
cardamomo
I have found Whoosh to be easy to use, though I haven't put it to use with a
large number of documents, so I don't know how well it holds up.

------
shavenwarthog2
I recommend taking a look at Whoosh -- I worked with it extensively a few
years back, adding a keyvalue backend for it.

It seems to be designed for indexing manpages. That is, a medium-sized semi-
static database with a few different dimensions per document.

