I played with IndexTank a bit this past weekend, and it seems like they are really hurting from lack of a good gem that fits into Active Record. When compared to something like ThinkingSphinx or a project like that it feels very un rails-ish.
I noticed this project has cropped up to try and solve it, which is awesome:
https://github.com/flaptor/thinkingtank
however it's not Rails 3 compatible right now.
If I were IndexTank, getting a gem like that up to speed would be my top priority I think to get real traction on Heroku or amongst rails folks.
When I first saw the contest I was only thinking about indexing all the existing data from some website, but clearly it's better to just start indexing the new data, and get a website up and running. Later I could write another program that fetches and indexes the old data in the background. Thanks for writing this, it gave me some new ideas.
My app is using 25k docs. What I described in the article is very basic, the app that you can try at http://plixitank.heroku.com keeps a window of the 25k most recent items from Plixi and erases older stuff. That's enough for several hours' worth of search history.
Trendistic is a special-purpose app with several millions of documents. However, the contest accounts are limited to 1M documents so don't worry much about the size. You could try an interesting approach at indexing tweets and should be more than fine choosing up to 1M tweets with some criteria, be it recentness, popularity of the author or something else. You can contact us directly if you have other specific questions, support [at] indextank or through the chat box on our site.
C, I probably still have that source code around. It was very rudimentary: an inverted index with no relevance, only AND queries (intersection of word vectors). I had the index mmap'ed because it was pretty small.
I noticed this project has cropped up to try and solve it, which is awesome: https://github.com/flaptor/thinkingtank however it's not Rails 3 compatible right now.
If I were IndexTank, getting a gem like that up to speed would be my top priority I think to get real traction on Heroku or amongst rails folks.