

Full Text Search in the Cloud - MarcusL
https://www.blurpr.com/blog/index.php/2010/06/13/full-text-search-in-the-cloud-as-a-service/

======
fizx
Hey! I'm one of the websolr guys. Obviously, we're working on this problem,
and it's a hard (but fun and rewarding) one for a number of reasons!

Search is fundamentally hard to put into the cloud, because it requires so
many IO operations. In addition, delivering really high quality results
requires machine learning and linguistics.

We have a few tricks up our sleeves to handle these issues, and I'm excited
not only to shed the beta-ish feel, but to roll out some truly exiting
features :) There's a "Review my startup" post coming pretty soon.

~~~
kordless
Hey! I'm one of the Loggly guys. Wanna grab a beer and chat about how we're
doing it?

------
jwr
I am a co-founder in a startup (Fablo, <http://fablo.pl/>) that does exactly
that. In order not to get wiped out by Google the day they decide to give that
functionality away for free, we decided to specialize on E-commerce search,
particularly in inflected languages, which is a much harder task than
searching English.

------
nl
I have Solr mostly working on AppEngine.

Obviously (for those of you who know the Solr codebase), there are some pretty
extreme hacks to get around the lack of file system access, but nothing that
couldn't be cleaned up.

I was a little surprised about the lack of interest in it when I emailed the
solr-dev list.

~~~
MarcusL
The Compass GAE walkthrough ([http://www.kimchy.org/searchable-google-
appengine-with-compa...](http://www.kimchy.org/searchable-google-appengine-
with-compass/)) was also able to get a rudimentary Lucene indexer up and
running. However many people eventually ran into problems with App Engine's
30-second request processing limit. For your Solr instance did you utilize the
Task Queue or have to do anything special to work around the 30-second
limitation?

~~~
nl
I'm not doing anything with Task Queue at the moment. I'm using the Lucene
implementation from Compass (and I've used it elsewhere too), so I am familiar
with it.

From memory, I think Compass had a unique problem with the 30 second limit
because it would try and re-sync the non-Lucene data with the Lucene indexes
(I can't remember what the trigger was for this).

I had quite a lot of issues with Compass-GAE - my impression was that it
wasn't really production ready. However, I did notice that Google is using it
for their ThoughSite example app, so maybe it has improved.

------
csmeder
I feel dumb, but I don't totally understand what is being wanted. Basically a
google search with "site:example.com query" but you have full control of the
results?

~~~
MarcusL
While Google/Bing/etc are great for searching public sites (when you can
tolerate the latency between when content is posted and when google indexes
it), they doesn't work at all for sites which are
private/intranetted/password-protected or otherwise inaccessible to web
crawlers.

~~~
csmeder
Makes sense, thank you.

------
dacort
<http://www.websolr.com> ?

