

Ask HN: Distributed lucene implementation - ameyamk

Any good pointer/ analysis on how to implement distributed search using Lucene? Other information on how lucene works, and information about its scoring algorithm would be useful. Also any advice on good programming/ algorithm related blogs would be awesome as well
======
riffraff
I don't understand your problem, but you may be interested in investigating
the existing solutions:

\- cluster of solr servers (but it basically means different indexes composed
at query time, so scoring will be affected depending on how you shard, at
least last time I checked)

\- katta, a platform for serving large lucene indexes

\- lucandra, a reimplementation of the Index(Writer|Reader) classes of lucene
over cassandra (and of solr)

\- lucene-on-cassandra, implementing only the Directory interface over
Cassandra

\- lucene over distributed memory(terracotta) or grids (gigaspaces,
elasticsearch)

but once again, I have no idea of what your problem is so I don't know what
you may want.

Basically I'd say the problem boils down on implementing the index in a
distributed way if you want scoring consistent with the normal lucene, or just
sharding/querying different indexes and reconciling the results if you can
live without distributed tf/idf

~~~
ameyamk
No specific problem actually, was just trying to learn and understand, what if
some one wants to implement large scale search system using lucene how one
should go about it. Also, all those options above look interesting, but I am
also interested in what sort of thinking/ analysis should be done before you
make a decision. In nutshell, I was looking for case studies/ blogs talking
about implementing lucene in a distributed fashion. But thanks for your
pointers.

