

Ask HN: It's 2011 Sphinx or Solr? - velly

Our existing platform at rgbdaily.com is solid LAMP.  As we approach our first round of PR we noticed that the MySQL fulltext search we had implemented on RDS was simply not cutting it with increased volumes. Sphinx looks like a winner, but Solr seems to have a lot of the hearts and minds of the SaaS crowd.  Any ideas from people who have had to make the choice, we really don't have time to evaluate both?  Sphinx seems like hours to implement and we have already started.
======
thefreshteapot
Having now been exposed to both...

If you ever want to use it as a caching system ... solr is the way to go.

I personally think very highly of Sphinx, yet be aware that Sphinx will return
ids not "content".

When handlesocket was announced [1], for me the gem other than the obvious was
this nugget of information.

"No duplicate cache When you use memcached to cache MySQL/InnoDB records,
records are cached in both memcached and InnoDB buffer pool. They are
duplicate so less efficient (Memory is still expensive!). Since HandlerSocket
plugin accesses to InnoDB storage engine, records are cached inside InnoDB
buffer pool, which can be reused by other SQL statements."

Knowing that with the latest version of sphinx your can make use of their
Query format via code or via the mysql console... it has me more sold on
Sphinx than solr.

We use solr where I work, it is nice to use and coming features will make it
better...

The most annoying aspect of solr for me is, you cant index elsewhere and
easily merge it...

Either are miles better than fulltext search.

Perhaps list a few of your needs and requirements then we might be able to see
if one stands out ahead of the other.

[1][http://yoshinorimatsunobu.blogspot.com/2010/10/using-
mysql-a...](http://yoshinorimatsunobu.blogspot.com/2010/10/using-mysql-as-
nosql-story-for.html)

~~~
velly
Thanks fresh,

I'd love to get rid of MyISAM. Doing the course of alpha testing it was
corrupted once and that never happened with our InnoDB tables. Also, looking
at Tokutek as the long term solution for all tables. The jump from InnoDB to
Tokutek is straight forward if fulltext is taken out of the mix. I really wish
Tokutek biz dev would talk to Amazon about offering the storage engine in RDS.
Any who, can you speak to the maintenance profiles of the two? It seems like
Sphinx is pretty hands off. Also having many years of Java experience, I feel
like Sphinx might be a better bet if I start off running in a memory
constrained environment like an EC2 small or medium.

