

How I Wrote a Search Engine in 6 Weeks - Readmore
http://www.embought.com/blog/show/5?t=How-I-Wrote-a-Search-Engine-in-6-Weeks

======
okeumeni
I will suggest that you read the article again and revise your judgement. This
is why.

I founded a search engine (web, images, video and business search) now in
Alpha, with limited resources as well; trust me I know how hard it is.

Simple questions, how good is your crawler? (You shouldn’t implement a scheme
for each web site, though I understand for your business it kinda make sense)

How much information do you have in your repository? (You should consider data
in TB if not you are still far from any trouble) Ferret is a great indexing
tool but how much data can it index? How scalable is it? Looking at the tech
behind Ferret, how much resource does it use? How good is your relevancy
model? (This question is tightly link to your indexing)

I loved your idea, Just one thing: work on the relevancy again; I searched for
‘ruby on rails’ and got ‘ruby’ only related results first the <relevant ones>
after. Also I will suggest you cache images to enhance user experience. Please
don’t take my review personal.

~~~
Readmore
That's good advice. I'll admit the scaling aspects of Ferret, and Rails at
this point, scare me quite a bit. My goal right now is to continue to expand
my index and learn what's working and what's not.

As was mentioned in another comment I'm looking into other, more long term,
indexing solutions (possibly SOLR), so hopefully that will help in the
relevancy area. I'm also working on an in-house ranking algo to better sort
results.

I appreciate your feedback.

------
tyohn
6 weeks? Why so long? - (just kidding) I am building a search engine as my
current project. It took all of a weekend to get it up and running. I didn't
do it alone my friend helped so I guess maybe that doesn't count ;) We've been
crawling for a little under 3 weeks and I keep making interface and search-
results tweaks but otherwise it works "ok". I am in the process of switching
it over to S3 (maybe EC2) - after that change I think I'll open it up to the
public.

~~~
Readmore
Sounds cool. let me know when you launch I'd like to take a look.

------
kradic
Why do they use a cartoon of Ann Coulter as their logo?

~~~
Readmore
Haha. I hadn't ever looked at her like that.... now I may have to change it.

------
gojomo
Thanks for sharing your experience.

For doing product search at the few-hundred-thousand-item scale, I would
suggest SOLR rather than Nutch from the Lucene family.

You'd need to do your own crawling/scraping, but the indexing is solid,
simple, and flexible. (SOLR's pedigree is from CNET's own product search.)

~~~
Readmore
Thanks for the info. I have looked at Solr and it looks great. I'll give it a
try and write up my thoughts. [Edited to correct iPhone typing mistakes]

------
fizx

      a. You wrote a crawler, not a search engine.  
      b. Ferret will bite you in the ass.  
      c. For a really good off the shelf crawler, look at Heretrix.

~~~
Readmore
Since you can go to www.embought.com and search for products I would have to
say your wrong. If I had only written a crawler I would have a nice collection
of webpages on my hard drive and nothing more. Why are you so down on Ferret?
what problems have you had with it? Just making an off the cuff statement
without facts to back it up doesn't make you look very credible.

~~~
petercooper
Just on the Ferret side of things, it does have a pretty bad reputation in
some circles. Google for "ferret "corrupted index"" .. there are 59 results
alone just for that limited query.

I don't use Ferret myself (I tried it once; it seemed pretty good) but I'm
very well read in the Ruby community (I run Ruby Inside and RubyFlow) and I've
seen more than enough people saying bad things about Ferret, how it corrupted
their indexes, concurrency issues, and what not, to personally avoid it. Solr
and Sphinx seem to get a far better rap.

..

I should add that I had a play with your site after reading your article, and
it's pretty good. There's a lot of trash out there in this field and you've
pulled together a good site. Kudos.

~~~
Readmore
Thanks! I really like ruby inside as well, you have a lot of great info there.

