
Low Hanging Fruit for Most Online Stores: Better Search Features - danielnicollet
http://blog.exorbyte.com/2010/08/low-hanging-fruit-for-most-online-stores-better-search-features/
======
jseliger
This is low-hanging fruit not just for online stores, but for many kinds of
sites, especially those that don't specialize in search. For example, I often
have to use the site Grants.gov, which you'll recognize if you've ever had to
submit or deal with research grants. And the search engine is _terrible_ and
frequently doesn't work at all.

The last time this happened, it inspired me to write a blog post about how to
use Google for site-specific search:
[http://blog.seliger.com/2010/10/24/google-faster-than-
grants...](http://blog.seliger.com/2010/10/24/google-faster-than-grants-gov-
finding-the-capital-fund-education-and-training-community-facilities-program-
and-the-fy-2011-recovery-implementation-fund) . Most of our readers aren't
technical and probably don't know how to do this, but it can make search much,
much better.

But that's for Grants.gov, and grants.gov dispenses money, which highly
motivates its users. If you're trying to get users to pay you, on the other
hand, you'll need to pay more attention to their experience.

~~~
colinsidoti
I had the same first instinct. Search is almost always an after thought unless
it's the primary focus. If anyone wants a startup idea, the search feature on
forums is something that people constantly suggest using, but generally
provides awful results. Furthermore, a lot of forum owners restrict search
usage for guests because it eats memory. If you make a replacement for
vBulletin, phpBB, and Invision I wouldn't be surprised if you could sell it
for $50+ to more active communities.

~~~
byoung2
_If you make a replacement for vBulletin, phpBB, and Invision I wouldn't be
surprised if you could sell it for $50+ to more active communities._

The problem with search for forums is that there is no getting around the fact
that you have to either search through the text of the posts (standard search)
or have a massive index that gets out of date quickly (Sphinx). You can't get
more creative than that and have your software still be compatible with most
hosting (including $5/mo shared hosting that most people will have). When I
worked at Internet Brands (makers of vBulletin), we used Sphinx on all of our
forums, which worked, but was far from perfect. It's just the nature of the
beast for forums...the interesting stuff is in the text fields (hard to
index), compared to an online store where you can sort by category, tags,
price, title, etc.

~~~
shafqat
Why is the text inside the document hard to index? Isn't that what Lucene/Solr
is for? Not sure what you mean...

~~~
byoung2
_Why is the text inside the document hard to index? Isn't that what
Lucene/Solr is for? Not sure what you mean..._

You just proved my point...if I have to install a standalone Java full-text
engine running on Tomcat, I've already gone beyond what most forum owners can
do on shared hosting.

Imagine searching through all posts on HN for the phrase "proved my point"
anywhere in the post. There is no way other than iterating through all of the
millions of posts.

For commonly-searched phrases such as "P vs NP", "scalability", or "rate my
startup" you can create an index for faster searching, but it is impossible to
anticipate every possible search, and in particular, long tail searches which
can be the most useful will be the least likely to be indexed.

~~~
nkurz
_Imagine searching through all posts on HN for the phrase "proved my point"
anywhere in the post. There is no way other than iterating through all of the
millions of posts._

This is not true. Rather, in addition to the document ID, the inverted index
stores the position information for each term occurrence, and can very quickly
find any phrase of your choosing. While it's possible that this isn't always
the default behaviour, all modern general purpose search engines support this.

~~~
danielnicollet
I concur with nkurz here. Inverted index are the basis for most full-text
search engines and they can handle this quite well.

Another issue with searching a large forum like HN though can be keeping the
index up-to-date through crawling new posts constantly to add the terms they
contain to the inverted index. However, you can usually build a simple script
that adds news posts URLs to a list of priority crawl targets.

Free HTDig (<http://www.htdig.org/>) for instance could do the task pretty
well and run on a budget hosting plan described above. It can index up to
100ks of documents easily and may require disk space upgrades for large sites
but not much more.

There are many other inverted index search engines available and some now
easily index without crawling through XML feeds from your CMS or forum
software.

~~~
byoung2
See my original post...I acknowledge that an inverted index can do it, but
when you have to make it compatible with a wide range of hosting and technical
ability as in the case of vBulletin, you are more limited. The typical forum
owner will not install a standalone search engine, especially if his $5
hosting doesn't support it

~~~
colinsidoti
Yes, but perhaps larger forums with dedicated servers can handle it? If this
is true, selling a Search Replacement for forum software could really be a
viable business.

------
danielnicollet
Thanks for your input and sorry the site has been down part of the evening. I
have no access to the web server right now but it will be back up tomorrow.
Dan

