

Is there a need for an alternative e-commerce/web search and indexing service? - zachfier

I'm one of the engineers here at Lightcrest - I have a question for the Y Combinator community. We feel that the current search options for developers, publishers, and store-owners are limited and lacking features. Google eCommerce search is expensive ($25k+/yr) and Google web search is limited in scope, dumps ads, and doesn't provide nice features like typeahead. Google also doesn't provide aggressive indexing - even their new indexing service can take up to 24 hours to index new content. Other services like websolr and indextank have some nice options, but they don't let you go from 0-60 very quickly - you need to code up solutions using client libraries before you can get started, and there is no advanced crawling capabilities. E-commerce search solutions should let you rank and promote items on the fly. And you should have access to this kind of functionality for less than $25k/yr.<p>Would you use an initially free, eventually pay-as-you-go service that is substantially cheaper than Google's to address the aforementioned needs?
======
nzadrozny
Hey there, Nick from Websolr here.

Of course I'm a bit biased here, but where search is concerned, Solr is a
really great option for you. Certainly I don't need to rehash all its features
and benefits here, when you could just as easily figure that out with a bit of
light research :)

It's a very fair point that getting started with Solr has a bit of a learning
curve to it. With Websolr we (fizx and I) think we at least have taken the
hosting and configuration side of that off the table. But there are still
client libraries to learn, and schemas to configure. I think learning some of
those basics is a great investment that will pay off, but the up front cost is
certainly there.

A word about crawling: it's definitely doable with Solr. The advanced approach
is to use Nutch[1] to fetch and feed content to Solr. The simpler way is to
use wget. In fact, I've got a basic shell script[2] to show off the latter :)

I'm always game to talk about how Solr can see more general usage,
particularly when it comes to lowering the cost of getting started. So feel
free to keep the questions/feedback/brainstorming coming.

1\. <http://nutch.apache.org/>

2\. <https://gist.github.com/774946>

------
mahmoudimus
I used to work at a comparison search engine that basically didn't have the
resources to devote to searching. We kept pushing for more resources allocated
to this, but we just didn't have the bandwidth and/or capable engineers that
were really experts at information retrieval.

Another engineer and I attempted to get search working right, but the
difficulty when it comes to search is overwhelming and just not an easy
engineering feat. Advanced search integration for integral user experience
features, such as understanding user intention, query disambiguation, search
term boosting, information metadata, typeahead, etc were very difficult to get
right, hard to really test, and required lots of knowledge about information
retrieval -- which a domain that not many engineers are familiar with.

I think a service that provides an easy to use API and allows the ability for
companies to easily and effortlessly address some of the concerns above is
always in need.

------
mgkimsal
not sure. part of the benefit of the use of client libraries is that I can
make sure private data stays private - 'crawlable' content is going to make me
have to format it in such a way as to make sense to the crawler, and deal with
making sure it's not accessible.

do you have a way around that? or am I missing something?

~~~
zachfier
We were actually considering both an initial crawling mechanism so you can
"just go", and a very simple REST API for adding, updating, and deleting
existing documents in the index. So you would be able to have both a 0-60
approach for getting your already-public-facing content into a sub-second
search index, and also the ability to granularly submit document batches.

As far as security is concerned, one assumption is that if your content is
already crawlable, it's not private. That said, it would be trivial to
implement a mechanism for keeping private data private - and even public data
from being indexed (i.e. with a meta tag.)

