That's because cuil was building a search engine from scratch rather than relying on bing's/yahoo's index to do the underlying scoring. Maybe they shouldn't have started from scratch, but they were certainly tackling a much harder problem (maybe not the right one.)
I've heard through the grapevine that they were able to index and serve 100 billion documents on 100 machines, which is a pretty impressive technical accomplishment if true. I'm surprised they weren't acquired for that. It's unfortunate that their search quality wasn't up to snuff yet.
How many queries per second can they handle on those nodes, and with what latency? What kind of relevancy calculations were they able to do at query-time in their system with 1B documents per node? Were they able to support query-time aggregation of structured fields in their documents? Was the index stale or did they support continuous feeding and indexing of new documents? If the latter, how well did they meet their SLA QPS and latency when indexing new documents?
I can set up a single search node and fill it up with God know how many documents any day, but the difference between supporting 10 QPS with ~500ms latency and 3000 QPS with the 99 percentile below 40ms is really more interesting than exactly how many documents I have per node.
The harder problem IMHO is customer acquisition. Many companies have built (and still have) link graphs of reasonable quality, but they all struggle to gain real customers.
I started out before those APIs existed, and did all my own crawling & indexing. When they came out, I decided to focus on my value-adds because I thought that was a quicker path to customer acquisition.
Furthermore, I don't use Yahoo/Bing straight up, e.g. I re-rank, omit, etc. I also mix them with my own my index/negative spam index from my own crawling efforts.
I don't use Yahoo/Bing straight up, e.g. I re-rank, omit, etc.
All re-ranking you can do is very limited with these services because they are black boxes: you don't see what factors went into ranking a page the way it happened, you can't tweak the weight of different factors, you can't add new factors. All you can have is hardcoded rules like "if there is a wikipedia page in the first 20 results, bring it on top" that don't really add much value, because if that wikipedia page is any good it would be on top already. With spam results it's similar: you can provide impressive customer service blacklisting spam on user requests, but major search engines are already pretty good in low-ranking spam so much that you don't see it if there are any other meaningful results.
Your marketing here on HN has been brilliant, you have some very interesting UI decisions and possibilities that Google doesn't have, but your added value is definitely not in improved ranking of results.
Thx. I can't reveal too much here, but I do a lot more in the reranking area such that the top 20 will look very different for many queries when compared. I think these improvements go a long way to improving search UX, but ranking is subtle and so doesn't get noticed much (except when failing miserably).
That's almost certainly the right strategy for you. The people who founded cuil had a solution in look of a problem, in that they came from the search infrastructure teams at Google and wanted to use those skills. It turns out that running a search engine on very few machines isn't much of a competitive advantage, or even having a very large index.
They also got a lot of marking out of their "ex-Googlers take on Google" narrative which probably wouldn't have worked out as well if they were using something like your strategy.
"That's because cuil was building a search engine from scratch rather than relying on bing's/yahoo's index to do the underlying scoring."
I've heard this in other discussions of DuckDuckGo here, and I don't understand why bing/yahoo allow a potential competitor free access to data that is so important to their search businesses. What's in it for Yahoo/Microsoft? Or is DDG paying for the privilege?
At the moment DDG is effectively a customer, not a competitor. If DDG ever became large enough to show up on Bing's radar (Bing currently has 600x as much traffic), you can bet that the terms would change.
"We are exploring a potential fee-based structure as well as ad-revenue models that will enable BOSS developers to monetize their offerings. When we roll out these changes, BOSS will no longer be a free service to developers."
Ok, so the problem they were solving was more difficult... but they are going after similar markets (or at least segments of the same market).
Technical achievements are great; but Gabriel is much better placed. He is self funded, he is building on existing tools (always good advice), he leveraged us, the hacker crowd, who can be very loyal, he clearly listens to his customers etc.
Cuil, on the other hand, produced some very confusing (if technically interesting) things and then ranted about those who criticised them. They had a lot of big bucks VC money (always a warning sign) and didn't appear to be leveraging loyalty from any user base.
Even if the problems these two startups are facing are different; there is a lesson here. One is how not to build a product, and one is :)