(Apologies for the off-topic comment; new ideas are exciting!)
I've heard through the grapevine that they were able to index and serve 100 billion documents on 100 machines, which is a pretty impressive technical accomplishment if true. I'm surprised they weren't acquired for that. It's unfortunate that their search quality wasn't up to snuff yet.
How many queries per second can they handle on those nodes, and with what latency? What kind of relevancy calculations were they able to do at query-time in their system with 1B documents per node? Were they able to support query-time aggregation of structured fields in their documents? Was the index stale or did they support continuous feeding and indexing of new documents? If the latter, how well did they meet their SLA QPS and latency when indexing new documents?
I can set up a single search node and fill it up with God know how many documents any day, but the difference between supporting 10 QPS with ~500ms latency and 3000 QPS with the 99 percentile below 40ms is really more interesting than exactly how many documents I have per node.
I started out before those APIs existed, and did all my own crawling & indexing. When they came out, I decided to focus on my value-adds because I thought that was a quicker path to customer acquisition.
Furthermore, I don't use Yahoo/Bing straight up, e.g. I re-rank, omit, etc. I also mix them with my own my index/negative spam index from my own crawling efforts.
All re-ranking you can do is very limited with these services because they are black boxes: you don't see what factors went into ranking a page the way it happened, you can't tweak the weight of different factors, you can't add new factors. All you can have is hardcoded rules like "if there is a wikipedia page in the first 20 results, bring it on top" that don't really add much value, because if that wikipedia page is any good it would be on top already. With spam results it's similar: you can provide impressive customer service blacklisting spam on user requests, but major search engines are already pretty good in low-ranking spam so much that you don't see it if there are any other meaningful results.
Your marketing here on HN has been brilliant, you have some very interesting UI decisions and possibilities that Google doesn't have, but your added value is definitely not in improved ranking of results.
They also got a lot of marking out of their "ex-Googlers take on Google" narrative which probably wouldn't have worked out as well if they were using something like your strategy.
I've heard this in other discussions of DuckDuckGo here, and I don't understand why bing/yahoo allow a potential competitor free access to data that is so important to their search businesses. What's in it for Yahoo/Microsoft? Or is DDG paying for the privilege?
"We are exploring a potential fee-based structure as well as ad-revenue models that will enable BOSS developers to monetize their offerings. When we roll out these changes, BOSS will no longer be a free service to developers."
I don't think the ordering of the results is Google's competitive advantage anymore it's branding and habit.
I think Cuil should sell their index as a service. Over which businesses can implement PageRank and http://en.wikipedia.org/wiki/Pagerank#See_also type of algorithms.
Technical achievements are great; but Gabriel is much better placed. He is self funded, he is building on existing tools (always good advice), he leveraged us, the hacker crowd, who can be very loyal, he clearly listens to his customers etc.
Cuil, on the other hand, produced some very confusing (if technically interesting) things and then ranted about those who criticised them. They had a lot of big bucks VC money (always a warning sign) and didn't appear to be leveraging loyalty from any user base.
Even if the problems these two startups are facing are different; there is a lesson here. One is how not to build a product, and one is :)
To me, Cuil looked like a prime example of design by committee whereas DDG is clearly opinionated but thrives because of it.
By being small DDG can address issues that others will not even think or bother thinking about i.e. enhanced privacy controls, TOR utilization etc.
I believe that Cull was a dream that went south. Unfortunately that dream had a hefty bill ($33m).
Strangely enough I visited that site once before and could not put a name to a site until I saw a screenshot of it.
I once contacted Cuil about some worthless search results, and got a standard reply asking me to be patient since they were a small company. But there wasn't any hint in the email that they would actually address the issue, so drastically reduced my use of them and never bothered to contact them again. Listening to users would have probably helped if my experience indicates a pattern.
But for those saying "no one can take on Google" and "search isn't an interesting space any more", remember Google got told the same things about the incumbents and "the portals" when it started.
There is always a better way out there. Someone just hasn't found it yet. Chances are low that Google will still dominate search in 10 years, or that most people will search the way they do now.
At the same time, I happened to be on eBay today and noticed that I've been an eBay member for just over 10 years. Why is it that we haven't found a better way to run auctions online? Most people still list and participate in online auctions almost exactly the same way they did 10 years ago.
I do agree that the pace of innovation is increasing, so maybe 10 years starting from 2010 is the same as, say, 20 years starting from the year 2000.
But I think chances are high that Google will dominate search in 10 years. If someone discovered a better way (and there are many, they just haven't caught on yet) they would just buy 'em out, right? ;) Or maybe Bing will win. But then it's Microsoft winning, and Microsoft is older than me.
The main thing it was going to do differently, was instead of having a fixed ending time for an auction, it would be conducted like a regular auction - in real time, via AJAX, and the auction ends once no one else bids.
Seemed like that realtime element, along with no 'sniping', fairer bidding, a cool UI etc would be a good reason for people to switch.
But the network effect is massive. To be successful as an ebay competitor, I think you'd need a pretty large marketing budget, or several people working on promotion full time.
It's ripe for the picking though. And massive profits.
I expect if there is a different way to slice search, google will explore it. Fundamentally, if you're going to beat them then you have to index more, better, faster, and somehow produce better results, then provide a better experience. I'm not saying people shouldn't challenge them but they're good and I think it's an insanely difficult challenge to beat them.
In 10 years, we'll still know of and probably use Google, I don't know if that's true for bing or ddg, or any of the others. Unless we stop searching.
All things aside Trade Me has been a great web success story for the local industry.
It's not pretty, but that's about the least important element of web-design.
It's very easy for people to understand how things are organized. Cities are organized together, categories are organized together. All of the pages are text-based so pages load fast. Various functions on the page are explained with helpful text explaining what's going on when you try and post or respond to a post. Etc.
Their UI can be improved with a little more padding in places and a little more focus on quality typography, but it's remarkable that such a large site has maintained such a quality design in the face of enormous pressure to change.
Hunch + Quora + Facebook questions could well be the biggest threat to Google's quasi-monopoly.
It feels like they were one of those x factor winners or big brother winners who have a lot of media attention to start with and then become complacent and do not keep up of playing the game of keeping the attention of the public. Marketing is not just five minutes of fame.
I think also that what the above teaches is that there is a lot of hype in social media. Sure there are real serious and enlightening conversations, but much of it is hype and trends and discussions. Maybe it is best to retreat from mass communications like the internet and focus on your projects only. Better be an actor in my own show than a spectator in others.
Would love to learn from from Cuil 's cuil code :)
Cuil's problem was with their ranking, which may be related to what they were indexing, but not to their ability to scale out.
Their index was very large, but their traffic was not. I think I've run websites that handled as much traffic as Cuil did on a good day.
"Cuil is dead".
I don't think this comes as a surprise to anybody.
In a way they still are but they also to a lot of ehow type SEO.
With google's privacy snafu's and other mishaps, I could see them getting upset within 5 years.
i'm not sure if a company will ever be able to take on Google in search directly
A far more likely scenario for a major change in this sector would be for social search to eclipse the crawl it and rank it model that is currently dominant.
Would you rather make $ 1 from a $.1 investment or $ 10 k from a $ 2k investment (assuming the former isn't scalable because there are no more users to advertise to)? If there is a non trivial time cost in setting up management, billing etc, you might not even bother with the former.
Also, if total return is small and capped, it may not even be worth the time to calculate ROI.
Problems there would be a) to beat Microsoft and Google at that game, and b) it would turn search into a low-margin commodity. Both make it hare to turn a profit.
Besides, sow do you know what where the investors willing? AFAIK, investors are normally willing to invest one hundred now and get one thousand several years later. In other words, investing 33M to get 330M 5 or 10 years later.
Not quite. VC plans on 9 out of 10 deals not working out. So that remaining 1 in 10 has to make enough to pay for the rest. That means that a 10-fold return just breaks even. But it gets worse. For an investment with a 10 year horizon they need to beat alternate investments. If you peg those at 10%/year (compounding annually), then you now need a 25-fold potential return on investment for the fund to have a chance to meet its goals.
Is the reasoning wrong, or is there some troll going on that I don't know about? It seems quite realistic to me.
The investors paid for the full probabilistic model, where in 10% of the cases Cuil would have succeeded in beating Google and would have made tens of billions of dollars. That alone is worth 10% * tens of billions = billions. 33M for that doesn't seem much in that perspective.
Your statement about "I am certainly sure" and "10 times more" ignores the probability distribution of your potential success, and how big is your success in each one of the cases.
I am sure not everyone would agree to subscribe to this rule, but I do.
By all means it should not take you $33M to recognize that you are in the wrong direction, the first 2-3 should do the job, unless you choose not to see what is happening in front of your eyes.
This is to be said to the founders and investors altogether.
Cuil was a joke, an expensive joke, period.