It's easy to gather the necessary data, but it's hard to know which parts of that data are the most relevant for finding good content and avoiding bad content. Is it more relevant if key words show up in links or titles than in the body of the text? If so, SEO spam sites will include a bunch of keywords in links and titles. Is it more relevant if keywords show up in the first 200 visible words of the page? If so, spam pages will make tons of pages with relevant keywords at the top.
The hard part about building a search engine isn't indexing the internet, it's adapting to spam. Spammers are continually adapting to changes in the algorithm, so the algorithm needs to adapt as well. And the more popular your search engine is, the more money you make and the more able you are too adapt to spam (and the more spammers focus on your engine).
So, the problem isn't that Google has a better index (though I'm sure it does), the problem is that nobody else has the will to spend the money necessary to tune the search algorithm to stay on top of spammers. When Google started, companies didn't care as much about improving their index and instead focused on building their other content (Yahoo, MSN, etc). Google saw the value of search and got a lead on everyone else in terms of curating results, and now they have the momentum to stay in front and have shifted to building content to improve monetization. Nobody else has the monetization network for search that Google has, so they'll continue having the problem that other companies had (Microsoft wants to point you to their other services, DuckDuckGo is limited by their commitment to privacy, etc).
In short, Google wins because:
- it was better when it mattered
- it makes money directly from search
- its other services improve their ability to understand what users want, which improves search quality and ad relevance
You can't make a better algorithm by being clever, you make a better algorithm by having better data, and that's hard to come by these days. The only way I can think of a competitor stepping in is if they target an underserved demographic and focus data collection and monetization there, and DuckDuckGo is close by targeting privacy conscious power users.
The irony there is that DuckDuckGo can't collect much of that data precisely because of their privacy focus.
You didn't just hit the nail on the head; you drove it all the way in with a single blow. Bravo.