Yet another unconscious programmer assumption shattered.
I can imagine that if I’m searching for “ball” Google already has the PageRank sorted results somewhere on its servers. That seems easy enough. But what about arbitrary search terms? Searching for “tango durlast 1978” (a soccer ball) gives me results just as fast, does Google have all of that pre-computed somewhere?
If they did that (to pick an arbitrary number) for search terms up to 100 characters (say all 26 letters, ten digits and space) that would mean that they would have to have something like 37^100=6.6e+156 pre-computed lists of results somewhere on their servers.
That can’t be what they are doing. If they were to use only one byte per possible search term (to pick another arbitrary and completely ridiculous number) they would still need to store 6.6e+144 terabyte. Which seems kinda impossible.
2) You have to update your copy of the internet regularly.
3) You don't want to search all that whenever somebody types a few words. So write programs to regularly prepare indexes in advance (just like in the books) and then when the query with words comes you can "just" look up in these.
The steps above sound simple, but you can imagine that the devil is in details, that there's a lot of knowhow needed and a lot of money to do that really right. Just ask Microsoft or even some failed search company.
First think of the web as a big matrix with each row being a document and each column being a term. The first step is to decide how you want to shard this matrix across multiple machines.
You can shard by row, which lets you do really complicated scoring, but you have to hit every machine for every query.
You can shard by column, which is much cheaper, but limits how sophisticated your scoring can be.
Within each machine the problems are all probably fairly familiar.
Of course, storing millions of the "common queries" isn't a joke either and that still doesn't answer the question of how they search the internet within a few seconds anyhow; I'm not trying to give a complete answer, just point one aspect out. Google Instant, by design, provides the Google Suggest queries the vast bulk of the time and I guaranteed that those are either largely or possibly even entirely pre-cached, so you're not doing 10 web searches in two seconds, you're doing 10 hash lookups in 2 seconds. Which is still damned impressive, I can't hardly keep my web site loading in less than two seconds for one hit where I work.
Even if 30% of queries are truly unique (a number that people have quoted from years ago, may no longer be true) caching 70% of the queries is a big win. Also, instant search is heavily relying on Google Suggest, which by it's very nature, are queries which have been performed already, so are trivially cacheable.
It's a great example of using the UI to encourage a desired behavior.
Google can't cache nearly as much as you would expect, not because the queries change, but because the results change.
The tough part for Bing, is with fewer searches, their predictive engine isn't quite as good. But this is where the Yahoo deal will pay off.
But all they need to do is fire off predictive searches and bring those back -- and of course, each predictive search only needs to bring back ten or fewer results.
Bing already does the predictive text suggestion. They just need to send back queries along with the actual text suggestion. At worst they'd need to get some more servers. But I fully expect they'd be able to roll this out in months if they wanted to.
Although personally I'd still focus on relevance. There's so much stuff both engines suck at, I'd love for them to fix some of those holes first.
I assume some of the ajaxy Google logos have been leading up to this - especially the one that coloured in the letters as you typed.
It's rather quite annoying.
This seems like such a tiny thing while I'm writing it down, now, but it's frustrated me steadily today; I've context-switched out of whatever I was thinking about more than a dozen times to figure out why I was getting a popup warning. Finally I just turned it off in search settings. The barely-noticeable increase in speed just isn't worth it.
It's more normal at Google to have fewer people and 30 or 60 minute mtgs.
If this is the case, then what's the point of showing the search results as the user types? It seems that just having predictions is sufficient.
For example, we tried a prototype where we waited for someone to stop typing before showing results, which did not work. We realized the experience needed to be fast to work well.
Maybe most people can't figure out they can press cursor down + return.
Or maybe this is a flashy way to distinguish their product from the competition.
Or maybe this is a way to encourage users to use shorter queries (as they can see how effective they are from the search results), thus forcing advertisers to pay more for more general terms.
This is a pretty big deal to people like me who go to great lengths to avoid using the mouse. I've not tried it much because I don't usually search from the google homepage, but maybe I'll try it now.