This article misses one of the primary reasons for AV's demise -- we didn't update our primary index for several months just as Google was gaining mindshare. A ridiculously high percentage of our front page links were 404s, while Google was always fresh.
This was particularly bad because one of our earlier strong points was fresh indexes. Our ability to refresh the supplementary index on the fly was awesome. When you lose one of your primary strengths, it's noticeable.
I don't mean to minimize the downside of losing focus. That's one of the biggest lessons I learned while working there. I'd say that our failure to maintain a high quality index was directly caused by our loss of focus, in fact. But it's important to remember that both UI and the underlying index quality matter.
"What about PageRank?"
Eh, not actually unique to Google. Remember that Jon Kleinberg was developing HITS in parallel -- AV was well aware of the concept of measuring page importance using incoming links, and we had our own implementation. It may not have been as good. It's hard to tell when your underlying data source is stale.
Also, any AV article which doesn't note that we bought Elon Musk's first company is inherently flawed. ;)
From the article: "[Alta Vista] broadened the use of boolean operators in search. Like some competing search engines, it supported AND, OR, and NOT."
My recollection is that Alta Vista supported boolean operators, but defaulted to OR while Google defaulted to AND. So searching Alta Vista for something like "$CommonWord $UncommonWord" would return results with high-ranking pages for $CommonWord that drown out all the low-scoring pages for $UncommonWord, whereas Google would return results that match the intersection (which would actually be relevant to the user's query). I'm convinced this default might have made a bigger impact on Google's success than any PageRank magic.
Although it's very much anecdotical, I, too, distinctly remember that Google defaulted to AND (unlike today, a very hard AND at the time), and that this made a noticeable difference in searching habits.
My theory on why this might matter even for people who knew how to use the operators is this: With OR as default, you would first try your query without operators, get page upon page of irrelevant results, and then start to narrow your query down. With AND as the default, you would type in the query, and if you only got a few irrelevant results, or often no results at all, you would try alternative terms instead.
It seems that progressing from no result to desired results by choosing alternative terms just makes more sense than having to wade through irrelevant stuff, and the default encourages one methodology over the other.
Today, it's very hard to make Google return no results at all. Not just because the amount of content grew to an unimaginable scale in the meantime, but also because Google has become way, way fuzzier in the way it interprets search terms, likely to better suit a larger and different audience. A lot of times today, I have to switch to "verbatim" mode first, at least for technical stuff.
The problem with AND as a default is that 'normal' people (i.e. people who have no idea what boolean means) operate search engines something like this: 1. type some words in, and get no or incorrect results; 2. add some extra words, and repeat the search, with the idea they are making the search more specific; 3. be confused as to why they still don't get the results they want. a default of OR, on the other hand, means adding search terms ends up being useful...
As someone old enough to have used search engines extensively myself and watching over others using it during the AV - Google transition, I can definitely say that defaulting to AND was one of the most important reasons Google appeared to give better results than AV for both advanced and basic folk.
Defaulting to AND means that you aren't just searching for the most common term in your query that drowns out the rest. Also, adding terms narrows the query down rather than making it more general. This behavior strikes me as far more natural for "normal" people.
sure, i agree, except that they also apply this 'narrowing down' logic when insufficient or no results are returned, thinking the query needs to be more specific in order to work. i have observed the following sort of behaviour:
1. entering 'invisible marmalade teapot' and getting no results
2. changing to 'invisible marmalade teapot with tartan cosy', again nothing
3. so 'invisible marmalade teapot with tartan cosy in outer space', ditto
you get the idea...? it's just like in the real world, when you might go to a bookshop and say
You: do you have that new crime novel in stock?
Bookseller: er, i don't know which one you mean?
Y: the new crime novel by john grisham, pelican something or other?
B: oh, right, yes, here it is!
and everyone is happy.
default OR in a search engine would mean my first example eventually starting to return results about transparent space coffee pots with tartan cosies, ignoring the first few terms but the rest match, which is often helpful, particularly if you're doing an exploratory search for something where you aren't sure of the exact details.
Hm. I think perhaps a better way of putting it is that the hard AND issue is when people search using a natural language type query (I know about stop words, assume these are always filtered out) and include some extraneous term, so 'What is that new crime novel by John Grisham about a Penguin I think?' will return nothing, and no amount of extra terms added at the end will help, until you delete 'Penguin'... Of course it's anecdotal, but I still suspect it's one of the reasons for the hard AND to OR switch...
As of the past 2 weeks Amazon's search feels like "we're not even going to try to get close anymore and aren't even going to show you matches that contain the words you're looking for, here are some random stuff plus some things you looked at recently".
Yes, easily discoverable from the links under the search box as Tools -> All Results -> Verbatim. Ignore the sign about the leopard, I think that's just left over from some other project.
No, it refers to verbatim mode, which ensures every word in the search is on (almost) every page found. (You would think google works like this by default, but in practise, very far from it) Using quotes around the search terms ensure they appear in the quoted order.
To switch on Verbatim, I click "Verbatim" on the left side of the google page, it appears just under the alternative, "All results". I think for other systems it can be well hidden in menus. I use it almost every time I google anything. Otherwise you get a load of irrelevant crap.
I've long believed that default OR in keyword searches is always a mistake, and the default should always be AND.
The reason being you can just search twice if you want either word, but in the vast majority of cases you want both words when you enter them in the searchbox. Most companies kind of split the difference and put the AND results first then fill in with OR results, but that mostly just leads to "if your answer isn't on the first page, it's not going to be on the second or the hundredth".
I had an ISP at the time, and remember teaching users to always search on AV using "keyword1 AND keyword2", to get the results they expected. When Google came around, this became unnecessary.
IMO this was one of the biggest contributors to the perceived quality advantage of Google vs AV.
It’s not always a mistake. The OR or AND just provides initial filter and then you get to rank pages and take top N. If your ranking algorithm put a lot of weight on the fact that all words exist then you can get same result as AND but with benefit that if no page exist then you may suggest something. It also depends on how you present these results.
Alta Vista defaulted to AND. Crappy search engines like Excite defaulted to OR because it's easier to serve up a lot of low quality results in an OR search. Moreover Alta Vista had "NEAR" which made it unique among search engines.
It always took 2 to 5 searches using Altavista to get the results you were looking for. This was a huge improvement on Excite and Lycos which might produce infinite results with absolutely nothing relevant. There was a lot of noise like source code archives. With Google search the first page usually had a useful result.
Probably the biggest thing that destroyed Alta Vista was the horrible flashing banner Ads at the top of the screen.
I don't remember stale indexes being a problem with AltaVista, its interesting to hear about that. When I switched to Google, it was because I was reliably finding good results within the first few entries. I'd forgotten about it until now, but it used to be a normal thing to page through several screens of search results - performing some kind of human relevance/ranking task on results that were simply too noisy.
From the article:
> This move away from AltaVista’s streamlined search experience made AltaVista more similar to its competitors. Users gradually began to switch to a newcomer, Google, for the simple search they missed.
I don't think this was the case at all. People switched because they got better results.
This was my experience too. Simply better results on the first or (rarely) second page. Also the clean ui. Just an image and a search box. Fantastic on a slow connection.
>Eh, not actually unique to Google. Remember that Jon Kleinberg was developing HITS in parallel -- AV was well aware of the concept of measuring page importance using incoming links, and we had our own implementation.
And the guy who started Baidu (the Google of China), Robin Li, had created his similar to PageRank algorithm, and even patented it in the US, before Google (he filled for the patent in 1997, Google was founded in 1998).
However, Kaiser Kuo points out that Robin Li, the co-founder of Baidu, obtained a patent for hypertext link analysis before Larry Page obtained his “Page Rank” version.
Internal issues I'm not comfortable talking about in depth. It was a combination of technical problems and political problems; I expect any specific person's opinion of the percentage breakdown of those factors depends a lot on which group they were in at the time.
> I expect any specific person's opinion of the percentage breakdown of those factors depends a lot on which group they were in at the time.
I feel like that tells me where your percentage would lie...
EDIT: I apologize, it was intended as a joke to lighten the mood. It's never a good thing to lose one's job or have a company fail, even that long ago, so my response is to attempt humor.
I apologize, it was intended as a joke to lighten the mood. It's never a good thing to lose one's job or have a company fail, even that long ago, so my response is to attempt humor.
> I apologize, it was intended as a joke to lighten the mood. It's never a good thing to lose one's job or have a company fail, even that long ago, so my response is to attempt humor.
I see; I incorrectly read it as accusatory. Thanks for the very civil reply!
That's okay, I have a very dry sense of humor and straight-faced delivery in person as well. Which is unfortunate because it means I can't blame the lack of tone online when jokes don't land, they often don't in person either.
Not an insider, but I doubt it; this was the early commercial web, and business plans just weren't that sophisticated. People were building multi-million dollar companies on things that you could write in a long lunch hour today.
Honestly, I doubt that was even Google's plan for the first years of its existence. It was more "Hey, we made this neat search thing, let's see if we can figure out a way to make money from it".
At least some people in the TWIT podcast family have repeated many times the idea that Brin or Page had at one point said early on that "advertising ruins things" and this wasn't their initial goal.
I don't know if the information is true, but I know I've heard it more than once.
PageRank is using the stationary distribution of a random walk; that’s very different than just incoming links (which AV did have)
In a way, it ranks a link by its own incoming links (which are ranked by their incoming links etc). It was possible to game AV by setting up 100 sites that points to your own.
In pagerank, you essentially had to convince already popular pages to link to yours.
For a while, google effectively had no spam, and AV had lots. Eventually, spammers learned to game pagerank; the arms race is still on.
The person you're replying to is clearly aware of search ranking algorithms. You should try looking into the HITS algorithm they mentioned for some additional context.
To be fair, I'm an ops guy, not a search engineer. ;) It's valid to say AV might not have implemented the basic concepts as well and I don't want to devalue Google's innovation. It just annoys me when people assume PageRank was a unicorn and nobody else was doing anything similar.
As much as I value PageRank, it annoys me when people assume it was a novel idea.
It had been applied decades before through scientific paper references, as a measure to improve on the "number of references" metric, which is more easily gamed. References are more rarely circular (only same time in-preparations can form cycles, unlike web pages). I was sitting in a class about stationary processes in 1996 when the lecturer mentioned this (already old and well known at the time) use case as motivation.
Whatever AV implemented at the time, it was not on par.
Well, HITS is applied after you’ve already selected a subset, at response time; so, if you didn’t select s good subset (and AV often didn’t) then picking the most promising out of that subset is not as helpful.
Pagerank is essentially a “universal authority score” (in the HITS terminology), and it worked well because at the tine you didn’t have pages that were authority for one subject and spam for another. You do now - which is why pagerank is now one signal out of 200, even though it was sufficient on its own 20 years ago.
I do seem to recall AV getting flooded with spam --- porn spam to be exact. I remember myself and all the nerd kids in my high school computer lab would joke that you could search for something completely innocuous like "yarn" and always get back at least a couple of porn links.
This of course was also at a time when "whitehouse.com" was a porn site.
Good point - pagerank is a query-independent signal which is harder to game, but the (very large) part of ranking that is query dependent was still very much gameable.
We had a really slow internet connection at school. So when one of my colleagues introduced us to Google with it's clean interface, we all moved over from a handful of other search engines, and never went back. I can't remember the index being a problem back then.
My recollection was that as soon as anyone - but especially techies - tried Google even once they never went back.
The lack of fresh index surely was a factor but not sure whether it was primary.
Obvious question - why did you not update the index? Was it that it was obvious Google was going to win and made people give up? (edit - never mind - you address this in other comments)
> My recollection was that as soon as anyone - but especially techies - tried Google even once they never went back.
> The lack of fresh index surely was a factor but not sure whether it was primary.
I wasn't aware about index back then, but I saw broken links, at that time I thought that was normal, that it takes time to scan the entire Internet. With Google I generally got working websites.
I did like the boolean search in AV, it helped with obscure searches, especially when name was similar to a typo of a popular word.
>one of the primary reasons for AV's demise -- we didn't update our primary index for several months just as Google [...] I'd say that our failure to maintain a high quality index was directly caused by our loss of focus,
Do you believe Google's strategic decision to use commodity computers and hard drives gave them any competitive advantage (cheaper cost, scaling, etc) compared to DEC Alpha servers?
As an outsider, it seems like Google could iterate its data centers faster and cheaper and therefore, their web crawlers were cheaper to run (also run more frequently), also cheaper to store terabytes of data, and also cheaper to service search queries.
Nah. We weren't using that many servers. Today, that absolutely would make a difference. Back in the day we could run a top ten web site on well under 500 servers, and it's not like we were paying list price for Alphas anyhow.
Years ago, I was told by a drunk* ex-Digital engineer at a lisp meetup that one of the big reasons that Alpha died was that y'all were getting yields of something like 6 wafers/chip, vs. Intel's 97% for the Pentium. Given that, those 500 alphas still must have cost a pretty penny to produce.
* I was also not exactly sober at the time, so these numbers may be a bit off. The number of wafers per chip being greater than 1, though, I am absolutely certain about.
Not the OP, but I remember the early 2000s. Just spitballing here but IMO that made no difference whatsoever from a consumer's standpoint -- but it presumably did from an operational standpoint, given how Google introduced an actual business model to search. The only things that mattered to you as a consumer then was how good the results were, and how convenient it was to get them. Google had a clear edge by the end of 2000 insofar as I can recollect.
>but IMO that made no difference whatsoever from a consumer's standpoint.
With cheaper techniques, the idea is that the "more capital efficient" way of indexing the ever-expanding web would in turn provide better results for an improved consumer experience. It's the old adage of "do more with less".
For example, see the old Danny Sullivan graphs[0] showing how Google's index was growing faster than AltaVista. Having a bigger index lets one return more relevant search hits.
AltaVista wasn't just falling behind in "staleness" of old indexes; the aggregate size of the index was smaller than Google as well.
I'm not so sure it applied back then. Before Google, the core issue was to get a good result in the random garbage you were returned in search results. You'd use quotes and plus/minus or AND/OR operators, maybe strip out words like xxx and porn and warez, and hope for the best. Staleness was, frankly, of little concern if you got a few relevant results. That the AV index was stale was news to me before I read this thread, and I'm not sure I'd buy into the idea that it made much difference. Search engine toolbars made getting results more convenient. But the core of the problem then was getting any relevant results to begin with. For that, Google just rocked.
It did apply, to a point. Before Google, I had switched to AllTheWeb as my search engine of choice since a lot of sites just wouldn't show up in AltaVista no matter what you searched for, and ATW had a bigger index (I guess staleness could have had the same result).
But of course eventually I switched to Google for the better search results.
I wouldn't say it was just the index issue that allowed Google to take prominence. It was the browser toolbar that really killed AltaVista.
I didn't know anything about Google until mid 2000, and when I used it, I just thought it was an AltaVista clone.
Fast forward to fall 2000, and once after getting bad results on AltaVista, I tried Google again. At the time, I never bookmarked either of the sites or set them as my homepage. I remember, when I used Google, I thought to myself, "I'm going to switch to whoever comes out with a browser toolbar. Search should just be part of the browser."
About a week later, I had some bad results on AltaVista, and typed in google.com. Immediately I saw the "try our toolbar" banner.
At that point, I switched to Google, and I NEVER went back to AltaVista. (I think sometime in 2001 I tried AltaVista again, out of loyalty, to see if they finally had a toolbar, but the results were so bad I was in shock.)
I was expecting you to say Google won because it didn't put out one of those adware toolbars crowding your browser. I would never install one of those. I just put Google as my home page.
I feel that Google brought in a shift in thinking of internet companies. In pre-google era, companies operated like an government office, once they gain marked foot hold they virtually stop enhancing their products. Google and other successful startups have taught us that a tech company is prone to fall if it doesn't innovate.
Very true, cousin worked there in early 80's, probably not best source, but was first that jumped out and common knowledge amongst those old enough and into tech at the time to remember.
Was the aspect how they engaged with government bureaucracy and managed to in effect define a work culture that is so far removed from governmental bureaucracy, that you wonder what the early days was like.
I disagree. For example AV had image search before google, not very good but not for lack of effort. If anything the lack of focus on the core web search because attention was elsewhere to improve AV in other ways was it's undoing.
we didn't update our primary index for several months just as Google was gaining mindshare
I'm glad this has been mentioned. I noticed this at the time but never really knew if it was actually the case! Altavista had become somewhat annoying to use at the time and Google's relatively clean front page (although it was more than just a search box in the earliest days, but it was cleaner than AV's) meant it loaded quicker and sold me on both speed and that the results actually loaded.
> Also, any AV article which doesn't note that we bought Elon Musk's first company is inherently flawed. ;)
Agreed. My previous manager had worked with Elon Musk at Zip2 (maybe even as the CTO). He used to talk about how peculiar Elon is. Does it ring a bell who I might be talking about? :)
I also have a friend who was on the leadership team @ Zip2 and the stories I heard about Elon were hilarious. Unfortunately they remain unpublished, which is a shame, because they are really good.
I suspect it was management, after all search engines was unprofitable and borderline charity work. It was not until the vision of capitalising thru advertising that the whole market changed and by then, Google had outlasted and innovated the other `charity` offerings of the time.
Nah mate. Used AltaVista. The switchover had nothing to do with 404s or 'freshness'. Google was just easier in that day. You'd assume it was something technical that you could have altered but from a user perspective I highly doubt it had much relevance.
Two indexes: the supplemental index (refreshed frequently) and the main index (refreshed less often). The latter is the one that wasn't refreshed for an unusually long period of time.
All over. I didn't stick in search; I wound up working in online games for a long time. David Henke, who ran engineering and operations as a whole for a while, went on to Yahoo and then LinkedIn. Barry Rubinson went to Transmeta -- remember them? David Bills is at AWS now. Mike Burrows, who was and is an incredible engineer, is being brilliant at Google. Etc.
>> Zip2 allowed for two-way communication between users and advertisers. Users could message advertisers and have that message forwarded to their fax machine. Likewise, advertisers could fax users and users could view that fax using specific URLs
Ha, never heard about that - sounds contorted (nowadays) but somehow funny - so somebody in some company was sitting next to a fax waiting for something to come out of it, then when that happened the employee wrote the reply (scribbled on the same or another piece of paper) and faxed back the reply?
I wonder if I would like or hate doing something like that today - waiting for & finally seeing a piece of paper containing an unknown message coming out of a device sounds somehow fascinating... :)
I wonder if I would like or hate doing something like that today - waiting for & finally seeing a piece of paper containing an unknown message coming out of a device sounds somehow fascinating... :)
Just get a job at any restaurant in Japan. Fax is the way nearly all to-go orders are placed. Fax machines are still massively popular there.
I suppose that could be one of the reasons, but I believe the main reason was google won over CS students in high school and college in the late 90s and early 00s. I was introduced to google in high school because it was the best search engine for looking up programming code for my CS classes. Better than altavista, excite, etc. And naturally the word spread, not to mention the people maintaining computers, labs, etc all set google as the default search engine.
Sadly, now google is a terrible mess of moderated and curated nonsense.
I think Google got its big break when yahoo used them and allowed Google to put a 'powered by Google' button. I clicked it and never stopped using google
IT was responsible for setting thousands of machines default pages to Google through all our system images from an IT Desktop support perspective - this was because Yahoo's frontpage was an utter hideous mess - and it was the minimalistic portal to the internet that worked way better for the users in the companies I worked at then.
Yahoo advertising for Google on their front page is almost similar to another such rare extremely lucky break when IBM advertised for Microsoft in its all glory. Behind every billion dollar company there is an event that had one in billion chance of happening.
Absolutely that helped in gaining a wider audience. But I suspect the reason yahoo chose google was because all their tech people preferred google because they all used it to search for code. Maybe it's my high school or my college, but google won over the CS crowd very quickly. Their clean interface and their ability to get the code we were searching for to help us with our homework really gave them a leg up.
Then their gmail ( especially the initial invite and storage ) and chrome made google the "cool" tech company and pretty much cemented their place in the tech world. Sadly, they've turned out to be monsters rather than saints and we are all the worse off for it.
I'm not happy about everything Google does, but I don't actually feel worse off. I use their search dozens of times a day, I rely on GMail and Google Maps, Google Docs is fantastic. It's hard sell to convince me that I am worse off due to the existence and my use of Google products.
This was particularly bad because one of our earlier strong points was fresh indexes. Our ability to refresh the supplementary index on the fly was awesome. When you lose one of your primary strengths, it's noticeable.
I don't mean to minimize the downside of losing focus. That's one of the biggest lessons I learned while working there. I'd say that our failure to maintain a high quality index was directly caused by our loss of focus, in fact. But it's important to remember that both UI and the underlying index quality matter.
"What about PageRank?"
Eh, not actually unique to Google. Remember that Jon Kleinberg was developing HITS in parallel -- AV was well aware of the concept of measuring page importance using incoming links, and we had our own implementation. It may not have been as good. It's hard to tell when your underlying data source is stale.
Also, any AV article which doesn't note that we bought Elon Musk's first company is inherently flawed. ;)