This was particularly bad because one of our earlier strong points was fresh indexes. Our ability to refresh the supplementary index on the fly was awesome. When you lose one of your primary strengths, it's noticeable.
I don't mean to minimize the downside of losing focus. That's one of the biggest lessons I learned while working there. I'd say that our failure to maintain a high quality index was directly caused by our loss of focus, in fact. But it's important to remember that both UI and the underlying index quality matter.
"What about PageRank?"
Eh, not actually unique to Google. Remember that Jon Kleinberg was developing HITS in parallel -- AV was well aware of the concept of measuring page importance using incoming links, and we had our own implementation. It may not have been as good. It's hard to tell when your underlying data source is stale.
Also, any AV article which doesn't note that we bought Elon Musk's first company is inherently flawed. ;)
My recollection is that Alta Vista supported boolean operators, but defaulted to OR while Google defaulted to AND. So searching Alta Vista for something like "$CommonWord $UncommonWord" would return results with high-ranking pages for $CommonWord that drown out all the low-scoring pages for $UncommonWord, whereas Google would return results that match the intersection (which would actually be relevant to the user's query). I'm convinced this default might have made a bigger impact on Google's success than any PageRank magic.
My theory on why this might matter even for people who knew how to use the operators is this: With OR as default, you would first try your query without operators, get page upon page of irrelevant results, and then start to narrow your query down. With AND as the default, you would type in the query, and if you only got a few irrelevant results, or often no results at all, you would try alternative terms instead.
It seems that progressing from no result to desired results by choosing alternative terms just makes more sense than having to wade through irrelevant stuff, and the default encourages one methodology over the other.
Today, it's very hard to make Google return no results at all. Not just because the amount of content grew to an unimaginable scale in the meantime, but also because Google has become way, way fuzzier in the way it interprets search terms, likely to better suit a larger and different audience. A lot of times today, I have to switch to "verbatim" mode first, at least for technical stuff.
1. entering 'invisible marmalade teapot' and getting no results
2. changing to 'invisible marmalade teapot with tartan cosy', again nothing
3. so 'invisible marmalade teapot with tartan cosy in outer space', ditto
You: do you have that new crime novel in stock?
Bookseller: er, i don't know which one you mean?
Y: the new crime novel by john grisham, pelican something or other?
B: oh, right, yes, here it is!
default OR in a search engine would mean my first example eventually starting to return results about transparent space coffee pots with tartan cosies, ignoring the first few terms but the rest match, which is often helpful, particularly if you're doing an exploratory search for something where you aren't sure of the exact details.
I don't find your examples compelling.
If you want to do an additional search that does not depend on your first terms, you simply bring up a new search window.
In your 'real world' example, the equivalent search queries would go something like
search: new crime novel
result: way way too much stuff
search: new crime novel john grisham pelican
result: exactly the right book because every one of those terms applies
9130 webpages agree with you.
To switch on Verbatim, I click "Verbatim" on the left side of the google page, it appears just under the alternative, "All results". I think for other systems it can be well hidden in menus. I use it almost every time I google anything. Otherwise you get a load of irrelevant crap.
The reason being you can just search twice if you want either word, but in the vast majority of cases you want both words when you enter them in the searchbox. Most companies kind of split the difference and put the AND results first then fill in with OR results, but that mostly just leads to "if your answer isn't on the first page, it's not going to be on the second or the hundredth".
IMO this was one of the biggest contributors to the perceived quality advantage of Google vs AV.
Relevant xkcd about the “second page of Google results” effect:
It always took 2 to 5 searches using Altavista to get the results you were looking for. This was a huge improvement on Excite and Lycos which might produce infinite results with absolutely nothing relevant. There was a lot of noise like source code archives. With Google search the first page usually had a useful result.
Probably the biggest thing that destroyed Alta Vista was the horrible flashing banner Ads at the top of the screen.
From the article:
> This move away from AltaVista’s streamlined search experience made AltaVista more similar to its competitors. Users gradually began to switch to a newcomer, Google, for the simple search they missed.
I don't think this was the case at all. People switched because they got better results.
And the guy who started Baidu (the Google of China), Robin Li, had created his similar to PageRank algorithm, and even patented it in the US, before Google (he filled for the patent in 1997, Google was founded in 1998).
However, Kaiser Kuo points out that Robin Li, the co-founder of Baidu, obtained a patent for hypertext link analysis before Larry Page obtained his “Page Rank” version.
I feel like that tells me where your percentage would lie...
EDIT: I apologize, it was intended as a joke to lighten the mood. It's never a good thing to lose one's job or have a company fail, even that long ago, so my response is to attempt humor.
> I feel like that tells me where your percentage would lie...
Are you responding to BryantD's admission of (widespread) bias by accusing BryantD of bias? This seems to add nothing to the conversation.
I see; I incorrectly read it as accusatory. Thanks for the very civil reply!
That's okay, I have a very dry sense of humor and straight-faced delivery in person as well. Which is unfortunate because it means I can't blame the lack of tone online when jokes don't land, they often don't in person either.
Honestly, I doubt that was even Google's plan for the first years of its existence. It was more "Hey, we made this neat search thing, let's see if we can figure out a way to make money from it".
I don't know if the information is true, but I know I've heard it more than once.
In a way, it ranks a link by its own incoming links (which are ranked by their incoming links etc). It was possible to game AV by setting up 100 sites that points to your own.
In pagerank, you essentially had to convince already popular pages to link to yours.
For a while, google effectively had no spam, and AV had lots. Eventually, spammers learned to game pagerank; the arms race is still on.
It had been applied decades before through scientific paper references, as a measure to improve on the "number of references" metric, which is more easily gamed. References are more rarely circular (only same time in-preparations can form cycles, unlike web pages). I was sitting in a class about stationary processes in 1996 when the lecturer mentioned this (already old and well known at the time) use case as motivation.
Whatever AV implemented at the time, it was not on par.
Pagerank is essentially a “universal authority score” (in the HITS terminology), and it worked well because at the tine you didn’t have pages that were authority for one subject and spam for another. You do now - which is why pagerank is now one signal out of 200, even though it was sufficient on its own 20 years ago.
This of course was also at a time when "whitehouse.com" was a porn site.
The lack of fresh index surely was a factor but not sure whether it was primary.
Obvious question - why did you not update the index? Was it that it was obvious Google was going to win and made people give up? (edit - never mind - you address this in other comments)
> The lack of fresh index surely was a factor but not sure whether it was primary.
I wasn't aware about index back then, but I saw broken links, at that time I thought that was normal, that it takes time to scan the entire Internet. With Google I generally got working websites.
I did like the boolean search in AV, it helped with obscure searches, especially when name was similar to a typo of a popular word.
Do you believe Google's strategic decision to use commodity computers and hard drives gave them any competitive advantage (cheaper cost, scaling, etc) compared to DEC Alpha servers?
As an outsider, it seems like Google could iterate its data centers faster and cheaper and therefore, their web crawlers were cheaper to run (also run more frequently), also cheaper to store terabytes of data, and also cheaper to service search queries.
* I was also not exactly sober at the time, so these numbers may be a bit off. The number of wafers per chip being greater than 1, though, I am absolutely certain about.
With cheaper techniques, the idea is that the "more capital efficient" way of indexing the ever-expanding web would in turn provide better results for an improved consumer experience. It's the old adage of "do more with less".
For example, see the old Danny Sullivan graphs showing how Google's index was growing faster than AltaVista. Having a bigger index lets one return more relevant search hits.
AltaVista wasn't just falling behind in "staleness" of old indexes; the aggregate size of the index was smaller than Google as well.
It did apply, to a point. Before Google, I had switched to AllTheWeb as my search engine of choice since a lot of sites just wouldn't show up in AltaVista no matter what you searched for, and ATW had a bigger index (I guess staleness could have had the same result).
But of course eventually I switched to Google for the better search results.
I didn't know anything about Google until mid 2000, and when I used it, I just thought it was an AltaVista clone.
Fast forward to fall 2000, and once after getting bad results on AltaVista, I tried Google again. At the time, I never bookmarked either of the sites or set them as my homepage. I remember, when I used Google, I thought to myself, "I'm going to switch to whoever comes out with a browser toolbar. Search should just be part of the browser."
About a week later, I had some bad results on AltaVista, and typed in google.com. Immediately I saw the "try our toolbar" banner.
At that point, I switched to Google, and I NEVER went back to AltaVista. (I think sometime in 2001 I tried AltaVista again, out of loyalty, to see if they finally had a toolbar, but the results were so bad I was in shock.)
Surprised that it still exists! https://www.google.com/intl/nl/toolbar/ie/index.html - and the screenshot even seems to show a "share to google+" :)
>In the mid 1990s, . . . from a place that would come to be known as Silicon Valley
In the mid 1990s the name Silicon Valley had been used in mainstream media for at least a dozen years.
Was the aspect how they engaged with government bureaucracy and managed to in effect define a work culture that is so far removed from governmental bureaucracy, that you wonder what the early days was like.
I'm glad this has been mentioned. I noticed this at the time but never really knew if it was actually the case! Altavista had become somewhat annoying to use at the time and Google's relatively clean front page (although it was more than just a search box in the earliest days, but it was cleaner than AV's) meant it loaded quicker and sold me on both speed and that the results actually loaded.
Agreed. My previous manager had worked with Elon Musk at Zip2 (maybe even as the CTO). He used to talk about how peculiar Elon is. Does it ring a bell who I might be talking about? :)
I have always wondered what Zip2 actually was. Can you shed some lights?
Was it like a primitive version of Yelp?
Did the map work like Google Maps?
giving away a free demo of <company> prowess is cool, but it's a big reason for its death I guess
Ha, never heard about that - sounds contorted (nowadays) but somehow funny - so somebody in some company was sitting next to a fax waiting for something to come out of it, then when that happened the employee wrote the reply (scribbled on the same or another piece of paper) and faxed back the reply?
I wonder if I would like or hate doing something like that today - waiting for & finally seeing a piece of paper containing an unknown message coming out of a device sounds somehow fascinating... :)
Just get a job at any restaurant in Japan. Fax is the way nearly all to-go orders are placed. Fax machines are still massively popular there.
Sadly, now google is a terrible mess of moderated and curated nonsense.
Then their gmail ( especially the initial invite and storage ) and chrome made google the "cool" tech company and pretty much cemented their place in the tech world. Sadly, they've turned out to be monsters rather than saints and we are all the worse off for it.
+noir +film -"pinot noir"
(Edit: scale and diversity of input data but also audiences)
Google regularly fails to include all words I searched (even if it's only three or four), often retrieving completely useless results. I doubt it's due to incompetence; I take it as a signal that they're now struggling to match the volume and characteristics of the data they have to ingest to an adequate user experience.
For a large number of people, Google's ability to answer the underlying question, rather than explicitly identify pages where all search terms appear, means it works better. If you think of Google as a way to get answers, this is good.
If you think of Google as a search engine, and particularly if you have historical experience with (and expectations of) search engines, this is very frustrating. And the workarounds of clicking the "must contain" link (or surrounding all of your search terms with quotation marks) are a seemingly unnecessary inconvenience.
As a personal anecdote, I was an early adopter of smartphones (particularly relative to a non-technical audience). So I was excited when I could speak to my phone, then disappointed when I discovered that I had to structure my queries and instructions very carefully.
A few years ago I was on a road trip with a very non-technical friend. We decided to stop for Chipotle. Had it been up to me, I would probably have pulled out my phone, opened Google Assistant (or perhaps Maps directly), and told my phone (speaking as clearly as possible) "navigate to the closest Chipotle" or something similar.
But I was driving, so she just pulled out her iPhone and half-shouted "I want a burrito!" at it. And that worked just fine.
Point being, I had expectations for how things should work based on interactions with earlier iterations of an interface. She didn't.
Google really needs to develop a "pro mode" search engine that works for this use case. I get the need for an "answers engine" for less savvy users and more casual use cases, but it's a massive company. It can afford to execute two products in its core competency (rather than umpteen messaging apps that it will kill, along with a lot of other useless and/or doomed stuff).
It keeps the power users on the site so they wont have to look for an alternative. Power users if they find something better might influence no power users to the other site
Google returned exactly ZERO results about TPUG. All of the results were about dogs.
That's either very quick turnaround to fix, or a deeper mystery!
Two people sitting at machines next to each other can perform the same search and get different results. It's what Google's spent billions of dollars on.
Plus, they're inadequate.
Search engines being 'about' a concept isn't a new thing. Bewlew's book from 2008 is called "Finding Out About". The dream is that the search engine can work out what a document is about, and what I'm thinking about based on the content of the document / query, and match them up.
The new thing in your example is that Google has gone beyond documents into burrito restauraunts, but it's not such a huge leap.
Maybe new adavances have brought new algorithms that are somehow better at finding and modelling those abstractions so the search engine is no longer a recognisable vector space model with predictable proxies. Even if that _is_ the case, they should be able to answer a query I have made, on my own terms.
That is awfully generous of you ...
Google shows us what it shows us to maximize advertising revenue. They need you to keep clicking and generate hits on adwords-encumbered websites. Showing you zero (or one or a handful of) results for your search query is counter to this goal.
Showing you a batch of results with no adwords-encumbered sites in it is also counter to this goal.
I don't think they're struggling with anything at all - they are optimizing for paid clicks and precise search results is, at best, a very distant second priority...
I figured it was due to the user base of 2019 being very different than that of 2010, and Google adapting to the fact that most of their users aren't technology literate and cannot formulate clear search queries, so they just try to guess what might be of interest to them.
That's fine, it's their business, but it makes the virtual monopoly even more painful.
The wealth of user traffic is also what no other search engine can replicate, due to Google's market share in web searches.
When Google receives a search query, it first broadens the search phrase (see ). The user's clickstream and search refinements are helpful in both training the model for doing the broadening, and then weighting the search contexts, for narrowing down what should actually be displayed on the front pages.
Exactly! Search engine performance can be assessed by measuring precision and recall . Full text search engines have really high precision. Additionally, when the user has been socialized with full text searches, they've built a model of how the search engine works ("it will find documents which contain my search phrase"), so false negatives are perceived to be less severe, as they can be readily explained by the model. "Ah, this document about helicopers contains 'Apache', no wonder it's in the results. I'll add 'webserver' to narrow it down" (And experienced users will already start off with all necessary key terms).
While full text search engines have high precision, they also have bad recall. This can be improved, but there is a tradeoff when tuning the algorithm: to increase recall, the search context is broadened. That necessarily decreases precision as well, because there is no way the search engine is always correct when adding context. Also, when at first all documents on the frontpage at least contained the search term, now there is not even a good explanation why some documents were retrieved. And the more precise the query itself (something we learned by using full text searches) the higher the probability of misclassification, and the worse the effects of broadening. The relevant results are somewhere in the list, but now every second result on the frontpage is from the wrong bucket. And with no explanation, those false positives weight heavy for us users from the old days.
 Precision is the probability that a random document in the result set is relevant. Recall is the probability that a random relevant document is in the result set.
"noir" "film" -"pinot noir"
I wonder whether they'll revert back now that google+ is dead.
This really confused me, because I was trying to find something specific, and it found a few emails, so I read them, and then was confused that they didn't actually take about the specific thing I was trying to remember. Then I realized that they I included one of the words from my string.
In this case, a close match was completely useless, and ended up wasting my time reading irrelevant results. A message saying "we didn't find that, but here's a few close matches" would have been more helpful and avoided wasting my time.
The reasons in this thread about non precise search results are half the reason I stopped using Google Search about a year ago. I get why they've done it, but I don't like it.
I either use duckduckgo or google in another browser that I don't normally use.
Give it a try - I run CookieAutoDelete so it is 'theoretically' a clean search each time if that makes any difference.
I remember trying to google something about Sufism and its relationship to mysticism and the occult...and google brought up a bunch of results from right wing conspiracy websites claiming that Islam was related to the "New World Order".
It was my first exposure to how much work could be saved by working efficiently with unstructured data! Of course, Google took this to a whole other level, realizing that for common users queries themselves should be treated as unstructured data! Learning as much as you can from how people already express themselves is one way to write the future, it turns out...
I can't. The plus operator meant the the word you applied it to must be in the search result. Quoting the word does not do this for me.
afaik putting a term in double quotes does the same thing but I am unsure if the implementation is really the same, the effect seems to be
Paul had been in Italy at a trade show; when he returned he talked with Brian Reid, who headed the Lab, about the need to find a demonstration project that showed off the top-of-the-line Alpha computers.
Paul and Andy and I used to have lunch together frequently. At one point, just after Paul returned from Italy, we were talking about Internet search at lunch. I'd been using the Magellan search engine and had some comments about how useful a better search engine would be.
Andy began talk about the problem and sketched out how a better search engine might work and how a web crawler could gather the information needed to do an index.
Paul listened and then went back to his office and enlisted Brian Reid's help in resourcing a search engine project. Brian got Louis Monier and Mike Burrows involved. The three of them did the hard work of reducing the concept to a real program running on an Alpha computer.
Alta Vista was an instant success with Alta Vista computers overflowing Paul's office and cluttering the hall nearby as the team worked to satisfy market demands for search.
Google changed the world. What used to be buried on the 3rd or 4th page of Altavista results, if it appeared at all, was suddenly front and center. (Yahoo’s directory was rarely useful at all.)
I’m not a fan of Google today, but for years I would tell everyone I knew about it.
(Edit: fixed the year.)
I say this because I’ve been wondering if we now have a Google snapshot of the web instead of AOLs homepage. Don’t get me wrong search is much better than a more or less static homepage of topics.
Are we all in a Google search bubble?
We're definitely in a Google bubble. It becomes very clear that they have intense control over what shows up on the first page of searches, especially in their Featured Snippets and Carousels at the top, and their native-looking ad results.
Control over search results is incredibly powerful in terms of anything from influencing the zeitgeist, to controlling marketing efforts at a grand scale, through to straight propaganda.
We really run an incredible risk as a society by putting too many eggs into the Google basket. Using their browser to use their service to consume their results means a complete monoculture; and while they're not really visibly abusing it now, it's clear that they can subtly manipulate things for a long time before they get caught, and they have the platform to be able to do far more should they (or any government actor forcing their hand) decide they want to.
It just sounded so much like the old AOL Keyword feature, which a lot of people forgot about, but was literally often advertised on TV or radio as how to get to a given site or web feature.
The only thing I think is really wrong with search (and all major search engines right now are guilty of this) is making paid ads look very similar to real results, which makes it possible to pay to hijack a result.
I can't remember that happening in years with Google. Now, it's unusual if the thing I'm searching for even appears in the first page.
Back in those days, I was a fan on the Yahoo! directory approach. The web was still small and explore-able, so a directory actually was useful if you didn't actually know what you were specifically looking for.
Meanwhile, if I did know what I was looking for, AltaVista gave me endless pages of random links that were only vaguely related to my search query. I don't think I ever found myself liking it much.
Gradually they became invasive and overwhelming.. to the point that I have a slight anxiety regarding anything Google in the news.
babelfish.altavista.com, anyone remember that?
The Babelfish being Douglas Adams' fictional fish that you stuck in your ear to use as a universal translator.
there was also a fake domain called alta-vista.com that was very much of the goatse variety.
The name always amused me, partly as a homage to the Terminator franchise, and partly Altavista.
I was delighted to discover that the translations weren't always symmetrical, with the best example being going from English to German then back to English with:
"I'm going to kick your ass"
"I will step on your donkey"
There was a way to embed maps in a web page and provide a bunch of points of interest to overlay on the map.
Metricom was using it to provide coverage maps of their pole top box locations, for their spread spectrum wireless mesh radio network (it was rolled out in the Bay Area around 1994-1996 or so).
I remember being impressed by how cool and powerful (and generous) it was for one web site like Xerox PARC's map viewer to provide dynamic map rendering services for other web sites like Ricochet's network coverage map!
Then a decade later, along came Google Maps in 2005.
A particularly innovative use of the map service is the U.S. Gazeteer WWW service created by Brandon Plewe [Plew1]. It integrates an existing Geographic Name Server with the PARC Map Viewer. A user simply enters a search query (e.g. the name of a city, county, lake, state or zip code) and a list of matching places is returned as a formatted HTML document. Selecting from the list generates another HTML document consisting of two maps (small and large scale) with the location highlighted (using the Map Viewer's mark option). The server in New York does not generate or retrieve the map images, since they are references directly to the HTTP server at Xerox PARC. The user's WWW browser retrieves the map images from the server in California and displays the complete document to the user.
place a mark on the map. ",mark_type" (1..7) and ",mark_size" (in pixels) are optional. multiple marks can be separated by ";" (see example below).
Specifies marks for Palo Alto, California and Pearl Harbor, Hawaii.
It was just 56k. I seem to recall back in the day people appended the ".6" for no apparent reason other than it sorta seemed logical after we had 14.4k, 28.8k, 33.6k, and then.. all of a sudden.. 56k.
(and, if I recall correctly, for technical reasons it was really only 52k, and even then, only if you were lucky, usually it was less.
I seem to recall that the equipment was theoretically capable of 56k but in actual implementation, even a perfect POTS system wouldn't do over 52k, and in the real world, it would usually negotiate lower due to distance from CO, quality and number of connections and equipment in between, etc).
Neither did I! And I'm a bit humored to see other people pronounced it like that too :) I wonder what causes it, regional dialect or something? I'm part-Mexican, grew up bilingual and it just sort of happened without thinking about it, I discovered 'warez', immediately pronounced it "Juarez" (without the Spanish jota inflection) and it took 30 years (today) to learn it's 'wares'.
That said, being a nerd of a certain age range, I saw "digital.com" and I got very excited about something I didn't get when I actually clicked on the link ...
Not sure without a bit more digging whether they honor explicit rules for their crawler.
Edit: (but a little looking at comments indicates that they don't, and notes that ia_archiver is Alexa, not the Internet Archive)
* There were a bunch of early "full-text"-ish search experiments, but Lycos best proved its immense value and potential first.
* Then, AltaVista arrived with breadth & speed beyond what had previously been possible. Still, it required a bit of expertise to craft your queries.
* Then, Excite burst ahead with a quality breakthrough. Something about their use of HTML-styling & in-link text meant that even with fewer sites, the results were much better.
* Then, Hotbot (powered by Inktomi) had an era of best mix of fresh-content, deep-content, and quality ranking.
* Then, all those pioneers dropped the ball, in one way or another, letting Google out-rank, out-crawl, and out-business-model them. (And much of Google's business-model was pioneered by "Goto", later "Overture".) Sad, really, especially how many coulda-shoulda competitors eventually died inside Yahoo. (Including Overture.)
I loved the seeming technical excesses of Digital so much. Nothing they did seemed as calm and staid as Cray, Sun or even SGI.
Think about that name - Alpha Server 8400 Turbo Laser
I honestly didn't expect Google to have survived the dot-com bust back then. This was before they figured out online adword auctions.
So, funny thing: Back then, if one search engine didn't have what you were looking for, you would try another. Now, if Google doesn't have what you are looking for, where do you go? Does it mean the answer does not exist on the Internet? Do you try Bing or Duck?
There are effectively only two competitive search engines in the U.S. and European markets today. In those markets, Bing is the only market pressure keeping Google honest.
It was probably subjective but IMO google wasn't better initially, but it did end up being the best. Even tho I use bing or ddg, Google still has the best search engine.
That's why I moved from Altavista to Google. Google had a minimalist homepage and results page which were focussed and loaded very quickly compared to the bloated pages (for the time) of Altavista and Yahoo.
Between AV and InfoSeek (which I used the Netscape plugin to get a browser search bar for) I was seeing better results than Google until sometime in the mid-00's.
Sure if you wanted to skim the surface of what was available on a given topic or just wanted the most popular links, Google was fine in the 90's and early 00's; But if you were deep diving or wanted something more obscure you needed other search engines.
So I searched for ‘Amiga games’. It came back with a screen full of junk.
I looked at my friend and said ‘none of this has anything to do with Amiga games’.
He said no you need to ‘+computer’ ‘-Spanish’.
I said to him, ‘who’s crappy idea was this?’
Needless to say we all got better at booolean searches and Altavista was light years ahead of yahoo.
Google was a welcome change, made the internet a cool place for about 10 years.
If other IPFS users wants to pin that to distribute the load, that'd be great.