Hacker News new | past | comments | ask | show | jobs | submit login
A Relevant Tale: How Google Killed Inktomi (diegobasch.com)
350 points by nachopg on May 3, 2012 | hide | past | favorite | 110 comments

Great article. One note: Google's dynamic abstracts were not only very useful, they also improved perceived relevance because they let users see why the pages were selected.

When I was at Altavista, we were also blocked from doing dynamic abstracts by cost.

Google's main advantages were:

- managed by the founders with a total focus on search and measurable results

- google's hiring process produced a very strong team early on

- strong focus on controlling costs from the beginning (Altavista's use of the DEC Alpha was a huge handicap.

Close. Lots of other companies were also hiring pretty high tier talent as well, and had intense focus. Google's success came down to effectively executing across typically disparate disciplines. You have hard core research level CS eggheads, you have top tier software engineers, and you have state of the art data center operations. In a typical organization these groups have competing interests, they fight amongst themselves and in the end some sort of compromise is reached that allows everyone to grudgingly get along.

At google these three groups worked hand in hand and complemented each other's work. The eggheads came up with page rank, the coders figured out how to make pagerank scale through massive paralellism via sharding and mapreduce, and the data center folks figured out how to make sharding cheap and fast through commodity pc based servers and massive amounts of automation for management. In the end everyone was working at the top of their game to help everyone else. The result was that google was able to deliver better results (pagerank) faster (mapreduce) and cheaper (automated commodity hardware datacenters) than the competition.

There were lots of other fine details that led to google's success, but in the end those core factors are what allowed them to deliver a better search experience to users (better/faster) and to be more competitive in the marketplace (lower cost per search means more profit even with lower per search ad revenue).

No one else in search was pushing on all the right pressure points the way google was, and the rest is history.

Excellent point.

From the article: "In short, Google had realized that a search engine wasn't about finding ten links for you to click on. It was about satisfying a need for information. For us engineers who spent our day thinking about search, this was obvious. Unfortunately, we were unable to sell this to our executives. Doug built a clutter-free UI for internal use, but our execs didn't want to build a destination search engine to compete with our customers. I still have an email in which I outlined a proposal to build a snippets and caching cluster, which was nixed because of costs."

The engineers here had more than inkling what needed to be done. The problem was this didn't go through the entire company.

I agree, but I'd ascribe that to Google being run by technical founders rather than MBAs. The main benefit of technical company leadership is the ability to "see" across and coordinate the disparate areas.

Infighting and begrudging compromises only happen when the leadership is blind to the details.

Having technical founders is a necessary but not sufficient condition, I think. Lots of companies have had technical founders who haven't managed a level of success as impressive (regardless of scale) as google.

The word "egghead" is rather demeaning, perhaps we can refrain from using high-school insults here?

If black people can call themselves niggers and not be insulted, can geeks call themselves eggheads and not be insulted? Not that I condone black people calling themselves niggers, but if anyone can, they can. So why not geeks and the word egghead? Heck, even geek was a high school derogatory word.

When Altavista launched, it was an impressive showcase of the DEC Alpha's power. Intel only became usable for serious servers (with the exception of exotic stuff like Sequents), as did Linux, years later. Google had the good fortune to be in the right place at the right time, when Lintel became a commodity in the datacentre. 5 years earlier, they'd have been on Sun probably.

Altavista launched in 1995 and Google began as a research project in 1996. At my own startup, in 1996, we used Intel because with Sun servers you paid an extreme markup for unnecessary reliability.

I was VP of Engineering at Altavista in 2000, and I started the project to move to Linux. It wasn't easy because search engineering was populated by Alpha fans who were unswayed by the 10x cost advantage.

As late as 2001, I sat in multiple focus groups where all the enterprise customers said Linux was not yet ready for the datacenter. IBM's penguin campaigns were just beginning at that time.

Google's large scale use of Linux was groundbreaking when they launched in 1998.

Initial versions of Google at Stanford ran on a mix of both Linux and Solaris.


I'll bet you a dollar that Larry and Sergei never actually bought a Sun server to run the search engine, but rather that they may have used some free resource available to them at Stanford.

> When Altavista launched, it was an impressive showcase of the DEC Alpha's power.

Altavista was started by Paul Flaherty. One of his jobs was to find some way to showcase Alphas.

Do you think that would have affected their prospects for success?

It would have affected their cost base certainly, and probably their entire datacentre strategy. With SPARC kit, you wouldn't build assuming that machines will often fail and simply be swapped out, for example, something that Google is famous for.

> With SPARC kit, you wouldn't build assuming that machines will often fail

In the late 90s SPARCs did fail. Yes, they were more reliable than commodity x86 boxes, but they failed often enough that it was an issue if you had 100 or so, and search engines hit that level very quickly.

Right, but look at what Google do, their boxes are basically disposable. Why invest in dual-redundant-hotswappable-everything boxes when you just throw the entire thing away if any bit of it breaks, 'cos it's cheaper to replace it than to even try to repair it in-place.

> Right, but look at what Google do, their boxes are basically disposable

We're talking matter of degree.

The claim was that building a search engine out of 90s sparcs meant that you didn't have to worry about things dying.

That claim is not true - reasonable search engines of that era required enough machines that the failure rate of 90s sparcs, while better than x86 of the time, was enough to require folks to handle frequent failures.

It's reasonable to argue that the cost/benefit tradeoff of sparc's extra reliability vs x85 wasn't worth it for those companies, but that's a different argument.

The better abstracts is the reason I use DuckDuckGo at home.

If I just want to know when the next episode of Big Bang Theory is out or what the weather is today I rarely need to even click on a result. For more obscure technical searches at work, Google still finds more answers.

But remember - the barrier to change for a search engine's customers is very very low

With the new DuckDuckHack project as well it does make it a lot easier for very quick 'cheap' results. More complex queries I do seem to find myself !g'ing them. Getting better though, improved over the last 3 months I've been using it.

What search query tells you about the next scheduled episode of a TV show?

Well, color me amazed. I always just went to Wikipedia's "List of <name> episodes" page, but this will save me a lot of time. Thanks!

Weird - I was totally blind to the top box - I ignored it because it was too banner like and kept scanning the results.

Note to DDG - put the cool useful bit Under the place we all expect ads to be.

I miss very often the top box too. I have learned the relevancy of this top box, but I still miss it very often. My eyes usually go right to the link with "Official site" tag. I've heard the effect is called banner blindness.

However I use adblock on every browser. Therefore I have less training than others to ignore ads. When I see one, it just hit me stronger (it's a side effect of adblock).

Pretty depressing. At least on Google there is a top-10 result for the actual big bang theory, instead of irrelevant pop culture garbage.

I think most people refer to the beginning of the universe simply as the "big bang", which produces the results you'd expect.

(Whatever you do, don't search for The Postal Service.)

Inktomi killed Inktomi long before Google helped put the nail in the coffin.

What the article doesn't say is Inktomi had a dual sided business. One side was in Caching Proxies the other was licensing a search API.

Inktomi decided to focus on the caching proxy business and de-emphasized their search product, only to watch the proxy business evaporate as internet bandwidth became cheaper/better.

The focus on a shrinking market (proxies) and the lack of focus on growing market (search) killed them. Had search been a priority from the beginning things may have ended very differently with Inktomi creating their own front end.

Thanks for that. I too remember Inktomi much more as the "cache kings" with licensed search a secondary focus.

Indeed. Inktomi also tried to position themselves as an arms supplier to the CDN business. It didn't help that the CDN business basically disappeared from 2001-2004, and that CDNs, to this day, rarely buy software.

I was going to mention this. It seemed like the management at Inktomi let it fall once the engineers started using Google search engines. Their response is a likely bellweather of the attitude of the time.

Another lesson from this article: If your engineers refer to those who make decisions as "the execs" instead of "we": Quit.

It was clear that Yahoo.com was the definitive result for the query "yahoo" so it would score a 10. Other Yahoo pages would be ok (perhaps a 5 or 6). Irrelevant pages stuffed with Yahoo-related keywords would be spam.

As someone who worked on search quality at Google for some time, this bit jumped out at me as a terrible mistake. The correct way to judge results for the query [yahoo] is:

(a) Where is yahoo.com? At the top?

(b) There is no (b).

It seems like a slight difference, but it leads to the wrong priorities. For the query [yahoo], it does not matter if spam or non-spam is in spot #5. The only thing that matters is where you put yahoo.com.

Can you elaborate a bit more on this? I think I understand your point but a few more details would be great. Thanks!

Most people don't look farther than the first 3 results, so what comes after that isn't very high priority. Might be the reason some more specific/obscure queries on google return terrible results :)

More specifically, if people find what they're looking for in the first result, they ignore the rest.

I just realized that Google won on search the way Apple has won on smartphones. They control the full stack -- frontend, relevance, indexing, advertising -- and tightly couple these pieces. Inktomi couldn't control the user interface the way Google can't really control the interface on Android.

You are right, but IMO you miss the real point. Both Google's search and Apple's iPhone were about delivering a wholly satisfying product/service, and controlling the whole stack was needed to do that for those cases. This is not always true (though it often is).

Yeah, it's pretty hard to know when it is and isn't. Vertical integration is as old as business.

Of course, Google's search was about that, while the iPhone still is. And the iPhone has been improving a lot faster than Google's search.

Relevant: http://youtu.be/E91oEn1bnXM

It's a recording of a very very good talk by Inktomi co-founder Eric Brewer called "Inktomi's Wild Ride - A Personal View of the Internet Bubble"

Yes, that talk is excellent. Deserves its own submission.

Thanks, very informative to watch.

Most interesting to me is that Inktomi had all the power to beat, acquire or replicate Google but didn't have the right mindset. They were operating under a few bad assumptions:

- search is a commodity for licensing (making them resistant to launching a "cleaner" engine that would alienate their clients)

- what worked for a smaller internet (100 million pages) could scale appropriately with the growing internet (100 billion pages) without rethinking everything

- "Page rank" only helped relevance (it was also about spam)

I think Google is stuck in a rut of their own right now. Here's some faulty assumptions I think Google is making:

- users always want faster, more direct answers (rather than controlling the filtering/categorization of their searches)

- users want Google to predict what they mean rather than clarify what they mean

- algorithms > human decisions

- users always want faster, more direct answers (rather than controlling the filtering/categorization of their searches)

That's a very power-user centric attitude, don't you think? As a power user I preferred to type long, complicated Sabre queries to find exactly which airplane flight I wanted. It was much faster, and I had memorized all of the complicated mnemonics. But that's not what a casual user would want to use.

Asking users to specify categories for what they want means requiring a certain orientation in their thinking which is shared by computer scientists and trained librarians. But to an average user, that's extra work. And think about how this might work if you're talking to an actual human librarian: if you start asking about TV shows, and then mention "The Big Bang Theory", do you think the librarian will ask you, "Did you mean the scientific theory, or the TV show?" That's only something a stupid computer would do. A smart librarian would take the context of the previous queries that you've made of him or her, and provide the right answer quickly and efficiently. Wouldn't you want the same thing from a search engine?

To be fair, faster answers + the ability to undo a do-what-I-mean guess lets me correct Google's assumptions pretty fast. The tools at the left allow for some quick refining as well; that's pretty useful when I need particularly fresh results, or a time window from when some news was breaking. And the fast completion is useful to refine a query before it returns results (though occasionally annoying when it erases quotes and the like).

After I had switched to Google, I never understood why all of the competition just disappeared over night. You would think they would have given it a fight, but that never seemed to happen. At least this article gives a little insight to that. I still wonder what happened to Altavista.

AltaVista tried to jump on the 'portal' bandwagon. I remember at the time how stupid I thought they were, trying to beat Excite and Yahoo at something that was already old hat, a concept that had the Internet had outgrown. Then they screwed up pretty much the same way Inktomi did.

AltaVista still exists. It's awful, and it's powered by the Yahoo search engine. Which is pretty much the same thing, I suppose.

Yahoo! bought Overture (which owned AltaVista) way back in 2003. They only replaced the search results with Yahoo!'s a year ago after announcing the site would be shut down. Seemed like they were using it as an occasional test bed in the meantime.

Yup, when I worked at Yahoo back in 2006 I remember altavista being described as a useful site for running experiments.

An isn't yahoo search now powered by bing? Tough game search these days when the no. 2 search engine is highly unprofitable.

None of their competitors realized how much money Google was making from search until it was too late.

I had no idea that that Diego worked on Inktomi. Although it makes sense why IndexTank worked fantastically.

If I had to pick one reason why Google triumphed (and you can only pick one), I think it would be their Page Rank algorithm. It added that extra bit of awesome-sauce to and already tasty stew.

I'd say: The Google CEO's have the point of view from an end user.

The Inktomi managers had the point of view of a generic MBA.

Everything followed from that difference of view.

I would modify that to read (since this article even stated that the engineers there found this to be obvious):

>The Google CEO's have the point of view of an engineer.

And, more generally:

>The Google CEO's have the point of view of the people doing the actual work.

The lesson to take away from this is that one shouldn't try to manage what one can't do themself. The disconnect between the manager and the problem domain becomes too great and they end up making ridiculous decisions since they are acting on the wrong information.

I'd disagree on a couple of points.

First, engineers very often build crappy products when left to their own devices. The ones that they do the best at are ones where they are also the users and are more or less representative of the target audience. Google's a great example, and so is Firefox. So I think Google execs' perspective as a user was much more important. Consider as contrast Google Plus, which definitely was not built because was an avid but dissatisfied Facebook and Twitter user.

I think your second point is almost right. You should never try to control things you don't understand. It's ok to manage things you don't understand, because good management in that case is not directing the people who understand, but supporting them in achieving common goals.

As a non-tech example, few hospital administrators can perform brain surgery. But that's fine as long as they ask the brain surgeons what they need rather than telling them how to work.

Did Google actually have revenue at that point in time? They bought Applied Semantics (for adsense / adwords) in what, 2003? (edit: mlinksva points out that they introduced adwords at the end of 2000)

Inktomi management would probably have had to raise capital on a risky pivot whilst at the same time dropping all of their existing revenue streams in order to compete head to head with Google who at that point didn't even have a way to monetise their technology.

That's a hard thing for any company to do: In this case it would have been the right choice, but it's far easier to say that with the benefit of perfect hindsight.

AdWords launched in 2000. Google bought Applied Semantics in 2003 and launched AdSense.

According to a friend of mine who worked on the search team, Inktomi shifted its (management and CapEx) focus away from search and onto other projects. He thought at the time that even with the constraints of not competing with their own customers, there were things they could have done to better compete if management had chosen to do so.

Edit: fixed typo

Yes, that's true. They didn't think search would be a huge business. Back then the model was to sell search to portals charging by query volume, and it was a race to the bottom. Our Solaris servers were more expensive than Google's Linux boxes.

What other projects did Inktomi have? I thought "white label" search was their main focus. I guess that might explanation their decline.

After reading the "Tale" it appears that Inktomi killed itself. It is a good example of what happens if top dog companies fail to innovate in face of sudden superior competition. RIM is another example, but would be wrong to say that Apple is killing them. Apple is just making and selling superior products.

Apple is just making and selling superior products.

Agreed. If I had a nickel for every time my Blackberry crashed since I upgraded the OS a year ago, I'd buy some Apple stock.

Not competing with your customers is a not-invalid reason, tho' there are real-world examples that go either way.

True. In this case, I forgot to mention that not having a world-facing UI deprived us of vital signals that Google used to improved their experience. We had query logs, but we had no idea of what our customers' users were doing within a session, no user history, etc.

We had a world facing UI, and all the insight that produced, and we still weren't able to sway management that there might possibly be a profitable search business.

In the age of napster people were using our service-- and falling in love with it-- finding music on the internet.

Nope, no money there. Better to sell to ISPs in canada and hope they integrate our results with Inktomis.

Think I would qualify his key takeaway somewhat:

"Are there any lessons to be learned from this? For one, if you work at a company where everyone wants to use a competitor's product instead of its own, be very worried."

This is because companies sometimes (maybe often?) ban the use of competitor products to their detriment.

That was a good read. I remember I used Yahoo for searching the web. Due to relevancy factor I moved to Altavista (but it didn't improve anything until the day I found out about Google and still use Google). I didn't know that Inktomi was powering search at that time. If Yahoo was so dependent on Inktomi or Google for its Search, I wonder why didn't they work on Search by themselves. After all they were information organization tool. Why did they ignore such a huge market. VC's were going crazy for funding search engines and number of search engines companies were either getting funded or going public. Based on these signals and the traffic they had during dot com era, they could have easily built substantially good search engine; yet they ignored it. Can anyone shed light on it?

Yahoo owned Overture so their results were pure pay-for-placement. If anyone wanted to pay $100 per click to have the #1 spot on "beef" go to a site about chicken, that was a-OK with Yahoo management. When Yahoo realized their auction system was stupid, they had "project Panama" which was also a joke and by that time Google had the market to themselves anyway.

Inktomi engineers using Google reminds me of TI engineers using HP calculators. TI banned them, which apparently just moved the calculators under the desk

When I started at Yahoo Search (in 2005; it was Inktomi), I quickly learned that "just Google it" was frowned upon, and modified my vocabulary to say "just use web search". To this day, I still use this phrase.

Most engineers that I knew in YST did not use Google at all. We preferred to eat our own dogfood, and filed query triages against bad results (and only used Google to compare).

Ok, so the timing on this is really amazing. Techcrunch, reporting on Facebook's S1, mentioned that Yahoo! has suggested another 12 patents they may try to throw against Facebook and their Open Compute project. The Yahoo! letter is here: http://www.scribd.com/doc/92280387/TechCrunch-Letter-From-Ya... and the source of many of these patents? Inktomi!

Is that crazy or what?

Why I switched to google.

A) it was fast, it loaded fast. B) it was not filled with ads and pop ups.

Only one of those is still true.

I suspect quality results also played an outsized factor in most people's switching, even if they don't admit it.

Easy test: are you using Bing now? (with their new clean results pages)

Oh yes. I remember back in early 1998 I was using Copernic to aggregate and filter results from a whole bunch of search engines, because individually their results where so poor and irrelevant. Then I discovered http://google.stanford.edu and never looked back.

Same here except I used dogpile.com

Its not that Google's page is completely cluttered now. I remember using web portals at that time to search the web and then Google. The difference was day and night. Portals were that bad.

I was amazed -- even at the time -- that anyone thought this (http://web.archive.org/web/20000229120749/http://www.altavis...) was a good idea.

I'm really glad UX thinking has progressed.

In the next year and a half the stock went down by 99.9%. In the end, Inktomi was acquired by Yahoo for 250M.

This is unrelated to the main point, but does anyone know if Yahoo re-used a significant amount of Inktomi tech acquired for that $250M? Or was it spoiled?

I've been told (by people from Inktomi who then worked at Yahoo), that yes, the Inktomi technology became the main Yahoo search engine. (How much it changed or not in the process, I don't know)

Apache Traffic Server, nee Inktomi Traffic Server, is still alive and kicking.

may be they used inktomi patents

Anyone else notice the really weird link supposedly of the Ipad Screen Resolution that contained the initial query of Domino Pizza Phone Number?

x's so it won't become a link!


If the stock went from 25 G$ to 250 M$, that's a decrease of 99% not 99.9%.

It peaked at 241 1/2, bottomed out at 24 cents in 2001 before going up 10x again.


Thanks for capturing the train of thoughts that seem to run through my head almost daily. Lots of lessons to learn from that experience. Although it is easier to connect the dots in the rearview mirror that it was looking forward at the time, There were some clear lessons about not forgetting the actual end user (which is not always your customer), using a single metric as a proxy for user experience, obsessing about a competitor and trying to get big instead of great.

As an Australian I used to use 'Anzwers' as it seem to give great relevance for Australian specific information. I think it used Inktomi?

I'm not sure why I switched to Google. Not to discredit the Google UX, but I think I switched because the name 'Google' was so catchy and eased it's way into my university's vernacular. "Just Google it" rolls of the tongue nicely.

Wow, not only is there still a search page under hotbot.com, its also still run by Lycos! It's like time travel...

I want to check out at that period of time does stuff in Google use Inktomi to do some experiment also?

From the article, it's clear that more than Google, inktomi killed itself.

I remember why I switched to Google: 1. speed, 2. simplicity. Relevance, meh. Never noticed much difference there. This led me to believe that PageRank was more about marketing (brilliant marketing) than technical edge.

Are you kidding? I was using Altavista, IIRC, and would routinely have to page down to the second or third page to find any kind of relevant link. When I switched to Google, I almost immediately stopped looking at more than the first page of results.

There was nothing wrong with Altavista's algorithm for "finding stuff" - it was just too vulnerable to SEO pollution. I remember the quality of it declining almost overnight once spammers (yes that's what SEOs are) figured it out.

I always found the best results on InfoSeek and then they were bought out and went away there was no choice but to use Google.

It wasn't even results.

Back in the day you could be on the second page of Google search results before Excite or Altavista had even loaded.

Yahoo wasn't much faster.

I think Google's high quality search results were less due to PageRank than the fact they AND'd search terms instead of OR'ing them.

PageRank was of paramount importance back then. At the time, AltaVista/Inktomi were easy to spam, and Google wasn't.

The google spam that actually works (or at least, worked a few years ago, before panda) requires setting up lots of sites, lots of independent IP blocks. That was much harder when PageRank appeared - hosting was damn expensive, VPSes were nowhere to be found.

PageRank was a huge thing. I used "and" queries in altavista at the time. It was no match.

Initially the no. 1 reason for me was peer pressure. It was simply unacceptable to use something other than Google in tech circles at some point. It was like having an @aol.com email address.

If your users are that passionate, you're doing something right.

After I gave it a go for a while, the other reasons you list made me stay.

> It was simply unacceptable to use something other than Google in tech circles at some point.

From what I recall, it was also the most programmer-friendly search engine at the time.





This is purely subjective, but Google is the only one that links to the Netlib 'real' LAPACK, straight to the file/documentation in question. The others have mixtures of Java packages and other examples...

Offtopic, but do you consider these results any good?


Basically, but you're getting the opinion of a random person on the internet :). The third result is the 'real' one (the top two are unit tests for the named function.

That's fine, its who I am targeting!

I was more curious to see if it was even in the ballpark. The relevance i'm not too worried about if the expected response is in the top 3.


Sounds much more like Inktomi killed Inktomi.

This was a very interesting read. I was working for a vertical search engine in this very same period. I, and the other engineers, also attempted to get our management to recognize what google was doing right. Unfortunately, we were delayed greatly by bad technology choices forced on us by venture capitalists (e.g. "build your search engine on top of Oracle! they say they have full text search, it will save you time!"[1]) and management short-sightedness ("our future is selling audio video search results to ISPs and portals, not being our own portal." -- this despite google not being a portal.) They actually got worried when the search box on our homepage started getting more queries than some of our larger customers.

Now, FWIW, I'm building another search engine. Instead of 20 engineers we have just me. Instead of 4 years, we're going to do it in one. While I have no interest in going up against google (different plans entirely) the radical change in leverage you get with open source and PAAS or IAAS, combined with Google's having taken their eye off the ball and run off to chase Facebook down a blind alley, means that something like DuckDuckGo actually could take real share away... maybe. (%1 of google's volume would be "Real share" right?)

[1] Oracle did have full text search but did not have the performance or per-machine-efficiency we needed, so it cost us a lot, and it was a constant fight to get it to do the kind of queries our relevancy algos required. we had a constant stream of consultants in from oracle HQ, and in the end dumped it and wrote our own db from scratch in about 4 months.

As you haven't mentioned what exactly your search engine will focus on, I'll just throw this idea out there. I would love a search engine that returned results equivalent to Google, yet offered me the privacy level that I desired. With Google, currently I am the product. I know this. I understand this. Thing this is, I don't want to be the product any more. I want to pay for the product on say a yearly fee which guarantees me I won't be sold ads or have my search habit information sold.

It seems like a nice idea. But I use other Google services like GMail, Google Docs, Google Maps, etc. If I switched to duckduckgo but still used the other Google products I'd feel wrong about it. There aint no such thing as a free lunch.

Now if their search results got so bad that other alternatives were clearly superior, that's different. They have an obligation to provide a good service to us the customers too, but we're not there yet. For me.

After the recent penguin update, their search results have got REALLY bad.

Don't you guys agree?

That's an opportunity, right there.

I imagine the opportunity for Google to offer this sort of "private search" (perhaps w/ complementing premium features) via mass licenses to (nervous) big corps would be one worth perusing. I imagine also it might be worth those big corps' money.

Have you tried http://www.startpage.com?


Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact