> I mean, when Google started the internet was actually small enough that it could be indexed relatively cheaply. These days, the internet is so enormous, and user expectations around hyper local content and being constantly up to date are so high, I don't see how one could begin to compete without billions.
But you're dealing with insanely long-tailed distributions and the "meat" of the search engine business is in the fat heads of those distributions.
(1) A small proportion of queries makes up a huge proportion of query events that you'll see through the day.
(2) For any given query, a small proportion of users will make up a huge proportion of the opportunity to monetize (researching a planned purchase, looking for a job etc).
(3) For any given query, an infinitesimally tiny proportion of the documents on the web is where the value to the user actually is.
I think there are potentially many ways of making selections on each of those three axes and ending up with a viable business based on a manageably small search index E.g. indeed as a job search engine or amazon as a product search engine have manageably small document collections, great value to users and a stable user base, and great opportunities to monetize.
...from that point of view I find it surprising that there aren't a lot more search engine businesses.
Case in point: I really like wiby.me, the 90s nostaliga search engine. I don't think they are a profitable business, but I certainly think they could & should be.
And even Google is making very deliberate choices along those three dimensions, rather than naively indexing the web and naively executing keyword search against that index.
Dimension 1: Given a query, reinterpret as follows. Catch eyeballs by delivering entertainment value ("pizza" -> "entertaining videos related to pizza"). Monetize those eyeballs by reinterpreting as local queries ("pizza" -> "restaurants near me wanting to sell me pizza") or products queries ("pizza" -> "online shops trying to sell me pizza ovens or other pizza-related items")
Dimension 2: Given a query and document, always make relevance decisions on behalf of the kind of audience with disposable income. E.g. "farming" shows a lot of stuff you'd want to read if you're the kind of person paying £10 for a potato in Borough Market and nothing that you'd want to read if you're a subsistence farmer in Namibia.
Dimension 3: When comparing two documents for inclusion in the pool of documents that stand any chance of coming up on page #1, prefer recency over authoritativeness. E.g. for "programming languages", apparently "The 9 Best Programming Languages to Learn in 2021" is considered relevant while "Go To Statement Considered Harmful" or the Mozilla Developer Network are considered irrelevant.
There are huge audiences who don't agree with these choices that Google is making and who are just waiting to switch if someone comes along making better choices or using another product alongside Google if someone comes along simply just making different choices that are useful in some way.
So, again, far from the sentiment that any venture in the web search space is doomed given the size of the web and the existence of Google, I'm positively baffled by the fact that there isn't a lot more activity there.
But you're dealing with insanely long-tailed distributions and the "meat" of the search engine business is in the fat heads of those distributions.
(1) A small proportion of queries makes up a huge proportion of query events that you'll see through the day. (2) For any given query, a small proportion of users will make up a huge proportion of the opportunity to monetize (researching a planned purchase, looking for a job etc). (3) For any given query, an infinitesimally tiny proportion of the documents on the web is where the value to the user actually is.
I think there are potentially many ways of making selections on each of those three axes and ending up with a viable business based on a manageably small search index E.g. indeed as a job search engine or amazon as a product search engine have manageably small document collections, great value to users and a stable user base, and great opportunities to monetize.
...from that point of view I find it surprising that there aren't a lot more search engine businesses.
Case in point: I really like wiby.me, the 90s nostaliga search engine. I don't think they are a profitable business, but I certainly think they could & should be.
And even Google is making very deliberate choices along those three dimensions, rather than naively indexing the web and naively executing keyword search against that index.
Dimension 1: Given a query, reinterpret as follows. Catch eyeballs by delivering entertainment value ("pizza" -> "entertaining videos related to pizza"). Monetize those eyeballs by reinterpreting as local queries ("pizza" -> "restaurants near me wanting to sell me pizza") or products queries ("pizza" -> "online shops trying to sell me pizza ovens or other pizza-related items")
Dimension 2: Given a query and document, always make relevance decisions on behalf of the kind of audience with disposable income. E.g. "farming" shows a lot of stuff you'd want to read if you're the kind of person paying £10 for a potato in Borough Market and nothing that you'd want to read if you're a subsistence farmer in Namibia.
Dimension 3: When comparing two documents for inclusion in the pool of documents that stand any chance of coming up on page #1, prefer recency over authoritativeness. E.g. for "programming languages", apparently "The 9 Best Programming Languages to Learn in 2021" is considered relevant while "Go To Statement Considered Harmful" or the Mozilla Developer Network are considered irrelevant.
There are huge audiences who don't agree with these choices that Google is making and who are just waiting to switch if someone comes along making better choices or using another product alongside Google if someone comes along simply just making different choices that are useful in some way.
So, again, far from the sentiment that any venture in the web search space is doomed given the size of the web and the existence of Google, I'm positively baffled by the fact that there isn't a lot more activity there.