Yahoo had a search engine from 1995 to 2009. Yahoo is now a Bing reseller. There was a period around 2007 when Yahoo search was better than Google search. They pioneered integrated vertical search: special cases for weather, celebrities, and such. But Google copied that.
Blekko (2010-2015) had a scheme with "slashtags" which attracted a small following but never caught on. They were trying to crowdsource part of the problem. Eventually, Blekko was acquired by IBM's Watson unit, and ceased offering public search.
Bing, Microsoft's entry, remains active. Microsoft seems to have given up on trying to raise Bing's market share. Bing no longer has a CEO of its own; it's just a miscellaneous online service Microsoft provides. It's still #2 in search, but only has 7% market share.
There remain a few little search engines. Ask, formerly Ask Jeeves, continues to operate, but has only 0.17% market share. Ask is from IAC, in Oakland, a spinoff of Barry Diller's Home Shopping Network. Excite, formerly Excite@Home, with 0.02% market share, continues to operate. Excite, in its day, was a hot startup powered by too much venture capital.
Outside the US, there's Baidu (China) and Yandex (Russia). Neither has much traction outside their home countries.
It's possible to do a better search engine than Google from the user perspective. It's not clear how to get it to profitability. There are two things Google does badly - business legitimacy and provenance. Google doesn't background-check businesses online. (I do that with Sitetruth; it's not only possible, it could be done better with a tie-in to costly business background services such as Dun and Bradstreet.) This allows bogus and marginal businesses to reach the top of search via the usual SEO techniques. Google is also bad at provenance - figuring out that site A is using text derived from site B, and thus B should be ranked higher. This is what allows scraper sites to rank highly in Google.
Fix those two problems, and a new search engine could be better than Google. Whether anyone would notice is questionable. Profitability would be tough. The reward for success is high. Search ads are more relevant and more profitable than any other form of advertising. When someone sees a search ad, they're actively looking for the item of interest and may be ready to buy. Almost all other ads are interruptions or annoyances. That's the basic reason for Google's success.
What's needed is a search engine with functional queries (as opposed to Google, which now only operates in "the user is drunk" mode), that doesn't give a damn about your robots.txt, and that can capture content in a way that is more akin to archive.org than Google's shoddy and increasingly absent cache.
Another issue is spam/false matches. Why does Google return illegitimate results? Because, let me tell you, any search for "some nifty computer book pdf" returns pages upon pages of bogus links leading to ad link mazes. A crawler should be able to trivially crawl such a page, determine that no PDF is linked, and blacklist the result, but this doesn't happen.
Google is slow and preoccupied. Their business is ripe for disruption.
Disruption happens by doing something completely different that happens to cannibalize the incumbent's business in a way that they can't change. All these little features mentioned are something Google can easily do so these are not really disruptive.
Some even say bing has better search quality than google. But that's not what matters. No matter how good the product is, if you can't get distribution it doesn't matter. Also, even if you do have distribution, it may not be the right context so it won't be effective.
If anyone is trying to really build a search engine to compete directly against Google, I can only say good luck, and prepare to be the most patient person on earth, because it won't happen fast, and probably won't even happen. Instead, while you're slowly making progress day by day with small traffic increase, you'll see some random new thing come out that had no intention of becoming a "search engine" (just like Youtube) and "disrupt" Google.
I have seen a lot of people talk about "disrupting something", and also have seen a lot of people who actually have disrupted their industry. The former are just wannabes, and the latter never said anything about "disrupting" something. They just saw an opportunity and went for it. And it ended up disrupting. Hope this makes sense.
Google doesn't operate on the original pagerank anymore, there are tons of other things going on underneath. Also they now own many innovative AND popular "search UI" both through acquisitions and their R&D. Google Maps owns a lot of the map space, Youtube owns the video search space, and so on. Look deeper and you'll find even more that you've been taking for granted.
Baidu and Yandex are serious but are comfortable in their respective positions. That said, I wouldn't be surprised to see a Google challenger spring up from the other side of the world, where Google's position may not be seen as so unassailable.
Dude where did I say Google is immortal? I'm just saying it's a wrong approach to try to kill google this way. There are many good ways to do it and coming from "How do i disrupt Google" is the worst possible way.
> Obviously Microsoft has no chance, they're worse off than Google re: ossification and Bing is a hedge at best.
Nothing is obvious when we talk about disruption. Why does Microsoft have no chance? I am pretty sure they keep Bing alive because they are looking at the future where they finally gain control over the platform via VR/AR/or whatever, and then use Bing to power the experience on those platforms, which actually may be the case Google can't compete against. But this is Microsoft. If you're a small startup, you shouldn't be doing this.
> DDG is one step above a lifestyle business and only innovates at the UI level, even their efforts there are forgettable. Their major differentiator (data privacy) is meaningless to almost all users, and ironically torpedoes many interesting monetization strategies that are not ad-focused, as well as research initiatives.
Privacy is a great feature, especially in 2017, and I think DDG has the highest chance of getting any market share with this "let's make a search engine to disrupt google" approach (but again, this likely won't happen and is a foolish approach. It will stay down there as an "alternative" until the founder runs out of steam, OR if there's a huge shift in how people experience the Internet and somehow takes off completely independent from the company's effort)
> Baidu and Yandex are serious but are comfortable in their respective positions.
Baidu and Yandex will never work out side of their countries because of the cultural barrier. In fact it's not just those, every country with significant Internet penetration rate has their own dominant search engine. In those countries Google is nothing. Which means this cuts both ways. So No. Baidu and Yandex are NOT serious contenders just like Google is not a serious contender in Russia and China.
Privacy is probably slowly growing into a big deal in the minds of regular people, but when the incumbent works, and works better (in my experience Google does just return better results than DDG - and I really did give it a try, in earnest, for a solid week a few months back...), privacy alone isn't going to be enough at this point.
For me it is too, yeah. They even own duck.co which would be a better brand. It'd be a pain in the arse to tell others to use to start ("duck dot com?" "no, dot co") but nowhere near as silly sounding as duck duck go.
I've had the opposite experience lately, DDG has been almost identical to the google results, to the point where the seem to be calibrating there own searches with google. Unfortunately it means it also suffers the same problems.
I've understood it to be a reference to https://en.wikipedia.org/wiki/Duck,_duck,_goose; the wikipedia article on duckduckgo confirms.
Only large companies can afford to make browsers. They are insanely complicated beasts now. Google because ad and Android, Microsoft because windows and Bing, Apple because of ios. Firefox is facing struggles and Opera gave up to webkit.
Disrupting Google search is going to be very hard. Google invests more than anyone in AI. If someone were to disrupt Google, they have to have very strong models of the world, natural language parsing, image recognition, knowledge reasoning, troves of data and an ocean of servers and fibre.
Google hasn't been sitting idle, they've been disrupting themselves and pushing towards those fronts.
Maybe not. The Cuil experience is significant. The PR for the launch was good, there was a huge initial traffic spike, and then users discovered the product sucked and traffic dropped. The system at launch was terrible. Six months later it wasn't so bad, but nobody cared by then. They launched too early. But they did, briefly, have mindshare, lost through technical ineptitude.
When I'm seeking specific matches, I'm seeking really fucking specific matches. Google annoy me to no end.
Here it was documented in 2012:
There are (or at least were) others with related effects.
It used to force to search the word exactly as written.
I've done some large-scale searching where the most relevant detail is how many results are returned, most particularly for a specific domain. (For which, incidentally, there's no handy mechanism to accomplish, so it's <array of terms> * <array of domains>, and multiplicative explosion of searches, plus about a 45s timeout per query to avoid triggering bot defenses by Google.
Such as this:
The problem is interesting, but I think you make it seem easier than it is. What if there's a CAPTCHA, as there often is? Should the engine still ignore it? That could lead to a whole lot of missed content.
Use Tools > Verbatim, it's much faster than quoting every search term, and it works better.
My experience with Google is the opposite of this. When people are searching google they are looking for information. When people go to Amazon they have their credit card in their hand.
Most of my google-ad clicks (85% last check) are clearly not interested in buying what they are looking for. They are only on my page for seconds. If I try to qualify my ads better I lose 'quality score,' which is a measure that is entirely, at least to a first-order approximation, about whether or not people will click the ad. That, if my ad say 'This site is X' I get a good quality score but shitty leads. If I say 'This site is X for $49.99' I get shitty quality score but the people who come to the site (the clicks I have to pay for) are ready to buy.
The profitability isn't because their initial hypothesis of Search Marketing (that people are searching what they want to purchase and will therefore purchase) was correct, but because they are far-and-away the search winners and if you want to be found at all you have to play it.
At least, in my experience and IMHO.
And – as just another anecdote – I have had continuing success over the last 8 to 10 years with search ads, while not having a single conversion with anything else (Facebook, youtube, twitter etc.) as far as I can tell.
Also, it seems to me that both weaknesses described, legitimacy and provenance, are actually good for Gs core ad business, because if legitimate advertisers could reliably rank highly without paying, they'd have no interest to pay to out rank the competition. The competition who are buoyed by the nebulous practices Google pretends to frown upon, but for the sake of a delicate balance cannot justify completely stamping out.
The perhaps unpalatable and unutterable truth is that ad-supported search is a delicate business: be good enough to attract eyeballs, but not so good that advertisers who could pay you have no need to pay.
ActionCookBook accurately recreates my experience. https://twitter.com/actioncookbook/status/834439563032555521
ME: [views product online] Hmm. Nah.
PRODUCT: we meet again
ME: Sorry, no [changes again]
PRODUCT: bitch this ain't over
ME: FINE. FINE. I will buy this rug. Just leave me at peace.
REST OF INTERNET: this dude loves rugs, let's get him, boys
> Google is also bad at provenance - figuring out that site A is using text derived from site B, and thus B should be ranked higher.
The part I dispute is the last part. What matters is the user perspective. So of course a site that does nothing but scrape another one should rank lower, but many scrapers add value, if none other than UI-wise. So it's not obvious that the site of origin should rank higher.
(Also, how about DuckDuckGo? I've got the (admittedly gut) feeling that it should at least outperform Ask and Excite.)
And Bing, Yandex, and Yahoo have active crawlers.
Yeah, I don't think many people will care about the difference between good and perfect. You might be able to find a niche in search that Google is ignoring, but you would have a hard time expanding from there into general search.
In that sense Google is a bet against technology - you would invest in Google if you believe the Web's going to stay the same for a long time and nothing will replace it.
Without needing humans to tell them what the results mean.
So no, this will not be the next direction - unless there is a universal model of machines that understand things. But there isn't, and there won't be for at least another 10 years.
IMHO, the other pieces of the picture, and what i see happening, is search engines or a service, will index entities and relationships ( in the ontology sense ), and return results enriched with that data through something like an extended microdata vocabulary. There will be an API where machine consumers can query entities and relationships, and see those attributed to their sources in the webpage. This prediction is nothing new and has been consistently foretold by ai and information retrieval augurs for decades. The difference is that I see this moving beyond the realm of expert systems in large corporations and into the realm of being an API generally accessible to anyone.
I think it's very possible that, as you say, a current incumbent search engine may not consider the provision of such a service its job. Which makes things very interesting for potential new entrants in the semantic search market.
Other business types could surely supply these entities and relationships however who better to be involved in the supply chain than search engines because they hold a hose containing nearly all the world's information, and in the other hand a market hungry for all the world's information.
But this type of service could be niche because most humans are not going to care about getting ontology data in their search results. So even though parsing out entities and relationships could improve the usefulness and accessibility of information there might not be enough universal demand for someone like Google to really care about it.
With these caveats, I think the trend from information, to knowledge, is a very natural and already apparent progression for search engines. The type of API described here is possible today given the right incentives.
Edit: There should be a differentiator for new search engine, there is more room (problems to solve) in Q&A search and Discovery.
Google's clean interface (basically just a search box) was so revolutionary that it's hard to comprehend today - I still remember it took me a few days to even take it serious, it was so thin and lightweight. PageRank was a gigantic step forward in SERP relevancy and even though it was gamed eventually by link farms and paid backlinks, it took a lot of time and effort and the first few years saw few spammers.
Well, yes, it was simply that. It was the difference between a search engine that works and one that doesn't.
For example, most sites back then had tons of text with the same color of the background, with words that would get you an higher rank on the previous generation of search engines.
Also, the search engines had very heavy pages, mostly with news, and they were known as "portals". Google was a blast in comparison.
Before Google, the dominant search engine was AltaVista, which was awful. The UI was so crowded that you had to search for the search bar, and it was so full of graphics that if you were on dialup, loading took forever unless you turned images off.
And, yeah, PageRank helped. It wasn't uncommon for you to find what you were looking for on page 3 or 4. Google had what you were looking for on page 1, always. In fact, pre-Google search was so bad that if you were looking for anything obscure, you had to use multiple search engines. When I'd get serious about finding something, I'd fire up AltaVista, Infoseek (Disney/ABC/Go.com), Hotbot, WebCrawler (AOL), etc. in separate browser windows and search on all of them. And for the lazy, there was MetaCrawler, which automated this; I didn't use it often, mostly because I kept forgetting about it, but when I remembered to use it, it was a godsend.
At the time, their main competitors were already expanding into other markets, and Google was refreshing for doing one thing and doing it well, compared to its competition. It didn't clutter, it didn't try to distract with other services that they offered. It simply gave you the results you queried as a plain list of links, and got out of your way so you could browse those links at your leisure.
At that point Google had a smaller index, which hurt their results, but I thought that was better than getting porn results while at work.
If so, why is Yahoo Slurp still showing up in my apache logs?
All successful companies that came out of nowhere and disrupted already stable industry never started out thinking "How do i build another X", "How do I disrupt X"? They all built something they thought was needed by the world and it went onto somehow "disrupt X".
So if you're starting out thinking "I want to build a search engine if there's room for another.", that will never work because you don't even know what you're solving, you will be frantically searching for the question throughout your "startup" life.
Ask better questions, and you'll get better answers.
So, the question shouldn't be 'Is there room for another search engine', but perhaps 'Would a better search engine ...', 'what comes after search engines'.
I think even looking at the 'flaws' of google aren't really going to give you a game changer. You'll find a whole that Google can easily fill.
It's like magic. See it in action here: https://stackshare.io/match
A river was just a mapping from a database to ES. For example, you could search CouchDB with ES with the proper river set up.
ES actually started with Rivers being part of Elastic but since deprecated it. You will still find people talking about "rivers" though as a description for however they are updating the index.
Algolia provides things that elasticsearch do not, such as being hosted, being extremely simple to setup and having a default web interface for anyone to use.
algolia does that out of the box.
I wish Yahoo would have open sourced some of it's internal search tools. :( They weren't always simple to use, but they were stable.
Basically google can't trust backlinks anymore because people game them and competitors try to destroy each other's sites by buying scummy links to their stuff.
So they mainly attempt to measure quality in a vacuum. This is using their machine learning stuff to look at the quality, confidence, and reading level of the writing style.
They do the same quality checks for the site. Checking for EV certs, clean markup, real email volume through Gmail, reputable DNS provider, physical address in G maps. A lot of their hundreds of quality metrics don't measure the site itself, but use Google's pervasive data trove from their other services. Most scammers don't bother doing any of this right.
The problem becomes people like me. I setup sites with all measures of quality for legitimate businesses. Have articles written by good writers with knowledge in the subject. Sounds great right?
The problem is that these articles are still done for money and quite biased sometimes. Google is slowly running into a need for a strong AI because all measures of quality can be emulated if enough money is on the line. It doesn't matter if something seems truthful in every way except the fact that it isn't.
This is the same reason "fake news" is invading google and Facebook. Smart spammers have upped their game to the point that it's impossible to know what's real anymore.
Need a wikipedia article changed? Good reviews on Yelp? A nice piece on a popular tech website? All of this can be openly bought with zero consequences.
I would believe that, although there has to be several alternate ways to measure a site popularity, Google Analytics are a huge part of a site ranking. Do users stay long on your website? Do they come back to the search results after that? Do they click through to "Pricing", then back to "Features"? If so, that must be the right answer.
Alternate example: I would bet that shops where Google Maps geolocalises a lot of customers have a higher-ranked website than similar websites where their physical venues are empty.
Google is focused on getting you to a relevant result quickly, but having a search engine that helps you discover new things is really useful. If you focus on a niche, you can also make use of a lot of metadata Google doesn't retain.
I'm exploring this on a small scale with https://www.findlectures.com. Having the date a video was made gives it a 'street view for history' feel, and lets me rank historical content differently from conferences (where recency is more important).
Building a graph of talks, conferences / speakers / books / publishers could be the building blocks for a pagerank implementation, or to build a different type of book search. Alternately, I think it would be interesting if search engines let you do LSA style queries, like "Brian Goetz" - "Java" + "Python", to help discover speakers.
a.) storing more data about the sites (and doing something interesting with said data)
b.) improve the UI/UX for power users. The best part is that I can imagine that there would be quite a few people who would pay actual money for being able to use a better search engine. Note that the Bloomberg terminal, is, among others, a search engine. For example, you could make the link graph explicit, you would immediately see what sites link to what sites.
E.g. symbol search really leaves something to be desired on google. I also wish I could use regular expressions. I get it, they are expensive, but like even a little "expressiveness" goes far.
c.) i would pay A LOT for a good search engine for code.
Charge me $15-25 dollars per year
Let me decide what demographic information I wish to share- make it easy for me to control and help me protect my information. Because you are charging me money you can afford it and I trust you.
Give me two search options: one, I'm only seeking information. two, I'm looking to buy. Do this for me as an advertiser: help me qualify the clicks I'm paying for
Perhaps allow me to pay per 1000 impressions (CPM) instead of per click.
By the way, I would also subscribe to a facebook that did this.
However, I am in a thread now with with matt4077 who says google is his best online lead generator, so there are obviously things to learn.
The upfront capital investment, in terms of the data center capacity necessary to make a modern scraping and search infrastructure, is immense. And since the ad-word business model does not scale linearly with market share – e.g. the market leader collects a disproportionate share of the available profit – you will be losing additional money for a long time.
Since the market leader is good enough that it isn't possible to disrupt the market purely through result quality (as Google did), you will need to rely on bigger and more effective marketing spend. Not only will you have to outspend and outperform Google, but also Microsoft/Bing, who have tried to do the same thing for years, with only limited success.
Even if you have the funding necessary to do all of this, then you would be better off either buying shares in an existing search engine company, or starting a business in a different market, one with lower upfront costs and less dominant incumbents.
Why is this so?
If you have space for a single banner ad, you're going to use the network with the highest payout, which is the one with the most competitive bidders.
Google's real monopoly at this point is ad-side.
Except on mobile. Banner ads on mobile are growing.
The fact that a lot of us DuckDuckGo, and I hope they are profitable, is evidence that there is room for other search engines.
I would like to find a good substitute for Facebook, but the fact that so many people I know use it, that I always need to check Facebook two or three times a week to not miss out on stuff since many friends and family don't use email anymore.
Attending the Decentralized Web Conference last year got me excited about using smaller and Decentralized services. Gnu Social is pretty good, but requires work to find interesting people to follow.
However, if we expand the concept of "search" to something beyond text on webpages and "engine" to something beyond a linear algebra pagerank problem that weighs url links, there's room for many more competitors.
Let's say we want to search for "best restaurant":
Method #1 might be searching millions of web pages, twitter posts, newspaper archives, etc where ngram such as "best restaurant" is mentioned. That's what Google/Bing engines already do.
Method #2 might rank restaurants by collecting crowd-sourced opinions. That's what Yelp & Tripadvisor does. (Although Google also piggybacks on their data and lists yelp pages in SRP.)
Method #3 might be a company like Visa/Mastercard analyzing their billions of transactions and based on actual spending amounts & frequency of a billion cardholders, they can also provide their own calculation of a "best restaurant". (I know that Visa/MC already offer limited marketing data to some entities but they don't surface that data to every day web surfers.)
The idea is that there's plenty of room for more imaginative scenarios of #2 & #3. The common theme is that Google doesn't have the data (e.g. credit-card transactions) and therefore, the new "search engines" can give fresh answers that Google algorithms can't provide. To try and boil it down to a simple question: "What interesting answers can a new engine provide that _can't_ be extracted from the text of webpages?"
Btw, I ran across some posts from a Microsoft employee (but not a Bing team member) stating his opinions on building competing search engines. https://news.ycombinator.com/item?id=7011472
The idea is that what people actually pay for is a different set of vector inputs compared to what people submit reviews for (Yelp/TA) or what people link to (blog with links to favorite city restaurants.) Google's search index is over 100 petabytes but even that gigantic database is missing lots of data that other entities can collect and convert into uncontested search results.
There's too many clones on the market right now. Some with good purpose, like DuckDuckGo which can be simplified to "Google but without privacy invasion."
Others like Bing could be just "Google but clunkier." (my opinion)
If you've got an idea on your hands that can't be described as "Google but..." then there's definitely room for another.
1) a good sitewide search engine. Google's offer is laughable, and Algolia is too developer-centric (requires pushing the data through API). What I'd want is a single input field where I can put my site's main page URL — and get a working search in a few minutes.
2) subscriptions / monitoring. I want to monitor some event or topic, and I want the updates to be delivered to e.g. my WhatsApp/Telegram/Slack/whatever, with smart filtering, refining etc (in lieu of frantically Googling / redditing / refreshing Twitter feed)
3) context-preserving interactive search, that can ask me questions/ refine results.
4) Timeline search interface for news / events / company history etc. I want to be able to put the name of a person, or company, or TV series, and get a comprehensive timeline view of all things happened there.
I have a lot more ideas, and zero free time :(
2) Mention (http://mention.net)
3) Jelly (didn't work. Maybe there's a reason?)
4) Google / Wikipedia.
Unless you can build something 10x better than what exists,
I mean, the modern idea of the internet is pretty much useless without a search engine, and we've been spoiled by the power we get through Google — the phrases on this very page get indexed within literally seconds; I just tried a literal search for a sentence from a 3-minute-old comment here — but it's really not a good idea for a single company to have so much authority.
This really isn't something to keep relying on a small handful of companies for, especially once we have interplanetary internet. :)
I would look forward to search engines that are topic specific. However, the blocker is having the information available in the first place, so I doubt if this will ever happen.
Google is beginning to show signs of accidental self-sabotage. Their AMP approach was so aggravating for me on mobile that I literally switched search engines to avoid it. And their insistence on scraping and summarizing things and trying to prevent you from even visiting other sites is slowly ruining even desktop searches. They are in danger of disruption.
I don't know what will replace it. Chat bots could be one. Much better understanding of context. Or providing answers based on the knowledge that is spread on many separate web pages. or actually taking action (if you are searching, it's to do something, not to read a page).
But "finding a page" will sound really silly 20 years from now.
relevance can be relative to language/origin ; number of links away from a wikipedia article ; coolness (has it a link from a known twitter account/news aggregator) ; age group ; a link from HN frontpage or slashdot would make it 'nerdy', news - was it referenced by a news source etc. etc.
I think a differentiator would be to have a non intrusive and intuitive UI to select an available relevance model (instead of trying to profile the user based on his search history / browser history)
On the one hand the user profile is of great value for advertising, but on the other hand the explicit choice of relevance model can be used to match relevant adds.
Exposing options is rarely a good idea, as it only reaches a single-digit percentage of users.
And they say profiles are oh-so-relevant, but as far as I can tell, Google's main product (search ads) is still tied almost exclusively to keyword, region, and language.
Don't know: try to do the same Google search from different accounts. I think the results will be quite different...
Are there areas where Google can't go?
That's why Google dropped the + operator. Lots of people using it hasn't always made it better.
EDIT: I mean, Matt Cutts has a blog post saying that Google dropped the + operator because most people didn't use it, and when they did use it they used it wrong.
Probably not great for consumers that the #2 offering is almost 1/3rd as popular as the leader.
Which says two things...
- Yes, there's room for a better number 2.
- But, if the best Microsoft can do is 1/3rd as popular, how well would a new entrant fare?
It seems like you would need some new feature that makes you significantly better than Google to stand a chance.
Also, the barrier to entry here is enormous. The spend to be at least as large an index, and as fresh an index, and as relevant results as...Google, is big.
That's a much harder problem to solve today then when Google trounced AltaVista in 2000. Now search engines are tightly integrated into browsers.
One hint: I switched to Google when they released a browser toolbar. I even remember deciding to switch to whoever released a browser toolbar. What's today's equivalent of a browser toolbar?
I try to build a curation platform with https://curlz.org - just at a planning stage at the moment, but did raise interest in some dmoz admins and editors.
I've been using startpage for the last 5 years and I'm not looking back. I woudn't have any problem using any other search engine, nowdays any search engine works. The 3 or 4 times that I googled these years I found it pretty weird.
tl;dr: There's plenty of room, just not enough gray matter. :)
But my point is my searches usually end up in:
wikipedia, blogpost, *overflow, twitter, reddit, papers ...
So making a search engine for only those sites, could be a usable search engine indeed and I would use it if I had to. :)
DuckDuckGo is a fine search engine and I believe they'd benefit a lot from services like ads, email, blogs, docs, shops, apps, social, etc.
And doesn't Google return better search results for Reddit already?
This may be the 21st century, but...
I often find that I search for "vegan pancake recipe" and end up at a page with lots of images and it is very hard for me to find the ingredients list. Google does a poor job here. They should give preference to simpler sites where it is easier for me to find the information I'm looking for. Instead, they seem to actually give preference to complex sites. If their job is to help me search for the information, then they shouldn't give links to haystacks. They have tried to improve upon this with their answer feature where they quote websites. This is, IMO, the wrong way to do things.
Instead, the search engine should be a desktop application which is more pervasive than a website can be. It needs to run natively, and not be cloud based, both for privacy, performance, and for the ability to integrate well with the system.
When I search for "vegan pancake recipies", if the search engine is going to give me a result which contains 3-5 pages of text and images before the ingredients list, it should automatically scroll the web browser window down to the actual recipe.
This desktop application should also build a context profile based on what I am doing on my computer. This context profile shouldn't be uploaded to the internet, but it is still usefull. For example, I should be able to select a string in my terminal and press the search icon in the sytem tray. This should bring up a stack exchange question containing the exact text of the string I selected.
I should also be able to select a set of websites which I want to use as my search "domain". I might give my search domain as "the documentation to Python3, the Docker API reference, and stack exchange". This whould make it so that those "feeling lucky" links would work much better.
The search engine should also present image results which are NOT WATERMARKED before ones which are!
I should be able to write a markup for things I need to search for, and then enter a "search engine research wizard". The makup would look like so:
"We had a great time at [Park on that hill in prague???] park. It was so sunny! The temperature was [Prague temperature on 27th of march ???] which is [Average Prague temperatures in March ???] for this time of year."
The search engine would then, when shown this text, would allow you to right click on the bracketed areas, search for the text in them, and then, by selecting parts of wikipedia articles, fill in the blanks.
The search engine should use accessibility APIs to record the text of the windows that I have open. I should then be able to use the search engine as a kind of memory store which I can search. If I want to know what that awsome new tiling window manager written in Rust was called, I should be able to search full text of my browsing history and open up the previous HN page where the tiling window manger was presented.
Based on what I've heard from leaks and various news articles, the leadership of the search division was once adamantly against using neural networks for search.
Things took a sudden turn maybe 5 years ago. Something caused google to do a complete 180 and vastly increase their investment in AI research. They saw something that scared them.
I think it was their unexpected success using AI for machine translation. There was much PR about it at the time, and I think it really got the gears turning at Google HQ. You see, the same language processing needed for machine translation has obvious parallels to search.
The more curious employees began applying word vectors used for translation to search. After all, most of it had been trained on index data from multi lingual websites anyways. They found that, horrifically, rather simple neural networks sometimes outperformed the the search algorithms google had spent billions on.
When this reached upper management it set off a quiet panic. Google, once seen as invincible, could have been beaten by a start-up using effective ML techniques. Computer power calculations showed that this could have been done since a few years after the debut of CUDA, a window of vulnerability of maybe 7 years.
The timeframe of around 2005-2010 coincided with Google spinning off a bunch of moonshot projects and doubling down on their core businesses. Coincidence? Maybe, but I don't think so. I wish a Xoogler or two would come out of the woodwork and tell me if I'm crazy or not.
Anyways, Google usually has a 5-7 year lag time when they release details of their tech to the public. This dates Tensorflow and their heavy AI work to around 2010. The window where somebody could have beat them easily with ML was probably 2005-2010
I would address the money issue first before thinking about how to make a better search for some market.
Google existed and grew (rapidly) for quite a while before they launched Adwords, which they introduced in 2000. Their growth in '99 was pretty meteoric (prompting a $25M investment from KP and Sequoia) before it rolled out PPC monetization. The Adwords model wasn't new at all-- Goto.com was the first search engine to bet on that model.
Google won because it was a massively better search engine... Not just 10% better-- it was "holy crap" better on a mess of fronts (notably: serving up what you were looking for).
With a better search experience users came and advertising became a major money maker.