Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Is there room for another search engine?
149 points by _6cj7 on Mar 26, 2017 | hide | past | web | favorite | 193 comments

That's a good question, and something I've spent much time on. Cuil (2008-2010) tried. I knew some of those people. It cost them about $30 million to launch a full scale search engine. They had no revenue model. In retrospect, they were hoping to be acquired by somebody. It was some ex-Google people, trying to replicate older Google technology. They had a great launch, but the system wasn't very good and traffic rapidly fell off. Their technology wasn't that great. Their big selling point was that they could do the job on less hardware than Google used.

Yahoo had a search engine from 1995 to 2009. Yahoo is now a Bing reseller. There was a period around 2007 when Yahoo search was better than Google search. They pioneered integrated vertical search: special cases for weather, celebrities, and such. But Google copied that.

Blekko (2010-2015) had a scheme with "slashtags" which attracted a small following but never caught on. They were trying to crowdsource part of the problem. Eventually, Blekko was acquired by IBM's Watson unit, and ceased offering public search.

Bing, Microsoft's entry, remains active. Microsoft seems to have given up on trying to raise Bing's market share. Bing no longer has a CEO of its own; it's just a miscellaneous online service Microsoft provides. It's still #2 in search, but only has 7% market share.

There remain a few little search engines. Ask, formerly Ask Jeeves, continues to operate, but has only 0.17% market share. Ask is from IAC, in Oakland, a spinoff of Barry Diller's Home Shopping Network. Excite, formerly Excite@Home, with 0.02% market share, continues to operate. Excite, in its day, was a hot startup powered by too much venture capital.

Outside the US, there's Baidu (China) and Yandex (Russia). Neither has much traction outside their home countries.

It's possible to do a better search engine than Google from the user perspective. It's not clear how to get it to profitability. There are two things Google does badly - business legitimacy and provenance. Google doesn't background-check businesses online. (I do that with Sitetruth; it's not only possible, it could be done better with a tie-in to costly business background services such as Dun and Bradstreet.) This allows bogus and marginal businesses to reach the top of search via the usual SEO techniques. Google is also bad at provenance - figuring out that site A is using text derived from site B, and thus B should be ranked higher. This is what allows scraper sites to rank highly in Google.

Fix those two problems, and a new search engine could be better than Google. Whether anyone would notice is questionable. Profitability would be tough. The reward for success is high. Search ads are more relevant and more profitable than any other form of advertising. When someone sees a search ad, they're actively looking for the item of interest and may be ready to buy. Almost all other ads are interruptions or annoyances. That's the basic reason for Google's success.

There are way more than two things that Google does wrong. Remapping my search terms into oblivion so it can pretend it's fast is the worst one. Especially when this happens to a query I've modified to quote "every" "single" "flipping" "term." I think Google is cheating, and that their usable index is much shallower than they'd have you believe.

What's needed is a search engine with functional queries (as opposed to Google, which now only operates in "the user is drunk" mode), that doesn't give a damn about your robots.txt, and that can capture content in a way that is more akin to archive.org than Google's shoddy and increasingly absent cache.

Another issue is spam/false matches. Why does Google return illegitimate results? Because, let me tell you, any search for "some nifty computer book pdf" returns pages upon pages of bogus links leading to ad link mazes. A crawler should be able to trivially crawl such a page, determine that no PDF is linked, and blacklist the result, but this doesn't happen.

Google is slow and preoccupied. Their business is ripe for disruption.

Those are all things Google can already do but decide not to. Which means if any startup attempts and gets a moderate success, google will of course decide to do the same. Your startup will be nothing more than an R&D department Google uses for free.

Disruption happens by doing something completely different that happens to cannibalize the incumbent's business in a way that they can't change. All these little features mentioned are something Google can easily do so these are not really disruptive.

Keep in mind that it's been 20 years since Google's differantiator (PageRank) was invented. It's been about 10 years since Google's last meaningful changes to its search UI (autocomplete and "instant" search). So, while I respect Google's omniscience and omnipotence, I disagree strongly that Google/Alphabet are in a position to respond to a serious upstart competitor. Their primary response, as you hinted would be to acquire/acquihire, which is unimaginative and not exactly how you remain on the cutting edge. That strategy falls on its face as soon as it meets that one damn company that doesn't want to be purchased.

There are already many "serious upstarts" that are doing just fine but will never reach the status of Google, like duckduckgo, bing, etc.

Some even say bing has better search quality than google. But that's not what matters. No matter how good the product is, if you can't get distribution it doesn't matter. Also, even if you do have distribution, it may not be the right context so it won't be effective.

If anyone is trying to really build a search engine to compete directly against Google, I can only say good luck, and prepare to be the most patient person on earth, because it won't happen fast, and probably won't even happen. Instead, while you're slowly making progress day by day with small traffic increase, you'll see some random new thing come out that had no intention of becoming a "search engine" (just like Youtube) and "disrupt" Google.

I have seen a lot of people talk about "disrupting something", and also have seen a lot of people who actually have disrupted their industry. The former are just wannabes, and the latter never said anything about "disrupting" something. They just saw an opportunity and went for it. And it ended up disrupting. Hope this makes sense.

p.s. Google doesn't operate on the original pagerank anymore, there are tons of other things going on underneath. Also they now own many innovative AND popular "search UI" both through acquisitions and their R&D. Google Maps owns a lot of the map space, Youtube owns the video search space, and so on. Look deeper and you'll find even more that you've been taking for granted.

We can agree to disagree about Google's immortality, but I take issue with you treating DDG and Bing as serious challengers to Google. Obviously Microsoft has no chance, they're worse off than Google re: ossification and Bing is a hedge at best. DDG is one step above a lifestyle business and only innovates at the UI level, even their efforts there are forgettable. Their major differentiator (data privacy) is meaningless to almost all users, and ironically torpedoes many interesting monetization strategies that are not ad-focused, as well as research initiatives.

Baidu and Yandex are serious but are comfortable in their respective positions. That said, I wouldn't be surprised to see a Google challenger spring up from the other side of the world, where Google's position may not be seen as so unassailable.

While google has a much stronger brand than bing, they give roughly equivalent results except for a few categories such as "paste the error message I just got into the search box" queries. Google's much better for those longer queries. HN readers are probably more likely than average to make them, too.

Hm yeah. I use DDG except for error messages when I use Google.

> We can agree to disagree about Google's immortality

Dude where did I say Google is immortal? I'm just saying it's a wrong approach to try to kill google this way. There are many good ways to do it and coming from "How do i disrupt Google" is the worst possible way.

> Obviously Microsoft has no chance, they're worse off than Google re: ossification and Bing is a hedge at best.

Nothing is obvious when we talk about disruption. Why does Microsoft have no chance? I am pretty sure they keep Bing alive because they are looking at the future where they finally gain control over the platform via VR/AR/or whatever, and then use Bing to power the experience on those platforms, which actually may be the case Google can't compete against. But this is Microsoft. If you're a small startup, you shouldn't be doing this.

> DDG is one step above a lifestyle business and only innovates at the UI level, even their efforts there are forgettable. Their major differentiator (data privacy) is meaningless to almost all users, and ironically torpedoes many interesting monetization strategies that are not ad-focused, as well as research initiatives.

Privacy is a great feature, especially in 2017, and I think DDG has the highest chance of getting any market share with this "let's make a search engine to disrupt google" approach (but again, this likely won't happen and is a foolish approach. It will stay down there as an "alternative" until the founder runs out of steam, OR if there's a huge shift in how people experience the Internet and somehow takes off completely independent from the company's effort)

> Baidu and Yandex are serious but are comfortable in their respective positions.

Baidu and Yandex will never work out side of their countries because of the cultural barrier. In fact it's not just those, every country with significant Internet penetration rate has their own dominant search engine. In those countries Google is nothing. Which means this cuts both ways. So No. Baidu and Yandex are NOT serious contenders just like Google is not a serious contender in Russia and China.

I wonder, is DuckDuckGo's name a problem for adoption? I have no idea what it's supposed to mean, but it sounds very silly and I don't much enjoy recommending it to people in real life, at least in part for that reason. ("Google" was a silly name too, but at least it was just one syllable of silliness.)

Privacy is probably slowly growing into a big deal in the minds of regular people, but when the incumbent works, and works better (in my experience Google does just return better results than DDG - and I really did give it a try, in earnest, for a solid week a few months back...), privacy alone isn't going to be enough at this point.

Yeah like you pointed out, if your product stands out so much that nothing else matters, then you can name it whatever you want. But duckduckgo is just a glorified meta search engine and they are not doing anything completely unique. In this case branding matters and I think their name is really bad for branding. There are many examples of really bad naming, like cockroachdb and swagger. Swagger ended up changing their name to OpenAPI (Thank god) but I don't know what the cockroachdb guys are thinking. I don't want to use a DB named after a bug (pun intended).

> I wonder, is DuckDuckGo's name a problem for adoption

For me it is too, yeah. They even own duck.co which would be a better brand. It'd be a pain in the arse to tell others to use to start ("duck dot com?" "no, dot co") but nowhere near as silly sounding as duck duck go.

> Privacy is probably slowly growing into a big deal in the minds of regular people, but when the incumbent works, and works better (in my experience Google does just return better results than DDG - and I really did give it a try, in earnest, for a solid week a few months back...), privacy alone isn't going to be enough at this point.

I've had the opposite experience lately, DDG has been almost identical to the google results, to the point where the seem to be calibrating there own searches with google. Unfortunately it means it also suffers the same problems.

>I have no idea what it's supposed to mean

I've understood it to be a reference to https://en.wikipedia.org/wiki/Duck,_duck,_goose; the wikipedia article on duckduckgo confirms.

Is this an American thing? I have never heard of it.

I played Duck Duck Goose when I was a kid in Australia.

I played Duck Duck Goose when I was a kid in the United Kingdom.

I'm convinced that the only reason Microsoft is keeping Bing alive is to keep Windows 10 users from funneling money to Google by default.

Windows use is pretty large and Bing is profitable. Edge is funded by bing profits I believe.

Only large companies can afford to make browsers. They are insanely complicated beasts now. Google because ad and Android, Microsoft because windows and Bing, Apple because of ios. Firefox is facing struggles and Opera gave up to webkit.

Disrupting Google search is going to be very hard. Google invests more than anyone in AI. If someone were to disrupt Google, they have to have very strong models of the world, natural language parsing, image recognition, knowledge reasoning, troves of data and an ocean of servers and fibre.

Google hasn't been sitting idle, they've been disrupting themselves and pushing towards those fronts.

prepare to be the most patient person on earth, because it won't happen fast, and probably won't even happen

Maybe not. The Cuil experience is significant. The PR for the launch was good, there was a huge initial traffic spike, and then users discovered the product sucked and traffic dropped. The system at launch was terrible. Six months later it wasn't so bad, but nobody cared by then. They launched too early. But they did, briefly, have mindshare, lost through technical ineptitude.

I think the bigger problem with Cuil is they over-hyped their offering before launch which could lead to nothing but disappointment.

I'd argue that showing answers on the search result page is a bigger innovation than "instant" was. It also requires much more new technology.

They could erase competitors from results.

To disable remapping of terms in Google, use verbatim search: On the search results page, choose "Search tools -> All results -> Verbatim"

There's no syntax of which I'm aware which tells Google to not "correct" or expand the results.

When I'm seeking specific matches, I'm seeking really fucking specific matches. Google annoy me to no end.

I believe it was possible in the past using URL parameters (e.g. complete=0 to disable auto-suggest) but it does not seem to work anymore.

Here it was documented in 2012:


There are (or at least were) others with related effects.

Boolean search would be wonderful


It used to force to search the word exactly as written.

"Used to" is the operative phrase. That was among the negative consequences of Google+, as +<term> notation apparently was going to be reserved or repurposed for that somehow, but never was.

I've done some large-scale searching where the most relevant detail is how many results are returned, most particularly for a specific domain. (For which, incidentally, there's no handy mechanism to accomplish, so it's <array of terms> * <array of domains>, and multiplicative explosion of searches, plus about a 45s timeout per query to avoid triggering bot defenses by Google.

Such as this:


The plus/+ notation has been removed. You must know surround the term in quotes "word".

I wish there was a way to use both Verbatim and Date Range at the same time: you end up having to choose between recent but irrelevant results, or relevant but obsolete results.

Another issue is spam/false matches. Why does Google return illegitimate results? Because, let me tell you, any search for "some nifty computer book pdf" returns pages upon pages of bogus links leading to ad link mazes. A crawler should be able to trivially crawl such a page, determine that no PDF is linked, and blacklist the result, but this doesn't happen.

The problem is interesting, but I think you make it seem easier than it is. What if there's a CAPTCHA, as there often is? Should the engine still ignore it? That could lead to a whole lot of missed content.

I'd argue that paywalled sites, Captcha-guarded pages, things like that are second-class content and should be treated as such. If somebody wants to dig deeper and see such results, that should be an option, but as you said, this is a tough problem to solve satisfactorily and the truth lies somewhere inbetween. An obstacle to this happening right now is that Google doesn't want to shake the advertising tree too hard. Their business model is spammy so they tolerate spamminess. That's the hardest problem of all, and one I don't think anybody's close to solving yet: How do you monetize web search without automatically wrapping results in ads? But some bright person will come up with an answer and hopefully defeat this huge conflict of interest.

> a query I've modified to quote "every" "single" "flipping" "term."

Use Tools > Verbatim, it's much faster than quoting every search term, and it works better.

Any search crawler that ignores robots.txt is going to be blocked by site operators in a hurry.

If Google today started ignoring robots.txt, not many people would start blocking Google's crawlers, assuming they continued to do their job efficiently. robots.txt is security by obscurity at best.

People wouldn't block Google because allowing Google scraping offers a return in the form of more traffic that offsets the cost. $BRAND_NEW_ENGINE wouldn't have that advantage.

Google's crawlers don't exactly perform 2FA before they crawl a website. If impersonating Google's crawlers doesn't suit your fancy, there are all manner of ways to anonymize a crawler. So I think blocking is one of the least interesting challenges. The other side of the coin is robots.txt is not inherently adversarial, and a crawler could waste quite some time and energy crawling truly meaningless content. That, in my mind, is the interesting challenge: Obsoleting robots.txt with an intelligent (and gentle) crawler.

Google doesn't feel like being impersonated so they offer an easy way to authenticate Googlebot:


They do ignore robots.txt sometimes, the pages just font end up in serps.

And more importantly: It's illegal :D

When someone sees a search ad, they're actively looking for the item of interest and may be ready to buy.

My experience with Google is the opposite of this. When people are searching google they are looking for information. When people go to Amazon they have their credit card in their hand.

Most of my google-ad clicks (85% last check) are clearly not interested in buying what they are looking for. They are only on my page for seconds. If I try to qualify my ads better I lose 'quality score,' which is a measure that is entirely, at least to a first-order approximation, about whether or not people will click the ad. That, if my ad say 'This site is X' I get a good quality score but shitty leads. If I say 'This site is X for $49.99' I get shitty quality score but the people who come to the site (the clicks I have to pay for) are ready to buy.

The profitability isn't because their initial hypothesis of Search Marketing (that people are searching what they want to purchase and will therefore purchase) was correct, but because they are far-and-away the search winners and if you want to be found at all you have to play it.

At least, in my experience and IMHO.

I'm not sure if it works that way, but if you change to Pay-Per-Conversion, it would be in Google's interest to make ad quality depend on conversions-per-view.

And – as just another anecdote – I have had continuing success over the last 8 to 10 years with search ads, while not having a single conversion with anything else (Facebook, youtube, twitter etc.) as far as I can tell.

What do you sell?

One could argue that a monopoly has no interest in improving service.

Also, it seems to me that both weaknesses described, legitimacy and provenance, are actually good for Gs core ad business, because if legitimate advertisers could reliably rank highly without paying, they'd have no interest to pay to out rank the competition. The competition who are buoyed by the nebulous practices Google pretends to frown upon, but for the sake of a delicate balance cannot justify completely stamping out.

The perhaps unpalatable and unutterable truth is that ad-supported search is a delicate business: be good enough to attract eyeballs, but not so good that advertisers who could pay you have no need to pay.

> Search ads are more relevant and more profitable than any other form of advertising. When someone sees a search ad, they're actively looking for the item of interest and may be ready to buy.

ActionCookBook accurately recreates my experience. https://twitter.com/actioncookbook/status/834439563032555521

   ME: [views product online] Hmm. Nah.
   [changes website]
   PRODUCT: we meet again
   ME: Sorry, no [changes again]
   PRODUCT: bitch this ain't over
   ME: FINE. FINE. I will buy this rug. Just leave me at peace.
   REST OF INTERNET: this dude loves rugs, let's get him, boys

Thanks for this excellent comment. I wonder if this is true though:

> Google is also bad at provenance - figuring out that site A is using text derived from site B, and thus B should be ranked higher.

The part I dispute is the last part. What matters is the user perspective. So of course a site that does nothing but scrape another one should rank lower, but many scrapers add value, if none other than UI-wise. So it's not obvious that the site of origin should rank higher.

7% for Bing? That's huge!

(Also, how about DuckDuckGo? I've got the (admittedly gut) feeling that it should at least outperform Ask and Excite.)

DuckDuckGo didn't make the world list.[1] A US survey has them at 0.41%.[2]

[1] https://www.netmarketshare.com/search-engine-market-share.as... [2] https://www.searchenginejournal.com/august-2016-search-marke...

They are a privacy minded search engine, it's quite possible such a survey would underestimate their share.

DuckDuckGo doesn't crawl/index by itself--it partners with other companies to use their search indexing. Bing is one of their primary sources of indexing, actually, although Wikipedia tells me that DDG's indexing is a compilation of "about 50 sources".


DuckDuckGo do have their own crawler: https://duckduckgo.com/duckduckbot, but they are dependent on the indexes of others.

Interesting, I had no idea. I wonder if they have or ever had plans to index themselves? The wiki link mentions they have do their own crawler.

Yeah but neither do some of the other mentioned search engines. Apparently DDG just is even smaller than I thought.

Ahrefs crawls actually more than any search engine other than Google: https://www.incapsula.com/blog/most-active-good-bots.html

And Bing, Yandex, and Yahoo have active crawlers.

Ask and Excite and all of the other small search engines are showing you Bing and Google results.

"Fix those two problems, and a new search engine could be better than Google. Whether anyone would notice is questionable."

Yeah, I don't think many people will care about the difference between good and perfect. You might be able to find a niche in search that Google is ignoring, but you would have a hard time expanding from there into general search.

In that sense Google is a bet against technology - you would invest in Google if you believe the Web's going to stay the same for a long time and nothing will replace it.

The next direction in search seems to be direct question-answering. Echo (Amazon) is interesting in that, being voice only, it can't punt to a screen of search results. It has to answer the question.

The next direction in search is search that machines can use.

Without needing humans to tell them what the results mean.

This isn't task of the search engine, it's the task of the machine that uses it. If the machine doesn't understand meaning there is no way the search engine can teach it that meaning.

So no, this will not be the next direction - unless there is a universal model of machines that understand things. But there isn't, and there won't be for at least another 10 years.

That perspective assumes a result-consumers-have-strong-ai version. That's not what i think will happen and it's not the only thing that can happen because i think that misses some pieces of the picture.

IMHO, the other pieces of the picture, and what i see happening, is search engines or a service, will index entities and relationships ( in the ontology sense ), and return results enriched with that data through something like an extended microdata vocabulary. There will be an API where machine consumers can query entities and relationships, and see those attributed to their sources in the webpage. This prediction is nothing new and has been consistently foretold by ai and information retrieval augurs for decades. The difference is that I see this moving beyond the realm of expert systems in large corporations and into the realm of being an API generally accessible to anyone.

I think it's very possible that, as you say, a current incumbent search engine may not consider the provision of such a service its job. Which makes things very interesting for potential new entrants in the semantic search market.

Other business types could surely supply these entities and relationships however who better to be involved in the supply chain than search engines because they hold a hose containing nearly all the world's information, and in the other hand a market hungry for all the world's information.

But this type of service could be niche because most humans are not going to care about getting ontology data in their search results. So even though parsing out entities and relationships could improve the usefulness and accessibility of information there might not be enough universal demand for someone like Google to really care about it.

With these caveats, I think the trend from information, to knowledge, is a very natural and already apparent progression for search engines. The type of API described here is possible today given the right incentives.

The question could be "find the page that has the following words".

Thanks for summary. My 2 cents: I think Amazon search ads are more profitable than Google's. When user searches for product on Amazon, this user is only one click away from buying.

Edit: There should be a differentiator for new search engine, there is more room (problems to solve) in Q&A search and Discovery.

I was very young when Google became popular. Why did it become so popular? Was it just the technological advance of Google-Matrix (Page Rank) and Map-Reduce?

To understand that, you'd need to know just how bad search was before Google. Search wasn't even a "thing" on its own, it was usually merely a feature of large portals (AltaVista, AOL) which were large, slow, cumbersome and gave pretty irrelevant search results. At that time, search relied mostly on the meta tags of the pages themselves - which was easy to game and lead to a lot of spam in the search results back then already.

Google's clean interface (basically just a search box) was so revolutionary that it's hard to comprehend today - I still remember it took me a few days to even take it serious, it was so thin and lightweight. PageRank was a gigantic step forward in SERP relevancy and even though it was gamed eventually by link farms and paid backlinks, it took a lot of time and effort and the first few years saw few spammers.

> Was it just the technological advance of Google-Matrix (Page Rank) and Map-Reduce?

Well, yes, it was simply that. It was the difference between a search engine that works and one that doesn't. For example, most sites back then had tons of text with the same color of the background, with words that would get you an higher rank on the previous generation of search engines. Also, the search engines had very heavy pages, mostly with news, and they were known as "portals". Google was a blast in comparison.

The dirt-simple UI and fast loading times helped.

Before Google, the dominant search engine was AltaVista, which was awful. The UI was so crowded that you had to search for the search bar, and it was so full of graphics that if you were on dialup, loading took forever unless you turned images off.

And, yeah, PageRank helped. It wasn't uncommon for you to find what you were looking for on page 3 or 4. Google had what you were looking for on page 1, always. In fact, pre-Google search was so bad that if you were looking for anything obscure, you had to use multiple search engines. When I'd get serious about finding something, I'd fire up AltaVista, Infoseek (Disney/ABC/Go.com), Hotbot, WebCrawler (AOL), etc. in separate browser windows and search on all of them. And for the lazy, there was MetaCrawler, which automated this; I didn't use it often, mostly because I kept forgetting about it, but when I remembered to use it, it was a godsend.

My mother introduced me to Google back in the day. She and several of her friends agreed that it was amazing because it was fast, simple, and got out of your way. And back then, that's exactly what it was:


At the time, their main competitors were already expanding into other markets, and Google was refreshing for doing one thing and doing it well, compared to its competition. It didn't clutter, it didn't try to distract with other services that they offered. It simply gave you the results you queried as a plain list of links, and got out of your way so you could browse those links at your leisure.

The day that I abandoned Altavista for Google was the day that several technical queries brought up porn -- AltaVista had been pwned by porn-pushing keyword stuffers.

At that point Google had a smaller index, which hurt their results, but I thought that was better than getting porn results while at work.

> Yahoo had a search engine from 1995 to 2009. Yahoo is now a Bing reseller.

If so, why is Yahoo Slurp still showing up in my apache logs?


It's true that Yahoo uses bing, and I don't think anyone has said in public what Yahoo is up to with Slurp.

Cuil had a really aggressive spider but never delivered anything useful ... I ended up blocking them to clean up logs.

I miss DuckDuckGo in your line-up.

The short answer is YES, but the long answer is, if you're thinking about building a startup, this should NEVER be the question you ask.

All successful companies that came out of nowhere and disrupted already stable industry never started out thinking "How do i build another X", "How do I disrupt X"? They all built something they thought was needed by the world and it went onto somehow "disrupt X".

So if you're starting out thinking "I want to build a search engine if there's room for another.", that will never work because you don't even know what you're solving, you will be frantically searching for the question throughout your "startup" life.

Thank you so much for writing this @cocktailpeanuts, I'm surprised there is so much discussion around the 'yes/no' how to make the decision.

Ask better questions, and you'll get better answers.

So, the question shouldn't be 'Is there room for another search engine', but perhaps 'Would a better search engine ...', 'what comes after search engines'.

I think even looking at the 'flaws' of google aren't really going to give you a game changer. You'll find a whole that Google can easily fill.

good for you, go do it then!

Companies like Algolia which provide a site specific search engine has been doing really well especially with the speed and relevancy where Google currently is not concentrating on.


Algolia is a game-changer. They made it so incredibly simple to add search to your website. I'm not talking about their widgets, I'm talking about their server-side integrations and their javascript client-side lib.

It's like magic. See it in action here: https://stackshare.io/match

It's bizarre that algolia is a thing. Elasticsearch is good and free.

Yes, but Elasticsearch requires a dedicated server, you may have to use a river, you may have to shard, etc. Some people (most people?) don't want to think about shards & rivers and just want something to work. Algolia seems to do that.

I know what a shard is, but what's a river? I've never heard anyone use that term in a technical context before.

I used ES a long time ago and rivers have since been deprecated: https://www.elastic.co/blog/deprecating-rivers

A river was just a mapping from a database to ES. For example, you could search CouchDB with ES with the proper river set up.

In Elasticsearch a "river" is the description for the pushing of data from your primary store (A database more than likely) into the elasticsearch index.

ES actually started with Rivers being part of Elastic but since deprecated it. You will still find people talking about "rivers" though as a description for however they are updating the index.

there are lots of things like this. it's bizarre that i pay for email when there are good and free email services (and my privacy/security needs are pretty damn simple.) people don't mind paying for things they like, or for something that provides just a little detail that others do not.

Algolia provides things that elasticsearch do not, such as being hosted, being extremely simple to setup and having a default web interface for anyone to use.

There are plenty of companies providing hosted ElasticSearch these days, starting with Elastic.co, the corporate sponsors of the open-source project. Amazon hosted ES is also a lot better than it was just a few months ago; they were stuck on 1.5 for the longest time, but now they've already got support for 5.1.1.

The database is just a building block for the search. You still have to implement a lot of logic and smartness on top.

algolia does that out of the box.

It's not that hard though. I'm a junior dev and i managed to implement ES ( AWS hosted of course) for our site search on my on own. If i can do it, pretty sure majority of people can do it.

nothing like this is hard. being Junior or not has nothing to do with it. it's a question of investing time/money

I'll give you free, but I'm not sure about good. For the www site I run, we have run a one node instance on each machine; they regularly decide to go off into the weeds and not return results, and in the meantime use ridiculous amounts of ram.

I wish Yahoo would have open sourced some of it's internal search tools. :( They weren't always simple to use, but they were stable.

Hi! Algolia employee here, thanks for the mention, did you encounter any issue using our widget based libraries or where you only willing to use the raw JavaScript client?


It's good in results, but too aggressive in updating live results -- the effect is laggy on systems I use, e.g., when searching HN.

That's the fault of the implementation client-side. You shouldn't fire off an ajax request on every key stroke but leave a certain timeout zone after keys are pressed to detect "yeah this guy has finish typing, let's search now".

I just got this hooked up to firebase last week, amazing!

It's a crowded market, even though Google has basically withdrawn from it.

This is more a feature than a different search engine, but I so so so wish I could de-prioritize blogspam. 300 - 1000 word text-heavy writeups of a couple of facts where a few bullet points, an image, a graph, a map, or a data table would be much much better. Google has been SEO'd to death because of its block-of-text lowest common denominator favoring.

never heard the term "blogspam" but that is a very good description of it. When I click a link and then notice it is just a short blog post from some company trying to "content market", I leave the page.

I write blogspam. It's definitely a problem but I don't know what Google is going to do about it.

Basically google can't trust backlinks anymore because people game them and competitors try to destroy each other's sites by buying scummy links to their stuff.

So they mainly attempt to measure quality in a vacuum. This is using their machine learning stuff to look at the quality, confidence, and reading level of the writing style.

They do the same quality checks for the site. Checking for EV certs, clean markup, real email volume through Gmail, reputable DNS provider, physical address in G maps. A lot of their hundreds of quality metrics don't measure the site itself, but use Google's pervasive data trove from their other services. Most scammers don't bother doing any of this right.

The problem becomes people like me. I setup sites with all measures of quality for legitimate businesses. Have articles written by good writers with knowledge in the subject. Sounds great right?

The problem is that these articles are still done for money and quite biased sometimes. Google is slowly running into a need for a strong AI because all measures of quality can be emulated if enough money is on the line. It doesn't matter if something seems truthful in every way except the fact that it isn't.

This is the same reason "fake news" is invading google and Facebook. Smart spammers have upped their game to the point that it's impossible to know what's real anymore.

Need a wikipedia article changed? Good reviews on Yelp? A nice piece on a popular tech website? All of this can be openly bought with zero consequences.

> pervasive data trove from other services

I would believe that, although there has to be several alternate ways to measure a site popularity, Google Analytics are a huge part of a site ranking. Do users stay long on your website? Do they come back to the search results after that? Do they click through to "Pricing", then back to "Features"? If so, that must be the right answer.

Alternate example: I would bet that shops where Google Maps geolocalises a lot of customers have a higher-ranked website than similar websites where their physical venues are empty.

Even creepier... If Google knows all your physical locations they know how many employees you have at work on a given day

Yes. In today's search engines, I cannot give you a blacklist and say filter out these results. If I am looking for tutorials, I cannot say no video results. If I am looking for market research, I cannot filter out news websites from the links. For personalization, I cannot give google any suggestions on what I absolutely do not want to be included etc.

blekko had these features, and almost no one used them. The google guy who teaches advanced Google searching says that almost no one uses Google's advanced search, either. So if this is a viable niche, you'll have to figure out how to find these users...

Who cares if a small subset of people use them? It doesn't make them worthless.

Oh, I think it has value. I was just reporting that I tried it at blekko, and think I completely failed. Maybe you'll be able to do better.

Very interesting! Is the main problem really finding the users or could it be developing an interface that makes it easy enough for users of varying technical abilities to occasionally use?

Why do you feel you failed? Just economics?

Since Greg was CXO at blekko, I would say for whoever is making search engine, its important!

I think there is definitely space for niche search engines - there are tons of them already, if you include things like the DPLA, octopart, iconfinder.com, Spotify or class-central.com.

Google is focused on getting you to a relevant result quickly, but having a search engine that helps you discover new things is really useful. If you focus on a niche, you can also make use of a lot of metadata Google doesn't retain.

I'm exploring this on a small scale with https://www.findlectures.com. Having the date a video was made gives it a 'street view for history' feel, and lets me rank historical content differently from conferences (where recency is more important).

Building a graph of talks, conferences / speakers / books / publishers could be the building blocks for a pagerank implementation, or to build a different type of book search. Alternately, I think it would be interesting if search engines let you do LSA style queries, like "Brian Goetz" - "Java" + "Python", to help discover speakers.

I think that definitely. Google lacks in quite a few areas, I think that

a.) storing more data about the sites (and doing something interesting with said data)

b.) improve the UI/UX for power users. The best part is that I can imagine that there would be quite a few people who would pay actual money for being able to use a better search engine. Note that the Bloomberg terminal, is, among others, a search engine. For example, you could make the link graph explicit, you would immediately see what sites link to what sites.

E.g. symbol search really leaves something to be desired on google. I also wish I could use regular expressions. I get it, they are expensive, but like even a little "expressiveness" goes far.

c.) i would pay A LOT for a good search engine for code.

What kind of code search are you thinking of? In my experience, code search could be useful when one works with a big and unfamiliar code base, but even then good architecture documentation and a good IDE would help more. And when one really needs string search, `git grep` is usually fast enough (for me on a 5GB code base).

It's not just current code search but like "auto complete". What if my IDE could recognize that I'm writing a bad implementation of binary search and could suggest a better one from the internet. What if I had an SQL like language that I could use to query and transform things. E.g. find all instances where a method starting with "set" that take an Int as an argument and do with these. I've been noticing that quite a bit of writing software would be a lot simpler if I could do this.

http://symbolhound.com/ is great for searching symbols or code.

Regarding C any reason searchcode.com is not meeting your needs? I would be happy to add it in. You can download your own version as well.

Ben Boyter runs searchcode.com, you should check it out - he's on hn as boyter too

Here's what I want in a search engine:

Charge me $15-25 dollars per year

Let me decide what demographic information I wish to share- make it easy for me to control and help me protect my information. Because you are charging me money you can afford it and I trust you.

Give me two search options: one, I'm only seeking information. two, I'm looking to buy. Do this for me as an advertiser: help me qualify the clicks I'm paying for

Perhaps allow me to pay per 1000 impressions (CPM) instead of per click.

By the way, I would also subscribe to a facebook that did this.

Google is making 50 USD per user, potentially an order of magnitude more from a US user from ads, so I am quite certain that your offer of $25 USD is a low baller :)

Yea but Google also makes more money than they know what to do with. Surely you could be sustainable at a lower price?

This is kind of their problem. I am not worth $500 as an advertising prospect on Google. If you come in with a way for advertisers to pre-qualify their prospects, they'll want to do business with you. We know we are throwing our money away with Google but they are the only game in town.

However, I am in a thread now with with matt4077 who says google is his best online lead generator, so there are obviously things to learn.

For a general search engine, no, there isn't.

The upfront capital investment, in terms of the data center capacity necessary to make a modern scraping and search infrastructure, is immense. And since the ad-word business model does not scale linearly with market share – e.g. the market leader collects a disproportionate share of the available profit – you will be losing additional money for a long time.

Since the market leader is good enough that it isn't possible to disrupt the market purely through result quality (as Google did), you will need to rely on bigger and more effective marketing spend. Not only will you have to outspend and outperform Google, but also Microsoft/Bing, who have tried to do the same thing for years, with only limited success.

Even if you have the funding necessary to do all of this, then you would be better off either buying shares in an existing search engine company, or starting a business in a different market, one with lower upfront costs and less dominant incumbents.

> the ad-word business model does not scale linearly with market share – e.g. the market leader collects a disproportionate share of the available profit

Why is this so?

Because the market leader has all the publishers.

If you have space for a single banner ad, you're going to use the network with the highest payout, which is the one with the most competitive bidders.

Essentially: network effects, efficiencies of scale, and costs of managing multiple small relationships (from the advertising buyer's PoV).

Google's real monopoly at this point is ad-side.

Along with Facebook, yes. About 65% of online ad spend is to Google and Facebook.[1] The remainder is to a bunch of little guys, mostly selling to bottom-feeder sites.

Except on mobile. Banner ads on mobile are growing.

[1] http://fortune.com/2017/01/04/google-facebook-ad-industry/

because it is a waste of time optimizing/setting up campaigns for such a miniscule return.

Although I also use Google search and Microsoft Bing, probably more than 80% of my search is done with DuckDuckGo.

The fact that a lot of us DuckDuckGo, and I hope they are profitable, is evidence that there is room for other search engines.

I would like to find a good substitute for Facebook, but the fact that so many people I know use it, that I always need to check Facebook two or three times a week to not miss out on stuff since many friends and family don't use email anymore.

Attending the Decentralized Web Conference last year got me excited about using smaller and Decentralized services. Gnu Social is pretty good, but requires work to find interesting people to follow.

Instead of Facebook, just talk to your friends and family semi-often. If one of them has a baby or goes to Europe a lot of them will know and someone will mention it. On occasion you will hear about something three years after the event but that's still okay. It worked perfectly well for thousands of years and it still works today.

If by "search engine", you mean something similar to Google/Bing then probably not.

However, if we expand the concept of "search" to something beyond text on webpages and "engine" to something beyond a linear algebra pagerank problem that weighs url links, there's room for many more competitors.

Let's say we want to search for "best restaurant":

Method #1 might be searching millions of web pages, twitter posts, newspaper archives, etc where ngram such as "best restaurant" is mentioned. That's what Google/Bing engines already do.

Method #2 might rank restaurants by collecting crowd-sourced opinions. That's what Yelp & Tripadvisor does. (Although Google also piggybacks on their data and lists yelp pages in SRP.)

Method #3 might be a company like Visa/Mastercard analyzing their billions of transactions[1] and based on actual spending amounts & frequency of a billion cardholders, they can also provide their own calculation of a "best restaurant". (I know that Visa/MC already offer limited marketing data to some entities but they don't surface that data to every day web surfers.)

The idea is that there's plenty of room for more imaginative scenarios of #2 & #3. The common theme is that Google doesn't have the data (e.g. credit-card transactions) and therefore, the new "search engines" can give fresh answers that Google algorithms can't provide. To try and boil it down to a simple question: "What interesting answers can a new engine provide that _can't_ be extracted from the text of webpages?"

Btw, I ran across some posts from a Microsoft employee (but not a Bing team member) stating his opinions on building competing search engines. https://news.ycombinator.com/item?id=7011472

[1] http://marketrealist.com/2016/10/why-visas-processing-and-in...

Method #3 would be for calculating popular restaurants, not best restaurants. For example, I bet McDonald's would rate pretty highly with that approach, but it's very nearly as far away as you can get from the idea of a "best restaurant".

Popularity is but one input for "best" -- depending on one's definition of best. There's enough metadata (price, location, cuisine, etc) to correlate with actual payment data to filter out fast food like McDonalds. It doesn't have to be a totally naive statistical approach that gives "dumb" answers.

The idea is that what people actually pay for is a different set of vector inputs compared to what people submit reviews for (Yelp/TA) or what people link to (blog with links to favorite city restaurants.) Google's search index is over 100 petabytes but even that gigantic database is missing lots of data that other entities can collect and convert into uncontested search results.

BuzzSumo.com is a good example as well.

I definitely think there is room for search improvement. I believe the next area of search is contextual search (https://en.wikipedia.org/wiki/Contextual_searching). If you can combine what the user is looking for to actual website content then I think you might be onto something. The trick is finding that link function. Traditionally Google has relied on keywords and ranking by links. There could be other ways to find that user/content relationship.

I'd think there would have to be something fundamentally different. It would have to be hardly recognizable as a "search engine."

There's too many clones on the market right now. Some with good purpose, like DuckDuckGo which can be simplified to "Google but without privacy invasion." Others like Bing could be just "Google but clunkier." (my opinion)

If you've got an idea on your hands that can't be described as "Google but..." then there's definitely room for another.

Take a look at how DuckDuckGo built up their business around privacy first, and leveraging Google when appropriate.

Some ideas:

1) a good sitewide search engine. Google's offer is laughable, and Algolia is too developer-centric (requires pushing the data through API). What I'd want is a single input field where I can put my site's main page URL — and get a working search in a few minutes.

2) subscriptions / monitoring. I want to monitor some event or topic, and I want the updates to be delivered to e.g. my WhatsApp/Telegram/Slack/whatever, with smart filtering, refining etc (in lieu of frantically Googling / redditing / refreshing Twitter feed)

3) context-preserving interactive search, that can ask me questions/ refine results.

4) Timeline search interface for news / events / company history etc. I want to be able to put the name of a person, or company, or TV series, and get a comprehensive timeline view of all things happened there.

I have a lot more ideas, and zero free time :(

1) Swiftype (or any of Algolia's integrations - WordPress, ZenDesk, Shopify, Magento, ...) - here you talk about Algolia's developer focus, but the rest of your arguments are about the consumer experience. All search engines are built by developers/engineers, and Algolia delivers end-user experience on Twitch, Periscope, Medium, and even HackerNews (hn.algolia.com), which are exactly what you're looking for. You can actually use Algolia to create all the search engine experience ideas you have, and it takes less time (which you don't have)

2) Mention (http://mention.net)

3) Jelly (didn't work. Maybe there's a reason?)

4) Google / Wikipedia.

Unless you can build something 10x better than what exists,

I think search engine functionality will sooner or later need to be incorporated into the core specifications for the internet, like DNS.

I mean, the modern idea of the internet is pretty much useless without a search engine, and we've been spoiled by the power we get through Google — the phrases on this very page get indexed within literally seconds; I just tried a literal search for a sentence from a 3-minute-old comment here — but it's really not a good idea for a single company to have so much authority.

This really isn't something to keep relying on a small handful of companies for, especially once we have interplanetary internet. :)

Yes. Google is too generic and that is great for the internet.

I would look forward to search engines that are topic specific. However, the blocker is having the information available in the first place, so I doubt if this will ever happen.

Absolutely - Giphy is a great example. There will be plenty of search engines that will grow to prominence around either a niche content type (gifs => Giphy) or a niche feature privacy => Duck Duck Go).

I feel like amazon is my second search engine. So yes there is room with category specific search engines. Reminds me of when people started making specific apps from craigslist sub-categories.

It is possible to compete with Google by offering what they used to have: simplicity and speed, and not screwing with results.

Google is beginning to show signs of accidental self-sabotage. Their AMP approach was so aggravating for me on mobile that I literally switched search engines to avoid it. And their insistence on scraping and summarizing things and trying to prevent you from even visiting other sites is slowly ruining even desktop searches. They are in danger of disruption.

Absolutely. I'm fairly confident that 20 years from now, we'll laugh at the notion that all Google could do was finding pages.

I don't know what will replace it. Chat bots could be one. Much better understanding of context. Or providing answers based on the knowledge that is spread on many separate web pages. or actually taking action (if you are searching, it's to do something, not to read a page).

But "finding a page" will sound really silly 20 years from now.

Finding pages is becoming more and more irrelevant as people less and less create and host their own websites like they used to when the web was young. Web page creation has mostly been centralized to things like Tumblr, Twitter, and major news organizations.

They're already working on that with the Assistant, though.

Yes and I predict Reddit will be the first challenger to finding web content. And looking beyond, I don't see Google as the way I find amazing AR/VR experiences.

DuckDuckGo is doing pretty well, and it has a lot of room to grow (so proves it can be done).

There is the problem of relevance - you must order your search results by relevance, Now you can have one global model of 'relevance' or you can have several models (i think the web is so big that one model is not good enough).

relevance can be relative to language/origin ; number of links away from a wikipedia article ; coolness (has it a link from a known twitter account/news aggregator) ; age group ; a link from HN frontpage or slashdot would make it 'nerdy', news - was it referenced by a news source etc. etc.

I think a differentiator would be to have a non intrusive and intuitive UI to select an available relevance model (instead of trying to profile the user based on his search history / browser history)

On the one hand the user profile is of great value for advertising, but on the other hand the explicit choice of relevance model can be used to match relevant adds.

Google obviously already operates such profiles, at a rate of one profile per user - with AI in the background.

Exposing options is rarely a good idea, as it only reaches a single-digit percentage of users.

And they say profiles are oh-so-relevant, but as far as I can tell, Google's main product (search ads) is still tied almost exclusively to keyword, region, and language.

> but as far as I can tell, Google's main product (search ads) is still tied almost exclusively to keyword, region, and language

Don't know: try to do the same Google search from different accounts. I think the results will be quite different...

Google has the best search results cause it has the most people using its service. Its models learn every time you click on a result. There's no way to take that on directly. What you need is to find an angle that Google can't easily follow, as with DuckDuckGo and privacy.

Are there areas where Google can't go?

Partly using human curation is an area that Google doesn't want to go into... that was blekko's sustained competitive advantage. Wasn't big enough for us, but maybe someone else could make a go of it.

Apparently Google's search results have plenty to improve:


> cause it has the most people using its service. Its models learn every time you click on a result.

That's why Google dropped the + operator. Lots of people using it hasn't always made it better.

EDIT: I mean, Matt Cutts has a blog post saying that Google dropped the + operator because most people didn't use it, and when they did use it they used it wrong.

I observe occasional referrals to our web-site from Qwant [1], a French search engine with focus on privacy. Although, I don't know if they're cash-positive or still burning investor's money.

[1] https://www.qwant.com/

Qwant is good. Fast and relevant. Don't know about their financials.

Build a local search engine for a very focused niche.

I think there is room for a horizontal search engine, by making it mobile-first. Even with Siri style conversation agents, mobile search still sucks real bad. If you design bottom-up for a mobile form factor you could have a winner.

Google's market share is ~64%, while Bing is around ~22%.

Probably not great for consumers that the #2 offering is almost 1/3rd as popular as the leader.

Which says two things...

- Yes, there's room for a better number 2.

- But, if the best Microsoft can do is 1/3rd as popular, how well would a new entrant fare?

It seems like you would need some new feature that makes you significantly better than Google to stand a chance.

Also, the barrier to entry here is enormous. The spend to be at least as large an index, and as fresh an index, and as relevant results as...Google, is big.

If you keep the search paradigm of entering text and returning 10 links, it's probably not likely to succeed. But come up with a new pardigm and you can definitely shake things up.

This ^. I would also posit that coming up with a completely new paradigm for a mobile form-factor (cf. the mini-me approach of current mobile search), has lots of potential.

A small additional 'Yes' in the pile. I am currently searching for a new search as google mangles every querry. I am white hot with anger at google after every 3rd search.

Yes, you just have to figure out how to get a large enough percentage of the market to switch to you so you can make a profit.

That's a much harder problem to solve today then when Google trounced AltaVista in 2000. Now search engines are tightly integrated into browsers.

One hint: I switched to Google when they released a browser toolbar. I even remember deciding to switch to whoever released a browser toolbar. What's today's equivalent of a browser toolbar?

How about some kind of machine learning algorithm that is regularly trained on user feedback and ratings of search results? The system has one mode that starts out feeding you approximate matches for your search criteria, ordered pseudo-randomly, and has a secondary mode of viewing results based solely on user feedback to order search results? Yeah, it probably wouldn't work but it would be fun to see what it produced.

Very specific and concentrated search and discovery sites eg., producthunt, zomato, yelp, quora (if they execute really well) will out do google as I think, leaving only discovery of these sites (and their content, like for almost every google search I end up going to SO or Quora) and contextual information as a problem for them to address.

I think I wiki model would be awesome. Of course they'd need a smart way to weed out seo and other results gaming. But somehow Wikipedia does it.

Remember wikia search?

You can always test the theory and see who likes your search engine with a browser extension that over-rides google as a search provider https://developer.chrome.com/extensions/settings_override#se...

I think there is and it's glaringly obvious. "If you don't pay for a service, you are the merchandise." With Google that is obviously true. If you can build an engine where you can demonstrate that you are not sniffing on the user, you will be able to charge a user fee, and a lot of people will gladly pay for it.

I dream about a real time search engine. Also, there's room for improvement in personalizing and curating content.

I think so, too. A week ago, the biggest approach in curating web content - dmoz - closed its doors. What was a good idea in the 90s (title and short description) is obviously not sufficient in a modern web. And it's far from real time.

I try to build a curation platform with https://curlz.org - just at a planning stage at the moment, but did raise interest in some dmoz admins and editors.

I don't see any problem here.

I've been using startpage for the last 5 years and I'm not looking back. I woudn't have any problem using any other search engine, nowdays any search engine works. The 3 or 4 times that I googled these years I found it pretty weird.

tl;dr: There's plenty of room, just not enough gray matter. :)

But...Startpage uses other search engines, primarily Google.

Truuuue. :)

But my point is my searches usually end up in:

wikipedia, blogpost, *overflow, twitter, reddit, papers ...

So making a search engine for only those sites, could be a usable search engine indeed and I would use it if I had to. :)

There is room for vertical search engine. E.g., searching for research papers with Google is not very impressive.

Absolutely, but search alone won't cut it no matter how smart it is. For profitability you need a whole ecosystem where ads is a major revenue source.

DuckDuckGo is a fine search engine and I believe they'd benefit a lot from services like ads, email, blogs, docs, shops, apps, social, etc.

If anyone is looking for an alternative search engine then please go for https://www.ecosia.org/ who plant trees with their ad revenue.

No. It also doesn't help that the whole "independent websites" scene is disappearing: there's less web surfing on the phones, and also Facebook and other platforms swallowed a lot of small sites.

Build a search interface on top of reddit. Make it way better than reddit's search engine. Produce an index on each comment that takes credibility into account on top of number of upvotes.

What's such important that Reddit provides?

And doesn't Google return better search results for Reddit already?

i think most of the social media content created recently is stored inside mobile apps and and for the most part is not available to be indexed by search engines. apple and google are in a unique position with access to all of that data. i think just like facebook does push its users to make more and more content public apple and google could put in similar efforts and if the right balance is found there could be a new search engine for all content created and stored inside mobile apps

If it can build something significantly better, or at least differentiated enough, and it won't be copied - i think you could find your niche or even large scale success.

I already use Bing full time. Inline searches are getting better but all basic Google functionality is basically there. Fuck Google.

Hell yeah, I exclusively use DDG and it kicks ass

Build me a search engine that only returns results which use https.

Nope, but search is 20thC stuff... when do we live again?

80% of Google's ad revenue is search advertising, and ads overall are almost all of Google's revenue.

This may be the 21st century, but...

I get your point - but to me it's a bit like "is there room for another voice operator" in 1983... we could artificially make them, then watch as they consolidate and then note that the real action is in something else which it would have been good to think about first.

I beleive that there is a place in the world for a new search engine. People say that Google is good enough, but I find it hard to use Google to find the things that I search for. Here is how I would upon Google:

I often find that I search for "vegan pancake recipe" and end up at a page with lots of images and it is very hard for me to find the ingredients list. Google does a poor job here. They should give preference to simpler sites where it is easier for me to find the information I'm looking for. Instead, they seem to actually give preference to complex sites. If their job is to help me search for the information, then they shouldn't give links to haystacks. They have tried to improve upon this with their answer feature where they quote websites. This is, IMO, the wrong way to do things.

Instead, the search engine should be a desktop application which is more pervasive than a website can be. It needs to run natively, and not be cloud based, both for privacy, performance, and for the ability to integrate well with the system. When I search for "vegan pancake recipies", if the search engine is going to give me a result which contains 3-5 pages of text and images before the ingredients list, it should automatically scroll the web browser window down to the actual recipe.

This desktop application should also build a context profile based on what I am doing on my computer. This context profile shouldn't be uploaded to the internet, but it is still usefull. For example, I should be able to select a string in my terminal and press the search icon in the sytem tray. This should bring up a stack exchange question containing the exact text of the string I selected.

I should also be able to select a set of websites which I want to use as my search "domain". I might give my search domain as "the documentation to Python3, the Docker API reference, and stack exchange". This whould make it so that those "feeling lucky" links would work much better.

The search engine should also present image results which are NOT WATERMARKED before ones which are!

I should be able to write a markup for things I need to search for, and then enter a "search engine research wizard". The makup would look like so:

"We had a great time at [Park on that hill in prague???] park. It was so sunny! The temperature was [Prague temperature on 27th of march ???] which is [Average Prague temperatures in March ???] for this time of year."

The search engine would then, when shown this text, would allow you to right click on the bracketed areas, search for the text in them, and then, by selecting parts of wikipedia articles, fill in the blanks.

The search engine should use accessibility APIs to record the text of the windows that I have open. I should then be able to use the search engine as a kind of memory store which I can search. If I want to know what that awsome new tiling window manager written in Rust was called, I should be able to search full text of my browsing history and open up the previous HN page where the tiling window manger was presented.

I have a theory that Google was terrified of this very thing happening a few years back. Kinda out there but hear me out.

Based on what I've heard from leaks and various news articles, the leadership of the search division was once adamantly against using neural networks for search.

Things took a sudden turn maybe 5 years ago. Something caused google to do a complete 180 and vastly increase their investment in AI research. They saw something that scared them.

I think it was their unexpected success using AI for machine translation. There was much PR about it at the time, and I think it really got the gears turning at Google HQ. You see, the same language processing needed for machine translation has obvious parallels to search.

The more curious employees began applying word vectors used for translation to search. After all, most of it had been trained on index data from multi lingual websites anyways. They found that, horrifically, rather simple neural networks sometimes outperformed the the search algorithms google had spent billions on.

When this reached upper management it set off a quiet panic. Google, once seen as invincible, could have been beaten by a start-up using effective ML techniques. Computer power calculations showed that this could have been done since a few years after the debut of CUDA, a window of vulnerability of maybe 7 years.

The timeframe of around 2005-2010 coincided with Google spinning off a bunch of moonshot projects and doubling down on their core businesses. Coincidence? Maybe, but I don't think so. I wish a Xoogler or two would come out of the woodwork and tell me if I'm crazy or not.

Anyways, Google usually has a 5-7 year lag time when they release details of their tech to the public. This dates Tensorflow and their heavy AI work to around 2010. The window where somebody could have beat them easily with ML was probably 2005-2010

Google beat the competition not so much that it has a better search engine but in that it has a much better ad platform and brings in much more revenue per search than Yahoo, Bing, etc.

I would address the money issue first before thinking about how to make a better search for some market.

This isn't true.

Google existed and grew (rapidly) for quite a while before they launched Adwords, which they introduced in 2000. Their growth in '99 was pretty meteoric (prompting a $25M investment from KP and Sequoia) before it rolled out PPC monetization. The Adwords model wasn't new at all-- Goto.com was the first search engine to bet on that model.

Google won because it was a massively better search engine... Not just 10% better-- it was "holy crap" better on a mess of fronts (notably: serving up what you were looking for).

I think that's incorrect, Google worked mostly because it was a better search engine (crawling the web to find major nodes was a brilliant idea), and because it had a clean design.

I disagree with you. Google won because it had and still has a better search and less clutter on the search page.

With a better search experience users came and advertising became a major money maker.

Google beat the other search engines in 1998 because of PageRank and a clean UI. Yes their monetization platform has allowed them to improve their product and dominate the market, but that isn't how they became the search leader initially.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact