
One company's plan to build a search engine Google can't beat - prostoalex
https://www.protocol.com/neeva-search
======
marmshallow
Lately I find myself inserting "site:reddit.com" at the end of my Google
searches. Most of the stuff that shows up by default is pure garbage that is
riddled with ads and doesn't answer my question quickly.

~~~
spideymans
Anecdote 1: I was Googling a lot of information on Angular, back when I was
first introduced to the framework a year or two ago. I'd say that well over
half of the content surfaced by Google was SEO spam: marketers masquerading as
tutorials, with the sole intent of upselling me on some shitty Angular plug-
in. A lot of them are clever about it too; they don't upsell you until you've
invested a decent amount of time reading their "tutorials". The SEO spam only
went away as I began entering more granular search queries, as my familiarity
with the framework improved.

Anecdote 2: I have a close friend that is a high school teacher. She's not all
that tech savvy, so I help her out sometimes. Google is damn near useless when
it comes to helping her develop her courses. Not exaggerating at all... maybe
60, 70... 80 percent of the results for educational queries are SEO spam.
Every result is essentially, "want to learn how to write a monologue? Click
here to pay $20/month for the privilege".

It's gotten so bad that I jokingly tell her that it would be more efficient to
just walk down to the library and take out a book on whatever topic she's
querying. But, you know, they do say that every joke has an element of truth
to it...

~~~
II2II
Searching for teaching materials or activities for children is so bad that
developing the materials yourself is usually far less work!

I doubt the declining quality of search results is a product of Google's own
advertising business, as the article would have us believe, and is mostly a
product of third-party SEO spam. This outcome also wholly predictable. Part of
the appeal of Google to early adopters was the lack of the SEO spam that
destroyed the utility of earlier search engines. I recall conversations at the
time when Google was so new they eschewed graphical ads that foresaw this
outcome, though our predictions may have been a bit on the pessimistic side
(i.e. Google held up against SEO longer than we thought it would).

~~~
fock
Given that we have AGI around the corner _cough_ , building a model which
discriminates SEO-spam from original content might be viable, don't you think
so? Or for every new domain in the index you check whether it's legit or not
by human intervention, then repeat each year and build a blacklist, similar to
how blacklisting IPs works for Email (how do these sites even get listed?
There should be networks apparent!). One could also put a button: "report
abuse" and make a process to unblacklist legit sites (the same process for
email works very well with most providers... except Google.)

My theory, why this is not happening: I guess that most of these SEO-spam
sites are actually including Google-funneled ads, so this means there's
something wrong there too...

~~~
yetihehe
Then you have hundreds of SEO companies putting their own AGI to beat google
one.

~~~
TheRealSteel
I mean, can we really call it an intelligence if it wants to work in
marketing?

~~~
b4ke
Obviously, because it has figured out the highest return for the least
effort.... Get that "agi" a contract, please.

------
qqii
> Rather than try to build a search infrastructure from scratch, Neeva instead
> opted to use Bing's search API for its basic results.

So it's another UI over bing like duckduckgo? I'm not too optimistic, at the
moment there are fundimental issues with how search engines interpret text and
rank results.

~~~
mhaberl
DDG is _not_ a UI over Bing

[https://help.duckduckgo.com/results/sources/?redir=1](https://help.duckduckgo.com/results/sources/?redir=1)

~~~
kinkrtyavimoodh
Unless they are maintaining their whole global index, it IS a Ui over Bing.
Don't be fooled by their 400+ sources, that stuff just affects things like the
answer box

~~~
leereeves
Comparing the results, in a search for "Neeva", the only results they have in
common on the first page are exact name matches and one article from the NY
Times:

DDG:

neeva.co, neeva.co/blog, moneycontrol.com, indiatimes.com, nytimes.com,
neeva.tech, neevagroup.com, babycenter.com

[https://duckduckgo.com/?t=ffab&q=neeva&atb=v63-1&ia=web](https://duckduckgo.com/?t=ffab&q=neeva&atb=v63-1&ia=web)

Bing:

neeva.co, nytimes.com, androidauthority.com, medium.com, oflox.com,
gomoguides.com, neeva.tech, neevagroup.com

[https://www.bing.com/search?q=neeva](https://www.bing.com/search?q=neeva)

(I don't use Bing, and DDG isn't supposed to track me, so neither should be
personalized results.)

~~~
norswap
Not sure why this is getting downvoted. Even if DDG is built mostly on top of
Bing, the discrepancy would be interesting to explain.

~~~
woko
In my opinion, it is due to location. Bing returns 8 results on the first
page, and DDG returns 10 results. At the top of the Bing page, I can see that
Bing already takes my location into account (and it is precise, it is my
city), so there is no box to tick. Once I tick the "location" box on DDG (it
is only the country), I get 7 identical results out of 8 on the first page!
The order is the same for results 1-4, the 5th result is different, the 6th
result is identical, and results 7-8 are identical but their positions are
swapped.

~~~
leereeves
That's interesting. By location, do you mean the box that says "All Regions"?
I get the same results on DDG (different from Bing) whether I set that to "All
Regions" or "US (English)".

~~~
woko
For me, it is a setting to tick, which mentions the country:

[https://i.imgur.com/fML3x0l.png](https://i.imgur.com/fML3x0l.png) (in French)

Once it is set to my country (France), then I get almost identical results.

I don't know how to deactivate this feature on Bing, so I could not compare
Bing and DDG in the case where the "location" setting would be ticked off on
both sites.

------
moralestapia
How to build a good search engine:

1\. Actually return the results matching the words that people typed in your
search box. The more they match, the more they go up.

More and more, it has become extremely disappointing what you get back from
Google (and others as well). Verbatim search (where you surround exact terms
in quotes), seems to have vanished in the last year or so. More often than not
I have clicked on a result, only to find out that my particular query is
nowhere to be found on the site, but some "related stuff" does.

~~~
wutbrodo
I share your frustration, but the obvious rejoinder is that most people don't
agree with you. You can certainly target a relatively niche group of power-
searchers, but most Google users probably think that the search engine
guessing their intent is part of what makes it "good".

~~~
HaloZero
Ha. My parents actually do google.com just to type in something like
facebook.com which searches and they find facebook.

I doubt getting them to switch would be very easy unless you did some deal
with browsers. But they like most others else uses Chrome, doubt Google would
do that.

~~~
Stratoscope
It's not just your parents.

I was sitting down with a very experienced C++ engineer a few months ago to
work on a problem. There was something we needed to do a web search on.

He opened Chrome, clicked in the address bar (which already had the keyboard
focus, but never mind that), and typed "google". This did a Google search for
Google, and the first search result was www.google.com. Then he clicked
www.google.com which took him to the Google home page, and there he typed in
the search terms.

Yes, he Googled Google to do a Google search.

~~~
as1mov
Perhaps it was muscle memory? I could see that happening to me if I had
recently installed the browser and it won't auto-complete the searches based
on past history. Also the fact that my IQ drops by 50 points when someone is
watching my screen and I look like an idiot.

~~~
wutbrodo
> Also the fact that my IQ drops by 50 points when someone is watching my
> screen and I look like an idiot.

I've heard this a lot, and have basically trained myself by rote not to hover
over people's computers when we're looking at something together, but I still
can't say I understand it. I don't know if it's an inarticulable phenomenon,
but do you have some sense of what drives this and/or what it feels like?

~~~
jasonv
Surely we're all seeing this on Zoom calls these days?

People sharing their desktop/app/window, then breaking out of that sharing
selection to bring something else up, while they're talking.. and searching,
and things aren't working exactly as expected, so they go into rabbit-hole
mode, etc...

Everyone's got a million things going on in their brains, and an audience
changes things, and "presenting to an audience" is different than sharing with
an audience.

I switch between my Mac and a Windows machine, between Safari and Chrome,
between ctrl- and Command-, etc etc. Half the time things I'm connected to are
broken (VPNs, endpoints, services...) so I find myself half-stabbing my way
through little time windows during the day. And if I'm talking to someone,
while trying to do something I've done 1000x before, I'm probably covering my
bases by stabbing at the keyboard even more.

I do presentations a lot too, it's a totally different mode.

------
amanzi
I want to share a recent search I did through Google -- at work I wanted to
look up how to implement conditionals in Microsoft Forms (i.e. the next
question in the form is based on the previous answer). I searched for
"microsoft forms conditional" (without quotes) and the first search result on
Google was this page on how to "use branching in Microsoft Forms" \-
[https://support.microsoft.com/en-us/office/use-branching-
in-...](https://support.microsoft.com/en-us/office/use-branching-in-microsoft-
forms-16634fda-eddb-44da-856d-6a8213f0d8bb)

That page doesn't contain the word "conditional" at all - the word that
Microsoft uses is "branching" but Google deduced that it was the best result,
which was perfect. The same search in either Bing or DDG produces results that
all have the words "microsoft", "forms", and "conditional" in them and none of
them link to the page I mentioned above, which I consider to be the best
result.

Moral of the story? Search is hard, but Google does it better than everyone
else.

Also, I learned via this thread that DDG is just a wrapper around Bing, which
explains why the search results between DDG and Bing were near identical -
they even have the matching video suggestions.

~~~
popinman322
Just a comment:

Sounds like the results of a neural network: roughly approximating your intent
and searching around that intent, in continuous space, to find other viable
search terms and phrases. (This is one possible approach, given in broad
strokes.)

That's a massive barrier to entry. You need enough data and compute to train a
massive language model, more compute to run the model against all incoming
queries, and then even more compute to handle the extra search load
precipitated by use of the language model.

Not to mention the years of R&D that go into these models and their associated
tooling.

~~~
eitland
> That's a massive barrier to entry. You need enough data and compute to train
> a massive language model, more compute to run the model against all incoming
> queries, and then even more compute to handle the extra search load
> precipitated by use of the language model.

Luckily, most of the time you could improve my user experience by removing
that cr*p and give me my 2007-2009 Google back.

From there you would only need to allow users to make personal blacklists,
share personal blacklists (this was about the time when auto-generated content
started to become popular) and maybe also aggregate some popular blacklists
for a default blacklist and it would be better than anything we have seen
since.

(I remember having a txt-file with -spammydomain.com -anotherspammer.com etc
etc that I pasted in at the end of certain searches to take care of sites that
had either had

\- auto-generated content

\- or stuffed their pages with black/black or white on white keywords )

~~~
bananaface
Giving you 2007 Google might not work because people are using 2020 strategies
to game it. But I'm definitely skeptical that that's all there is to it.

~~~
eitland
I hear this a lot.

But in all honesty it is not the SEO scammers fault that Google serves me
pages that doesn't contain the words I searched for after I have chosen the
verbatim option.

It also isn't SEO scammers fault that when I search for Angular mat-table[0] I
get a number of pictures of tables with mats on. That is probably the result
of someone playing with some cool AI tools while othwrs are busy trying to
make more efficient ways to ignore customer feedback ;-)

We must manage to keep those two thoughts in our head simultaneously:

\- Black hat SEO have changed

\- Google has adapted to another audience and has ditched us power users
hoping we wouldn't notice.

[0]: screenshots of that and some other clear examples of Google and Amazon
testing out AI in production here:
[https://erik.itland.no/tag:aifails](https://erik.itland.no/tag:aifails)

~~~
bananaface
Do you know many black hat techniques? Around 2011-2013 was when Google
shifted from being extremely easy to game to very difficult. 2014 was really
the end of it. Have a look at some niche site blogs from the time - revenue
from new niche sites tanked from like $1.5k/month each to barely close to $100
(with a lot more work up front).

Anyway my point is if you rewound the clock to 2008, you'd have a _way_ bigger
problem than you might think.

~~~
eitland
Fine. But we must still must be able to separate between backend and frontend:
it should be possible to upgrade the anti-spam machinery without breaking

\- doublequotes

\- + (ok, they broke that deliberately around Google+)

\- the verbatim operator

All those should be able to work even if the crawler and processing techniques
are updated, right?

Also a heads up: I added some more details to my post above,I didn't think you
would answer so fast :-)

Edit: I only know the black hat methods that was well know 10 years ago like:

\- backlink farming from comment fields (we protected against it by applying
nofollow to all links in comments)

\- Google bombing (coordinated efforts to link to certain pages with
particular words in the link, trying to get Google to return a specific result
for an unrelated query. I think the canonical example was something like a
bunch of people making links with the text "a massive failure" that all
pointed to the White house website.

\- Link-for-link schemes

\- etc

------
nine_k
So, finally a paid search engine that does not have to rely on ads for
revenue.

So far it's based on Bing, which does. This makes it a bit a hard sell,
compared to an intelligent ad blocker.

The most important problem of search engines is SEO spam. Google themselves
sort of have a moral hazard to not be too stringent on SEO spam, because it
shows ads by Google, increasing Google's revenue.

OTOH I wonder if the subscription revenue is going to be sufficient to have
access to a reasonably good search index and enough processing power to
efficiently combat SEO spam while returning relevant results. This takes your
own data centers run frugally, because fees of something like AWS or Azure
will just be exorbitant for a global search engine and a global search index.

I wonder if companies aiming to provide alternative search engines will
cooperate on maintaining a common index, to distribute the massive costs of
doing that. They could even publicly sell access to it, at a point where
running a competing search engine won't be practical; e.g. researchers would
buy it.

~~~
ma2rten
I am actually wondering if search engines really need such a large index. The
vast majority of sites in Google's index are crap. If you were able to better
select which sites to index it might help search quality and improve
efficiency.

~~~
nine_k
How can you tell what is crap while crawling, is there enough time? How can
you tell what is crap for the next person, and what is not crap?

It's not an easy question.

~~~
nwienert
People! Let the users help. All the best content sources on the internet are
either experts or communities with voting / submissions.

~~~
sah2ed
> _People! Let the users help. All the best content sources on the internet
> are either experts or communities with voting / submissions._

There are over 1.7 billion websites [0], so the task of ranking content, the
way algorithmic search engines do it in a matter of milliseconds, is not as
easy as it sounds when you add humans into the mix. It would only end up the
way Mahalo did [1].

[0] [https://www.statista.com/chart/19058/how-many-websites-
are-t...](https://www.statista.com/chart/19058/how-many-websites-are-there/)

[1]
[https://en.wikipedia.org/wiki/Mahalo.com](https://en.wikipedia.org/wiki/Mahalo.com)

~~~
nwienert
False dichotomy, you can use algorithms in tandem. And cherry picking... I
remember Mahalo well, and one example of one version of it doesn’t prove
anything. Mahalo is far from how I’d structure it.

You can still automatically index but have users vote on the results. There
are 1.7 billion websites and 3 billion+ users, and you don’t need that many to
be active voters to help assist algorithms. Plus how many are at the top
anyway? I’d love to downvote a ton of google results even if it only used it
as a trainer for my own.

Plus, there are so many “super curation” sites like here and Reddit that
provide a big dataset curated by people automatically. Lean on them more.
Everyone knows “site: reddit.com” or “site:stackoverflow.com” _already_ give
you better results.

A simple upvote downvote on their results would let me downvote all the spam
SEO sites. It wouldn’t take many votes for them to start tuning it.

Stats are a good way to blind yourself. That algorithms scale doesn’t mean
people don’t improve them. Google’s problem is they are too cocky about
algorithms, but their algorithms fail compared the curated communities all
over the web already.

------
protomyth
Using Bing's API isn't really going to help with some of the problems.

One thing that would be really helpful, stop counting the words appearing on
links on the sidebar or other non-content part of the page as important for my
search. It's amazing how many searches go astray because someone has some
words on a sidebar that don't have anything to do with the content of the
page. You would think with all this ML someone would teach a search algorithm
to ignore it.

~~~
Twisell
Was filled with hope at the start of the article and it faded away pretty
quickly while reaching the part about Bing. Thereafter hope suddenly fell from
a cliff...

>Neeva's most unusual feature is its ability to also search users' personal
files. In a demo, Ramaswamy searched for tax documents and photos, all
surfaced within his search results or available in a Personal tab in the Neeva
interface.

Privacy focused my a$$!

------
tomxor
> it felt like a product being made worse by its business model.

Something even worse than this has has forced me to ddg out of necessity:
fucking CAPTCHA.

Which is essentially discrimination against anyone with a mobile internet
connection who blocks google's tracking.

I just can't use google search anymore, it's a special kind of torture, after
years of getting used to using google for everything and now getting this
thrown in my face every single fucking time - way to permanently train users
away from your search engine. It's time we had more competitors, google search
is a monopoly and it's only going to abuse it's users more.

------
scoutt
The more I think about it, the more I think the search engine problem has no
(currently known) solution. The options are:

1) Word-based crawling/indexing: quickly abused by spammers.

2) IA/ML-based: I think this is the current model (?), but after a while the
machine got "clever" and it makes Google to think it knows better than me
about what I am looking for, and returns result for "most people" tastes. The
problem is when you are not "most people", and/or you are looking for some
niche topic/work related/tech stuff/etc. Simply trying to discover new things
like an interesting blog or a small shop it's impossible.

3) Paid-based: as in the article, and might be a good idea. But I think it has
to run on a custom indexer. Why would I pay for Bing results?

4) Aggregators: a search engine that returns results from a bunch of other
search engines, like DDG and others.

5) A mix of the above?

So unless new ideas come to the rescue, I think it's always going to get
worse.

~~~
eitland
> 1) Word-based crawling/indexing: quickly abused by spammers.

I guess that ends quickly once spam means you get blacklisted no matter how
many Google ads you serve ;-)

A large piece of this article hints at how they have some interesting options
that Google doesn't have.

------
sawaruna
Maybe the pendulum could swing back the other way slightly and we could have
room for an older Yahoo-style indexed search engine. This wouldn't be
something that would be browsed instead of search, but if you had a general
categories list indexed, say 'finance' or 'graphic design', users could enter
those sections and search sites categorised accordingly. Perhaps some ranking
could be done based on search terms used within a category and what site a
user ends up visiting (e.g. many users end up visiting a certain stackexchange
link when searching for 'parsing json in python' within the 'computer
programming' category, and so its rank increases). Heck instead of ads, maybe
each category page could have a list of 'popular / trending sites on this
topic' section at the topic that pages could pay to be placed in.

Not sure if this solves any user problem to be honest, and the idea is only
appealing to me because I think some amount of domain expertise and human
curation could go into categorising pages. While this sounds (and no doubt
would be) labour intensive, if we consider the number of domains that users
actually visit when conducting a search (i.e. ignoring anything past page 1 of
google) then perhaps it's not so extreme.

~~~
eric4smith
Back in the good old days of Dmoz, this was the way, but then it became
extremely limiting, and I remember wanting to rank in the directory for
something and I "knew" one of the people in charge of a section. There was
even some accusations of "pay for play" too. Eventually Dmoz died when Google
came on the scene. But I get what you mean.

------
Ozzie_osman
I hope this works out. I'm a big believer that companies eventually take the
shape of their business models, and if you're free but serve ads, you're an
ads company, not a search company. So it'll be interesting to see how a
company without the ads tension ends up evolving.

That said, search is a really hard problem to solve (even if you can take
shortcuts like using Bing's API).

------
8organicbits
> putting search results back at the top of search results

Watching Google slowly fill more and more of the search results with ads, this
is an obvious and very welcome idea.

~~~
imustbeevil
Why watch? uBlock origin [1], the most recommended browser extension in the
history of the internet, blocks the ads on Google. You can also right click
and block any element you don't want in your search results, like "Top
Stories" and "Videos". You're still getting the search results, if there's
content in your browser you don't want to see, _you_ have more control over
that than Google does.

I haven't seen an advertisement in 10+ years. I don't really understand why
anyone chooses to see them when they don't have to.

And sorry if this comes off as confrontational, I just see so many people
talking about advertisements and it's difficult to have to tell each
individual that adblocking extensions have existed for close to 15 years [2].
I wish there was some better way to spread this information so no one would
have to see ads or comment about their existence ever again. The internet is
so much better without them.

[1]
[https://en.wikipedia.org/wiki/UBlock_Origin](https://en.wikipedia.org/wiki/UBlock_Origin)

[2]
[https://en.wikipedia.org/wiki/Adblock_Plus](https://en.wikipedia.org/wiki/Adblock_Plus)

~~~
chickenthirty
uBlock Origin is great but doesn't work on Safari, which a lot of people use
since Chrome is an incredible battery hog and Firefox is significantly slower
on Mac.

None of the choices of content blockers in Safari successfully block the
majority of ads on the internet.

Furthermore, it's not just about the surfaced ads. Even if you use
uBlockOrigin, search engines like Google optimize for ad clicking, which will
affect the search result ranking even if you have ads blocked. As a result,
search quality has been steadily decreasing over the past decade (there have
been hundreds of highly ranked HN discussions on this in the past).

Finally, uBlockOrigin is an amazing tool developed by 1 person. There is
always the chance that, in the future, there are developments in browsers or
ad-serving technologies that render it obsolete (e.g if Google decides to make
a breaking change to the Chrome Extension API, like Safari did). In that case,
it would be worthwhile to have alternatives.

~~~
mi100hael
This is changing in the next release of Safari. They will support the standard
WebExtensions API so Firefox/Chrome extensions will be easily portable.

[https://techcrunch.com/2020/06/25/apple-will-let-you-port-
go...](https://techcrunch.com/2020/06/25/apple-will-let-you-port-google-
chrome-extensions-to-safari/)

~~~
thombles
According to uBlock Origin's developer it is not enough
[https://www.reddit.com/r/uBlockOrigin/comments/hdz0bo/will_u...](https://www.reddit.com/r/uBlockOrigin/comments/hdz0bo/will_ublock_origin_back_to_macos_big_sur/fvoc7wk/)

------
toohotatopic
Google can beat them by also offering a paid-for search engine. People trust
Google with their mail and phones, they will also trust google with privacy-
respecting search.

On the other hand, how come Cliqz has been shut down instead of being sold?[1]
Are there no companies with deep pockets who are interested in containing
Google's revenue besides MS? E.g. since Cliqz was so privacy focussed,
wouldn't that have been a great start for Apple to have a privacy respecting
search engine?

[1]
[https://news.ycombinator.com/item?id=23909484](https://news.ycombinator.com/item?id=23909484)

~~~
sterlind
I think Apple Maps probably left a sour taste in their mouth about running
such a service. Search engines are never done; it's a wide-open problem domain
with diminishing returns and investment in many directions. I think Apple also
realizes they're not a services company.. they sell hardware and support an OS
and app store; everything else is a value add to bring people into the fold.

~~~
the_other
Apple is absolutely building core business around services, and has been for
several years.

Siri has search and its top hits are surprisingly helpful. Maps is getting
slowly better, and has both native and web clients. Apple TV+, iCloud Drive
has paid tiers. Shortcuts and Messages look like apps, but they’re really UI
around services, as is Siri.

------
m-i-l
Not mentioned in the article or in the comments so far, but Neeva has raised
$37.5m in funding[0]. I'm curious how that money will be spent, if they're not
actually spending it on building a new search engine. Is it going to be mostly
spent on buying the results in from Bing, and/or on advertising their new ad-
free search, and/or something else?

[0]
[https://tech.economictimes.indiatimes.com/news/startups/7649...](https://tech.economictimes.indiatimes.com/news/startups/76494545?utm_source=RSS&utm_medium=ETRSS)

------
cutler
Asking users to pay for something they've been getting free for 22 years is a
poor business model. Look at news websites.

~~~
puranjay
But people are still willing to pay for news when it's done right - see
NYTimes as an example

------
wackget
This website (protocol.com) hounded me with a newsletter subscribe popup as
soon as I landed on the page.

I hadn't even had a chance to read the first line of the article and the
site's already asking me to sign up for their marketing trash.

Why in the world would anyone think that's acceptable? I have never seen the
site before and have no idea of the quality of its content so what on earth
makes them think they're going to get a subscriber out of me?

~~~
XCSme
I agree, I usually insta-close a website when I get an unexpected pop-up. Pop-
ups should only be used as a result of a direct user action (eg. click on
delete account, receive a confirmation pop-up).

------
rcardo11
Search engines are such a big thing they should be open sourced and
distributed over the community. It's like the most basic infrastructure the
internet needs to work and we are outsourcing it.

~~~
jeffbee
Please describe in detail how you will distribute crawl, index, and ranking as
basic infrastructure.

~~~
WrtCdEvrydy
The same way you decentralize anything else.

You can do crawling by using an extension that allows you to create a new tab,
crawl data on your current url and send it up to the mothership.

You can actually do even better because you don't get SEO-hacks like disabling
certain javascript when Google is on the page to improve speed.

~~~
stelonix
I was thinking exactly this when I stumbled upon your comment, except I
figured it should work for any private tab and it'd also need a browser that
makes tabs private (and contained) by default.

It's a solution more easily solved by vc companies or government laws, because
we're not seeing Google doing that in this lifetime, while FOSS solutions
simply won't get the needed traction.

~~~
kevin_thibedeau
What happens when these self-hosted crawlers access illegal content in one's
country?

~~~
stelonix
The same thing that happens when a peer accesses an illegal torrent on his
country? How is this relevant? It is a decentralized system, it shouldn't make
a difference.

~~~
kevin_thibedeau
"Honstly officer. I didn't click on that link to CA imagery. It was my
webcrawler."

------
WarOnPrivacy
Want to build a superior search engine?

OBEY YOUR FREAKING BOOLEAN OPERATORS (search syntax, whatever) - something
Google, DDG, Bing, et al, no longer have any interest in doing.

~~~
vogre
That will bring you like 17 loyal users.

~~~
WarOnPrivacy
Why would you assume only 16 other people have ever had to search for
something precise?

I think that number is a lot closer to everyone.

------
yalogin
I agree with the sentiment that google search results are not what they used
to be. And its google's fault. Not as much because of their search algorithms
as it is their ad business. Talking pure about tech articles on the internet,
Google's ads have corrupted the previously benign and even helpful blogs and
other sites and turned them into ad peddlers. I mean who wouldn't want a
recurring revenue stream?

That kind of reached its zenith (or nadir) when companies started sucking in
articles and employing people to write tutorials just to get ad dollars. Every
one of the tech articles. I am sure the other fields are the same way. So
unless you know a site you want to search in, the results are all ads not real
content.

~~~
bhartzer
The problem is ads and ad revenue, but Google is increasingly scraping content
from websites in order to put that content on their own site—To keep users on
google properties as long as they can.

------
kwhitefoot
What I want is a search engine UI that will guarantee that my search terms
appear in /all/ of the hits. And allow me to specify terms that must not
appear and guarantee that the hits do not contain those.

I'm semi-seriously considering writing my own.

~~~
moritonal
Not to seem rude, but do you know about
"[https://www.google.com/advanced_search"](https://www.google.com/advanced_search")?

~~~
kwhitefoot
How do you find it if you aren't aware that it exists. There is no longer a
link to it on the Google search page as far as I can see. And, as soon as you
hit the search button you are back in the ordinary search so refining the
search is tedious.

The UI stinks.

However, I am grateful for the reminder.

------
lettergram
Having actually built a different kind of search engine (see
[https://hnprofile.com](https://hnprofile.com))

I’m kinda miffed that all these posts use bing. I’m sorry, that’s not a new
search engine, it’s a new UI.

------
titzer
Hey, this is great. Sridhar was my VP for 3 years of my tenure at Google. Hell
of a nice man. Great that he got out of the ad business and I absolutely think
this a great thing to be working on!

~~~
person_of_color
Are you still working on ads?

~~~
titzer
No, I worked 6.5 years more, on V8 and WebAssembly and I quit Google last
August.

~~~
person_of_color
And went where?

------
mark_l_watson
If it weren't for Duck Duck Go, I would absolutely consider paying a small
monthly fee to Neeva for ad-free (and hopefully tracking-free) search service.
I might anyway.

Interesting that both Neeva and DDG build on Microsoft's Bing service. I also
use Bing search for small personal projects. Search APIs are useful for many
automated information extraction use cases.

There are paid for Google services that I really like (GCP, Play Music, Play
Books, Play Movies), but I now avoid search and gmail.

------
ralmidani
I would gladly pay $5 a month for a totally ad-free and tracking-free search
engine that lets me easily block sites and/or downvote results that aren’t
relevant to me. I already pay for Apple hardware and Protonmail, and use FF
and DDG whenever I can, in no small part to avoid Google and MS. I would love
to see DDG or someone else offer a no-nonsense, “premium” search engine.

------
freediver
I have to admit this has a great title.

I am familiar with Neeva and listened to the podcast with its founder which I
recommend to people interested in the field.

[https://podcasts.apple.com/us/podcast/sridhar-ramaswamy-
your...](https://podcasts.apple.com/us/podcast/sridhar-ramaswamy-your-search-
history-is-incredibly/id1011668648?i=1000480087638)

I am a bit skeptical about the whole thing for two main reasons:

\- Sridhar spent 15 years building the behemoth that is Google's ad business.
It has to be hard to shrug off 'don't be evil until you do' culture and
mindset that easily. \- I think they can do a good job at specialized searches
(like the one illustrated in the article). But the problem is that the web
search is general, not specialized. They can not expect the users to pay
$30/mo and still go to Google for general searches. Bing API results are
subpar to Google results.

I do welcome a paid model for a well executed search engine. I would
personally pay $100/mo for one that meets my needs.

------
ur-whale
A few things you need to build a search engine:

    
    
        1. something that *understands* the question
        2. something that *understands* content (web pages)
        3. something that can match the output of 1 and 2 very fast and at scale
        4. a way to generate revenue out of the whole thing without bothering the user or perverting the output of 1,2 and 3
    

In the case of Google, they have a fairly good implementation of #1,#2 and #3
and they _used_ to have found a decent compromise for #4.

In the last 5 years though, they have completely eff'd up #3 because of the
pressure that comes standard with large US-based corps: the next quarter EPS
is the only thing that matters, and long-term growth is not something wall
street is interested in.

Burn a patch of forest, grow something on the ashes-fertilized soil for a few
years and move on to the next patch when the soil is permanently dead.

------
sixQuarks
These paid services never seem to get off the ground. It's asking too much for
someone to pay for something right off the bat.

Not saying paid services will never work, but I feel something like this could
only work if they make it free from the beginning, actually offer better
search, build a rabid fanbase, then start charging for it.

~~~
harikb
I would think it is better to start with a smaller amount - not zero just to
make sure you are getting users who would be willing to pay the real (higher
fee) later. Subsidizing to zero only works for those which either have a
network effect or ad-supported.

~~~
sixQuarks
Studies show its at least 20 times harder to get a paid conversion than free
conversion regardless of price (example being free app vs 99 cent app
downloads)

I think it will be easier to convert more than 1 out of 20 from free to paid
once they see the value of the search

------
135792468
As an SEO, I’m embarrassed how accurate the comments in this thread are. I’ve
had this conversation with with peers but the search results are utter trash
and it makes me feel bad to be part of it, finding good information is so
hard.

Until Algos get smart enough to understand text at a deeper level we will not
get anywhere.

We really need more competition. I know Apple gets billions from google for
being the default search but if they were to buy DDG which results are mostly
goof enough, they’d be a viable competitor. Until something big like that
happens, Google will continue to be below average

------
simonkafan
Why another general purpose search engine? How about a niche, for example a
search engine that can be extensively controlled by using Boolean algebra,
Regex, first-order logic?

Googles results got less and less reasonable over the last years.

~~~
throwaway_pdp09
Grep through your HD. How fast? Now imagine grepping through the internetz.

As for FOLogic, IIRC RDF was related to that (or was that, can't remember) and
it takes the sites investment to get that working on their site ie. you can't
expect formal relationships between objects to appear magically, someone has
to do it. And FOL if it existed may be computationally expensive.

And once you've got all that, how easy would it be to use? Probably most would
go back to keyword searches.

------
c3534l
If you could return search results that actually contain the keywords I
entered and not waste 5 minutes of my time returning irrelevant results
because you only found 3 relevant results (you know, the ones I was actually
looking for) to pad out the results, that would be a massive improvement.
Search has gotten so bad that I feel like the only reason no one has toppled
Google at it is because search engines are almost passe. No one cares about
them much any longer because people go to a handful of site for specific types
of content rather than find data scattered and unorganized across the web.

~~~
non-entity
This! So much. I'd rather have a search be truthful and return no results than
return shit I dont care about. This applies to many things outside google as
well.

------
sushshshsh
How would somebody write their own search engine?

I'm imagining some access patterns like this:

If I know what site I want to get an answer from (youtube), i can just
download lots of text from youtube and search those text files for the string
im looking for, making a map of files that match that string (or youtube could
expose an API for me to use of course)

if i dont know what site i want an answer from, well this becomes harder, and
presumably if i dont have the space to store a text only copy of the internet
for me to grep through, then a 3rd party web copy (such as wikipedia) is
probably a good starting point???

------
habosa
I just can't get past the narrative here. Sridhar basically builds Google's
massive ad business from the ground up and then decides "ads are bad" and
takes his ad money over to profit on the other side by selling a new search
engine without ads.

Reminds me of "The Prodigal Techbro":
[https://conversationalist.org/2020/03/05/the-prodigal-
techbr...](https://conversationalist.org/2020/03/05/the-prodigal-techbro/)

------
LockAndLol
They could be using YaCy, setup their own infra and provide premium access to
fast and more reliable YaCy results. That would at least make them a
competitor and not just a Bing UI.

I mean, sure, it's cheaper, but it doesn't really set them apart from any
other Bing UI. Actually running something they own, especially something
opensource and federated, would distinguish them and warrant whatever fee they
deem necessary.

------
Aeolun
They’re not building a search engine at all. They’re another duckduckgo, but
charging a subscription fee.

They’ll get my money when they’re actually building a search engine.

------
shafyy
From their website:

> Neeva finds exactly what matters to you, whether it’s on the web, buried in
> your emails, or deep in an impossible-to-find document.

Why not stick to just a good search engine? I don't want to hook up my email
accounts cloud storage accounts to this.

Usually people know if they need to search on the world wide web or in their
own emails or docs, don't they?

------
narrator
There's already a paid, curated search engine Google can't beat in certain
verticals: Westlaw and LexisNexus, but they are based on vast curated and
annotated data sets of legal case law that deliver enormous value to end users
of those services.

------
est
Search engines are dead because public websites no longer carries useful
information. Tons of user creations are on mobile now, in walled gardens.

We need hostable websites on mobile phones. Or data will be centralized to few
big corporations.

------
ganafagol
> See, Google's business model won't allow it to compete with Neeva. It can't
> get rid of ads because ads are its whole business.

That may sound great, but it's just not true. Everybody working at a company
that pays for gsuite knows this. Gmail in gsuite is not part of the ad-
financed services. Instead, you pay. Google could any day offer the same thing
to consumers with search. Pay some fee every month, opt out of ads. Facebook
could do the same thing, btw. Boom, there goes the "novelty" of Neeva's
business model.

Now comes the question why don't Google, FB etc have such offerings yet. Many
people would choose them. My guess is that they would need to have quite steep
pricing, if they would want to 100% compensate for lack of ad income for each
user who pays instead. Nobody would want to pay FB 50 USD every month just to
see their uncle's Trump propaganda. That would be quite the eye opener about
how much each user is really worth to FB or Google in money terms.

Maybe services like Neeva create a market though and push FB and Google to at
least consider it.

~~~
dragonwriter
> Now comes the question why don't Google, FB etc have suh offerings yet.

They don't have them because the people that would pay anything at all for
them are largely also the people that are most valuable to advertisers, and
they wouldn't pay enough.

~~~
SomewhatLikely
It's a very intuitive explanation, yet the costs to remove ads from Hulu or
YouTube are quite reasonable. It's possible search is a different beast I
suppose. I know a lot of the search ads I click on are actually defensively
purchased. The number one organic result bought the top ad slot just to defend
their position. That feels far more like rent seeking than the ads on the
video platforms.

------
hliyan
If only the early Internet had a standard search protocol RFC alongside the
others, so that each website could implement it, and there would one day be
search aggregators with content quality filtering.

------
m3kw9
Once I saw subscriptions based search, I cannot bet on it just because when I
search I can filter the ads myself from google pretty easily, and it is quite
hard to beat googles search accuracy.

------
phenkdo
I think "search" needs a paradigm shift, in fact keywords -> links paradigm
needs radical change. Something akin to a GPT-3 model but with interactivity
and self-customization.

------
sid-
I think we should have subscription products that respect privacy and data for
all successful products; Maps, Youtube, Search, FB, etc.

------
jhoechtl
I signed up but I fear that it is going to be shopping-centric. So results are
bought instead of intelligentely mined.

------
amelius
I'd like to see a curated list of papers for building a search engine.

------
niftylettuce
Yet another startup with product that doesn't fit the market.

------
IndexPointer
So, another Bing API wrapper to add to the list?

------
magedqwani
it subscribe model is OK for a starter but what are the search algorithms
employed should produce open source technology if its want vast adoption

~~~
omarchowdhury
They use Bing API.

------
ve55
Even if they make a superior product, how will they get users to use it
anywhere near the scale they use Google?

~~~
devmunchies
They don’t need google scale, they just need profitability.

A premium search engine can have a place with professionals. Maybe they could
sell enterprise licenses.

~~~
omarchowdhury
> They don’t need google scale, they just need profitability.

Oh? They did take money from Greylock and Sequoia. Don't those guys aim as
high as possible?

~~~
devmunchies
They’d get acquired by google before they get google scale.

------
alexhaber
Related: [https://hammer.lol/](https://hammer.lol/)

~~~
ZephyrBlu
Is this a joke?

~~~
BbzzbB
Looks like it's by George Hotz/Geohut, so half-joke half-serious perhaps?

------
ridaj
These two guys are super smart. But, how are they planning to make money?!

~~~
okaleniuk
I'll pay them some.

