Hacker News new | past | comments | ask | show | jobs | submit login

You're referring to Fallback Mixing, which is off by default. You have to enable it in https://search.brave.com/settings. When enabled, this feature will (at times) pull in results from Google via an anonymous query, routed through the browser. Read more about it here: https://search.brave.com/help/google-fallback

> Note that choosing this option has no effect on your privacy. If you happen to have a Google account, Google will not be able to associate your query with this account.

I'm confused about "routed through the browser" -- is the browser talking to Google directly, but without sending the login cookies, and then hoping Google doesn't associate searches from your IP with your identity?

Correct, a query is issued from your browser but without any cookies. While it's true your IP address tags along for the ride, the IP address isn't typically how users are tracked on Google-scale properties. Due to NAT and more, your IP address is not exclusively yours. It can represent many people at once, and over time. That said, if you are not comfortable with the idea of Fallback Mixing, you do not need to enable the feature.

At the very least I suggest modifying the text on the page as it is misleading.

  > Note that choosing this option has no effect on your privacy.
IP address is definitely considered private information even at court level.


This has not been my experience. Comparing results with Google, Startpage, and a Searx instance with only Google enabled reveals that the results are almost always from Google. Sometimes they merge multiple results that share a domain.

I decided to add them to the "Semi-Independent" category of my collection of indexing search engines: https://seirdy.one/2021/03/10/search-engines-with-own-indexe...

Mixing with Google results only can happen after opt-in and only in Brave browser. You can see if a single query has been mixed clicking on the `Info`, or check the independence metrics on the `Settings` tab.

The fact that you see results similar to Google for popular queries is a by-product of the fact that our ranking is trained using anonymous query-log. There is plenty of references to the methodology (https://0x65.dev/).

The fact that we are similar to Google on certain types of queries, is good (at from the perspective of human assessment). It's easy to find other types of queries for which we are not similar to Google. It would be rather stupid if we were to "use google" on easy to solve queries but not on the complicated ones, don’t you think? In any case, very nice article besides a couple of miss-conceptions (like this one), will bookmark.

Disclaimer: work at Brave search, used to work at Cliqz

That makes a bit more sense; I just read the blog posts. I'm concerned about the effects of optimizing against Google (namely, the extremely similar results); I don't think I understand the point of an alternative if it tries to replicate a competitor to this degree. The whole idea I was going for in that article was a diversity of information sources: if one engine isn't giving the results you want, try another.

Right now, users who want Google results and privacy can use a Searx instance or Startpage.

I updated the article to fix the inaccuracy. Diff: https://git.sr.ht/~seirdy/seirdy.one/commit/ddeeb36248ce5318...

Any other fact-checks are welcome.

You bring a very good point on the diversity of information sources, which is something we plan to attack in the near future with open ranking [0]

In my opinion having similar results to Google will facilitate adoption. After all, Google is pretty good for many types of queries (not all), and people in general have strong habits.

The fact that we are similar with our own index is great. It means that we have the power of deviating from it when needed, as we mature/evolve.

Allow me to repurposed your statement on why not use startpage if you want Google-like results: if tomorrow Google disappears (or for some reason becomes unusable), brave search will continue to operate as normal (similar to old Google). What will happen to searx or startpage? What till happen to ddg or swisscows if the provider turning bad is Microsoft. IMHO, no matter how much reranking or nice features they you put on top, unless you do not control the search results themselves, diversity can only be superficial.

Sorry for the "rant". Thanks a lot for the inputs and for updating the doc, appreciate it.

[0] https://brave.com/wp-content/uploads/2021/03/goggles.pdf

Brave Search doesn't fall-back to Google; not unless you have enabled Fallback Mixing in https://search.brave.com/settings/. Brave Search has its own index; the results may resemble those of other engines at times, but they aren't pulled from those engines (again, noting the exception of Fallback Mixing, an optional feature offered to the user via Settings).

I'm testing on Firefox and the Tor browser right now, JS disabled. I also disabled cookies in Firefox. Searches for "Seirdy", "Neovim", "gccgo", and others return results identical to Google, Startpage, and Searx instances with only Google enabled. No other independent engine of all the 25 other English independently-indexing engines I compared in the article has had this happen; identical pages on all the other engines are nearly impossible to find for advanced/uncommon queries.

90% of queries being identical to Google but different from the 25 other independent engines is one hell of a coincidence.

Archived example:

Brave results for "gccgo": https://web.archive.org/web/20210622172743/https://search.br...

Google results for "gccgo" (proxied through Startpage): https://web.archive.org/web/20210622172939/https://startpage...

If this is a bug, it's very serious and needs to be publicly disclosed.

Edit: more examples:

Brave results for "oppenheimer": https://web.archive.org/web/20210622173647/https://search.br...

Google results for "Oppenheimer" (proxied through Startpage): https://web.archive.org/web/20210622173658/https://startpage...

As a counterexample, I searched for something very obscure (only three pages on startpage) expecting to see them pulling in results from startpage to cover the long tail. I was surprised to see different results, suggesting their index is much larger than I assumed.

The query was "retail snap incentive program"

Edit: All your queries are for relatively popular terms. I wouldn't be surprised if there's just a clearly right top set of pages.

> I wouldn't be surprised if there's just a clearly right top set of pages.

I would be astounded! Why would DDG, Bing, etc. not use it? Different search indices and engines should practically always have differences in results, as ranking results is very fuzzy and dependent on the available data.

Interesting. I couldn't reproduce those results. Certain queries did produce _very_ identical results, but others did not. In some of those cases Google and Startpage did better.

Even semi-independant seems generous. I probably would have just lumped them in with Google or Bing.

Some queries do actually return independent results, but the vast majority (in my experience) do not.

I don't see a fallback mixing option on that page. Is it called 'Fallback Mixing' on that settings page? Also, these results are pulled from google and bing it seems for every query I do. seems like maybe some reranking is happening. And the query completions are from Bing. So you are sending everybody's queries to third parties. Not very private.

It does not appear that they are exposing all possible settings configs on mobile as fallback mixing is not shown as an option for me there. This seems like an oversight to me.

Fallback Mixing is only available to Brave on desktop and Android at this time. Apologies for any confusion.

Why is it only available on Brave? Doesn't make any sense.

Because you cannot issue a cross-site request to Google from the client due to CORS policies. This feature required work in the Brave browser itself, so that the application would serve as a pipeline for the request on behalf of the search page itself.

What incentive do Google and Bing have to share free SERP data to Brave in an anonymous channel?

They aren't sharing it with Brave directly, but rather with users. The query is issued via the participating user's Brave instance. This data then supplements what Brave Search has found, and assists Brave Search in presenting better results to that user, and others, in the future.

This sounds like a dishonest way of bypassing payment for Google search API by impersonating a request from a user.

It's still a request from the user; the user consents to issuing these requests on behalf of Brave Search when they opt-in to Fallback Mixing. Anybody can issue calls to Google's search engine.

Doesn't this get the user directly in violation of Google's TOS which prevents automated queries against it?

What do you think the Google custom search api is supposed to be used for if not serving searches that originated with some user?

Applications are open for YC Winter 2024

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact