Hacker News new | past | comments | ask | show | jobs | submit login
End to End Encrypted, Private Search (private.sh)
130 points by rasengan on March 3, 2021 | hide | past | favorite | 101 comments



This appears to be a zero-knowledge proxy for web search. It's not clear what search engine they're proxying to, but I wasn't super impressed with the result quality off the bat. I searched for "Brave search engine acquisition" and got a link to the TechCrunch home page with a relevant snippet, but the actual link didn't take me to the relevant article.

It's hard to see what value this adds over something like Duck Duck Go. Yes, it means the search provider can't see your IP... but there are other ways to do that, and you're trusting private.sh with your query and IP anyway.


> and you're trusting private.sh with your query and IP anyway.

If you take a look at the javascript, the query is absolutely encrypted before private.sh can see it. In other words, private.sh cannot see it.

Instead, what private.sh provides is a decoupling of your identity from your search query.

Private.sh can see your IP, but cannot see your search query.

The search engine can see your search query, but cannot see your IP.

Hope this helps clarify and thanks for the input! The search provider is constantly improving search results -- we'll be seeing significantly improved results over time!


I think there's a lot of confusion over this topic because the general explanation is confusing. "XXX is encrypted in the browser" implies that XXX is encrypted because of TLS.

Perhaps start with a statement like: "Private.sh sends all of your searches to a 3rd party search engine. These searches are private because we prevent the 3rd party search engine from knowing your IP or fingerprinting your browser. Private.sh can't see your searches because they are encrypted in your browser and can only be decrypted at the 3rd party search engine."

"Thus, the 3rd party search engine only sees your search, not your IP or browser finger print."


You're absolutely right. This is a much better explanation, and we'll improve our language with this advice in mind.

Thank you so much for your help and input!


> If you take a look at the javascript, the query is absolutely encrypted before private.sh can see it. In other words, private.sh cannot see it.

And if private.sh can unilaterally modify the js, that doesn't mean much.

Also at some point the request has to be decrypted to send it on to the real search engine. You're also trusting private.sh to do that without deanonoymizing you. [Edit: appearently this step is done by gigablast, which is a separate company. That's at least a good thing. You're still trusting that both parties are acting honestly and competently]

In other words. If you trust private.sh not to be evil, then sure it works. But if we're just blindly assuming providers are good, we might as well just use google.


You can use the private.sh extension [1][2] to ensure that private.sh isn't doing anything fishy. The public key of Gigablast is hardcoded in and the assets are served by a third party.

[1] https://chrome.google.com/webstore/detail/privatesh-private-...

[2] https://addons.mozilla.org/en-US/firefox/addon/private-sh-pr...


Not that you need to, but you could also protect your client-side JavaScript with some tight CSP rules as well as a nonce that you publish and keep verified as changes are made. This won't affect normal users, but technical people who care can check.


That's a good suggestion. Thank you!


I still have to trust that Private.sh isn't working with Gigablast to de-anonimize the request.


That's true. This is the same as Tor in the sense that if multiple parties collude, you can be de-anonymized.


It's different from TOR in that only two parties need to collude, though.


You assume that the proxy and the search provider follow these rules and do not cooperate in a different way (e.g. the proxy can send a query with an IP).

But can you actually prove it? The search provider isn't disclosed, we don't even know if it exists at all (as a separate entity).


I don't think it's an assumption in this case because they are the ones running the proxy.


> If you take a look at the javascript

How do I automatically validate that the JS hasn't changed between visits on my mobile?

(This is the problem with zero-knowledge-via-JS-delivery platforms that I've been unable to solve to date)


This is also the weak link in client web apps for security products like 1Password, where the weakest link is the integrity of the static files loaded by the browser. Break into their static file server or CDN/edge cache and you've effectively pwned any browser sessions started after that point.

(This also applies to builds of client software, but for most operating systems those builds have to be signed with a private key, which makes them more difficult to compromise. On Linux, SHA256 checksums are usually provided alongside binaries - which are especially important to use against security-related client software.)


Browsers support Subresource Integrity [0] checks. You provide an integrity attribute (or several) for a script or link resource and the browser will check the hash(es) before using the resource.

[0] https://developer.mozilla.org/en-US/docs/Web/Security/Subres...

[1] https://w3c-test.org/subresource-integrity/subresource-integ...


Good point, thanks for the added detail.

Note that one still must fully trust the root resource (the HTML response that bootstraps the web app). Whatever generates or stores that response up until it is wrapped in TLS is just as vulnerable.


Yes, but there is the problem of bootstrapping that chain of trust: how do you know that the SRI attribute value has not been tampered with? TLS protects transit, but not server side breaches.


If you're doing an SPA where the HTML is just loading your JavaScript for the meat of your app it's straightforward to verify that file. Besides literally comparing its hash to a known good value there's the Digest header [0] which is a way to provide a resource hash to a client. From the browser side there's the Want-Digest header [1]. These replaced the Content-MD5 header.

While browsers support these headers I don't think any currently reject a page or resource if a digest doesn't match. But at least the mechanism is there.

There's also the draft [2] for signed HTTP messages. So a response can be signed and verified against a public key.

[0] https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Di...

[1] https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Wa...

[2] https://tools.ietf.org/id/draft-cavage-http-signatures-10.ht...


If I understand correctly the digest is only to make sure it didn't change while in transit.

If the file itself that's being served was compromised, the web server will serve it with a new digest that matches.


That's where the Signature header comes in. The host can sign the resource and the client can verify the digest and signature. If you're trusting the TLS connection you'd have the same amount of trust in the resource signature.


Right. Subresource integrity checks eliminate the need to trust the source of the subresource, but not the main resource (the HTML response that bootstraps the app).

But I'm willing to bet 1Password and the like never trusted third parties to host <link> and <script> assets anyway.


TIL, thanks!

Looks like it works on (mobile) Safari as well, which is fantastic.


I believe the current direction is to piggy-back off of subresource integrity:

https://w3c.github.io/webappsec-subresource-integrity/

As written, it's more about a page validating the asset from a CDN, but it wouldn't be hard to also have a browser extension which pops up an SSL-like badge when all script resources on a page are marked as trusted by some validating third party.


> Private.sh can see your IP, but cannot see your search query.

> The search engine can see your search query, but cannot see your IP.

Should add: this will hold true until both are acquired, or one sell their part of the logs to a data broker - the party acquiring both, can correlate the data after - and if one party sell its half of the data the other half can acquire it...

(supplement with: we promise not to log data... Until ownership changes) .


Gigablast is the search engine and that's something I coded. The bad results you are seeing here are because of index size. I don't have the hardware to keep the index as fresh and in-depth as it needs to be. At some point I hope to get the resources I need. But, keep in mind that ddg is just re-serving Bing results.


The difference seems that this threat model requires an active attack on the proxy to de-anonymize search queries/users.

With a normal proxy, a passive attack on the proxy is enough (since it will have both the request IP and search term in memory together at some point).

Whether this is indeed a meaningfully different threat model for most users is an interesting question.


Using the term "end to end" feel a little off to me in this solution since one of the "end"s is the search engine service.

If encryption is just through TLS how is this different than just sending your normal https based searches through a VPN or TOR?


Thanks for taking a look, and that's a great question. The search is both end to end encrypted and private.

In the sense of end to end, if you rely on TLS alone and send your query directly to the search engine, your IP will be attached to said query and thus your search and identity are coupled.

In private.sh, it's both end to end and private:

1. We utilize client-side javascript encryption prior to routing a search thru a proxy.

2. The proxy cannot read the query, and simply forwards it to the search engine.

3. The search engine decrypts, performs the search, and encrypts the results which are sent back thru the proxy (who, again, cannot even read the results).

4. The results are decrypted client-side.

Thus, your search query and identity are decoupled.


That's very similar to what Tor does. And since both parts are under control of the same actor, it's not very different from, well, not doing it.

But too often ideas are shot down because they're not "helpful enough" or something. I shouldn't be too critical! It's great to see someone working privacy in the search space beyond tech that is already there. Having tor-like anonymity without needing to use a special browser and slow network is definitely an advantage, perhaps we can iterate upon the design later (somehow introduce user-run proxies so that the search server really cannot know who sent the request? Somehow make the javascript reload only every once in a while, so you can't silently change it for everyone at once as an attacker? People monitoring the subresource integrity tags for unexpected changes? I don't know). I'll be curious to see where this takes us!


> And since both parts are under control of the same actor, it's not very different from, well, not doing it.

Both parts are not in control of the same entity.

That said, thanks for the input and comments!

Edit: If you have any more input on how it can be approved, I would love to hear it! :-) My email is in my profile!


> Both parts are not in control of the same entity.

How does that work? Do you or a subcontractor of yours not host the javascript responsible for the encryption?

Edit: okay so this piqued my interest and the site has this to say: "Public IP is stripped away by the Private.sh proxy" (so you run the proxy) and "Search Provider decrypts the query". So the point is that Privacysh != Search Provider I guess? That begs the question whether one is a subcontractor or side project of the other, or hosted in the same physical space, or something else that make this effectively the same as not proxying it. Also because you need to coordinate on what JS is being delivered to some extent: if Privacysh doesn't use the right pubkey, the SP can't decrypt it.

It feels a bit like Telegram who says "all our chats are encrypted, really! We can't read your messages!" but is still somehow able to deliver the plaintext messages to your client at a very fast rate. The trick being that "the key is stored in another datacenter as the message" but of course Telegram owns both and that's how they combine it for intercepting, ahem, for delivering messages to your phone. This isn't the same, but without more background info it seems similar. Then again, it can't be worse than the status quo, so still good that you're working on this one way or another!


> Then again, it can't be worse than the status quo, so still good that you're working on this one way or another!

Thank you. To clarify, private.sh != search provider (gigablast). Thus, it would take collusion between both to couple the identity of the searcher to the search query.

private.sh does not have access to the private key associated with the public key that's used to encrypt the search query!


Thanks for clarifying, that's a setup I definitely haven't seen before. Interesting!


> Both parts are not in control of the same entity.

That seems doubtful. Which party is in control of the end where decryption takes place if not private.sh?


The search (where decryption takes place) is controlled by GigaBlast.com.


Fair enough, i guess that's something.


> Having tor-like anonymity without needing to use a special browser and slow network is definitely an advantage

This is potentially huge. Maybe for a resource-light webpage (in terms of content downloaded by the user) the overhead of a Tor onion architecture is negligible and combined with being able to use your regular old browser with no special configuration, that could mean just having specialized onion architectures for certain websites could make privacy a lot easier to realize for the end user.

The barriers of slow load times for every webpage and the need for end user configuration are pretty big barriers.


I mean, there's a reason we're not commonly doing this: a setup where the relaying and the back-end component are from the same organisation make it fairly useless. You need to find an established partner that wants to work with you to have some claim to not be colluding. And even if so, the front-end can always deliver another piece of javascript that carries off a copy of the plaintext to a server of their own. This setup has some clear disadvantages over Tor. But to repeat what I said before: I shouldn't be too critical when all I'm really saying is "it might not help enough to be worth the effort". If there are no downsides either (well, the effort, but that's up to the dev and consequently the user who pays the dev indirectly) then it's better to do it.


This is not tor like anonoyminity. At best this is like vpn-level anonoyminity.


If you're going to rely on any client-side javascript please use subresource integrity for all your assets.


If TLS is compromised, you cannot trust the subresource subresource integrity checksums. It's better than nothing but there is no good solution.


Our solution here is our extensions [1][2] which are hosted and served by google and mozilla (third parties).

[1] https://chrome.google.com/webstore/detail/privatesh-private-...

[2] https://addons.mozilla.org/en-US/firefox/addon/private-sh-pr...


Which is why this whole exercise is a bunch of snake oil. You can't have better than TLS security when TLS is the weakest link.


This is just plain false. This search engine is advertising anonymity. You don't get that when searching Google, which is encrypted with TLS...


Indeed, if TLS is compromised many security guarantees of the web break down.


How do I verify that my query is being encrypted in the browser?

And is the proxy a service under your control?


The javascript source should show you that it's being encrypted -- and the proxy is under our control, the search engine is not.


Then every time I want to make a query I must re-audit the JavaScript source?

And, do you have any ideas about setting up the proxy so that I don’t have to trust that it is being operated as advertised?


Given that no external domain is contacted to pull the public key to encrypt with, how do we know the proxy won't send its own public key to the browser then forward the query to the search engine via a different encrypted payload?


In this case, the only way to be sure of this is to use the private.sh extension[1] which has the public key hardcoded in and is being served by google and mozilla.

[1] https://private.sh/extension.html


Under "How Does It Work?": Every single search request from our website and extension is encrypted locally on the client and proxied through our service, stripping away your public IP address ensuring that only the search provider is able to decrypt and see your query without any knowledge of who you are.

This is vague. It seems like it just means: "we have a TLS certificate and will be a proxy between you and Google"


Great question!

It's a bit more than that -- the search query is encrypted client-side. This means that the private.sh search proxy cannot see what the query is. Then, the query is delivered to the search engine. The search engine does not know the IP origin of the search query.

This means that the search query and the identity of the searcher is decoupled, providing significantly more privacy.

Hope this is clear!


Thanks for the response.

Is this client side encryption the same regular TLS encryption that I would normally (no proxy involved) send to the search engine or is there some kind of buy-in from the search engine to handle it a different way?


Great question!

It requires the search provider to utilize NaCl to decrypt (requires buy-in from the engine).


Is there a difference between private.sh and using a VPN?


That is what the authors stated means in practice, so your assessment is correct. This seems trivial, like I can just use an onion router for this.


It's a bit different, since if you only use a TLS certificate, you'd still be able to see the query.

That said, I agree, you could absolutely do the same with an onion router. This simply provides the benefit without having to do so.


I was being a bit harsh offhandedly. If I was making a full assessment of the product I would say that it has a market. You can't really do what I'm talking about with the tor onion router, because google blocks tons of tor traffic from accessing its services and wont even let you do a captcha cloudflare-style a lot of the time.

As someone who uses tor everyday, you basically can't get any higher quality general purpose search than duckduckgo or searx to actually load over onion routing. If you position yourself in such a way that your results engine returns queries on par with those two (or preferably better) you will have a rock solid product in your hands.


Thank you so much for the comments! I totally agree with you!


I don’t understand either. It says private.sh can’t see your query but the “search provider” can, so that’s the other end.

What the heck is a search provider though?

EDIT: looks like it’s “gigablast”. I don’t understand why I just wouldn’t use that search engine directly?


It encrypts the query with the public key of the search provider (Gigablast) then sends it over to https://search.private.sh/v2/search which then supposedly goes through a few proxies before finally hitting Gigablast, wherein the search query is decrypted by Gigablast's private key.

The point is that Gigablast doesn't know details about the user (though I'm not sure why private.sh's servers can't just forward the IP), and private.sh doesn't know what search query was sent (which, if you're using the extension and disable auto-updates, you can verify that this is actually the case. The code is short and readable.)


The whole idea of the product is it strips your IP address.


This seems like an odd use of the term "end to end encrypted". This seems to be primarily some sort of proxy scheme. If this is considered "end to end encryption" then why do we not consider any use of a proxy "end to end encryption"?

Normally that term is used in messaging between two people to identify a system that puts the identity management and the control of the cryptographic keys under the exclusive control of those users. That is certainly not true in this case.


This is absolutely end to end encrypted. A proxy can only strip the IP. The NaCl client side encryption deployed on private.sh is what makes this end to end.

The client encrypts the search query with the public key of gigablast. The proxy proxies the encrypted payload to gigablast. Gigablast decrypts the payload, performs the search and encrypts the results with the client's public key which is included in the encrypted payload. Then, Gigablast passes the encrypted results payload back through the proxy to the end user who is able to decrypt the payload and display the results.

In conclusion, the proxy (private.sh) knows your IP, but not your search query. The search provider (gigablast) knows your search query but not your IP.

Hope this is clear!


End-to-end encrypted means that Gigablast wouldn't see the search query or the results only the service and the searcher would see them. For example, Signal is end-to-end encrypted. Signal servers do not see your messages.

Edit: this was addressed in the comments. The end-to-end is between the client and Gigablast. Private.sh is the Signal server and it cannot see the request or response.


It depends on how you look at things, in this case you and the search service provider are both signal clients and the proxy is the signal server that relays the messages between the clients and provides privacy on the network level (hides the IP). Since the proxy cannot decrypt the query or the response it can’t see anything.

Wether the actual privacy as in is it possible to correlate queries to attribute a search to a given user is always preserved it’s hard to tell without looking at the implementation in depth but it provides some level of additional privacy at least to the level of a VPN.


End to end encrypted means this here:

[Client] ---> [Private.sh] ---> [Gigablast]

In the above diagram, [Private.sh] cannot see the search query. Thus, it's end to end encrypted. Private.sh is the "signal server" in this case. It works because the actual client, in this case, is the javascript running client-side on the web browser.

Your browser encrypts the query, clientside, before sending it to private.sh. All private.sh does is strips the IP away before sending it to gigablast.

Hope this is more clear!


Wow, search engines are having a busy day. First Brave and this one is on the front page.

I said it about Brave's purcahse of the defunct Clickz, and I will say it here:

Privacy is a feature, not a platform.

--

Building a business on this feature is not enough. It actually has to be better.

Competing with Google's 20+ year and 500 billion dollar head-start on broad search seem foolish unless you are willing to buy up the talent and resources to go head-to-head.

I'm not here to poo-poo effort, but I do think there is a better way.

One based on niches and human curation.

We can get around the problem of head-to-head competition with general search by focusing on better facets and data sets curated by subject matter experts.

At least, that is one hypothesis.


I did a lot of work on the community at Quora and my low-effort hot take is that subject matter experts don't get paid enough to do that kind of work, so you get grad students or hobbyists who put in effort while they are unemployed, but fizzle off when not, and they boil away leaving only the obsessive folks with an axe to grind.

That's a problem to solve as well, but I think it could be solved if those subject matter experts were part of your business model and they were paid in some meaningful way.


Sam - check your spam folder. I self host my email lol.


if the doj, etc. breaks google's stranglehold on search ads up into 2 or more independent search ad companies, then other non-big-tech search engines might have a chance at bidding for some of the 'default' provider lists for cell phones, browsers, etc. but as it is right now, google does not allow their search ads to be displayed on other search engines' search results. sucks!


You may be on to something here. In the past, most search engines started from DMOZ (now defunct) which was a simple, curated, directory.


I'm very serious about this space. Please reach out if anyone wants to connect and chat. Links in Profile!


Who is the search provider?

I tried looking on the site, didn't find any mention of it, and the language used in the comments here seems somewhat evasive.

Edit: okay this is an initiative by PrivateInternetAccess and they are using https://www.gigablast.com/ as the search provider


It's an initiative by Imperial Family Companies [1] which uses GigaBlast as the provider.

[1] https://imperialfamily.com/


Searched on Coinbase in USA news results. Didn't get anything directly about the upcoming IPO, and many non-english results were included. I like the concept, but the results are not useable.


This is good feedback and a good test. That should have absolutely returned something of relevance, so I agree with you here.

The search provider is constantly working to improve its search results, and I believe that feedback like this is of great importance to catabolize this.

Thanks again for the feedback, it's well noted.


This requires non-free JavaScript[0] to even post a search query, doesn't it?

The FF add-on[1] specifies "Custom License" without specifying which one.

[0] https://www.gnu.org/philosophy/javascript-trap.html

[1] https://addons.mozilla.org/en-US/firefox/addon/private-sh-pr...


Am I missing something about duckduckgo? I was under the impression it already serves the function of providing privacy-focused search. Is it not as private as I thought it was?


when you do a query on ddg they have access to your IP address AND your query. so you have to trust ddg quite a bit, and trust the people that work there. But if you use private.sh, no single entity has access to BOTH your query and your IP address. so it just provides more privacy than ddg.


That's marketing bullshit. Private.sh has no more garuntees than ddg


DDG is a great search engine. I trust they do not log and maintain privacy. One entity needs to be trusted.

That being said, with the private.sh offering, it requires two entities to collude to identify users.

Thus, it does provide more guarantees than DDG.


Private.sh can identify users without any help because it controlls the encryption code.

You only need 2 entities to collude if private.sh is acting honestly. If they are colluding you have already assumed they are not acting honestly.


If you use the Firefox and Chrome extension [1] you do not need to trust that the encryption code is broken.

[1] https://private.sh/extension.html


As https://blog.cloudflare.com/the-trouble-with-tor/, anonymity is always a double-edged sword and this will get taken advantage of by abusers. So we'll have to see if this can survive when abusers start taking advantage of it. This is a difficult problem...


Its a read-only search box. What exactly do you think the trouble will be?


Any reason they don't just use Google as the search provider? Then suddenly they've got a business


Startpage is similar to what you are describing. (Supposedly private, and backed by Google search.)


I found the "How it works" section somewhat confusing. I don't know what is meant by "your search term is encrypted using the Search Provider’s public key". Does this mean that the search provider has some simple encryption interface other than TLS?


Great question.

If you review the javascript, NaCl is used to encrypted the search query before giving it to private.sh. Thus, private.sh is unable to decrypt your search. The search provider uses NaCl to decrypt the search query and then encrypts the results before passing it back. The results are then only able to be decrypted by the browser, client side.


So instead of:

I trust DDG not to track me.

it's:

I trust Private.sh to not give Gigablast my IP to track me.


Yes. A comparison would be single-key versus multi-key authentication.


I tried a query I searched for yesterday at work: "opengl python glclipplane". Google gives me the documentation page, two example pages (which are near duplicates), and then some relevant Stackoverflow questions. Private.sh gives me the website homepage for OpenGL, some totally irrelevant links into the PyOpenGL documentation (other modules, etc., not the one I searched for), and the Wikipedia page for OpenGL.

It's cool that it's private, but I could pipe my searches to /dev/null and that'd be pretty private too. If I can't get a useable result, who cares?


gigablast coder here. i noticed if you search for 'pyopengl glclipplane' it gets it. ('python' does not occur anywhere on the page!) i think this is a synonym issue. 'opengl python' should be synonymous with 'pyopengl'. i think google gets this because they have billions of queries from which to derive synonyms. i'll be adding more synonyms to gigablast over time and hope to close this synonym gap.

that being said, i think 90% of the bad search results come from the index being too small (straight up lacking the best web page), or not having good enough synonyms.


While that's a reasonable explanation, the top result from private.sh (the opengl.org homepage) also does not include "python" anywhere on the page. I would expect the top result to at least be something relevant to glclipplane. I think this is both an issue of retrieval but also of ranking - the opengl homepage is a very generic result which is unlikely to have much salience to a query with other specific search terms.


How do I know Gigablast (the search provider) and Private.sh (the proxy provider) aren't colluding to de-anonymize the request?


It's not a trustless solution, but one that requires multiple parties to collude to break trust as opposed to in most search engines cases, just one.

In other words, if "Search Engine Company" wants to deanonymize and link your search with your query, they can.

If Private.sh or Gigablast want to, the two separate entities will have to collude. This isn't zero-trust, but it's absolutely 'lesser trust' than the current offerings outside of Private.sh.


> This isn't zero-trust, but it's absolutely 'lesser trust' than the current offerings outside of Private.sh.

In what way? How do I know the two owners of the companies aren't the same person? Heck, how do I know that Private.sh isn't just a Google spinoff?

There's absolutely nothing that guarantees that Private.sh and Gigablast aren't the same person or the same company.

I'd say it's the same level of trust as DDG or Google.


Thought this was some new kind of homomorphic encryption and got pretty excited for a bit.


private.sh is not saying if the client's key pair is rotated, and if so, how often. If it is not rotated, search provider can correlate requests coming from the same user and learn a lot about them.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: