For Safe Browsing protection, here's how it works (in progress): https://chromium.googlesource.com/chromium/src/+/refs/change...
[Disclosure: I'm the Software Engineer on Chrome who wrote parts of this Safe Browsing code, and that incomplete documentation linked above.]
If you're thinking of Google Safe Browsing (used by both Chrome and Firefox), you're wrong.
It works the other way around: Google sends you the list of undesired domains, and your client prevents you from visiting domains found on that list.
Nothing needs to be shared with a third party for that functionality.
A hash prefix list gets downloaded locally; Chrome checks locally against the prefix list. If a URL hits, Chrome will send the hash prefix (not the full hash and not the URL) to the server, the server will send back all full hashes that match that prefix, and then the client will complete the check locally.
In theory, if the server had a small number of matching full hashes, it could guess about what URL a client might be hitting, but in practice the system is designed as much as possible to avoid ever leaking data about what you're visiting to Google servers.
Clients download a database of partial hashes of malware URLs. If they get a hit on one of those partial hashes, they make a request for the full list of hashes with that prefix.
Google knows when a client makes one of those requests, but the exact URLs (or hashes) they're looking up are never revealed. The partial hash is 32 bits long, so there's enough collisions that making a request isn't especially revealing.
I might be wrong, but this is obe of the reasons I don't use Chrome so if anyone has links that proves something else I'm interested.
Because Chrome keeps a log of all your activity, your DDG searches are easy to find here: https://myactivity.google.com/myactivity
So if you're web based (like me) then activities such as sending an email, checking out YT, reading HN, watching Twitch, and jerking off, all end up as entries in that log file.
* Enabling Chrome Sync, which is opt-in
* Syncing history, which is on by default if you enable sync
* Not using a custom passphrase for sync data (not using one is the default)
* Having "My Activity" save "Web & App activity", which is opt-out
* Having synced Chrome history data sent to "Web & App activity", which is opt-out
For the last two bullets, the opt-outs are at https://myaccount.google.com/activitycontrols .
Edit: And presumably you're using Incognito for... some of those activities, which wouldn't be captured regardless.
They would be captured by the ISP.
Did you know that Google happens to partner up with a lot of ISPs? Hmm, I wonder what for. What could they possibly have that Google needs?
Ever heard of https?
Do you really think Google would have trained an AI to determine that last activity? How would they have trained the AI?
How graphic do you want me to be? But isn't the real question: is there utility for an ad network to know about your preferences in porn? If there is such utility, you're best to believe Google implemented a way to get them.
If I was an ad network I would love to hear about your porn habits. I would absolutely love it.
If you know, you are obviously one of the people who has that data. If you don't know, you aren't.
Such networks don't have to know whether or not and the exact moment a given user jerks it. Though, it would actually be better if you were actively browsing around and not jerking it currently. I guess that's why discovery and AI are so bad on porn sites. It's actually better for their ad revenue!
I think the causal order is flipped around here.
Sloppy. How do they know I'm not preparing for actual intercourse? How do they know I'm not downloading porn for later? I could also be watching gun videos, since they've been hosted there.
Granted, it's probably 95% accurate.
And anyway, it's “to improve your experience”, so you can't opt out. It's for your own good. Remember, Big Brother loves you.
Everybody working for this company should be ashamed.
Yes the data. The whole request and the whole response. They need it in order to properly train their ad-network.
Given that they scoop up all this data I'd appreciate if their ad-network actually improved. Just the other day the dating site scams where back.
"We'll try not to show it again" they say. Well for vacuuming the market for the best and brightest they either don't try very hard or they are very dysfunctional because they fail as a group.
Would you think of Google as trustworthy because they only gave their backend two pieces of data? I myself would not, because I'm pretty sure the actual request and response messages are looked up by client ID (in their Google Analytics data store).
Chrome (and Chromium) creates at start and maintains SSL connections to Google. It is not easy to sniff what is being sent. Even if you MITM it, like in enterprise transparent proxies, Chrome will throw an error because of cert pinning. google domains should be whitelisted: "we recommend that you avoid the use of transparent proxies." https://support.google.com/chrome/a/answer/3504942?hl=en
But I doubt the Chrome extension that would do that patching could stay at Chrome store.
EDIT: Added italicized text for clarity.