
Edge sends full URLs of pages visited to Microsoft - semiquaver
https://mobile.twitter.com/scriptjunkie1/status/1152280517972299777
======
wolf550e
Eric Lawrence (PM on Edge, previously on Chrome, previously author of fiddler)
showing screenshots of documentation that explains all that, and saying it's
been documented this way for 14 years:

[https://twitter.com/ericlaw/status/1152933704198758401](https://twitter.com/ericlaw/status/1152933704198758401)

~~~
jammygit
...and for 14 years no ordinary user has understood what those default
settings do, instead assuming “their software provider wouldn’t do that...”

~~~
arijun
I think you're making a big assumption that ordinary users care if Microsoft
knows what URLs they visit--how many of them use blockers or another
technology to opt out of e.g. Facebook tracking?

~~~
badrabbit
Ordinary users don't know what a URL even is. But they do assume every site
they visit isn't being logged by Microsoft, much like they assume their car's
manufacturer isn't recording the places they drive to using GPS.

"Users don't care" is such an intellectually dishonest argument when users
aren't technical enough to understand the subject.

~~~
dgzl
Maybe we should differentiate "users understand and don't care" against "users
don't understand and don't care".

On one hand, a TOS usually describes privacy implications. On the other hand,
there's no accountability that a user has understood or even read the TOS. Is
it unreasonable to assume people will read the fine print?

~~~
elliekelly
> Is it unreasonable to assume people will read the fine print?

Unequivocally, yes. Even if someone hypothetically had enough time to read
every TOS they've agreed to I would venture to guess the average user wouldn't
be able to comprehend most of the terms[1] anyway. Disclosure documents are
written by lawyers (corporate CYA) for lawyers (regulators, plaintiff's
counsel, judges).

The law, however, disagrees with me entirely.

[1] Edit to clarify I mean "terms" as in the terms _of the agreement_

~~~
dgzl
Delegating privacy rights feels like delegating monetary rights. As in,
negotiating contracts for payments. This is coming from the perspective of a
simple person trying to understand this complexity. Anytime I sign complex and
impactful contracts, it's usually done in person with physical paper and a
general understanding between both parties. An online agreement feels like
scam artist's work, only after we've realized the value of our personal data.
Maybe we're giving too much away at the notice of typed words? We're
essentially giving away resource because we didn't realize the importance of
it.

------
userbinator
Is there even a "pure" browser anymore? By that, I mean one which by default
will do this: if I enter a URL in the address bar, it will fetch that page and
its associated resources. If I click a link on a page, it will fetch its
destination (this is obviously a simplified view, ignoring things like JS on
the page etc.) No other network activity, no "recommendations", no other
attempts to be "smart" or "helping" by doing anything beyond "pure" browsing.

I see a lot of this extra functionality being justified in the name of
"security", but I don't think the gradual erosion of personal responsibility
and agency that results is something which should continue.

"The road to hell is paved with good intentions."

~~~
dontbenebby
Firefox might not meet your definition but with a few extensions (Containers,
NoScript, uBlock origin) it's quite good at protecting your privacy.

~~~
gottam
The only thing is they integrate things like pocket and they're putting some
other things in soon.

~~~
ubercow13
Pocket is part of Mozilla

~~~
Aissen
And it still is not open-sourced, two years and five months after the
acquisition: [https://github.com/Pocket/extension-save-to-
pocket/issues/75](https://github.com/Pocket/extension-save-to-
pocket/issues/75)

For comparison, it took Red Hat less than two years to open source Ansible
Tower.

------
skarz
No one should be surprised by this. Microsoft is Google, just with a price tag
attached. (Google services are usually free of charge because they mine your
data and target you with ads; Microsoft has discovered the concept of double-
dipping and has now joined the party, while still charging for Windows.)

Getting you to sign into your computer with a Microsoft account is all about
tracking and monitoring everything you do all of the time and selling a record
of it to the highest bidder.

------
acollins1331
Edge is the number one browser in the world for installing chrome.

~~~
eitland
Doesn't Chrome send your entire browsing history to Google as well?

Edit: I thought this was a well know fact and if it isn't I might have been to
harsh about Google and Chrome.

Edit 2: Thinking about it and searching a bit I conclude that IIRC Google at
least used to have access to your browsing history as part of syncing it
unencrypted.

~~~
liability
Everything you type into the address bar gets considered for possible
completion, right? And part of that quite possibly entails sending it off to
Google servers which take a stab at finding completions for it. Your claim
seems plausible at least.

~~~
kps
Yes, there are some options under chrome://privacy/settings that suggest they
send full or partial URLs:

\- _Use a web service to help resolve navigation errors_

\- _Use a prediction service to help complete searches and URLs typed in the
address bar_

and then there's

\- _Use a web service to help resolve spelling errors_

whose description suggests it sends more than URLs.

~~~
Karunamon
And, if I recall right, you're asked if you want search suggestions in the
omnibar the first time you use it for that purpose.

~~~
liability
Does that prompt effectively explain its impact to nontechnical users?

------
josho
I assumed every browser did this as a part of their safe browsing features. Or
is this different somehow?

~~~
antoineMoPa
Following tweets explain that other browsers send an hashed URL instead. (but
I guess it's easy to associate the hash with actual URLs)

~~~
shakna
Other browsers use a _truncated_ hash. Easy enough to match to a known list of
bad sites, but exceptionally difficult to reverse engineer a list of sites the
user has visited.

~~~
ChrisLomont
Those hashes have to be essentially unique to prevent blocking good sites
erroneous;y, and as such are easily associated to actual sites when you’re the
scale of MS or Google.

~~~
simias
Not if there's an other layer of verification locally. Then you can allow a
few false positives with the hash on the remote and they'll get discarded when
the full hash is compared locally. That being said 32bits of information seems
like quite a lot and can probably used to match the original URL fairly
accurately.

~~~
liability
You also need to consider the greater context. Suppose my history shows I sent
a hash that could be one of 100 different known URLs. One of them is about
model airplanes, but there is no way to know I picked that one. Chance I like
model airplanes? All else being equal.. call it 1%

But what if over the next hour I continue to send hashes with random subject
matter sets that include model airplanes each time. Suppose the intersection
of the possible subject matters shrinks with each additional hash, quite
possibly contains only model airplanes.

I visited 100 URLs in one hour. For each URL there are 100 others with the
same hash but independent topics. Lets say there are 10,000 known topics, but
every hash I sent has model airplanes associated with it. Now what are the
chances I like model planes?

It seems clear to me that this scheme, with logging, reveals a lot in theory.
But _maybe_ solving this problem in practice would cost more that the data is
worth. For now.

~~~
tialaramex
You skipped a factor, in Safe Browsing Update API (the mode actual web
browsers like Firefox use) that step where you send a 4 byte prefix only
happens when the prefix already matched a local hotlist.

So if you visit 100 URLs in an hour, and using the local hotlist allows your
browser to discard all but 1 of those URLs as definitely not bad, that's only
one URL prefix checked against Google, not 100 so there's no triangulation.

And the actual number isn't 1-in-100, that's why they picked 4 byte prefixes.
I haven't actually checked, but by eyeball from playing with this data for
work I would guess 1-in-a-million.

So, you visit 100 URLs about model planes, one of them happens to be a 1-in-a-
million match to a possible badware site, your browser sends the 4-byte hash
prefix to Google, it gives back the 4-byte prefix of the badware site it was
worried about which is different, your browser goes "Phew, good" and nothing
happens.

There's just not really an opportunity for tracking here.

~~~
tialaramex
Ugh, editing goof - that last bit of my explanation is wrong, Google sends the
whole SHA256 hash of the bad site's URL and your browser compares -that- to
the full hash it calculates for the model plane site. The four byte prefix
will be the same of course, if it wasn't we wouldn't get this far.

------
noobermin
Can this be changed to the normal twitter URL, not the mobile version?

~~~
marsrover
When I removed the "mobile" it looks exactly the same. I guess there is no
non-mobile version of Twitter anymore.

~~~
cannabis_sam
There’s a super-fast, no bullshit, mobile-friendly, absolutely wonderful
version of twitter you can get if you disable javascript... (i’ve only used it
through tor tho)

~~~
PhantomGremlin
I'm not sure about "super-fast" since, every time I visit twitter w/o
JavaScript, I'm forced to click thru the message:

 _We 've detected that JavaScript is disabled in your browser. Would you like
to proceed to legacy Twitter?_

And this annoyance persists. Click on a twitter link within the twitter
website and you get the message again. And again. It's definitely a dark
pattern meant to annoy you into enabling JavaScript.

------
seaish
Doesn't it send full URLs anyway when you're signed into Edge or Chrome to
sync your history?

------
octosphere
Which is why I always harden Edge after setting up Windows by disabling
SmartScreen

~~~
Insanity
You could just use another browser like Firefox. :)

~~~
ahoka
AFAIK Mozilla does the same.

~~~
jchw
Mozilla does not send full URLs to a third party. As mentioned in the thread,
they use “4-byte URL hash prefixes.”

Documentation: [https://developers.google.com/safe-browsing/v4/urls-
hashing](https://developers.google.com/safe-browsing/v4/urls-hashing)

(Disclosure: I work for Google but have nothing to do with the safe browsing
API.)

~~~
7373737373
Wait, this is not happening locally, against bloom filters or something?

~~~
est31
There is a first step which uses bloom filters, and if the filter gives a
result, it is hashed and the hash is sent to the google service to double-
check.

That being said, a four byte / 32 bit hash is enough to almost uniquely
identify a website. There number of 32 bit numbers and websites is roughly the
same order of magnitude. It's a problem without a good solution because if you
create many collisions then you also generate plenty of sites falsely reported
as phishing and which admin would want _that_ to happen on their site. If you
avoid creation of collisions, you have this identifyability problem.

There is this CRLite proposal [1] using layered bloom filters to stop reliance
on web services. Maybe it can be adopted for phishing sites, as well.

[1]:
[https://ieeexplore.ieee.org/document/7958597](https://ieeexplore.ieee.org/document/7958597)

~~~
gruez
>That being said, a four byte / 32 bit hash is enough to almost uniquely
identify a website. There number of 32 bit numbers and websites is roughly the
same order of magnitude.

Why not send less bits (eg. 24 bit) of the hash?

~~~
cyphar
The utility of the hash for a user and some potential attacker is the same.
Yes, passing 8 fewer bits means that they have 256 times more possible sites
you might have visited. But it also means that you get 256 times as many false
positives of websites being labeled as phising. This is what GP meant by it
being problem without a good solution.

One possible option is to do what haveibeenpwned does, where you give fewer
bits and then locally check. That would be a good improvement to the system's
privacy, but you probably want to avoid downloading the hashes of every
malicious website that starts with the given 3 bytes (I'd assume the list is
quite large) for every page load.

~~~
tialaramex
SafeBrowsing already does the thing you're describing as a "possible option".

Say I go to
[https://fakebank.example/security/login](https://fakebank.example/security/login)
and Google has decided all of fakebank.example is a phishing site.

My browser computes [among other things] SHA256('fakebank.example') and then
it snips off the first four bytes and compares that to a large dataset it got
from Google. It fetches updates to this dataset every few hours. Sure enough
the four byte prefix is present in the dataset.

So, we've got an alarm - it calls Google, but it doesn't tell them it's
thinking about
[https://fakebank.example/security/login](https://fakebank.example/security/login)
at all, it just tells them the 4 byte prefix. Google responds with a list of
full SHA256 hashes beginning with that prefix that it considers _right now_ to
be phishing. The list might be empty (maybe fakebank.example was actually a
Greek yoghurt company subject to a PHP 4.x attack, and they upgraded PHP and
removed the phishing site so now it's fine) but if it has the entire SHA256
hash we calculated then I get an alert telling me that my browser thinks this
is a phishing site and I might want to not visit.

------
kryogen1c
Not to defend Microsoft, but SIDs are non unique. I think theyre only
guaranteed to be unique per AD forest with the addition of the RID
subauthority. Lots of domains are domain.local or other such commonalities and
so may have duplicates. Microsoft would know this, so the intent does not
appear surveillance or data collection.

~~~
gruez
According to wikipedia, a SID looks like this

    
    
        S-1-5-21-3623811015-3361044348-30300820-1013
    

The 3 long numbers in the middle are called "Domain or local computer
identifier". Further down it says that each of those numbers are encoded as a
32 bit integer. Assuming they're randomly generated, that's 96 bits of
entropy, which is more than enough to uniquely identify every computer on the
planet[1].

[1] There might be a _few_ duplicates due to the birthday paradox.

------
gumby
This is one of the reasons I don't use chrome. Google tracks which search
links I click on any browser but I assume they simply track everything else
done with Chrome.

------
auslander
Not so easy to do this mitm trick with Chrome, it has google certs pinned
down. I think Chrome sends no less info to Google.

------
kerng
Poster might use preview versions, where Microsft does these things (not just
Edge) if I recall correctly.

------
thebeefytaco
Anyone know if "full URL" here would include a hash from the url?

------
reaperducer
Every time I see people on HN campaigning for Microsoft as some kind of
reformed tech company that has seen the errors of its past and turned into a
force for good, something like this comes up.

Same old Microsoft. Second verse same as the first.

~~~
alkonaut
The good part of Microsoft is the division responsible for languages,
compilers and cloud. Meanwhile an old Microsoft makes Windows, Office, etc.

While the latter has improved somewhat, only the former has reformed.

~~~
NullPrefix
>The good part of Microsoft is the division responsible for languages,
compilers and cloud.

Putting their telemetry in your compiled binaries is ... real good? Marketing
is the only thing that changed.

~~~
alkonaut
Yes. I have absolutely no problem with the way that works. It’s clearly
spelled out how it works and how it’s configured. Seems few who actually use
these tools have problems with this...

