Hacker News new | past | comments | ask | show | jobs | submit login
Chrome deploys deep-linking in latest build despite privacy concerns (theregister.co.uk)
242 points by mikro2nd 44 days ago | hide | past | web | favorite | 147 comments



Amusingly, I built a (private prototype of an) extension for functionality almost exactly like this a while ago. My goal was to be able to bookmark arbitrary long articles (or single-page books) “in the middle”—without the author providing an anchor permalink—and then come back to them where I left off, precisely by embedding my scroll location on the page into the fragment of the bookmarked URL. It worked pretty well, and mostly obviated my need for any service like Instapaper/Pocket.

I always wondered why browser bookmarks don’t just work like this, even if doing so required storing additional metadata outside the URL itself.

One thing that’s interesting about this functionality in Chrome specifically, is that due to the nature of Chrome’s PDF support, these URLs should allow you to deep-link into a PDF from outside it, which hasn’t really been possible before.

Still quite a way from Xanadu transclusions, but it’s a start.


For those confused by the "Xanadu transclusions" part, it's basically quoting documents through hyperlinks (somewhat like OLE embedding on Windows).

See https://en.wikipedia.org/wiki/Transclusion


these URLs should allow you to deep-link into a PDF from outside it, which hasn’t really been possible before

Chrome and other browsers with built-in PDF viewers have long supported the #page=XXX URL syntax, to go directly to a specific page of a PDF document.

This sometimes works only when the document is first loaded; if it’s already open in the window or tab, adding a #page fragment in the address bar won’t always turn the page.

See this Adobe document for other parameters, which aren’t as widely supported:

https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdf...


The quote about DNS seems wrong to me

""Consider a situation where I can view DNS traffic (e.g. company network), and I send a link to the company health portal, with #:~:text=cancer," he wrote. "On certain page layouts, I might be able [to] tell if the employee has cancer by looking for lower-on-the-page resources being requested.""

I thought DNS requests just get the domain, not the hash and not even the page requested.

But lets play along. How would sending a link tell you anything apart from if a user clicked said link? How would links to existing anchors example.com#h3-subtitle be any different?


Kind of an edge case privacy issue, IMO.

Imagine if the page has 10 parts. Part 1 has an image reference hosted on part1.imagehost.xyz, but loaded lazily. Each part has something analogous, an image hosted on a different domain, but loaded lazily (so, no requests before that part of the page is visible).

If you open the page and read from the top, the browser will open with part 1, Javascript will fire and tell the browser to load part1.imagehost.xyz/image1.jpg, causing a DNS lookup, HTTP(S) connection, and a request. And so on for each part (interestingly this way you can do a timing to see how fast the user is scrolling past each part).

But if the browser directly loads part 8 (the part with cancer in the example), and the Javascript only asks to load part8.imagehost.xyz/image8.jpg, the fact that the browser only asks the DNS for this hostname will reveal to the DNS admin that the reader followed a link straight there.

But if it's some health portal under the evil company's control, it seems like it'd be easier to write Javascript to fire an event to do server-side logging/send an email alert to HR if someone spends a lot of time reading the cancer section...


The privacy risk makes some logical sense, but I don't see why it doesn't also apply to regular old anchors that have been around for forever. That is, the same sort of information leak has already been possible for decades on sites that allow the URL to just have #cancer.

If this is really an issue worth solving, doesn't it need to be solved for all cases, both traditional #foo links and also these new pattern-style links?

I can understand that this new mechanism will probably increase how often it can happen, but blocking the new mechanism would only solve part of the problem. To solve the whole problem, you'd have to do something like turn off lazy loading.

Anyway, if the reason the person is visiting the page is to read the section about cancer, aren't they usually going to manually scroll to the section about cancer a lot of the time? So the resource will still be loaded, just a little bit later. The scroll bar is moving to that position either way. The only information leaked is who moved it, the referer or the end user.


(Most insightful comment so far.)

I've really been missing this kind of functionality. Modern sites are built by people who apparently don't know or care about HTML name anchors. There is still a need to link to specific parts of pages, particularly in online discussions.

The privacy aspect seems insanely overblown to me, for the reasons you outlined.

I do understand why Brendan is using this moment to criticize this. It's on-brand (his Chromium fork has a privacy angle), and it is going to get him exposure because the supposed bad guy is Google. Don't get me wrong, they (Google) are not "good guys" by default any longer. This thing though? Not so sure. I think it will make things more good than bad, on average.


Yeah, I agree that even assuming someone is making the right decision, it doesn't necessarily make you feel good about the fact that it's theirs to make.


One difference is that today's page anchors are only put there by the page author, so, since all authors are cracker-jack security experts, they would not have made an anchor available in such a sensitive part of the document, since it opens their readers up to this risk.

As you can tell, I think that the difference is a real, technically true difference, but the implication is a bit dumb, since authors do not have this kind of thing in mind when deciding whether to anchor. You might as well be mad about lazy image loading too. If a browser is smart enough to only load images near an anchor, then this same risk would have been opened up when that was introduced. (The author wrote the anchor before lazy loading, so they correctly perceived no risk, then lazy loading turned it into a risk.)


> One difference is that today's page anchors are only put there by the page author,

AFAIK we can link to any id in the page, not only to anchors. Strictly speaking you're still right because the author creates those ids but some of them are automatically created by frontend or backend frameworks.


That's true, though I often forget it. But surely those cracker-jack security expert HTML writers don't forget! Or use such privacy insensitive tools as the ones you've mentioned.


(Too late to edit, so replying to myself instead.)

I'm going to backpedal a bit on my point about manual scrolling being the same thing as automatically scrolling directly there via anchor.

The scrollbar can be at various positions, and if an anchor takes you there automatically, it jumps to position N, never passing through 0 .. N-1. Whereas if you manually scroll, you pass through all of 0 .. N. So whatever resources are above the target position would not necessarily be loaded with an anchor but probably would with manually scrolling. So the externally observable wire behavior (DNS, IP addresses you connect to, etc.) may still be different.

So if you manually scroll, the adversary can infer that there's something significant about scrollbar position N. It might just be that there's where you lost interest, or it could be that that's what you found interesting, but it's not giving away as direct a clue as jumping straight to position N.


I read it as the opposite, where the evil party doesn't control the browser but they can sniff traffic on the network and now they can see which part of a page you requested instead of merely which page.

I am not a web dev so I'd love to hear someone with more knowledge spell out some other privacy concerns.


That's also how I understand it. It requires a very specific and uncommon set of circumstances, but it could leak info to someone observing.

But also I think the main issue "pes10k" has is that it is adding a new attack vector in a backwards-incompatible way. So a sensitive website which is doing these JS lazy-loading things was "safe" last week, but now can potentially leak information in a way it couldn't have with earlier chrome versions.

In practice though it's hard (for me) to think of a website working this way. I understand the idea of a network admin sending the CEO a link with the text-fragment "cancer" in it. And if that fires off related requests unusually, I can assume it auto-scrolled to "cancer", and the CEO has cancer and sell all my shares. But what sort of website will dynamically contain the word "cancer" for patients that have cancer and is also doing fancy lazy-loading stuff? I'm not a security expert though and I know people can are much cleverer than me when it comes to exploits.


You're very right, what is the chance that this set of circumstances occur anywhere on the web? It seems like the person describing the scenario was just pulling a theory out of thin air in hopes to convince someone how terrible this is...


The github issue has more actual examples. The important bit is the "looking for lower-on-the-page resources being requested." bit.

You can't necessarily tell anything from that lone DNS request. But after loading the page, the user's browser will go on to send requests for elements on the page, and anything JS wants to grab based on position, which can be a channel for information.

Example: 99% of your employees' DNS requests to the company health portal are followed by DNS requests to the image CDN a few milliseconds later. However 1 specific request to FakeCompanyHealth.com is immediately followed by a DNS request to MayoClinic.com, because it turns out the web page dynamically loads an iframe to contact the Mayo Clinic once you scroll to the Cancer subsection. You could then assume based on timing that this user clicked a cancer-specific anchor link.

A more realistic example IMO is the twitter friend leak one provided in the Github thread. Send an anchor like "twitter.com#:~:text=@handle" and see if their page-load matches the standard twitter homepage load, or did their browser scroll them halfway down the page and load additional stuff? If so, you can assume these 2 users are friends.,

https://github.com/WICG/ScrollToTextFragment/issues/76


You don't send URLs or query parameters to a DNS server. Only a hostname. Also, it's usually cached after the first lookup. Sniffing the network will only requests to that IP. Hostnames, URLs, params and responses are all fully encrypted if they're on https.


Hostnames are not encrypted in HTTPS requests. They are sent in plain text via SNI (Server Name Indication). There is a new way of handling this called ESNI that does encrypt it, but I'm not sure how widely that is supported.

Edit: Cloudflare has a page to check if ESNI is supported by your browser. I tested it in the latest version of Firefox, Chrome, and Safari and it failed the test in all three.

https://www.cloudflare.com/ssl/encrypted-sni/


Ooooh, you're right. I didn't think of SNI.


Does this require you to intercept their traffic on the network? If not, how do you know what their page load looks like?


> A more realistic example IMO is the twitter friend leak one provided in the Github thread. Send an anchor like "twitter.com#:~:text=@handle" and see if their page-load matches the standard twitter homepage load, or did their browser scroll them halfway down the page and load additional stuff? If so, you can assume these 2 users are friends.,

It's still not realistic at all unless you can already snoop on them in some way.

The starting assumption here is "I assume I already violated someone's privacy in a massive way." Like, ok, sure you can violate it a little more in some situations maybe. But does this introduce any new risks in situations where user privacy hasn't already been so massively hijacked?

And since Chrome, along with others, are moving towards encrypted DNS, the feasibility of getting these DNS lookups is even less plausible than it is today. You need to actually be able to snoop on all the target's network traffic.


A sibling/top-level comment has a link to the security doc: https://docs.google.com/document/d/1YHcl1-vE_ZnZ0kL2almeikAj...

I think what that is saying is that, say evil.com starts embedding some sensitive site. It can't normally see the contents of that site, b/c it would be cross origin. (Or any other context where I can load a resource, but not see the content.) If I want to know whether some content is or is not in that sensitive site, I use the scroll-to-fragment to scroll to that text: either it will find the text, and scroll to it, or it won't; evil.com won't know the result of that directly, but it might be able to measure timing/CPU work done, and from that, infer the result: whether that text was or was not in the page. From there, repeat the attack to start brute forcing the content of the page. W.r.t. DNS, if the scrolling or lack of scrolling initiate two separate sets of DNS requests that I can observe, then that's one possible way to leak the information on the page.

At least, that's how I'm reading that security doc; this is fairly different from how other responses to your remark are interpreting it, but I wonder if they were only looking at TFA, which really was not very clearly explaining anything.

The security doc also mentions "Restrict feature to user-gesture initiated navigations", which would completely stop the iframe attack I've outlined above.


The idea of that quote is that using the anchor would only scroll to that part of the page if the word "cancer" is present, and if the page uses lazy-loading then it would fire off more requests than it would if the word wasn't present. It's a bit of a stretch though.


also #fragments aren't sent to the server at all, unless this changes that (a major major change if so)


The google document being used talks about crossing an origin-boundary.

So website with bad script injected is loaded by the user and is able to make requests to a logged-in banking website with time-based/scrolling attacks.


Does this matter when the server (well, multiple servers when you throw in ads as well) are running their custom Javascript on the client, which has access to not only the viewport coordinates, but the URL itself?


does this add an additional vector to that case?


Right on. This article is bull


And aren't most browsers (and for sure Chrome) loading the entire page anyway? I mean maybe if you have a slow connection, but for most people with a fast connection it should load everything to avoid that when scrolling down you have something missing.

I really don't see the issue, to me is a useless argument.


A lot of sites do lazy-loading (but even then it's good practice to preconnect to third-party domains) and chrome even implemented native lazy-loading for images and iframes recently.


It seems more plausible that I could send you a link to portal.site.example/you-have-cancer-what-now and if you don't have cancer it redirects to portal.site.example/home, if we're willing to consider the infiniverse of possible health care portal designs.


It just seems to me like not that big a concern.

eg on a company network, it would be common to install an SSL cert on all devices on the private network for things like web proxies, CASBs, or other security policy enforcement measures.


I think a lot of the comments in this thread are trying to evaluate the privacy concerns on merit, which makes sense, but IMHO it's also instructive to look at the fact that other W3C members don't want this included and Google is able to do it anyway. Perhaps that should be the bigger cause for alarm than any one feature.


In Google's defense, and basically because I like this particular feature, I have to point out the obvious ... most new browser features happen before and not after standardization.

The standardization process is meant for reaching agreement among browser makers and to provide clear specs for those features. But it isn't meant for exploration.

That has always been the case. Even Mozilla does it. And if not, then you can't really get the browser vendors to agree on mostly anything a priori, without market validation.

Standardization works as a refinement of already existing work. E.g. Google came with NaCL/PNaCL which was basically their own ActiveX, Mozilla came with Asm.js in response and the end result was Web Assembly, which is objectively better than both and that wouldn't have happened without that prior art.

The W3C doesn't guard what gets implemented in browsers. It never did. And if browser makers willingly participate in the standardization process, we should thank them, because they can always stop doing it. Because really, retreating from the W3C would mean absolutely nothing for computer illiterate people.


To elaborate, the process for standardizing browser features today is whatwg managed, not w3c. And it's basically "write a proposal, solicit feedback, implement the feature in one browser, and if it's ever implemented in 3 browsers it's part of the standard".


WHATWG is mostly HTML and DOM/browser api:s, ECMAscript is by TC39, CSS is still in W3C, URI schemes (which this is about) are in W3C, as seen in this draft being hosted by WICG, which as far as I can tell is a W3C group.


True, but since the other parties don't really care much, what could we do really? (The other big corps like Apple and Microsoft - and let's say FB because of their WebView browser. Are all also knee-deep into their own walled gardens. Not likely to take standards and consumer protection seriously. Eg. Apple will just laugh and brand itself as the real privacy option - for those who can afford it. And geeks can ultimately just use a fork of Chrome/Chromium if they want.)


How would W3C stop this?


Why do you think the W3C is relevant?


As the article says, ScrollToTextFragment is a W3C spec.


It's not.

But that's besides the point. The W3C haven't been in the driver's seat for a very long time now when it comes to web standards.


Well, not spec, but proposed spec. [1]

Driven by Google, but under the auspices of W3C.

> The W3C haven't been in the driver's seat for a very long time now when it comes to web standards.

That was exactly ghostpepper's point.

[1] https://github.com/WICG/ScrollToTextFragment


The W3C doesn't even try to drive features after one browser has implemented them and other browsers are considering them - that's the job of WHATWG.


WHATWG is mostly HTML and DOM/browser api:s, ECMAscript is by TC39, CSS is still in W3C, URI schemes (which this is about) are in W3C, as seen in this draft being hosted by WICG, which as far as I can tell is a W3C group.

At least that has been my understanding of the situation.


We are in violent agreement.


With the risk of falling into a loop:

https://news.ycombinator.com/item?id=22386665


Wow, what a dumb justification for the privacy concerns. URL hashes aren't even sent to the server, let alone have anything to do with DNS. We already put sensitive stuff in URL hashes, like OAuth tokens.


Read again. Their point isn't that the fragment is sent to the server or DNS.

Their point is that you can infer the existence of a certain text on a page the user views (even via https) by sending them a link like this and observing the browser loading resources that would be at the bottom of the page. The only reason the browser would do that is if it scrolled to the bottom, confirming the text you put in the link is on the page.


Yeah -- it's frustrating that the privacy concerns brought up in the article are just blantantly wrong. Especially because there a legitimately interesting issues in the security write-up [1].

On the other hand, I suspect they probably won't get the mitigations for those real issues right the first time. It's a complicated system they're building, and I think someone will find a clever way to break it.

[1] https://docs.google.com/document/u/0/d/1YHcl1-vE_ZnZ0kL2alme...


This sentence from the doc referenced sums it up well I think:

> A naive implementation of this feature may allow an attacker to determine the existence of textual content across an origin boundary.

Security & privacy implications of that seem obvious, e.g. extracting an account number (or other sensitive information) from a page by searching for successive prefixes.


Just to clarify, I do think there are real privacy concerns with a naive implement of this feature. Allowing sites to search across origin boundaries would be a huge issue!

I just don't think that the DNS query issue mentioned in the article is the thing to be worried about. It's just so much less severe than the fact that DNS isn't encrypted, and requires essentially the same fix.


Careful about calling something a dumb justification. It's often the simplistic-seeming features that expose unforseen privacy issues, in large part because today's internet is such a complex place. In the end, it may turn out that privacy concerns are unjustified. But dismissing them out-of-hand isn't wise.


Quoted from the article:

"Consider a situation where I can view DNS traffic (e.g. company network), and I send a link to the company health portal, with #:~:text=cancer," he wrote. "On certain page layouts, I might be able [to] tell if the employee has cancer by looking for lower-on-the-page resources being requested."


Yeah, looks like I read it too quickly.

But wow, what an absurd amount of work to almost find something out when you already have access to the entire network, and apparently the WiFi is not secured at all or your targets are all plugged in and on your switch. This is like complaining about a weak combination on a padlock used to secure your screen door.


It sounds like the feature's enabled across all websites - so it could break security & privacy expectations a user has about existing web pages.

It requires that the attacker has DNS request visibility and expects that a user will visit a vulnerable page - not necessarily huge barriers to entry.

This could be exploited by targeting a user with an advert that appears in the footer of a webpage, for example, and then obtaining DNS logs for that user after-the-fact.

It's a useful feature certainly; the concern is mainly around the fact that this has essentially been self-certified by the development team and rolled out.

With more ISPs looking to monetize DNS logs, and the future of DNS infrastructure looking a little uncertain at the moment, there does seem to be risk here given that it could become widely deployed.


But couldn't you now just buy an ad, send them to ihavecancer.example.com and find that in the DNS logs later? You don't even have to own the domain. You could use that exact one and then just find the failed resolution in the logs.


The article says the folks at Google have this short docs to address the concerns:

Scroll-to-text Fragment Navigation - Security Issues

https://docs.google.com/document/d/1YHcl1-vE_ZnZ0kL2almeikAj...


The doc addresses some concerns, but leaves what I think is the biggest issue (disclosure of information on a page, e.g. "send user to URL that searches gmail for from:hiring@snap.com; look for 'No emails are found'") completely unaddressed.


I don't have access to this document, is there a mirror?



What? A DNS lookup doesn't include anything but the hostname and anything following a hash is never sent with a request from the browser.


I think the idea there is that there would be lazy-loaded content from another site that would only load when someone scrolled far enough to see the highlighted word, which will automatically happen with this feature.

The title of the Forbes article is too hyperbolic for my tastes and while this could be a security concern in very specific situations it's being overblown.


I think that would need to be a designed attack, not something that would apply to 99% of websites. The only legitimate resource (that could be used as a canary) being loaded far down a page would be an image (and that kind of requires it to be lazy loading as well).


Latest Chrome defers out-of-viewport images on its own, or at least tries to.


I just give a unique key to each user. That user does a lookup of key-<scroll-position>.tracking.example.com when they get to that part of the page. As they scroll I gather data via dns.

Source: was my job to track people and what they do.


This Google Doc (posted earlier in this thread) seems to desribe possible attack vectors far more clearly than either the Forbes or Register articles (second being better than the first as well):

https://docs.google.com/document/d/1YHcl1-vE_ZnZ0kL2almeikAj...


It seems that “privacy researcher at brave” is a job requiring no domain knowledge, only plentiful contact with credulous journalists.


I don't want to derail this thread, but that's kind of Brave's whole MO. They pretty much constantly put out or source anti-Google content; there's often a grain of truth somewhere in many of them, but not always and rarely reported honestly.


As you broaden your domains of expertise, it becomes increasingly clear that most news media is astroturf.


I avoid Chrome due to privacy concerns, but I actually like this feature ¯\_(ツ)_/¯

Many times I'm looking for an existing anchor for linking to a certain section of the page, because the author did not bother to create a table of contents and many times that anchor ID is missing.

Also — I was under the impression that anything that comes after # is not sent to the server, being a fragment meant to be processed entirely by the client.

What am I missing?


I think this paragraph sort-of fills it in:

> "Consider a situation where I can view DNS traffic (e.g. company network), and I send a link to the company health portal, with #:~:text=cancer," he wrote. "On certain page layouts, I might be able [to] tell if the employee has cancer by looking for lower-on-the-page resources being requested."

The impression I got was that you'd need a combined hack with javascript reading the page position when you load the page.


Let me give you an example. It will be contrived for simplicity but there are circumstances that will seem more real-worldish. Lets consider the following scenario:

- you visit a page at example.com that contains private data about you (e.g. bank account number, medical conditions, what have you)

- this page loads external resources only when they are scrolled into view

- it also hotlinks some image below the fold that a potentially bad actor can see requests to (maybe the image is on their webserver, maybe you're on a corp network and they can see dns resolutions, whatever)

So now I link you to: https://example.com/privatepage#:~:text=Account%201

Did the image get requested? If yes, I know your account number starts with 1; if not, I know it starts with something other than 1. Rinse and repeat.


How are you determining whether or not the resource was requested? If you can hijack my network traffic you've already owned me pretty badly and don't need this silly game in the first place. If you haven't, this doesn't help you get that.

Encrypted DNS is a thing, so snooping on those is non-trivial and not at all something you can be assumed to have access to.


> Rinse and repeat.

How exactly would you as an attacker perform this rinse and repeat action?


and why would a bank site hotlink to an asset owned by someone else?

All the security concerns I've seen for this seem quite contrived to me. And I say that as someone who assiduously avoids google products*

(*alas, except at work, because I can't really choose that)


I think this is the perfect storm.

1) Privacy-conscious browsers are trying to get exposure, so they are stretching an extremely narrow privacy risk into something extreme.

2) (I also believe) media companies are worried this will rob them of ad exposures, so they are incentivized to cover this as something scary.

3) The "bad guy" is Google. This means the amplitude of the story is immediately 10X larger.


> Also — I was under the impression that anything that comes after # is not sent to the server, being a fragment meant to be processed entirely by the client.

You aren't missing anything. The article is wrong.


The article isn't wrong, by forcing scroll to a certain location and embedding for example images that only load when scrolled into view the web server can know that someone was linked to a particular section of the website with the new deep-linking feature.

This is especially the case if the browser scrolls and doesn't load all prior resources automatically because they were never "in view".


You can already measure where the visitor's view is and notice the sections at which the visitor stares the most.

The scenario you've given is also very similar to a scenario in which the user opens the web page and then manually searches for "cancer". Or scrolls down rapidly until they notice the word "cancer" and then stops. I do that all the time for example.

So I don't see how excluding such a feature protects the user in any way compared with the status quo.


Seems like a useful feature to me. You could just train yourself not to click on hyperlinks with the aforementioned tag. And like another comment says, it shouldn't be a privacy issue unless you are running over HTTP, since the resource URLs are encrypted.


I agree that it's a useful feature and not a privacy issue, but I will say that it's nigh-impossible to avoid clicking links like these. I already try my best to strip out tracking parameters in query strings, and avoid shortened URLs. But they're simply everywhere, and sites/email try their best to hide them.


It's in the URL fragment (the stuff after the hash), so it's never even sent to the server.


IE6 is back, baby! All that’s left is deep integration with some proprietary Google “standard” (I’m thinking AMP) and it’ll be the early 2000s all over again.


We were in WAY worse shape back then. IE6 was closed source, under-resourced (after they gained market dominance), and MS had every incentive to suppress the progress of web technologies once they ran the table.

I’m not saying we shouldn’t all be rooting for the other players today and keeping a close eye on Google, just saying there is no way for Google to do nearly as much to hold back the web as MS did back then, even if they wanted to.


I'm actually looking forward to WebAssembly but I'm just imagining wasm applications tying into ChromeOS and we have ActiveX all over again.


It'll be like the early 2000's if switching to another browser was most likely just IE6 reskinned.


Microsoft Edge, which is built on Chromium, would fit this parallel nicely.


And brave, and vivaldi


And Opera.


There's only one competing browser engine left, Firefox's, with like 5% market share. Except if you have an Apple device, every other option is skinned Chrome.


No it isn't. It's skinned Safari which is WebKit. Chromium forked from WebKit years ago and they have diverged. And Apple certainly doesn't include all of the phone-home stuff that's in Chromium. It's not even remotely the same.


The example that the security researcher gave seems moot: the same thing would happen if the employee simply scrolled down on the page manually, no? And we already have the ability to link to anchors on a page, and that's not considered to be a privacy issue. Can someone explain how this is actually a meaningful privacy issue?


Say you have a long page that lists “Pre-existing conditions” at the bottom, and near that section is also a unique image or other external asset. If you click on the link and cancer is in your list, the page will scroll and load the related assets instantly. Without cancer in your list, you’d only load those assets through human scroll, which would most likely look different timing-wise. Thus you can determine with high probability whether your target has cancer listed (if you have access to DNS records, as mentioned in the example, and the target is using a browser that delays offscreen asset loading - like this same latest Chrome).

Whereas anchors tend to be generic (#preexisting-conditions), this new scroll behavior can be used to create an existence check for any user-specific text on the page (in carefully crafted scenarios). There are probably other variations that could be devised on this concept, since it allows indirect page interaction that can bypass authorization walls (since the browser would transmit cookies normally and such).


This is absurdly difficult to pull off with very little payoff. You'd need to be sniffing the traffic of a network. Then craft a URL that contains a unique image near your text fragment query. Then somehow send that URL to the victims on your network. _Then_ check how long it takes for them to load that image upon clicking the URL.

I'd like to call myself a privacy advocate, but this is just absurd. The pros obviously outweigh the con of a very precise and targeted attack that leaks a predetermined bit of information.

If you've got their DNS records, you've already violated plenty of their privacy to get the information you want. No need for this text fragment "attack."


>I'd like to call myself a privacy advocate, but this is just absurd.

Yeah, my read of this is that it has nothing to do with privacy, people who want to block change for some reason have just learned that "i have privacy concerns about google" is a catchword that will get you some press coverage, and are essentially hijacking the actually valid and important privacy concerns to push forward their unrelated opinions (and promote their browser product).


This is my take on this, too.

As a FF user, I'd love to see FF implement this too!


FF is considering implementing something in this space, yes. Note that there are at least two proposals for how this could work: one that is already deployed via a polyfill on various sites and the Google one. They have various functionality tradeoffs, and unfortunately Google decided to make up a wholly new thing instead of improving the existing thing...


For what it’s worth, I wasn’t trying to say that this is an easy attack, just explaining the supposed outline.


Ah, and thank you for that! I had read the article a few times, but I didn't fully understand the attack until I saw your comment.


If it's https, that's all fully encrypted. The network only sees IPs. The hosts and paths are all encrypted headers.


But if it had the anchor 'cancer', couldn't you infer the same from the url alone? These examples seem highly contrived.


Anything after a hash in an URL is never sent to a server (it's only within the context of the browser).


Example from above: Twitter.com/#:~:@user

This would behave differently if the browser's user is following @user than otherwise.

This convinced me not to dismiss the issue. I'm not sure how that leaks over a side channel. But i am sure there are governments who take active measures to shape the Internet to their liking - and twitter has played a significant role in the past inside countries governed by them to oppose these governments.


I'm not convinced this is a problem. In terms of ways we leak privacy on the web, this seems very low on the priority list.


Seems like a great excuse to create click-bait headlines – not much else.


Confused. As I understand it, anything after the # in a uri isn't sent over the wire.

So the only way someone could see that you're navigating to a specific fragment is some sort of deep chrome logging, or chrome plugin, etc. And if that's the case, cat's already out of the bag for everything you do already.


I was confused as well. I think (from some other comments) that the issue is that you can scroll to text that is specific to you and others might not have on their page (e.g. 'cancer' on some medical page) and then somehow gather from the requests that you scrolled there. It seems pretty hypothetical, but I can see the issue with forcing a scroll depending on what text is on the page, I guess.


There are plenty of reasons one would want to switch away from Chrome, but potential abuse of ScrollToTextFragment must be the most contrived one.


Just like everything else. Millions of users won't understand what this is talking about, or care for that matter.


Hmm. The use case of linking to a specific location in a document, really appears to be more of an issue with websites not actually using ids that could be anchored to. If every paragraph had an id, you could get 95% of the way to the desired functionality by just making it easy to copy a link to e.g. www.example.com/foo#paragraph4.

At this point, Wikipedia is one of the few websites I use regularly that actually works in this way.


No doubt because most large web corporations have completely switched to javascript for their anchors (ie, #) and no longer use HTML spec anchors that actually work. I'm looking at you Microsoft Github.


> I'm looking at you Microsoft Github.

Where don't anchors work when javascript is disabled? (With admittedly very brief testing just now, I couldn't find any such cases.)


On every single repo index page that has "anchors" that I've tried over the last year. The markdown is now interpreted different so anchors are class="anchor" and not real anchors. Maybe you didn't fully disable JS? Make sure JS is disabled before you load the page and you've cleared your cache (ctrl-f5 in FF-alikes).

I just went to the most recent github tab in my browser session and found one instantly: https://github.com/quiet/quiet#dependencies


Can't say I've noticed it, but then I'm going around with Javascript enabled, so it's not like I would.

That said, the github one is a sort of interesting case, insofar as they're doing it to avoid user-generated content clashing with their page-chrome ids... while still preserving a readable URI that matches what the content creator expects. Which doesn't seem like an unreasonable case, to me.

You'll note that there is a real anchor on that link -- https://github.com/quiet/quiet#user-content-dependencies -- which works without Javascript. It's just rather inaccessible.


You're right. I was wrong. Sorry!

I had tested linking to specific comments in issue threads, which does work with JS disabled (unless I've completely messed up...), and assumed that linking to parts of documents, as less complex/application-y would obviously work, but as you've pointed out it doesn't.

That's highly disappointing.


Can somebody explain to me how is this more of a privacy concern than the rest of the url, often containing the title sluf, date, id, tags and/or filter parameter of the page ?


Is it possible to agree with the assessment of severity of this privacy issue without being regarded as supportive of the way Google uses its monopoly to force certain webstandards on the web?


I don't get it. This is the least privacy reducing feature there is. It's a great feature.

Wouldn't be nice to have people worry about this while hundreds of other actual privacy reducing features go unnoticed?


Was it wrong of me to be amused when I got to the end of this post on internet privacy concerns and Google, only to see a link at the bottom encouraging me click in order to follow the author on Facebook?


And I couldn't read the article because of my adblocker


Could someone explain this a bit better? I've read two articles on this this morning and I still don't understand what the privacy concerns are with this feature. Thanks!


The feature in question is the ability to use fragment (# in a URL) to link to matching text rather than just an ID. Here is the key line with the so-called privacy concern.

"Consider a situation where I can view DNS traffic (e.g. company network), and I send a link to the company health portal, with [the anchor] #:~:text=cancer. On certain page layouts, I might be able [to] tell if the employee has cancer by looking for lower-on-the-page resources being requested.”

So they could send someone a link to a page with a fragment, trick them into clicking it, and matching text and watch for DNS requests being lazy-loaded to learn the fact that they clicked it.

It's convoluted nonsense.


Last week I switched to Firefox nightly on desktop and Firefox preview on mobile.

It took some configuration, but I'm not regretting it at all.


For those concerned, I believe you can just disable this behaviour via this flag: chrome://flags/#enable-text-fragment-anchor


They should have used XPointer instead of reinventing the wheel. It has a much more effective syntax.


i don't think you can arrive to any conclusion based on analyzing traffic/dns reqs made alone.

How can you tell a user didn't jitter scroll and caused more content to show up?

What if it was a 404 page.

How's the intruder able to see the query param anyway? if it's non secure all bets are off.

The same argument used revolving around text can be made about words in a domain name.


What was wrong with just `example.com/#section` instead of `ScrollToTextFragment`?

That seems to complicate things for no good reason.


If I'm understanding his correctly, google actually changes the HTML to have these anchor tags?


No, it's a selector, but one that can include the actual text in the page. So instead of having to select based on a pre-existing anchor, like `http://site.com/page#anchor`, you can link to `http://site.com/page#understanding+his+correctly` (though that's not their format). No need to alter the markup.

Though, I'm curious how this works when text is broken apart over spans or divs.


All this fuss should have been about lazy loading of resources. The leak is there


Why is this feature so interesting it got implementation priority? Does this enable some commercial activity by Google?


Uhhhhh, you do realize the anchor (#...) is not sent to the server, right?


The attack depends on measuring the timing of resource-loading that in turn depends on the free-text search the attacker can get the victim browser to do on his behalf.


As a developer, how do I lend my support to stopping google from trying to steal the web?

Something actionable, something specific to my semi-unique position as a developer?


As a first step, at least: Use and contribute to Firefox (or other non-Blink-based browsers, but there aren't many to choose from...), and encourage those you support/influence to do the same.


Only use web technologies supported by the majority of the browsers, not just the majority browser. Basically, use vanilla html5. You can make a perfectly decent website with it. You can play with the shiny (chrome) toys but they should remain curiosities.


Use Firefox. And not only use Firefox, but also promote Firefox. Make sure that all your browser screenshots, presentations, etc show Firefox.

As a developer you, like many of us might be the family tech support. Switch them to firefox.


how exactly is linking to a custom location on a document "stealing the web"?


Use Firefox. Try to convince others to use YouTube. Never use features only available in Chrome/chromium browsers.


Use Firefox, use Safari. Test in Firefox, test in Safari.


Imagine telling someone in 2003 or so that the most popular web browser fifteen years later would be made by DoubleClick and imagine what their reaction might be.


Something like "wow, software development will improve so much! You will make a web browser in a double-click."


The year is 20XX. You return to your slave cube to find what you assume is a PC running "Plan 9 from Microsoft". You are shocked at it's convenient features:

The world's information is merely one click away.

The world's browsers are simply two clicks away.

Dispensing candy is an easy ctrl-triple-click.

Bootstrapping a compiler is just the konami code backwards.

Printing the screen is really simple as hitting PrintSrc and loading your printer with $300 cyan ink cartridges.

However turning the computer off is a week-long process that involves arguing with the built-in HAL9000


Will never use it. Read what Google did with JS websites and recommendation to implement hashbang and escaped fragment to provide prerendered version.

You wont. Because it didnt work and killed SEO. Than after years they silently removed documentation about it. Not providing any way to transition to other ways without damaging SEO ever more and moreover, it was impossible to do anything about it because server does not receive data after hash.

And now Google tries to force new standard that is technically broken. Dont use it. It is a trap. It will hurt your SEO sooner or later!


EDITED and hour later: A couple of people have pointed out it's back, they've made some changes and marked it as public now.

There was another article on this same topic this morning with this:

"Google's engineers have not ignored worries about the security and privacy risks. To their credit, they've gathered them together into a single document and they've clearly been engaged in understanding what people are worried about. It's just that they've concluded the concerns aren't that big a deal or can be dealt with to their satisfaction."

That Single Document is here, and I went and had a read this morning, but now I get permission denied, I guess it was getting too much attention?

https://docs.google.com/document/d/1YHcl1-vE_ZnZ0kL2almeikAj...

I wish I would've save a copy now. It hand some decent details on the various vulnerabilities and how they are handling (or ignoring) them.


It's available and listed as (PUBLIC) in the title now.


It was actively being destroyed when I checked it.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: