Hacker News new | past | comments | ask | show | jobs | submit login
Where Am I? NYTimes or Google? (theinternetbytes.com)
1130 points by rwoll 10 months ago | hide | past | favorite | 355 comments



Yes this has been a big issue for a very long time now. Google wants to push a release where it will display the hostname of the amp site even if the content is being served from google.com[1].

Mozilla (and Apple) are strictly against it and thank god for Mozilla. If Google had a bigger market share this would already be something we would have been living with. I'm sure there are better sources for this, but here is the first result:

https://9to5google.com/2019/04/18/apple-mozilla-google-amp-s...


Being completely against AMP for obvious reasons, I'm personally not against signed exchanges itself, this feature could spawn a whole new class of decentralised and harder to censor web hosting, that sounds like a great addition.


Going to also spawn a whole new class of semi-persistent malicious pages (say created via XSS) that once signed and captured can be continuously replayed to clients until expiration.


What? The signing allows the content to be mirrored in other locations with guarantees about consistency. It doesn't imply anything more about the content than SSL does.


If Google AMP is acts like a cache of content, then cache poisoning attacks are a concern. How those cached items expire will determine how long an attacker who poisons the cache can serve malicious content.


At that point, wouldn't the approach be to defend from the client side? Namely, we can instruct the client to not trust any content sign by such-and-such keys. This can be done by pushing out a certificate revocation, etc.


This would be pretty cool (remotely revoking signed exchanges), however it's not part of Google's proposal - Unless every previous security consideration about caches is accounted for in SX's, it's probably not safe to start faking the URL bar.


Certificate revocations do apply to signed exchanges.


Why does Archive.org get a pass on this one? Signed responses mean that there's a very clear way to leverage the browser's domain blacklisting technology to stop the spread of malware, which isn't presently possible for any content mirrors on the web.


Archive.org makes it clear you are on archive.org. The URL shows archive.org. The page content shows archive.org at the top. [1]

Google AMP doesn't show Google on the page. Google is pushing for the URL to show the origin site's URL instead of Google[2].

If an attacker poisons a nytimes.com article served by Google AMP, how does a browser's domain blacklisting help? Block google? Block nytimes.com? Neither makes sense.

1. https://web.archive.org/web/20050401090916/http://www.google...

2. https://9to5google.com/2019/04/18/apple-mozilla-google-amp-s...


I believe you might be misunderstanding the idea behind signed exchanges. To be clear, Signed Exchanges are how AMP should have worked all along.

example.com generates a content bundle and signs it. Google.com downloads the bundle and decides to mirror it from their domain. Your browser downloads the bundle from google.com, and verifies that the signature comes from example.com. Your browser is now confident that the content did originate from example.com, and so can freely say that the "canonical URL" for the content is example.com.

Malicious.org does the same thing, and the browser spots that malicious.org is blocked. At this point it doesn't matter if the content came from google.com, because the browser knows that the content is signed by malicious.org and so it originated from there.

Hope this helps clarify. Obviously blacklisting isn't a great security mechanism; my point is just that signed exchanges don't really open any NEW vectors for attack.


I think the concern was more that if I can XSS example.com, Google is now serving that for some period of time after example.com's administrators notice + fix this. (In the absence of a mechanism to force AMP to immediately decache the affected page(s), that is.)


Yes, at what point does example.com lose control of their content and for how long?


I'm following.

Imagine that example.com builds the bundle by pulling data from a database. If an attacker can find a way to store malicious content in that database (stored XSS) and that content ends up in a signed bundle that Google AMP serves (similar to cache poisoning) then users will see malicious content. When the stored XSS is removed from the database, Google AMP may continue to serve the malicous signed bundle. So an extra step may be needed to clear the malicious content from Google AMP.

How exactly the attacker influences the bundle is going to be implementation dependent, so some sites may be safe while others are exploitable.


Signed exchanges can only serve static content, so it's not clear what you could do maliciously.


I think most of the comments in this thread mean "malicious" in the sense of injecting malware (say, a BTC miner) or a phishing attach or something into the signed-exchange content. However, you also have to consider that the content (text, images) itself could be "malicious", in the sense of misinformation.

If, purely as a hypothetical, Russian operatives got a credible propaganda story posted on the NYT website 24 hours before the November elections, and an AMP-hosted version of it stayed live long after the actual post got removed from nyt.com, I'd certainly call that "malicious". Of course, just like archive.org, I suspect that in a case as high-profile as that, you'd see a human from the NYT on the phone with a human at Google to get the cached copy yanked ASAP, but maybe on a slightly smaller scale the delay could be hours-to-days, which is bad enough.


XSS?


Along with signing, we need explicit content cache busting and explicit allowed mirrors list (which can be revoked instantly). Then it would be at par with TLS + current cache busting mechanisms on top of TLS.


As long as javascript on the page has some way to inspect the signatures and where it was delivered from, you can implement cache busting, allowed mirrors, and invalidation yourself however you please.


Not continuously. The signed content includes an expiration date, which the publisher controls.

This expiration can also never be set more than 7 days in the future.


I don't see how this tech would further help to make malicious pages created with XSS, any thoughts? It sounds like it's the same issue with or without signed exchanges.


The point wasn't that this technology would uniquely enable XSS attacks, but rather that it could allow malicious actors to persist particular attacks for the duration of the validity of the signed content. Any brief vulnerability in a website now becomes serializable. They considered this already. Look in the draft "6.3. Downgrades":

"Signing a bad response can affect more users than simply serving a bad response, since a served response will only affect users who make a request while the bad version is live, while an attacker can forward a signed response until its signature expires. Publishers should consider shorter signature expiration times than they use for cache expiration times."


I see indeed, I don't think they are going with the right approach here, there should be an automatic way to upgrade signed content / check for updates, short signatures just destroys the benefits of the feature.


It's the only way to do it. TLS has shown that OCSP and the likes are not adding significant security and short certificate expiration is the only way to go.

The serving nodes are not necessarily under control of a well intended party that complies with upgrade requests.


And I don't see the issue with short expiry. The point of a cache is to reduce load, not to entirely eliminate it. Even with a 5m expiry, it's still 5 orders of magnitude better than having a 100+ QPS on your server.


Parent considers that the feature could be used to turn a temporary problem into a long-term problem. Sort of like how certificate pinning could be twisted to ransom an entire domain.


What you describe sounds a lot like URL spoofing, which is already pretty much used entirely to trick and scam unsuspecting users into clicking malicious links. Signed exchanges would just be an even harder to detect version of this.


Signed exchanges make it super easy to detect fraud. You can verify the signature...


No, with signed exchange your browser verifies that the original site really did produce the content you are viewing.


No... no, it would not. It would centralize web hosting and make it less censorship resistant.

Sure, you could move it somewhere else and have it show up in the address bar the same, but the actual URL has changed and you need to somehow get the new URL into people's hands. And ultimately you've centralized a lot of websites under a smaller number of service providers which, before, would have been on their own domains.


I'm not sure what you describe there but it sounds much more complicated than it is.

Isn't signed exchanges basically CDN's without having to setup DNS? It's in theory no different than using CloudFlare to serve your content, except any CDN can just serve it without giving them access to your domain.


Or you could store signed exchanges on platforms like Dat or IPFS and get real decentralization.


Right, the point is that anyone can serve the content through any platform, which is why it allows decentralization. But I just don't understand why people are hanging so tightly to the idea that a URL is a direct path to a server, because that just isn't true.


> this feature could spawn a whole new class of decentralised and harder to censor web hosting

How so?


https://docs.ipfs.io/concepts/dnslink/ this is how it should be


There is an issue opened around supporting HTTP exchanges for IPFS in the browser as well.

https://github.com/ipfs/in-web-browsers/issues/121


I don’t really think Google’s plan is that weird. And it would be amazing for decentralized networks, archiving, and offline web apps. Google can’t just serve nyt.com — they can serve a specific bundle of resources published and signed by nyt.com verified by your browser to be authentic and unmodified.


How does centralizing content on Google from multiple sources improve decentralization? The web is already decentralized. That's why it is a web.

AMP is a scourge. It's a bad idea being pushed by bad actors.


The current implementation of the AMP cache servers obviously doesn't help the decentralization.

I think what Spivak is saying though is right. If we could move from location addressing (dns+ip) to content-addressing , but not via the AMP cache servers, in general, anyone could serve any content on the web. Add in signing of the content addressing, and now you can also verify that content is coming from NYTimes for example.

Also, I'd say that the internet (transports, piping, glue) is decentralized. The web is not. Nothing seems to work with each other and most web properties are fighting against each other, not together. Not at all like the internet is built. The web is basically ~10 big silos right now, that would probably kill their API endpoints if they could.


I think this would require an entirely new user interface to make it abundantly clear that publisher and distributor are seperate roles and can be seperate entities.

I don't think this should be shoehorned into the URL bar or into some meta info that no one ever reads hidden behind some obscure icon.


Isn't it already the case though with CloudFlare and other CDNs serving most of the content? Very few people really get their content from the actual source server anymore.


That's a good point. I just feel that there is an important distinction to be made between purely technical distribution infrastructure like Cloudflare's and the sort of recontextualisation that happens when you publish a video on Youtube. I'm not quite sure where in between these two extremes AMP is positioned.


Thank you for this explanation. AMP has put a really bad taste in my mouth but what you describe here does have some interesting implications. Something to consider for sure.


Please fact check me on this, but the ostensible initial justification for AMP wasn't decentralization, but speed. Businesses had started bloating up their websites with garbage trackers and other pointless marketing code that slowed down performance to unbrowsable levels. Some websites would cause your browser to come close to freezing because of bloat. So Google tried to formalize a small subset of technologies for publishers to use to allow for lightning fast reading, in other words, saving them from themselves. AMP might be best viewed as a technical attempt to solve a cultural problem: you could already achieve fast websites by being disciplined in the site you build, Google was just able to use its clout to force publishers to do it. As for what it’s morphed into, I’m not really a fan because google is trying to capitalize on it and publishers are trying various tricks to introduce bloat back into AMP anyway. The right answer might be just for Google to drop it and rank page speed for normal websites far higher than it already does.


> How does centralizing content on Google from multiple sources improve decentralization?

It actually makes perfect sense in Doublespeak. /s


They’re suggesting a web technology which would allow any website to host content for any other website, under the original site’s URL, as long as the bundle is signed by the original site. That could be quite interesting of a site like archive.org, as the url bar could show the original url.

But AMP is a much narrower technology, I’d imagine only Google would be able to impersonate other websites, essentially centralised as you say. The generic idea would just be a distraction to push AMP.

Everything would be so much better if the original websites were not so overloaded with trackers, ads and banners, then there would be no need for these “accelerated” versions.


I see where you are going, but what if my website is updated?Is the archive at address _myurl_ invalidated, or is there a new address where it can be found? I am thinking of reproducible URLs for academic references or qualified procedures, for example, which might or might not matter in the intended use case.

Could there be net-neutrality-like questions in all this as well?


I think this is possible already, but should not override the displayed URL for the content.

Create a new “original URL” field or something.


Google is not a single server. Think of Google as a CDN.


So it's decentralized because Google has multiple servers? And here I was, thinking that Google runs everything from a single IBM mainframe.

What you're saying would be described as distributed... Not decentralized.


Seems to me like it's easy to forget there's a difference between those two..


+1. The way I think about it is that signed exchanges are basically a way of getting the benefits of a CDN without turning over the keys to your entire kingdom to a third party. Instead you just allow distribution of a single resource (perhaps a bundle), in a crytographically verifiable way.

Stated another way, with a typical CDN setup the user has to trust their browser, the CDN, and the source. With signed exchanges we're back to the minimal requirement of trusting the browser and the source; the distributor isn't able to make modifications.


It seems like there is a risk that an old version of a bundle will get served instead of a new one by an arbitrary host? Maybe the bundle should have a list of trusted mirrors?


There is a publisher selected expiration date as part of the signed exchange which the client inspects. The expiration also cannot be set to more than 7 days in the future on creation. This minimizes, but of course does not eliminate, this risk.


It also makes signed exchanges completely unusable for delivering packages offline. (E.g. the USB stick scenario)

What a bummer.


Browsers could have a setting to optionally display the content anyway, along with a warning to the effect of "site X is trying to show an archive of site Y", similar to how we currently handle expired or self-signed SSL certificates.


Alternatively super short expiry times. It doesn't seem like it would be that concerning to have another site serving a bundle that was 5 minutes out of date. It doesn't seem like it should be too much load to be caching content every 5 minutes.


I could see some sort of alternative URL bar ("https://nyt.com/somearticle/ | served by https://somecdn.example.org/blah"), but complete replacement is far too dangerous and confusing in that it is completely hidden.


The New York Times surely already serves their pages through a CDN, silently, and with the CDN having the full technical capability to modify the pages arbitrarily. Signed exchange allows anyone to serve pages, without the ability to modify them in any way.

(Disclosure: I work for Google, speaking only for myself)


My objection is that it's no longer clear if you're dealing with content addressing or server addressing. If I see example.com in the URL bar, is it a server pointed from the DNS record example.com (a CDN that server tells me to visit), or am I seeing content from example.com? If I click a link and it doesn't load, is it because example.com is suddenly down, or has it been down this whole time? Is the example.com server slow, or is the cache slow? Am I seeing the most recent version of this content from example.com, or did the cache miss an update?


What if there was a `publisher://...` or `content-from://...` or `content://...` protocol, somehow? (visible in the address bar, maybe a different icon too, so one would know wasn't normal https:)

And by hovering, or one-clicking, a popup could show both the distributor's address (say, CloudFlare), and the content's/publisher's address (say, NyT)?


> a way of getting the benefits of a CDN without turning over the keys to your entire kingdom to a third party.

https://blog.cloudflare.com/keyless-ssl-the-nitty-gritty-tec... is a thing now.


The session key, which is given carte blanche by the TLS cert to sign whatever it wants under the domain, is still controlled by Cloudflare.

To put it simply, Cloudflare still controls the content. The proposal here would avoid that, by allowing Cloudflare to transmit only pre-signed content.


Your browser would have a secure tunnel to CloudFlare which is encrypted with their key. But then that tunnel would deliver a bundle of resources verified your browser differently that CF doesn’t have the key for.


The plan is bad because google currently tracks all of your activities inside AMP hosted pages site in their support article.

Google controls the AMP project and the AMP library. They can start rewriting all links in AMP containers to Google’s AMP cache and track you across the entire internet, even when you are 50 clicks away from google.com.


While that's theoretically possible, the library can be inspected and does not do these things.


Could Google give specific persons different versions or is technically impossible?


Technically yes, but not very practically. The domain is cookieless, so it would be difficult to even identify a specific user, other than by IP. Also, the JavaScript resource is delivered from the cache with a 1 year expiry, which means most times it's loaded it will be served from browser cache rather than the web.


How is google.com cookieless?


The AMP javascript is served on the cdn.ampproject.org domain, not google.com.


It's very possible indeed.


They have the log files.


> the library can be inspected

Really? Could you publish how you are inspecting an unknown program to determine if it exhibits a specific behavior? There are a lot of computer scientists interested in your solution to the halting problem.

Joking aside, we already know from the halting problem[1] that it you cannot determine if a program will execute the simplest behavior: halting. Inspecting a program for more complex behaviors is almost always undecidable[2].

In this particular situation where Google is serving an unknown Javascript program, a look at the company's history and business model suggests that the probability they are using that Javascript to track use behavior is very high.

[1] https://en.wikipedia.org/wiki/Halting_problem

[2] https://en.wikipedia.org/wiki/Undecidable_problem


By reading the source code?

    def divisors(n):
        for d in range(1, n):
            if n % d == 0:
                yield d

    n = 1
    while True:
        if n == sum(divisors(n)):
            break
        n += 2
    print(n)
I don’t know if this program halts. But I’m pretty sure it won’t steal my data and send it to third parties. Why? Because at no point does it read my data or communicate with third parties in any way: it would have to have those things programmed into it for that to be a possibility. At no point I had to solve the halting problem to know this.

Also, if I execute a program and it does exhibit that behaviour, that’s a proof right there.

The same kind of analysis can be applied to Google’s scripts: look what data it collects and where it pushes data to the outside world. If there are any undecidable problems along the way, then Google has no plausible deniability that some nefarious behaviour is possible. Now, whether that is a practical thing to do is another matter; but the halting problem is just a distraction.


> at no point does it read my data

Tracking doesn't require reading any of your data. All that is necessary is to trigger some kind of signal back to Google's servers on whatever user behavior they are interested in tracking.

> or communicate with third parties

Third parties like Google? Which is kind of the point?

> [example source code]

Of course you can generate examples that are trivial to inspect. Real world problems are far harder to understand. Source is minified/uglified/obfuscated, and "bad" behaviors might intermingle with legitimate actions.

Instead of speculating, here is Google's JS for AMP pages:

https://cdn.ampproject.org/v0.js

How much tracking does that library implement? What data does it exfiltrate from the user's browser back to Google? It obviously communicates with Google's servers; can you characterize if these communications are "good" or "bad"?

Even if you spent the time and effort to manually answer these questions, the javascript might change at any time. Unless you're willing to stop using all AMP pages every time Google changes their JS and you perform another manual inspection, you are going to need some sort of automated process that can inspect and characterize unknown programs. Which is where you will run into the halting problem.


Funny how people can literally "forget" that Google is a third party. Probably people at Google believe they are not third parties. Not even asking or trust, just assuming it. No other alternatives. Trust relationship by default.


> I don’t know if this program halts.

Be cool if you did ;)


If you didn't catch the joke: It is currently unknown whether there are any odd perfect numbers (and the program halts on encountering the first).

https://en.wikipedia.org/wiki/Perfect_number

https://oeis.org/A000396


> Could you publish how you are inspecting an unknown program to determine if it exhibits a specific behavior? There are a lot of computer scientists interested in your solution to the halting problem.

This has nothing to do with the halting problem because that is concerned about for all possible programs not some programs.

We obviously know if some programs halt.

    while true: nop
Is an infinite loop.

    X = 1
    Y = X + 2
Halts.

More complex behaviours can be easier. Neither of my programs there make network calls.


Publishers who use AMP were already allowing Google to track everything through either Analytics or Ads.

Likewise, AMP pages are mostly accessed from Google search that's already tracked.


As a user I can choose to block GA, either through URL blocking or through legally mandated cookie choices in some regions (e.g. France). When served from Google I have no choice in the matter.


If you can block GA at the client, you can block google.com at the client, no?


Not if I want AMP pages. (I mean, I don’t, but there are presumably people who do.)


The AMP spec REQUIRES you include a Google controlled JavaScript URL with the AMP runtime. So technically the whole signing bit is a little moot, given that the JS could do whatever it wanted.


The same could be said of any CDN hosted javascript library. For example: jquery. There is an open intent to implement support for publishers self-hosting the AMP library as well.


For most JS served by CDN, you can (and should) use Subresource Integrity to verify the content. At least the last time I was involved in an AMP project, Google considered AMP to be an "evergreen" project and did not allow publishers to lock in to a specific version.


Long term versions are now supported, so publishers can lock in a specific version.

Publisher hosted copies are in the pipeline, as I referenced in the parent comment. My choice of verbiage was a bit confusing it appears.


I don't think it's your wording that's confusing. You are contradicting the AMP documentation.

AMP's documentation seems to indicate that the LTS is stable only for one month (new features released via the same URL each month), and so is not compatible with SRI (see https://github.com/ampproject/amphtml/blob/master/contributi...)

You can specify a version (ie, https://cdn.ampproject.org/rtv/somenum/v0.js), but the AMP validator complains about that.


> The same could be said of any CDN hosted javascript library

Yes, and? What’s your point? It’s actually a security weakness to include third party JS. The whole thing runs on trust.


What's an open intent? Where is this documented?


AMP spec: https://amp.dev/documentation/guides-and-tutorials/learn/spe...

"AMP HTML documents MUST..."

"The AMP runtime is loaded via the mandatory <script src="https://cdn.ampproject.org/v0.js"></script> tag in the AMP document <head>."

Do a whois on ampproject.org:

"Registrant Organization: Google LLC Registrant State/Province: CA Registrant Country: US Admin Organization: Google LLC"

Note that jQuery, as mentioned in some GP comment has no such requirement. Google AMP is quite unique in this regard. This is NOT some general CDN type issue. Also...agreed, WTF is "open intent"?



Note "open", i.e., unresolved. Perhaps in a less positive light, "how to enabled signed exchanges/AMP without controlling it".


Correct. Open as in not resolved yet, but intended to be resolved in the future.


You missed the required part.


That's not why Google (the corporation) wants this to happen. This is not about technical capabilities but about power.

They cannot be allowed to become the gatekeeper for the web.


They already are. The question is not how we're going to stop that from happening but how we are going to roll it back.


I agree, if we finally got a way to have working bundles on the web, that would be extremely useful. (And would also restore some of the capabilities of browsers to work without internet connection).

It seems to me, a lot of the security concerns come from the requirements to make pages served live and pages served from bundles indistinguishable to a user - a requirement that really only makes sense if you're Google and want to make people trust your AMP cache more.

I'd be excited about an alternative proposal for bundles that explicitly distinguishes bundle use in the URL (and also uses a unique origin for all files of the bundle).


I believe the issue with this is that users already largely don't understand decorations in the URL. For example, the difference between a lock and an extended verification certificate bubble. Educating a user on what a bundle URL means technically may be exceedingly challenging.


In what ways is this different/similar from "content centric networking"?

https://m.youtube.com/watch?v=gqGEMQveoqg

(Google Tech Talk from Van Jacobsen on CCN many years ago)


AMP is happening and CCN is not.


Do you mean to say that is the only difference?


No, there are many differences but they don't matter. Since CCN is not economically feasible, none of the technical details matter.


Why is CCN not economically feasible?


The problem is ownership. Google is “stealing” or caching content for what they consider a better web.

I don’t support ads but I also don’t support Google serving a version of the page that steals money from content creators. So, therein lies the problem: choice.

I can imagine a future where amp is ubiquitous and Google begins serving ads on amp content. Luckily, companies have to make money and amp is not in most people’s or company’s best interests.

If amp was opt-in only, this would be much more ethically sound.


Signed exchanges guarantee that the content cannot be modified by the cache, such as ad injection.

Google has never injected ads into any cache served AMP document (technically if the publisher uses AdSense, this is false, but that's not the point you are making).

It's difficult to follow what definition of theft is being suggested. The cache does not modify the document rendering, it's essentially a proxy. In a semantic sense, this is no different than your ISP delivering the page or your WiFi router.


It's completely moving away from the client/server model to something else.

Perhaps that's a great thing to do, but it's not something to do quietly.


Just hearing about this from the thread, I'm getting a IPFS vibe from this. It would be interesting to see that tech get more native integration with the browser from this idea.


How is it not weird that I see a domain name in the URL bar that has nothing to do with the domain I actually requested content from?


Why do they need a special extension though? What's wrong with DNS?


Signed exchanges are an extension to digital certificates, such as used for TLS. This is independent of DNS.


Why would it be amazing for decentralized networks and offline web apps?


If I publish mycoolthing.com/thing, it could be mirrored over a P2P network as peer1.com/rehosted/mycoolthing.com/thing, peer2.com/rehosted/mycoolthing.com/thing, etc., in a way that would make it evident to end-users not familiar with the protocol that the content is from mycoolthing.com.


AMP is of course not P2P.


I think the point is that signed exchanges ( https://developers.google.com/web/updates/2018/11/signed-exc...) could potentially be useful, if separated from AMP, and made an actually secure thing. Like, for example, the spec doesn't require specific Google controlled js URLS to be in the content.


Signed exchanges is actually separate spec from AMP. The browser implements it independently. There is no requirement for AMP pages to use signed exchanges nor for signed exchanges to be AMP.


Remember when Google was telling us that third-party cookies are there to protect us, and Safari/Firefox/Edge are just reckless and pose a risk to users by blocking them?


Please provide a link, I could only find this, which suggests Google has reversed course:

https://www.techradar.com/uk/news/google-is-phasing-out-thir...


> By undermining the business model of many ad-supported websites, blunt approaches to cookies encourage the use of opaque techniques such as fingerprinting (an invasive workaround to replace cookies), which can actually reduce user privacy and control.

https://blog.chromium.org/2020/01/building-more-private-web-...

I'm going to copy paste my older comment on this:

I find their "removing 3rd party cookies will incentivise businesses to rely on fingerprinting" discourse dangerous.

It implies that other browser vendors (Mozilla, Safari/WebKit, new Edge) are in fact making the Web a more dangerous place.

I believe it's dangerous because it creates a harmful, unproductive PR narrative—people might just assume this is a true statement, without learning about both sides of the problem. I'm not trying to strip anyone of agency, I just don't think most of my friends would have time to research this topic and might decide to follow the main opinion instead.

The answer I'd like to hear: Yes, it does push some actors towards fingerprinting, but preventing fingerprinting should be dealt with regardless. Changes should happen both on legislative and browser-vendor level.


Sounds a bit like: "By locking your door, you only encourage thieves to break your window, which can actually increase the damage they cause you."


Precisely.


Thank you for the additional background.


Thanks for asking, we need more comments like yours


Have a look at this: https://blog.cloudflare.com/announcing-amp-real-url/

Cloudflare allow using of same domain to use AMP. In this case, content is served from Cloudflare CDN.


Note the Cloudflare hosted AMP pages still mandate AMP requirements, like including a Google controlled JS uri in your content. Signing is moot if you allow Google to run arbitrary JS on your content. They haven't abused it yet, but it's allowed by spec. Subresource integrity isn't mandated, explained, or recommended.


It's called Signed Exchange and it's the same thing the comment you replied to is about.


I'm still waiting for general support of addons for the next version of Firefox on mobile just so that I can have the Redirect AMP to HTML[1] addon.

[1]: https://addons.mozilla.org/firefox/addon/amp2html/


I'm on Firefox on a mobile device. That addon has a redirection link [1] specifically for Firefox' mobile version.

As I didn't know about this addon, thanks for sharing it.

[1] https://addons.mozilla.org/en-US/android/addon/amp2html/


Here's a user script to get rid of AMP (I use it with adguard): https://userscripts.adtidy.org/release/disable-amp/1.0/disab...


This has been discussed over & over again and there is no representation from amp team to make it any better. I was surprised to realise how much my life changed when I started using firefox + duckduckgo. Full time, at work & home, on macOS & android.


Aren't we essentially reinventing http proxies with this?


Just having this idea already tells how important it is to actively resist Google.

One should never forget that at a certain point, Google will likely invoke the looser's argument ("protect you from terrorists and pedophiles") to require proof of identity prior to granting access to any resource or service it controls.

Anything that helps them advance in that direction must be fought fiercely.


Isn't this basically like a CDN or a PoP cache?


Not exactly. For a CDN to work, the DNS is repointed towards the CDN's servers. In this case, Google is trying to cover-up that Google and not NYTimes is serving the page.


Is NYTimes's use of Fastly also a cover-up?


Does NYTimes' use of Fastly subvert the meaning of the URL by literally covering it up in the address bar? Nope? Not the same thing, then.

Personally I don't think there's anything wrong with the fundamental concept of signed exchanges. The only problem is that it's just that: a signed exchange of content, which should have nothing to do with the domain name authority in the URL. By all means, display "Content from: a.com" in a box next to the URL, but don't change b.com to a.com in the URL as though it doesn't already have a well defined meaning.


> the meaning of the URL

The issue is that the technical meaning of the URL is very far from what most user think of.

Is the URL an address for NYT's server? Not really because you are actually hitting Fastly's server. So when NYT sets up a magical DNS config, it suddenly is fine, but using crypto to sign the package and serve it on a CDN that way, then it's suddenly "subverting the meaning of the URL"?

We can have a real discussion of what the meaning of a URL is, but I think your interpretation is unfair. I think it's entirely fair to argue that it makes sense for URLs to be an address to a specific content.


> The issue is that the technical meaning of the URL is very far from what most user think of.

My argument is not really concerned with what most users think of, but humor me, what do they think of?

> So when NYT sets up a magical DNS config, it suddenly is fine, but using crypto to sign the package and serve it on a CDN that way, then it's suddenly "subverting the meaning of the URL"?

Yes, because HTTP/S scheme URLs have a definition that implies a meaning, which is subverted when you create exceptions to that meaning. NYT setting up a "magical" DNS config that resolves to some third party server is perfectly fine by that definition, and resolving one FQDN while displaying another is not. It's not sudden, this standard has existed in one form or another since 1994.

> We can have a real discussion of what the meaning of a URL is

Yeah, let's do that instead of harping on about what's fair and unfair. It's not a matter of fairness, it's a matter of standardized definitions. By all means, create a new "amp:" URI scheme where the naming authority refers to whoever signed the data and resolves to your favorite AMP cache, but don't call it http or https.


I think the subtle shift of view here is that the URL shows the address where the content is located, more so than where the content was actually fetched from.

An example of where this occurs today is caching. You could be hitting a cache anywhere along the way. Hell you could be seeing an "offline" version, but the website would still show you the "address" of the content.

This is no different, you're hitting a different cache, but the "URL" you see is the canonical address of the content you are looking at, not where it was actually fetched from.


> I think the subtle shift of view here is that the URL shows the address where the content is located, more so than where the content was actually fetched from.

The only sense in which content is located anywhere is as data on a memory device somewhere. With the traditional URI in which the host part of the authority is an address of or a domain name pointing towards an actual host, you have a better indication of where the content is located than you do if this is misrepresented as being some other domain name which in fact does not at all refer to the location of the content.

The shift, if any, is that people may be less interested in where the content is located and more interested in its publishing origin.

> An example of where this occurs today is caching. You could be hitting a cache anywhere along the way. Hell you could be seeing an "offline" version, but the website would still show you the "address" of the content.

Yes, because that's how domain names work.

> This is no different, you're hitting a different cache, but the "URL" you see is the canonical address of the content you are looking at, not where it was actually fetched from.

It's different in the sense that a host name as displayed by the browser then has multiple, conflicting meanings that have no standardized precedent.


Google by definition wants to cover-up the domain name that serves amp web pages. Over half of the article discusses that.

For your question about fastly, I already answered that in the comment you replied. The fastly CDN requires that the DNS is configured to point at fastly servers. Take a look at https://docs.fastly.com/en/guides/sign-up-and-create-your-fi... under "Start serving traffic through Fastly".

  Once you’re ready, all you need to do to complete your service
  setup and start serving traffic through Fastly is set your domain's
  CNAME DNS record to point to Fastly. For more information, see
  the instructions in our Adding CNAME records guide."
A CNAME record is a dns mechanism that aliases an alternate domain for a canonical domain.


Users don't see DNS records. In the old world they click on a nytimes.com link and get something served from a Fastly server, but in the future AMP world they click on nytimes.com and get it served from a Google server. It isn't different.


You bring up an interesting point - if AMP hosting is the same as a CDN, then why do companies use both solutions? "Because they appear the same" doesn't mean they are the same.

AMP requires that you consume other Google products, which requires that additional JS is loaded. When your mobile site doesn't use AMP, Google limits SEO rankings your mobile site can have. Google AMP requires your pages meet Google's Content Policies or they won't host them.

AMP and CDN delivered pages are architected differently and Google imposes restrictions and requirements that don't exist in a CDN.


I agree with you, if I understand the signed exchange proposal correctly the trust model is effectively similar (NYT explicitly opts to let Fastly pretend to be them through their DNS config in the same way that a signed exchange would let them explicitly opt into letting Google pretend to be them).

I'm still opposed to the change, I see this centralization of the web through CDNs as a bad thing, I don't want to make it easier.


The trust model is pretty different: in the traditional model NYT has to trust their CDN to serve the content unmodified. In the signed exchange model, any modification will cause the content not to validate, and the browser will reject it.


It's very different. And the difference lies in the URL bar. When you use a CDN, your visitors will still see your domain. With Amp, they see google.com.


That's exactly what Google wants to change, and what Mozilla is opposing.


Not if nyt hasn't authorized google to act on their behalf. "Yes we will serve your stuff on your behalf at your request, now that you have stuck your sign on our door via a DNS record" vs "We're putting your sign on our door because as the authors of a browser we can do that, whether you like it or not".


NYT in fact has to go through non-trivial technical effort to authorize Google to act on their behalf.


Holy mackerel


has the mainstream web jumped the shark ?


Well, it's the going opinion of HN for years that the main problem with AMP is it shows the actual origin instead of the proxied origin. Lying about the URL is something hundreds of HN comments have angrily demanded.


Why do you say that? I don’t think people want it to show the “proxied origin”, they want AMP to get out of the way and google to link to the real website.


No. The complaint is that Google is redirecting the content to servers that they control.


This is not correct. Anyone can host AMP. See, for example, amp.cnn.com. Google hosts AMP content for its customers who elect to use that service. It’s not a nefarious plot.


People have been railing against Google's Amp on HN for years, and I think I finally figured out what it's for.

It's Google way of combatting phone apps.

If all of the world's information — especially current news and similar information — moves from the open web into apps, then Google can no longer crawl, index, or scrape that information for its own use. The rise of the mobile phone app is a threat to Google on so many levels from ad revenue to data for training its AIs.

So Google comes up with Amp to convince publishers to keep their content on the open web, where it can be collated, indexed, and otherwise used by Google for Google's services like search and those search result cards that keep people from visiting the content creators.

Google's explicit carrot in all this is the user benefit of page loading speed. Google's implicit carrot in all of this is page rank. But Google's real motivation is to have all of that information available to itself.

Can you imagine what would happen if content from even one of the big providers was no longer visible to Google? New York Times, WaPo, or even Medium? It would create a huge hole in a number of Google products and services, make its search results look even weaker than they already are, and cause people to look for search alternatives.

That's my theory, anyway.


Amp was a reaction to Apple News and Facebook News: using those applications to read the news was a much better experience than using the web. Why? Mainly for two reasons:

1/ Apple and Facebook were hosting all the content.

2/ The content did not come with megabytes of JS and other unnecessary crap.

Amp is an attempt at saving the web, and Google is interested in that for the reason that you gave: they make their money from the web.


> Amp is an attempt at saving the web, and Google is interested in that for the reason that you gave: they make their money from the web.

Yes; attempting to save the web in much the same way that the parasitic wasp is trying to oviposit in your thorax and take over your behavior, in order to save you from being eaten by the spider.

No thank you, sawfly.


This has already happened in China, where Baidu (The Chinese equivalent of Google) can’t crawl any articles from WeChat (The Chinese equivalent of Medium), as a result, the usefulness of its search result has deteriorated significantly. Recently, Baidu has been trying to start its own publishing platform with little success.


> WeChat (The Chinese equivalent of Medium)

TIL


Well, it’s more like WhatsApp, Medium, Venmo, and Facebook all combined into one giant app.


Also food ordering, travel reservations, health care appointments, banking, government services, and a whole lot of other things that would take too long to list. It's not an exaggeration to say the entire Chinese consumer experience runs through WeChat.


I think this is a fairly cynical take, as having news on the web is also pretty great for users.

Imagine if instead of having all news stories a quick search away you instead had to install apps from X different news sources (and inevitably grant them permission to access your location, contacts list, name of first born child etc.). It'd create lots of little silos of news with very little ability to go outside those silos.

Put another way, the web is a great platform for news. It does benefit Google, but it also benefits the billions of people who can freely access a huge range of sources.


Interesting theory. One hole is that companies want to be on Google's results. It hurts WaPo not to be in the top N results, so they have an incentive to make it at least possible.


Who is really using the dedicated apps for each news site? Web is just way more practical; for translation, for copy-paste, for sharing.

Besides,you dont need the app on your mobile.


I bet non-techie people _already_ read their daily dose of news from 1 to 2 news websites at most. Installing a dedicated app is not much different than surfing the same 2 websites everyday.

Also, for techie people, do you consider RSS as part of the "web" ? To me, an RSS aggregator app is superior to browsing 20 different news websites, all with different formats.

"web is just way more practical" isn't obvious. It depends about what you put in the "web" bag, and the use cases. Most apps use "web" protocols, so they are technically part of the web.


RSS should make a comeback


That's long been Google's stated reason for Chrome and much else, that pushing the web forward as a platform aligns with their interests as well.


Apple and Facebook really doesn't care if the web dies as long as their platform take the lion's share. But for Google, search as a product can exist only if the web itself remains relevant and this is why it's trying to keep display ads alive even though it doesn't really give them much money compared to search ads but all the privacy complication coming from third party tracker.


Interesting, though the barrier for users to install a new app seems to be very high these days. Most people only install a few necessary apps and thats it. In addition, we are talking about publishers here. There are thousands of news sites, no user has more than a couple of news apps. That’s why they have to keep up their website anyway, with or without amp.


That’s Google’s motivation for almost anything. Especially Chrome.


I think the main issue is limited AMPCache providers and inability for the publisher to choose their own AMPCache providers. Which is being exploited the two search engines.

AMP project by itself is open-source and it explicitly states 'Other companies may build their own AMP cache as well'.[1] There are only 2 AMP Cache providers - Google, Bing. Further, 'As a publisher, you don't choose an AMP Cache, it's actually the platform that links to your content that chooses the AMP Cache (if any) to use.'[2]

Say, if Cloudflare provides a AMPCache and if the site publisher can choose their own Cache provider this can be resolved effectively as AMP by design itself is easy for a laymen to create high performance websites; of course there is no excuse for hiding URLs.

[1]https://amp.dev/support/faq/overview/

[2]https://amp.dev/documentation/guides-and-tutorials/learn/amp...


Can we please stop trying to pretend AMP is some sort of community-driven open source project? AMP was created by Google, for the benefit of Google. We are not obligated to play along every time a company says “open source.”


>We are not obligated to play along every time a company says “open source.”

I Agree. IMO, Google has been using 'open-source' for weaponized marketing, same way Apple has been using 'Privacy'. But, either of them could be much worse without those.


Yet Google's own competitor, Bing, is clearly also using it. Isn't that part of the point of open-source? That anyone can see and use your work?


> We are not obligated to play along every time a company says “open source.”

This is the point.

People easily confuse "open source" with "free software" and "community driven".

A lot of corporate-driven open source greenwashed the dark patterns of closed source: centralized development, user lock-in, walled gardens, poor backward compatibility, forced software and hardware upgrades.


>"community driven"

This concern has been raised time again with every major Google open-source project e.g. Android, Chromium, Golang etc. and that concerns have helped improve certain aspects of the project.

But, I wonder whether a huge corporate like Google can build such large scale projects without such criticism, if the the project needs to be successful they to gain from it after-all they are investing their employees and other resources in it. And them being invested in it, is a major reason for adoption by other parties and resulting in a successful open-source project.

More over, such large projects have helped overall SW ecosystem and even startups economically. I for one would say, without such large open-source projects I wouldn't have even been able to build products from a village in India and compete with products from valley.

All I'm saying is, them being open-source at least helps us raise concerns and make them take actions; being a complete walled garden and just asking to 'trust us' is much worse.


> But, I wonder whether a huge corporate like Google can build such large scale projects without such criticism

Yes: they could at least develop large projects in a foundation with many other companies

> And them being invested in it, is a major reason for adoption by other parties and resulting in a successful open-source project.

...and the main source of pain when the projects are "pivoted" or just dropped due to a single company business needs, as it happened many times.

> such large projects have helped overall SW ecosystem and even startups economically.

They hugely harmed competing projects and competing companies including Mozilla, many phone OSes, many grassroots programming languages.

It's well known that google developed various projects to kill competitors or buy startups cheaply and drop the project afterwards.

There isn't an infinite pool of open source developers - far from it!

Any large corporation that drains the pool to create a competitor to already existing FLOSS projects is actively harming the ecosystem.

> being a complete walled garden and just asking to 'trust us' is much worse.

Closed source can be less harmful that fake-open source. A lot of people actively avoid closed source and fall for the latter.


>They hugely harmed competing projects and competing companies including Mozilla, many phone OSes, many grassroots programming languages.

IMO, we're the reason it failed. We as a consumer didn't buy FirefoxOS phone over Android, iOS. We haven't adopted Firefox browser enough for it to become have the major market share. The same argument can levelled against any proprietary product VS open-source product.

That proves my point, being 'completely community driven' open-source project isn't the only criteria for the success of a project.


Funny how I get bunches of downvotes on this account but never on other accounts. Time to switch.


Did Cloudflare end their Amp Cache? They hosted one previously.


I didn't know about it, AMP site lists only Google, Bing. But I know for a fact that cloudflare has no issues caching AMP sites like any other sites though.


It looks like they killed it last year. The codename was Ampersand.

Creation: https://www.cloudflare.com/press-releases/2017/cloudflare-an...

Death: https://blog.cloudflare.com/announcing-amp-real-url/

I agree with you that users should be able to choose their Amp Cache.


The link aggregator, not the publisher, must control the AMP Cache in order to prerender pages from it safely.


The whole AMP thing seems anti-competitive and hostile to the open web.

It's a really bad look on Google's part to be pushing this.


There has been no regulatory action since Microsoft (which happened as Google was being born), so the tech giants have forgotten fear and no longer self-regulate out of simple self-interest.


Another way of looking at it is: they absolutely are self-regulating.

And it appears to be a problem.

Another problem is, there's effectively no distinction between regulator and regulatee.


> Another way of looking at it is: they absolutely are self-regulating.

If they do that, it's not really visible, I don't see any regulation with how Google is behaving regarding search & web, if anything it looks like anti-competitive monopoly behaviours.


Ya'll misunderstood.

Self-regulating in the same way an alcoholic meth addict self-regulates.


Most of us understand, it's just that your "word play" is not helpful.


I am conflicted

Yes, AMP is an anti-competitive move by Google

At the same time AMP is "faster" because it gets rid of all the nagware and JS crap that the original page has.

So yeah, I don't like what Google is doing but I don't like what NYT is doing neither


AMP is faster? I’ve never been on an AMP page where I didn’t eventually need to go to the actual site to get the full content. So it’s really just an annoying step between me and the content I searched for.

It’s been one of the primary things that’s driven me away from google and into DDG. I don’t really care about privacy enough to leave google, but I end up leaving more and more of their services because their competition is just less annoying.


> At the same time AMP is "faster" because it gets rid of all the nagware and JS crap that the original page has

Google gives preference to AMP content whether the source page is lightning fast or not. I get the frustration with crappy web pages, but a big part of the reason web pages are getting increasingly crappy is because Google and Facebook (and to a much lessor extent Amazon in a weird way) have a stranglehold on the web advertising market and publishers are getting smaller and smaller slices of advertising revenue. AMP increases Google's lock on the market. Since AMP pages can only really be monetized by the publisher, this puts even more power in Google's hands.


> AMP is "faster" because it gets rid of all the nagware and JS crap that the original page has.

AMP is faster only for poorly-optimized JS-heavy pages but the design is fundamentally flawed to require all of its own large amount of JavaScript to run before anything displays, whereas most of the traditional bloat doesn’t block rendering. That means any optimized page - Washington Post, NYT, etc. – loads noticeably faster even before you factor in how often you need to wait for AMP to load, realize that some part of the content is missing, and then wait for the real page to load anyway.

That design forces it to be less reliable, too: before I stopped using Google on mobile to avoid AMP, I would see on a near-daily basis failed page loads due to the AMP JS failing in some way and when it wasn’t failing it was still notably slow (5+ seconds or worse on LTE). Since all of that JavaScript is forced into the critical path, anything less than unrealistically high cache rates means the experience is worse than a normal web page.

WPT examples:

https://www.webpagetest.org/result/200704_GR_62165b7f695e300...

https://www.webpagetest.org/result/200704_5F_f5c36a7c41cf4c2...


Those tests show you don't understand why AMP works. It works because it prerendered, which is going to be faster than anything you can do.


If that were true, AMP would be consistently faster. Since anyone who’s used it knows that it’s not, you would find it educational to learn about the issues with detecting user intent, reliably prefetching dependencies, and the relatively small / frequently purged caches on mobile browsers.

AMP’s design is very fragile: if you are using Google search results, they correctly guess what you’re going to tap on before you do and your browser fully preloads it, it _might_ be faster to run all of that JavaScript before anything is allowed to load and render. If any part of that chain fails, it will almost certainly be slower or, because it disables standard browser behavior, prevent you from seeing content at all.


> If that were true, AMP would be consistently faster.

It is. AMP results load instantly for me.

> you would find it educational to learn about the issues with detecting user intent, reliably prefetching dependencies, and the relatively small / frequently purged caches on mobile browsers.

And you might find it educational to learn why AMP doesn't rely on these things. There are no dependencies that need to be fetched for the initial render.

This idea isn't surprising. Multiple other systems use the same ideas, including Apple News, many RSS readers, and Facebook Instant Articles. AMP just does it in a way that isn't anti-competitive (like the former) and allows for multiple monetization schemes and rich formatting (unlike RSS).

> if you are using Google search results, they correctly guess what you’re going to tap on before you do and your browser fully preloads it, it _might_ be faster to run all of that JavaScript before anything is allowed to load and render

AMP doesn't rely on fully prerendering the page, only the portion above the fold, which it can calculate because the link aggregator page knows the display size, and the elements allowed in AMP are required to report their dimensions. This allows multiple pages to be prerendered.

> because it disables standard browser behavior,

What standard browser behavior does it disable?


It is ironic considering that 7 out of top 10 most used third party connections on websites are owned by Google.

So you can see why there must be some kind of internal struggle at Google. They understand the value of a faster web but they also cannot go after the main cause of the slow web. And this is how technology such as AMP gets invented and makes things worse.

https://markosaric.com/google-amp/


But why allow a third party (Google in this case) to collect data on your reading behavior on NYT?


If you loaded the megabytes of JS served by the actual nytimes.com, they’ll certainly be sending your data to Google as well for advertising purposes.

(Albeit, that’s far more blockable)


In what way is it anti-competitive? Google's competitors also consume AMP pages and prerender them using AMP caches. Anti-competitive would be requiring the publishers to integrate directly with Google like Apple News, not asking the publishers to publish pages that all link aggregators can consume.


Google Search uses its monopoly to push their own AMP cache. I can't search in Google and load the content through Bing's AMP cache.


> their own AMP cache

I'm confused, you make it sound like a free CDN is somehow a bad thing. You do realize people actually pay money to have their content on a CDN. I don't think Bing makes money on their AMP cache, and doubt they would want or even allow Google to link to content on their AMP cache...

The point of AMP cache is for Google (and Bing) to waste money making content faster for their users, in the hope that the user will then spend more time on search so they see more ads. The cache itself has nothing to do with the monopoly, and the fact that Bing can use AMP at all (since its open source) to get the same benefits actually shows the exact opposite.


> Google Search uses its monopoly to push their own AMP cache. I can't search in Google and load the content through Bing's AMP cache.

That's nonsensical. That would reveal what the person searched for to a third party (Microsoft) even if they don't click on any results. The AMP Cache has to be controlled by the link aggregator in order to support safe prerendering, so Bing's AMP cache is used to prerender Bing results, and Google's AMP cache is used to prerender Google results. Compare to directly integrating with Google, in which case, Bing wouldn't get to take advantage of prerendering. The latter (the Apple News setup) is anti-competitive. AMP is not.


IMO the core point of the article is false.

> To be blunt, this is a really dangerous pattern: Google serves NYTimes’ controlled content on a Google domain.

No, "Google serves NYTimes' controlled content" is an oxymoron. Google controls the content that is served, and that's all your browser is verifying. Google could very well make the NYTimes content on there display something else and your browser wouldn't show a warning. NYTimes could do nothing about that.

I disagree that this pattern is dangerous. While Google taking over serving the world's content is hardly a thing to celebrate, at least we're seeing that it's doing so here.


The pattern is dangerous because it trains the user to dissociate URL and legitimate content, and the best tool at our disposal against phishing is still the ability to use the URL to ascertain the legitimacy of a content.


URLs haven't been associated with legitimate content for a long time now, since most of the things come from giant CDN companies like CloudFlare anyway. What you're seeing in URL bar has very little to do with where the JS code executed on your computer is coming from.


Does it matter if it comes from a CDN rented by the NYT or a computer owned by DigitalOcean but rented by the NYT?

What matters is that the domain points to where the NYT considers is the correct source of their content.


With signed exchanges, the NYT is cryptographically opting into allowing Google (or other cache providers) to represent specific articles as being the NYT. It doesn't seem much different.


Yes, but users at least know the js code loaded was done so on behalf of the webpage the url points too


Do you really think most smartphone users look at the URL anymore? Or even know what a URL is?

From the non-technical people I've talked to, the answer is no, they don't know what a URL is, and that was happening before AMP came around.


You might want to back that up with research: people don’t look at full URLs but that’s exactly why it’s so important that the highly-prominent domain name display is accurate.


Are you sh!tting me?!!?

Had enough of HN ...

This place is bullsh!t.

Ban me.


It's the current amp status quo that trains users that legitimate content is sometimes on other domains.

This change would restore the idea that the URL indicates the provenance of the content,


With Signed HTTP Exchanges, for Google to modify the content that is served, Google would need access to a private key for a certificate for nytimes.com, no? Either nytimes.com has handed over that key or Google would have to create a key/certificate for nytimes.com. Believing Google would maliciously issue certificates seems a stretch to me.

I don't like AMP nor much of how Google has behaved with it (http://exple.tive.org/blarg/2020/05/05/the-shape-of-the-mach... largely matches my thoughts), but let's stick to what's actually happening with SXG.


I don't get this. Clearly the contents is served by Google, and so they can do whatever they like with it. How is an end user going to know whether the message was signed before it was passed on or not?


> How is an end user going to know whether the message was signed before it was passed on or not?

Your web browser will show a scary warning and refuse to display the bundle if it's not correctly signed. Google is not going to fake signatures for other sites, as certificate mis-issuance would open up Google to legal consequences.


How does the end user know now that a file from cdn.cloudflare.com is actually coming from NYTimes when it loads and runs code on their browser?


[flagged]


Please review and stick to the site guidelines when posting here (https://news.ycombinator.com/newsguidelines.html). They explicitly ask you not to lead with things like "That's nonsense. Take a moment to breathe." The rest of your first paragraph is just fine.

Re the second paragraph: people are wrong on the internet. If you want a site that doesn't have this problem, you're going to have to look for something considerably smaller, where countering entropy is an option. I think you might be running into the notice-dislike bias, though: https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...

Edit: I'm dismayed to see that your last 7 comments have all broken the site guidelines. Trashing this place because you don't like some of what other people post, or because you feel superior to the community, is not ok. I'm sure you wouldn't litter in a city park, so please stop doing the equivalent here.


> Is there some site where people actually have a clue? On any topic that I have any passing knowledge about, HN is just completely spewing nonsense and the worst part about it is that they think they have a clue.

Given that HTTP Signed Exchanges are nowhere near a web standard at this point, I think you should tone down your vitriol considerably.

Currently, what the parent commenter is saying is completely valid and true; if you're serving things on your domain and have a cert for it, you can serve https://youdomain.com/<anything>, where <anything> could be www.nytimes.com, www.google.com, or whatever. HTTP Signed Exchanges proposes a breaking change to this, and therefore is non-intuitive for the vast majority of users.


> but let's stick to what's actually happening with SXG

No, Signed HTTP exchanges are something that Google dreamed up so people don't have to see their hegemony over the modern web (or as the article you linked calls it, a shakedown). It's not a browser standard so far, because of Apple and Mozilla's resistance.

There are legitimate ways for NYTimes to allow Google to serve content on behalf of them, like so many other CDNs around the world (it usually involves the CDN generating the certificate for your site as well). Why should people create new standards for HTTPS and URLs simply for Google's benefit?

I don't deny that there's a way to make "nytimes.com" work where everything is served by "google.com". What I'm questioning is why we need a completely new web standard for doing so that affects the URL, something that has been standard for decades.


> Why should people create new standards for HTTPS and URLs simply for Google's benefit?

Because of the exact reasons that people are complaining about in this very thread: they want NYT to control the content and display the domain name appropriately, but they want to serve it from Google servers and allow for eager prefetching without leaking private details.

Today it would be easily possible if NYT just gave Google their private cert, but then Google would be able to serve any content they want as NYT. With the proposed solution they can display the content NYT wants without being able to serve arbitrary other content.


That's correct. Only with the private key can one sign a Signed Exchange for the publisher. Like TLS, if you have the key you can already do quite a lot.


The shenanigans Google been doing to the url bar is super hostile.

Trying to copy the domain of a url without the protocol just infuriates me.


Disable this setting in chrome by going to chrome://flags and switching #omnibox-context-menu-show-full-urls to enabled. Then right click the URL bar and select "alawys show full URLS"


I think not using chrome at all is a better response then trying to use workarounds.


I think the flag (but not the checkbox) will be enabled by default at some point... this option is the best thing to happen to the URL bar since they broke it by removing http:// and WontFix'ing the resulting complaints a decade ago.

Really, I couldn't care less about stuff getting pruned from the URL bar, as long as there's an easy and permanent way to show everything.


Turned this in a few weeks ago - so much better.


Isn't it just an extra click? Click one, it highlights the whole thing, click a second time and you see the full URL with the protocol.


Hm. There's room for a new good browser to pick up the beans and run with it now…


Try sending someone a link of a PDF you found using google.


Not defending that change, but when do you ever need to copy just the domain instead of the full URL?


Always. Main reason I cut something is I want to paste the hostname into a terminal so it can be an argument of whois or dig or traceroute or whatever, in no case have I ever been glad of the scheme prefix.


I've lost count of how many times I've done:

    $ ping http://whatever.com [furious line editing ensues]
But thankfully it's fixed now, with the "Always show full URLs" option.


Isn't more common to paste to a utility that uses the prefix like curl or wget? Or pasting into a chat? Besides all of those tools could just strip out the prefix, while there's no way to add the protocol to a domain name.

More information is strictly superior.


More information is better, so the URL shown in the browser should include the protocol. Consistent behavior is better, so copy/paste should only include text that is actually highlighted.


It's a solution in search of a problem.


The times that bit me the most is when I need to copy an IP address.


One of the main reasons sites use AMP (listed in top sites in serps) will not require AMP soon.

https://www.theverge.com/2020/5/28/21272543/google-search-re...


It's wrong to trust the URL bar. For example, this search [1] has as top link an ad that boasts "google.com", and it really is! And if you click on it, you'll end up on a google.com site, which nominally helps with printers, but in reality it's a tech support scam.

So much of the distrust here is that google wants to be everything: to host their content and publisher content and user content; to broker ads and recommend links; to run their software on your computer and phone, to store your data on their servers. They serve too many masters.

1: https://i.imgur.com/HalErpIr.png


"It's wrong to trust the URL bar" is true but only because the companies operating services like... Google... don't bother trying to protect their URLs. It's not hard to have a separate 'user content' domain for your user content, we've done it at places I used to work for. But for some reason people think it's enough to use a subdomain or get cute and use the same domain with a different TLD (looking at you, github.io)

So it is kind of frustrating to see someone offering to fix a problem they helped create in the first place through neglect or carelessness.


Agreed, that's problematic. But Google didn't even have to not host content, they would just have to use a different domain. They have such weird blind spots.


As an advertiser, you can write whatever you want into the url displayed there. This does not need to match the real target.


But the real target is google.com.

I just made https://sites.google.com/view/whalefacts, took me literally ten seconds, confirmed it was accessible from multiple IPs and multiple browsers.

Google wants to be a content host and an ad broker and a search engine. Each of these is reasonable in isolation. Yet you can search on google, and Google will serve you an ad linking to a google.com site, and that site scams you out of money. This isn't theoretical, I know because my family was hit.

Screenshot if it gets taken down: https://i.imgur.com/T6hVHr5.png


Super boring answer, and this is not an admonition to you, but in general; shouldn't this lead to lawsuits? It needs to be tried in court.


I'm confused, that's sites.google.com, the URL says that, and it's right, what's the problem?


All this does is rapidly devaluate the google.com domain. Not a bad thing per-se.


New York Times and all the other publishers don't have to participate in this crap. It's shameful that they cede authority over their content so easily in exchange for a vuage promise of more visibility. There are so many better ways.


It’s not a vague promise, it’s an extremely explicit one. Search results for news contain a “top carousel”, a horizontally scrolling box that shows cards for different articles. On most phones it takes up most of the screen. If you want to be in the carousel (i.e. if you want your site to be visible near the top of search results) you must use AMP. No ifs and buts about it.

If NYTimes and every other news organisation refused to participate then yes, Google would be in trouble. But they can rely on good old divide and conquer: these news organisations all compete with each other. All it would take is for one to starting producing AMP content again and they’d vacuum up all the search traffic, and all the other sites would follow them immediately.


In an ideal world where they would not rely on ad revenue and page views but are supported by the readers that assumption would be correct.

But right now we are not living in that ideal world and because all other publications are doing that they have to follow if they don't want to risk losing visibility against the competition.

So of course they don't "have to" but they also kinda do.


> don't have to participate in this crap

It's a tempting Ponzi scheme.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: