Hacker News new | comments | show | ask | jobs | submit login
CPP: A Standardized Alternative to AMP (timkadlec.com)
306 points by Yennick on Oct 25, 2016 | hide | past | web | favorite | 93 comments



This was first published last February, and the top comment was from a Googler who's involved with the AMP project:

> Love this. Not sure by coincidence, but the AMP team has been playing around with the same thing under literally the same name. We should meet up some time and discuss details. Not sure it would be an alternative, but rather a complementary thing.

I wonder what ever came of that.


The spec has been moving along: http://wicg.github.io/ContentPerformancePolicy/


The spec has been split into smaller pieces. The first one that made the most progress is https://wicg.github.io/feature-policy/


Correct me if I'm wrong, but doesn't the content maker get to decide if they in fact do want all the AMP benefits or not? If I remember correctly you can choose to disable the whole AMP cache google link, and only use the AMP optimizations.

Secondly, He goes on for a few paragraph at the start about how anyone can do this, but that's not so true is it? The whole point of the AMP redirect is that it's on Google cache servers, and unless you have a lot of money, that ain't gonna be cheap.

So at the end of the day, the last part is optional, and it's basically you paying for the cache with allowing them to use their domain (will all ads and traffic stats still sent to you).


omitting the amp cache google link basically makes it not AMP though - you can still follow the rules but it's just called "writing clean and minimal html" at that point.


I can't take AMP seriously as long as it forces web sites to include resources from Google.

That basically turns all AMP web pages into something that Google can track. Google can track enough of the web already, thankyouverymuch.

If your "standard" proposal starts with telling me I have to include content from a specific URL, you already lost me as a potential proponent.

I also happen to disagree with "all CSS needs to be inline".

Unless I missed something, CPP appears to not require me to include some 3rd party content, so I'm on board already :-)


I don't think it forces you to include resources from Google. I think the 2 primary reason Google asks you to include a specific javascript script [1] are :

1 - Lazy & Prioritized asset loading logic

2 - Loading assets from cache

Regarding the lazy & prioritized asset loading, the script [2] shouldn't be relying on some Google specific stuff. Since the project is open source, anyone can take a look and get an answer.

Regarding the loading from Google's cache: I haven't dug that much into it, but it's supposedly possible to write your own cache service (instead of relying on the currently free google CDN). Google provided the API and the URL format to so that it should be doable [3]. My assumption is that you can then create your own script, with the cache server URL changed to your own instead of Google's in a simple config file[4].

The question is then: by doing so (and effectively cutting any ties with Google's servers) will your page still be recognized as AMP by Google search engine? It will still reap the actual perf benefits (assuming your CDN does not suck), but if Google's crawler really wants that [1] to be present (and it should not), you wouldn't get the SEO benefits.

[1]

    <script async src="https://cdn.ampproject.org/v0.js"></script>
[2] https://github.com/ampproject/amphtml

[3] https://developers.google.com/amp/cache/

[4] https://github.com/ampproject/amphtml/blob/c44a48fbb1dbd0de0...


Aren't the AMP pages anyway delivered from the Google servers to the users? Then Google can track them anyway?

I'm just a casual observer and that's just how I understood the idea.


AMP pages are served from the publisher's server as usual, but Google can easily direct their users to their cache instead of the official site. Every AMP page also includes JavaScript hosted by Google, so Google gets pinged every time someone visits an AMP page, even if it wasn't through their cache.


https://developers.google.com/amp/cache/

"Google products, including Google Search, serve valid AMP documents and their resources from the cache to provide a fast user experience across the mobile web."

So it's Google who does deliver from the cache. Moreover, they motivate others to do the same on the same page.


Yes, Google can direct its own users to the cache, and others can also link to the cache. But if you or someone else link directly to the page, it is not served from Google's cache. Google would not be able to track those visits except for the fact that the page also loads content (at the very least, the AMP JavaScript) from Google's servers. AFAIK, it is not considered "valid" if you serve the JavaScript yourself.


That basically turns all AMP web pages into something that Google can track. Google can track enough of the web already, thankyouverymuch.

The point is to be able to track requests with javascript (or just Google Analytics, really) blocked.


Isn't there a better alternative to an abbreviation that's already widely used in the software engineering context?


PPC is pay per click, which is more likely to create confusion in a web dev context. PCP is less widely used in a software engineering and web developer context (one hopes anyway), but probably not something you want to be searching for from a work computer.

If we could get away from policies though, Performant Content Contracts doesn't overlap with much that is CS related.


PCP is also Performance Co-Pilot, a relatively widely-used systems performance tool. :)


The CPP abbreviation is used to be consistent with CSP.


What has Communicating Sequential Processes got do do with it??


Content Security Policy in this instance


Just correct him/her if you don't think it's a joke, why the down vote?


Best part is AMP is a C++ library.


Search engines already deal just fine with context. Just type in “cpp language” or “cpp html,” respectively.

I think everyone keeps making name collisions to be a much bigger deal than they are in reality.


If it was a minor collision, sure. But CPP is one of the most used programming languages on the planet. I suppose you think we should create standards for C++ and call them JS or RB or py, for short?


If you can always find what you’re looking for for any non-trivial (2 keywords or more) search terms, how is it not a minor collision? “CPP” is not even the name of the language—it’s the file extension.

EDIT: Mostly I’m just annoyed that those useless off-topics about naming keep crowding out actual discussion.


But CPP is the name of the language!

https://en.wikipedia.org/wiki/C_preprocessor


You are contradicted by literally the first sentence in that article.


cpp html: http://imgur.com/gV3dMI9

It's going to take significant momentum to displace "CPP" in the appropriate computing contexts.


Yeah but if you search for “cpp amp” it’s already mid-page, despite there being a C++ project with the same name. The page has hardly any backlinks yet.

Anyway if discoverability is an issue they can always give it a catchy marketing name (Project Swift Gazelle) later. CPP is the appropriate name for the proposal, given the relevant context.


Alternative name: CWP for Content Weight Policy :)


This is very interesting and I don't want to detract from that .

However, pointing out that for a (large?) portion of us, CPP means "C Plus Plus" so I was confused for a few seconds.


Welcome to the world of "$lang-lang" queries. Go, Rust, please make room.


You would then need to disambiguate between "clang" the compiler and "c-lang" the language.


Or C Pre-Processor.


Or Socialist Republic of Romania :P


Or Canadian Pension Plan


AMP is "asynchronous message passing" to me.


…and Apache + MySQL + PHP for me.


From back when LAMP (Linux being the L) was the buzz.


oh LAMP .. and if you were enterprise, it was SOA (Service Orientated Architecture) and today it's Microservices.

So buzzy.


LAMP and SOA don't really have much to do with each other.


We really are damn good at reinventing the wheel over and over and over, no?


The name makes perfect sense in context:

> CPP could borrow from the concept and approach of the already existing Content Security Policies (CSP). This means that there would likely be a reporting-only mode that would allow sites to see the impact the policy would have on their pages before applying it live.


I'm much happier to see this. My concern with AMP has always been that it removes incentives for browsers to get faster, because if most of the content is using AMP there's no point in optimizing important things. Having a multi-vendor solution gives us non-Chrome browser vendors a voice at the table.

(To give a concrete example, why bother with optimizing layout-affecting animations to run off the main thread if AMP just forbids them? Such animations are useful, precisely because they affect layout; we aren't doing Web authors any favors by forbidding them instead of just making them fast.)


AMP is for news pages, stuff is one page away from Search. Browsers still work to optimize the overall experience for web apps and full sites.


1. Animations have their place on news sites.

2. I'm not just talking about animations: things like parallel layout are also useful for static sites.



Just use pure HTML/CSS and the web sites will fly by comparison.


For a single page load, sure. For subsequent page loads you're loading a lot more than necessary (a JS app can fetch just the content that's changed, and without a blank page in between), so a pure HTML and CSS solution is a great deal slower.

Plus, if you users have unreliable internet connections then a JS app can use a service worker to cache the entire app to work offline, and only load in new content when possible. An HTML page doesn't work at all in those circumstances.

Sometimes JS does actually make a site better. It's not always unnecessary bloat.


There is a thing in HTML5 called Subresource Integrity (https://developer.mozilla.org/en-US/docs/Web/Security/Subres...).

It looks like this:

  <script src="https://example.com/example-framework.js"
  integrity="sha384-oqVuAfXRKap...."
  crossorigin="anonymous"></script>
I wonder if browsers could keep a cache with those hashes as keys and whenever the integrity hash has a match, then it can take the JS from the cache. That would save huge amounts of bandwidth and pages would be so much faster to load.

Probably right now we're fetching the same version of jquery hundred of times from 20 different domains a day.


Currently, SRI is not enough for browsers to implement content-addressable storage as you describe here, because it is subject to cache poisoning. See https://news.ycombinator.com/item?id=10311020 - basically, the browser can't know if a script can actually be loaded from the claimed domain without requesting it. This can be used to violate CSP.

Though it would be nice for the browser to cache it for domains that have delivered the script previously. It wouldn't be that different from a normal cache except the timestamp doesn't matter.


Thank you for the info! I imagined that there must be some technical issue as having hashes for content would make caching so easy. Anyway, at least using that hash for the same domain would save some requests, as browsers do requests to the server to see if the file hash matches in order to prevent sending the whole thing again.


It's not really bandwidth that causes the issue. Javascript is just really slow, both in parsing and execution. According to Chrome dev tools, parsing jquery takes 20ms on my 4.4 GHz desktop CPU. Now imagine how long that takes on a mid-range smartphone. Then add in a dozen other javascript libraries and shims and polyfills and the site is barely usable.


jQuery is a large library. If it were modular and developers used only the code they need it would be faster to parse.

But I doubt the bottleneck is JS code. The problem is that web sites are not optimized (some frontend developers think that writing a CSS stylesheet for narrow screen is enough) and they include a lot of resources (including trackers, advertisement, spying social network buttons I never click). Some of the widgets create an iframe (which is like a separate tab in your browser) and load a separate copy of jQuery there, make AJAX requests etc. And even worse, some advertisement networks can create nested iframes 2 or 3 levels deep (for example when a network doesn't had own ads, they can put Google Adwords block). So when you load a page with 10 iframes it loads the CPU as 10 separate tabs.

Decoding images is not free too, especially if it is thousand pixel wide heavily compressed JPEG or PNG image.

The real optimization would be cutting away (or making lazily loadable by user request) everything except content. As website developers are not going to do it, it is better to do the optimization on client side. I wish standard mobile browser allowed disabling JS, web fonts (which are just a waste of bandwidth) and loading images on request. Mobile networks usually have high latency so reducing the number of requests needed to display a page could help a lot.


You can download a custom bundle of jquery from the source. But then it won't be shared with other websites. The solution is for browsers to cache common libraries and not try to load these resources per page.


> There is a thing in HTML5 called Subresource Integrity

It piqued my interest, but I was disappointed to discover that it's only supported by Gecko & Blink[1] - not supported by Safari or IE/Edge. Javascript is currently unavoidable for offline apps.

1. http://caniuse.com/#search=integrity


It’s a progressive enhancement. Browsers that don’t understand the integrity attribute will just load the JS regardless, but at least Firefox and Chrome will get a safer experience.


JS + offline apps? thanks for making me feel old.


There are ways to cache responses defined in HTTP standard since first version. No fancy HTML5 features is required for that.

(And if you meant using hashes to use cache for resources from different domains - there probably will be many misses because every website can use different library versions, they can compress or bundle libraries etc).


At least on HTTP/1 sites, most people are bundling their libraries together, so subresource integrity can't save you there.

I want to say there's a security concern with re-using these libraries, but I guess the possibility of a hash collision would be extremely small.


They won't unfortunately implement that caching scheme, because that leaks the sites you have visited to attackers.


> if you users have unreliable internet connections then a JS app can use a service worker to cache the entire app to work offline,

It looks like an over engineered system, and you have to preload the content while on WiFi. And by the way do you know a reliable way to detect whether device is really online or there is a link but no packets are going through?

And every website is supposed to write its own code for service worker.

I think it would be easier to implement a feature in a browser where user can explicitly save some pages for offline reading. Or allow user to view pages from cache.

> Sometimes JS does actually make a site better.

For most sites it just adds unnesessary widgets (like spying share buttons) and advertisements. Especially on newspapers' sites - most of them work better without JS.


All that stuff sounds really great if you had a lot of engineering resources and you are writing Gmail, but for 99% of the sites out there the JS hacks that load just the deltas and whatnot just get confused by packet loss and I end up having to reload the entire page including the gigantic JS hairball, or even worse the thing is so confused that I have to clear my cache and cookies to make it ever work again.

Simplicity has so many things in its favor.


Actually for the first part, there is this protocol called SDCH (https://en.wikipedia.org/wiki/SDCH) that allows a site owner to define a site-global compression dictionary, and each resource then becomes a compressed resource with the dictionary being the former one. It's hard to deploy, but it works: LinkedIn saw an average of 24% additional compression.

For the second part I wonder if browsers could display stale data with some warning saying so; that would solve many problems that happen all the time (refreshing a page after the website came down, ...)


Given my web development scars, I have become an advocate that for anything other than dynamic documents, the way to go is native.


pjax (whether using the old familiar jquery-pjax or some more up-to-date implementation) is great for decorating simple HTML pages with, to replace full page loads with just a main-content load. And since the fall-back is just a full page load, it degrades really gracefully.


You're lucky if you've got a good enough site so that you can get people to stick around for the second page load! Subsequent CSS calls will be cached for the majority.


Client-side XSLT and HTTP caching address both those issues. Yes, JS is another way to solve those issues, but not the only one.


HTML/CSS are slow. The true way is ".txt"

Edit: I mean this as a sort of lazy way of making a reductio. I don't think .txt is better than HTML/CSS for pages (and I hope that's obvious). I also don't think having no JS is a good idea.

I believe in progressive enhancement, and to a first approximation, I think that all websites have at least one feature that they could implement in JS that would be "a good thing".


I agree 90% with this; .txt files get most of the job done! ;-) The other 10% where i don't fully agree comprises of hyperlinks; i need me some clickable hyperlinks. :-)


Browser option to make hyperlinks clickable when rendering text files? Or client-side browser rendering of text-based markdown.


Having browsers intelligently render `text/markdown` sounds like a great idea. And while we're waiting on the browser implementation, maybe we can find some sort of temporary workaround to send a markdown parser to the client?


Maybe we could just send the raw markdown anyway - it might not be pretty on all clients, but it should be _legible_ on all clients.

Or maybe we could send markdown if the user agent included text/markdown in the request's Accept header, and pipe it through a markdown->HTML filter otherwise.

I would love to see some kind of native markdown support on the web.


A web search turned up this Firefox plugin, https://addons.mozilla.org/en-US/firefox/addon/markdown-view..., are there other good ones?


As long as I can make text blink, I'm happy.


No, hyperlinks failed us. It's all about single-page apps. Put all of your content and all of the content you would have linked to in your .txt. We need someone to build Reactxt.


Whoa whoa whoa, what's some fat cat solution right there! Just use a http header for your content and you're set.


The url should contain all the information so we don't have to make a HTTP request.


I think most news sites implemented this years ago.


Someone posted this in a reply the other day and I feel it should go here as well:

https://vimeo.com/147806338

It's a pretty interesting talk. Just thinking about this kind of stuff can make way skinnier web pages than AMP. I mean really, if we designed pages for 56k modems, the web would be much much fast on mobile.


I would just about give my left arm for a consistent 56k mobile connection.


Mildly interesting: CPP AMP is already a thing: https://msdn.microsoft.com/en-us/library/hh265137.aspx


I think the real concern about AMP is not that it's nonstandard, but that Google's caching mechanism reduces publishers' control over how their content is presented and keeps users on Google's domain. This is a problem for publishers, but I assume it increases performance (perhaps because Google prefetches the page while the user is looking at search results?).


I agree with the premise that the amp framework (or any such framework) should be separate from being forced to work with only a specific set of tools, etc. However, it seems to me that what the author is proposing is really just more/better adherence to html/web specs...no? Or perhaps, a few performance-related tweaks to existing html/web specs...I mean, if we all (that is web producers, website managers, content authors, etc.) simply produce sites that adhere more to already established web standards (a la html5, xhtml, etc.), AND have browser makers be more strict in their interpretations of the established html/web specs, then we'd be almost all the way there...no? I'm by no means stating that this is easy, just stating that the author might be re-inventing a wheel that simply could use some optimization.


Right now there are a lot of things that specs let you do that will make your page slow, like running a lot of js in the scroll event handler, or just including too much js overall. If you read the current proposal [1] the idea is that the site could make promises not to do various kinds of slow things, and the browser could enforce that.

[1] http://wicg.github.io/ContentPerformancePolicy/


> However, it seems to me that what the author is proposing is really just more/better adherence to html/web specs...no?

Pretty much but buzzwords/PR work, "CPP Compliant" vs "We build stuff properly (for a given value of properly)"


No, there are terrible antipatterns that browsers enable to not break compatibility. A strict performance sensitive mode for web platform (HTML/CSS/JS) is long overdue. AMP is a pretty great solution, and the only issue with it has been the proprietary feeling. The browser support is the cherry on top.

The argument that people will just use best practices out of sheer goodwill or even just basic competence has been thoroughly debunked. No one optimizes for performance if they are not penalized.


COP? Content Optimization Policy?

That's the quickest alternative I can think of to CPP, though I'd be fine with CPP in any case.


No alternative can take part of AMP, now it become standard and there are hundred of scripts and plugin available for it. I am using WordPress plugins https://talktopoint.com/wordpress-amp-and-instant-articles/ and they are just working fines. What is need of this new alternative and why should I use it?


We already have a standardized alternative to AMP; it's called HTML.

Just cut the crap.


Shouldn't this be a white-list? If we restrict the features allowed we won't run into all the same security and performance issues we have with the web.

(also, we could call it HTML Light, or htmll)


> No synchronous external scripts nor blocking external stylesheets

I know we have:

<script type="text/javascript" src="script.js" async></script>

but for stylesheets, why can't we have something similar? Instead of a JS workaround, can't we have:

<link rel="stylesheet" href="sheet.css" async/>

I hate hate HATE the idea of having css dependent on some JS (even if enabled) that might or might not run depending on what feels like working today.


In the future, (i.e. only Chrome support this) you will be able to do this:

    <link rel="preload" href="/assets/stylesheet.css" as="style" onload="this.rel='stylesheet';">
which will more or less be async CSS. You can see that it will download in the background and morph into a stylesheet when it's ready, while the document continues to be parsed below it.

Then it will just be a matter of including this for people with JS switched off:

    <noscript>
      <link rel="stylesheet" href="/assets/stylesheet.css">    
    </noscript>
And then for browsers which have JS enabled but don't support the resource hint 'preload', you could do something like this as a fallback:

    window.addEventListener(  'load', function sweepUnloadedPreloads() {
      
      window.removeEventListener( 'load', sweepUnloadedPreloads, false );

      [].slice.call( document.querySelectorAll( '[rel=preload]' ) )
        .forEach( function( item ) {
          
          // simply doing this might work:
          item.rel='stylesheet';
          
          /** OR, if that doesn't work (I haven't tested it)**/
          var new_link = document.createElement( 'link' );
          new_link.rel = 'stylesheet';
          new_link.href = item.href;
          document.head.appendChild( new_link );
        });

    }, false );
The sketchy hypothetical fallback technique above, or any JavaScript CSS loader, could be augmented by using prefetch to attempt to get the tyres warm and start a low-priority download of the stylesheets in question.

    <link rel="prefetch" href="/assets/stylesheet.css">
And obviously there's Service Worker, which is also slim on support, but promises to turn your website into a near-native experience by providing the mother of all caches for resources and offline pages/resources.

Preload spec: https://www.w3.org/TR/preload/

Preload support: http://caniuse.com/#feat=link-rel-preload

You could also just put the link element(s) specifying your stylesheet(s) in the body to 'async' it — Stripe does this on stripe.com. It's not valid HTML but very few browsers seem to give a damn.


> It’s also the only JavaScript allowed: author-written scripts, as well as third party scripts, are not valid.

Minor nitpick: they're allowable in iframes. I.e. sandboxed.


http://xkcd.com/927 comes to mind




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: