Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Make your site’s pages instant in one minute (instant.page)
1214 points by dieulot on Feb 9, 2019 | hide | past | favorite | 337 comments

Live demo on my website @ https://sysadmincasts.com/

Temporary added it in-line for testing. I was already in the sub 100ms level but this just puts it over the top! Also updated all admin add/edit/delete/toggle/logout links with "data-no-instant". Pretty easy. Open developer preview and watch the network tab. Pretty neat to watch it prefetch! Thanks for creating this!

ps. Working on adding the license comment. I strip comments at the template parse level (working on that now).

pps. I was using https://developers.google.com/speed/pagespeed/insights/ to debug page speed before. Then working down the list of its suggestions. Scoring 98/100 on mobile and 100/100 on desktop. I ended up inlining some css, most images are converted to base64 and inlined (no extra http calls), heavy cache of db results on the backend, wrote the CMS in Go, using a CDN (with content cache), all to get to sub 100ms page loads. Pretty hilarious when you think about it but it works pretty well.

If I mouse-over the same link 10 times, it looks in my network tab like it downloads the link 10 times.

I'd expect this preload script to remember the pages it's already fetched and not duplicate work unnecessarily. :/

Perhaps the author could add a script parameter, or support an optional 'preload-cache-for' attribute, so you'd write <a preload-cache-for="300s" ...>

If you really care about speed anyway, you should already have setup your site to max out caching opportunities (Etag, Last-modified, and replying "not-modified" to "if-modified-since" queries) - I'd suggest the author should ensure the script does support caching to the broadest extent possible - hitting your site whenever appropriate.

Cache-Control headers already do a better job of solving that problem https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Ca...

Set your http cache headers correctly to instruct the client to save and re-use pages it downloads.

Yeah, this is likely something I need to look into. Since the site changes quite a bit for, non-logged vs logged, I'm not really doing much html caching right now. I'll check that out though. Even if I added something like 30 second cache TTL that might work. I'd have to see if that is even possible and test though. Maybe force an update on login? I'm not a html cache pro so I'd need to see what the options are and then do some testing. For now, it was just fast enough / not worth the time, to do anything other than a fresh page for each request. But, this is a good suggestion. Thanks.

ps. I do tons of caching for any images and videos though. I know those things never change so have weeks of caching enabled.

But how do you know it's still fresh?

HTTP has multiple cache control headers. It's a fairly complex topic, but TL;DR: do the config and it works in any browser since ~2000 (yes, even IE6).

Very impressive! Wonder if it'd be worth rerouting some of the paginated URLs like the episodes `?page=4` to `/page-4` or something like that. :shrug: - either way, looks like you had some fun optimizing it!

/page/4 would probably be more prudent

I don't like this as much since a savvy user might then try to navigate to /pages/ - which has what on it?

Good idea. I'll check that out! Thanks.

I just released version 1.1.0 which allows whitelisting specific links with query strings like those. https://instant.page/blacklist

Awesome, thank you!

This is what the OP should have posted. Very impressive. Will use in my work.

I like that there's an extra optimization where it cancels the preload if your cursor leaves and the prefetch hasn't finished it. Would help on really slow networks and pages that timeout.

nothing on my network tab in ff 65.0

ditto. It works on Chrome though (sigh)

Why the sigh? It says right on the site that this gracefully degrades on browsers that don’t support it. Why is it a problem making a site faster in a browser designed for speed, if it does not degrade the experience at all in all other browsers?

That is OK, if you are talking about older browsers or browsers on limited / obscure platforms. Firefox doesn't fit the bill, and optimizing the site just for ~~IE~~ Chrome hurts the FF users and makes Chrome win even more - leading to self-fulfilling claim that Chrome is faster. It's not, but it will be if people optimize for it. This is one of the reasons I make a point to always develop in Firefox and later just check in Chrome (note that I still do check, because majority of people use it) - apart from simply better experience of course. :)

Not working for me either, Chrome works great though.

Embedding base64 images isn't really more efficient on HTTP/2 servers. Base64 adds overhead, and multiplexing mitigates the cost of additional network requests.

But unless you use server push, it still is an additional roundtrip.

Yes but since it's multiplexed into the same tcp stream it doesn't suffer from slow start and so the tcp window is already large so it's not as bad as it would be on http1.

UPDATE - Feb 11. I removed the include for now. I was seeing a weird caching issue when people login/logout where the navbar would not update correctly (for the first few requests after). I'm still digging into this. I likely need to invalidate the browser cache some how. Doing some research to see what the options are.

Have you signed up for a page speed monitoring service, like https://speedmonitor.io/ ?

I'd be very curious to know what your performance looks like over time, especially as it relates to various improvements that you try out.

Or maybe Pingdom, NewRelic, or gtmetrix?

I wonder if breaking the page into two parts as above and below the fold, Above would even have base64 images and all inlined, below would get analytics script loaded.

I've been testing it for the past 30 minutes or so and found that it doesn't cause the same problems that InstantClick did. (Which was javascript errors that would randomly occur.) I'll limit it to a small subset of users to see if any errors are reported but there is a good chance this could go live for all logged in users. Maybe even all website visitors if all goes well.

Seems to have no impact on any javascript, including ads. Pages do load faster, and I can see the prefetch working.

Just make sure you apply the data-no-instant tag to your logout link, otherwise it'll logout on mouseover.

> Just make sure you apply the data-no-instant tag to your logout link, otherwise it'll logout on mouseover.

Logout links should never be GETs in the first place - they change states and should be POSTs.

POSTs are not Links. And Logout service is indempotent even if you can consider it changes the state of the system

Lots of people in this thread confusing “idempotent” with “safe” as specified in the HTTP RFC: https://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html

FWIW RFC 2616 was obsoleted by the newer HTTP/1.1 RFCs: https://tools.ietf.org/html/rfc7231#section-4.2

Which still doesn't change GP's point though:

> In particular, the convention has been established that the GET and HEAD methods SHOULD NOT have the significance of taking an action other than retrieval. These methods ought to be considered "safe".

(there's an exception listed too, but doesn't apply to logout)

EDIT: I know of someone who made a backup of their wiki by simply doing a crawl - only to find out later that "delete this page" was implemented as links, and that the confirmation dialog only triggered if you had JS enabled. It was fun restoring the system.

I don't know why you think I'm contradicting them. I was just pointing out that there are newer RFCs. They also happen to have a stronger and more complete definition of safe methods.

Ok, so make it a form/button styled to look like a link.

Idempotency is not the issue, the issue is that a user might hover over the logout link, not click it, then move on to the rest of the site and find they are logged out for no reason.

Right, which is why the included library includes an HTML attribute to disable prefetch on a given link.

OP’s point was that logout should not be implemented with a link/GET but instead with a button/POST for exactly this reason.

A logout action is idempotent, though. You can't get logged out twice. In my opinion, that's the use case for a GET request.

I just checked NewRelic, Twilio, Stripe and GitHub. The first 3 logged out with a GET request and GitHub used a POST.

Idempotency has nothing to do with it. Deleting a resource is idempotent as well. You wouldn't do that via GET /delete

A GET request should never, ever change state. No buts.

Just because a bunch of well known sites use GET /logout to logout does not make it correct.

Doing anything else as demonstrated in this and other cases breaks web protocols, the right thing to do is:

GET /logout returns a page with a form button to logout POST /logout logs you out

Depends on your definition of “state.” A GET to a dynamic resource can build that resource (by e.g. scraping some website or something—you can think of this as effectively what a reverse-proxy like Varnish is doing), and then cache that built resource. That cache is “state” that you’re mutating. You might also mutate, say, request metrics tables, or server logs. So it’s fine for a GET to cause things to happen—to change internal state.

The requirement on GETs is that it must result in no changes to the observed representational state transferred to any user: for any pair of GET requests a user might make, there must be no change to the representation transferred by one GET as a side-effect of submitting the other GET first.

If you are building dynamic pages, for example, then you must maintain the illusion that the resource representation “always was” what the GET that built the resource retrieved. A GET to a resource shouldn’t leak, in the transferred representation, any of the internal state mutated by the GET (e.g. access metrics.)

So, by this measure, the old-school “hit counter” images that incremented on every GET were incorrect: the GET causes a side-effect observable upon another GET (of the same resource), such that the ordering of your GETs matters.

But it wouldn’t be wrong to have a hit-counter-image resource at /hits?asof=[timestamp] (where [timestamp] is e.g. provided by client-side JS) that builds a dynamic representation based upon the historical value of a hit counter at quantized time N, and also increments the “current” bucket’s value upon access.

The difference between the two, is that the resource /hits?asof=N would never be retrieved until N, so it’s transferred representation can be defined to have “always been” the current value of the hit counter at time N, and then cached. Ordering of such requests doesn’t matter a bit; each one has a “natural value” for it’s transferred representation, such that out-of-order gets are fine (as long as you’re building the response from historical metrics,

Don't be a wise ass, with that definition state changes all the time in memory registers even when no requests are made.

> So, by this measure, the old-school “hit counter” images that incremented on every GET were incorrect

Yes they are incorrect. No Buts.

Two requests hitting that resource at the same exact timestamp would increase the counter once if a cache was in front of it.

That brings me back to the year 2001, when my boss's browser history introduced Alexa to our admin page and they spidered a bunch of [delete] links. cough cough good thing it was only links from the main page to data, and not the actual data. I spent the next few days fixing several of problems that conspired to make that happen...

As in, anybody with a link to /delete could delete things? No identification/authentication/authorization needed?

> I spent the next few days fixing several of problems that conspired to make that happen...

Yes, I was a total n00b in 2001. But then, so was e-commerce.

and fwiw, I knew exactly how bad our security was... I kept my boss informed, but he had different priorities until Alexa "hacked" our mainpage :p

If you're not allowed to change state on GET requests, how do you implement timed session expiration in your api? You can't track user activity, in any way, on get requests, but still have to remember when he was last active.

Idempotence is for PUT requests. GET requests must not have side effects.

I've heard this "get requests shouldn't have side effects" argument before, but I don't think it works. At least, not for me, or I'm doing something wrong.

For example: Let's implement authentication, where a user logs in to your api and receives a session id to send along with every api call for authentication. The session should automatically be invalidated after x hours of inactivity.

How would you track that inactivity time, if you're not allowed to change state on get requests?

I think this is the argument for PUT instead of POST, not GET instead of POST.

You're confusing idempotency and side effects. A GET should not have any side effects, even if they are idempotent.

It's not about idempotency, but about side effects. The standards mention if it will cause side effects use POST. Logging out does cause side effect (you lose your login) and hence should be a POST.

In the old days it might have been acceptable to get away with a GET request but these days thanks to prefetching (like this very topic) it's frowned upon.


GET is also supposed to be “safe” in that it doesn’t change the resource which a logout would seem to violate.

The whole reason this is supposed to be the case is in order to enable such functionality as this instant thing.

Also: sometimes a site is misbehaving (for myself, or maybe for a user we're helping) and it's helpful to directly navigate to /logout just to know everyone is rowing in the same direction.

Using a POST, especially if you're building through a framework that automatically applies CSRF to your forms, forecloses this possibility (unless you maintain a separate secret GET-supporting logout endpoint, I guess).

When I originally started my community site I used GET for logout. However, users started trolling each other by posting links to log people out. It wasn't easy to control, because a user could post a link to a completely different site, which would then redirect to the logout link. So, I switched to POST with CSRF and never had another issue.

That's exactly the problem with idempotency.

actually, no. Idempotency means that you can safely do the same operation multiple times with the same effect as doing it once. That's a different issue than the no-side-effects rule which GET is supposed to follow.

Thanks, how did you find out about data-no-instant tag?

Each time you hover over a link it's doing a GET request bypassing the cache (cache-control: max-age), even if you hove the same multiple times. Also this will make all your analytics false... Except that indeed this can improve greatly the user sensation of speed

The analytics should only be triggered is the page is rendered, assuming it's done client side. I believe Google does this for first top 3 results if I'm not mistaken.

This helped me quite a bit, my mental model of how this worked was off. Prefetching only downloads the resources but does not actually execute any js code. So sites doing lots of tag managers or other js loading js likely wouldn't benefit, but standard GA, etc. should be fine.

Added bonus: these extra pre-loads on hover will tell you if someone nearly clicked a link. Your web server logs contain some poor-man's eye tracking. Could aid determining if important parts of your pages (warning messages and so forth) gather enough attention.

Seems like an invasion of privacy. I'd be surprised if someone hadn't used this sort of script for this already. Perhaps browsers or ad blockers will block this feature in the future.

There's already other libraries out there like Fullstory, which tracks all the user mouse movements and interactions with the page and allows you to watch a user interact with your site in nearly realtime.

There are full tracking tools that do a lot more than this already - tracking all mouse movements and key presses. They are often used with the knowledge of users when testing UI changes/experiments, I've even heard of more general versions running as a browser plug-in for medical monitoring (watching for changes in coordination of people with degenerative conditions, without it feeling like an active test accidentally biasing results), though I would also be surprised not to find the idea already used more widely (and perhaps more nefariously) without users knowing.

Good point.

What would be a good way to avoid prefetch requests in your analytics if you only derive analytics from server access logs?

Most browsers pass a "purpose:prefetch" or similar HTTP header in the prefetch request that you can use to differentiate

Then how do you know when they actually go to the page? Do you need client side analytics at that point since the browser already has the page in memory?

You could obviously do it with JavaScript by pinging the server to tell him that log entry X is to be promoted to a true "hit".

For a non-JS solution, I guess an tiny iframe at the bottom of the html page that accesses a special server page with an unique stamp that causes the same "hit". the iframe loading would mean that the rest of the page is mostly loaded before it was closed.

> For a non-JS solution

How do you use this library with JS disabled?

The iframe would have to be part of the HTML document from the very start. Maybe a server-side pre-processor that appends it as it's served.

If a GET request is all it takes for your analytics to be messed up, then your analytics are not reliable in this era of bots pinging everything.

You need to be emitting your analytics events from a rendered/executed page. Preferably with javascript, and a fallback <noscript> resource link to a tracking url can work here.

What's more important - analytics, or the user experience. Damaging stats sucks, but this seems to outweigh that.

Clearly you're not in middle management.

In a lot of legit cases, analytics. Especially versus a UX improvement that is clearly minor. The word "analytics" has bad rep but it's massively important to know if you're building the right product for your users, without a sort of analytics you're shooting in the dark

Without analytics, I don't justify my project soooo

Might be worth adding something like a Pre-Fetch: True header to pre-fetched requests. But then the problem is, if the user pre-fetches and then actually views it, how does your analytics know unless the client then sends another request?

So would setting erroneous pre fetch headers to all requests would help to spoil analytics insights?

It seems everybody is missing this but this could actually slow down your experience, and I'd actually guess it will in some scenarios (ie. not only a theoretical situation).

Considering a user hovering over a bunch of links and then clicking the last one, and doing this in a second. Let's assume your site takes 3 sec to load (full round-trip) and you're server is only handling one request at a time (I'm not sure how often this is the case, but I wouldn't be surprised if that's the case within sessions for a significant amount of cases). Then the link the user clicked would actually be loaded last, after all the others - this probably drastically increase loading time.

The weak spot in this reasoning is the assumption that you're server won't handle these requests in parallel. Unfortunately I'm not experienced enough to know whether that happens or not, but if so, you should probably be careful and not think that the additional server load is the only downside (which part like is a negligible downside).

It actually cancels the previous request when you hover over another link

Client side canceled doesn’t necessarily translate to server side canceled.

I used to use a preload-on-hover trick like this but decided to remove it once we started getting a lot of traffic. I was afraid I’d overload the server.

I'd also hesitate wasting resources in such a way.

About your first statement though, which server software do you use that still sends data after the client has closed the connection? Doesn't it use hearbeats based on ACKs?

It doesn’t send the response to the client, but it still does all the work of generating the response.

I use nginx to proxy_pass to django+gunicorn via unix socket. I sometimes see 499 code responses in my nginx logs which I believe means that nginx received a response from the backend, but can’t send it to the client because the client canceled the request.

I admit I haven’t actually tested it directly, but I’ve always assumed the django request/response cycle doesn’t get aborted mid request.

The server is still doing all of the work in its request handlers regardless of whether client closed the connection.

Not if the server is setup correctly.

That doesn't make sense. You can't just "config a server" to do this. Even if a web framework tried to do this for you, it would add overhead to short queries, so it wouldn't be some universal drop-in "correct" answer.

Closing a connection to Postgres from the client doesn't even stop execution.

> You can't just "config a server" to do this.

Unless you are focusing on the word server and assuming that has nothing to do with the framework/code/etc, then I can assure you it can be done. I’ve done it multiple times for reasons similar to this situation. I profiled extensively, so I definitely know what work was done after client disconnect.

Many frameworks provide hooks for “client disconnect”. If you setup you’re environment (more appropriate term than server, admittedly) fully and properly, which isn’t something most do, you can definitely cancel a majority (if not all, depending on timing) of the “work” being done on a request.

> Closing a connection to Postgres from the client doesn't even stop execution.

There are multiple ways to do this. If your DB library exposes no methods to do it, there is always:

pg_cancel_backend() [0]

If you are using Java and JDBI, there is:


Which does cancel the running query.

If you are using Psycopg2 in Python, you’d call cancel() on the connection object (assuming you were in an async or threaded setting).

So yes, with a bunch of extra overhead in handler code, you could most definitely cancel DB queries in progress when a client disconnects.

[0] http://www.postgresql.org/docs/8.2/static/functions-admin.ht...

I don't think it cancels database queries.

Depending on the framework it can. That's the purpose of the golang Context and C# CancellationToken.

I believe PHPFPM behaves in this way. When the client disconnects from the web server, their request stays in the queue for a worker to pick up, I don’t believe there is a way to cancel it.

Per my cursory reading of the source code it looks like it might only prefetch one link at a time: https://github.com/instantpage/instant.page/blob/master/inst...

If that's not how it works, it could easily be modified to add a throttle on how many links it will prefetch simultaneously.

This library should be only handling cache logistics, moving a cdn cache to browser. Otherwise is ill advised because of the reasons specified.

Every site that can use a last mile performance optimization like this should already be serving everything from some form of cache, either from varnish or a cdn. So in theory, availability of the content should not be the problem.

It also has a minimum time of hovering, so if you're going by it quickly it won't fetch anything.

Many people browse the web from an employer who has rules about what types of pages may be accessed. For example, a person applying for a job with my team may include a link to a web page about their job-related background -- portfolio.html or whatever. HR tells us to be sure we don't follow links to any other page that may be more personal in nature, such as a page that reveals the applicant's marital status (which can't legally be considered in hiring decisions here). HR doesn't want to deal with complications from cases where we reject an applicant but there's a log entry showing a visit to, say, family.html from our company's IP address block. We'd prefer that prefetching isn't a default.

There's also log analysis to identify the set of web pages visited by each employee during work hours, and an attempt to programmatically estimate the amount of non-work-related web browsing. This feeds into decisions about promotions/termination/etc. Prefetching won't get anyone automatically fired, but we'd still prefer it isn't a default.

If you need people to design their websites around your companies HR metrics collection for them to work, then your HR department's metrics are the problem. The easiest way to improve productivity at your company could be to drastically cut HR funding, maybe HR should collect some metrics on how much time HR spends building datasets that aren't accurate.

I don't think it's a matter of HR necessarily. HR has guidelines because a lot of their job has to do with compliance. Big companies absolutely need large HR departments because a lot more regulations apply.

Which regulation states that promotions should be based on browser history?

While this is an approach you could take, it's not a very user friendly one. I agree with you that those HR policies sound draconian, but OP isn't necessarily in a position to challenge or change them.

In addition to being less user friendly, having the mindset that users/visitors to your website must live and work in ideal settings means that whatever you create will tend to be fragile and brittle because you don't try to take into account situations that you haven't seen before.

Jesus. I hope they pay you well for that.

I've heard a lot of stories of ridiculous rule-by-HR culture, but that's so extreme it sounds made up.

I don't think it's made up, because I experienced the same thing in a pretty well-known European research center...

Of course, had I known about these practices in advance, I would have declined the job offer. But I didn't. I ended up quitting a few weeks later anyway.

IT would monitor all connections from all employees and send a report to upper management with summary statistics, on a monthly basis.

I was told this was the case by a fellow worker during my second day there, so I tunneled my traffic through my home server via SSH. When IT asked me why I had zero HTTP requests, I reminded them that monitoring employees traffic was illegal under our current legislation. Doing this in a university-like non-profit research center is hard to justify.

So they asked you you are surfing the web on a insecure protocol that can compromise internal data ?

Couldn't you just say "I just dont use http anymore because this X company data is very valuable to me" ?

I don't see why it's hard to justify. They are providing facilities for you to perform the work they request, not for your personal benefit.

Invert the scenario: if they told you that you had to do work-based research on your own personal Internet connection, would that be OK? Any overage charges are yours to pay, no compensation.

The part about viewing family.html seems kind of understandable. If you assume no bad actors, then it's crazy... But we're all developers here, we know that you have to assume the existence of bad actors, and assume that they are going to target you (which is why we you always validate data client data server-side). I could see how viewing family.html could turn into a real headache for HR/Legal, especially if the law says that you can't discriminate based on family information.

The other part about log analysis seems crazy, though, I agree with you on that.

We're talking about evidence in a discrimination court case that points to an IP address associated with a company that visited /family.htm around the time someone applied for a job they didn't get. Like, that person went through their blog's access.log when they got home, defeatedly looking up IP addresses, and going "aha, jackpot!"? And everyone in the company hovers links to be sure they don't go to /as-a-black-man.htm during the hiring process? And the fear seriously is that prefetching might be what spurs this chain of events?

That sounds batshit insane.

Yes, I can't see how anyone could think this is sane in any sense. Brb putting my kids in my GitHub profile picture.

Just go right for the kill and put your marital status, ethnicity, and sexual preferences right below your name on your resume. That way they're trapped the instant they open it!

A while back I saw a chrome extension that hid profile pictures on GitHub specifically for this reason.

One of my biggest career goals is never to work in a company like that. A company that micromanages me to the point they search my fucking internet history to decide my termination doesn't deserve my technical skills.

Double True!

The second hn item today that made me agree that developers need a union.

I always let my team browse Facebook, if they wanted. One of my top people browsed it the most out of everyone. If you block a page, then they will just use their phone.

If you are going to measure, then measure outputs. Measuring inputs will make you and your team equally unhappy.

Measuring Output! Especially true for which their job require more thinking than actual labour. It doesn't matter how much time he spent on it, as long as it gets done. And more often, you actually need to play a game, relax or something different before coming back and solve the problem in seconds as compare to staring at it for hours and not getting anywhere.

Sounds terrible: An automated analysis of an employee's behaviour linked to promotions and even terminations? Should be outright illegal.

The webbrowsing of other people is their private matter. If you think someone is surfing the web too much (or taking too many coffee breaks, or leaning on the shovel for a minute or any other normal activity people do to take little breaks from work), it's on leadership to tell the person to get back to work, or generally, create a work environment where work flows more naturally.

Logs are for investigations in case of crimes etc.

I read so many things here on HN that are illegal in Germany. We have laws and powerful worker representation that prevents dehumanitzing stuff like that but, often, things started in the US find their way over here....

> There's also log analysis to identify the set of web pages visited by each employee during work hours, and an attempt to programmatically estimate the amount of non-work-related web browsing.

If your company is doing that, then do not browse on company time and/or using company equipment at all. Ever. They obviously don't (or for regulatory reasons, can't) trust you, so you should treat them as an adversary for your own good.

Remember: HR exists to protect the company, not the employees.

Any companies with such onerous policies could block sites like instant.page at the firewall.

Why would they block being handed extra leverage over their employees?

I had seen that kind of firewall before when a sale presented what his firewall is capable of at company I worked with. I'm not sure how I felt back then.

I mean that firewall was used to track every website you browse and other evil stuff.

>This feeds into decisions about promotions/termination/etc.

Where do you work that makes tolerating that level of idiotic behavior worth it? It the job super interesting or the pay above market rate? If not, there are much greener pastures my friend.

Companies that treat their employees like morons eventually push out everyone who is not one.

That sounds like hell.

That story about 'personal' pages sounds like an excuse for tracking what you do on your computer. I'd look for a different job if I were you to be honest. It sounds like a toxic environment to work in.

> Many people browse the web from an employer who has rules about what types of pages may be accessed.

I think only very few people browse the web from an employer who has rules about what types of pages can be fetched. As a web developer, if I can make a faster experience for 99% of my population at the cost of potentially annoying the HR department of some tiny fraction of them, I'm going to do it. And I won't feel bad about making it slightly more difficult for my site's visitors' management to effectively surveil their browsing habits -- not my problem!

Could you say what company this is, so that I can make sure I never work there?

You can surf with JavaScript turned off, or use your private mobile phone.

You should find a different company to work for.

Just wondering, which continent are you located in?

Does your company tell its' employees about their Orwellian policies upfront when hiring, or is it a public secret?

How do you deal with encrypted traffic, e.g. https?

Some companies simply filter web traffic via corporate white/blacklists, maybe you have some insights why an illusion of freedom has been chosen in your company?

Sounds like prefetching is good for you, since greater prefetching provides greater plausible deniability.

Tbh I would have been fired the next day.

P.S: Using HN at my office. Was looking at career pages of other software companies a while ago.

I hope I never end up in your dystopian part of the world.

Is there a novel about this company?

Is this a re-rebranding? I remember using something similar 4-5 years ago (instant.js/instantclick).

But quite an interesting little thing, especially useful for older websites to bring some life into them. The effect is very noticeable.

Kind of. It’s different than InstantClick in that it uses <link rel="prefetch"> and can thus be embedded in one minute, while InstantClick transforms a site into an SPA and require additional work.

It’s a different product. The initially planned name for it was “InstantClick Lite”.

Oh interesting, I'll give this a go.

I've seen this kind of features several times on libs. And sometimes I can't un-think about it while my mouse is hovering on links in normal life. It's .. prefetching or not?

Hmm, I wonder: if the user isn't asked to authorise the action whether it technically breaches the Computer Misuse Act (UK).

If you send a requested page, that's obviously fine. Normal use of websites is expected, but unilaterally instructing your site visitor's browser to download further unrequested content that's not part of a requested resource ...?

What about downloading tracking js you didn't ask for then?

Since nothing is actually executed until you click on the link, the only issue would be those on metered connections.

This is a feature available on most web frameworks today (for example Link's prefetch on Next.js), but still could be very useful for smaller website and other static pages not using such frameworks.

I'd be a little wary of using a script from an unknown person without being able to look at the code - I'd rather see this open source before using. Especially being free and MIT licensed, I don't see why it wouldn't be open.

In the technical details, he has a link to the open source on github. Here's the js that's actually doing the preloading: https://github.com/instantpage/instant.page/blob/master/inst...

I stand corrected then, thanks for sharing :) I missed that part!

Just go directly to the script url: https://instant.page/1.0.0

The code is not obfuscated or minified, very easy to read.

It probably should be minified if the whole point is to improve page load times.

Compression and caching makes any minification of small scripts more than negligible.

Perhaps you meant less than negligible? Or simply negligible? But not more.

Minification is free and is done only once. In our case, the script is .9kb compressed (2.9kb uncompressed). When minified, it goes down to .6kb compressed (1.1kb uncompressed). It’s a small improvement, but there’s no reason to ignore it.

Saves over 50% of bandwidth ... "it's a small improvement". ^_^

They mean the 300 bytes shaved from gzip to minified+gzip.

Both are under MTU size of TCP packet.

It appears that the source code [1] is linked from their Technical Details page.

[1] https://github.com/instantpage/instant.page

Isn't everything client-side on the web inherently open-source?

Is there something similar available on Django?

This should be completely backend agnostic. I was never a Django person, but you’d just put that script tag in your main template so it loads on every page.

I use Django and have done similar things. yes this should be backend agnostic.

Perhaps this something that a browser should be doing, instead of websites themselves?

Enabling it by default Internet-wide seems like it could be a bad idea for many reasons. If I go out of my way to enable it on my site, I am taking responsibility for the bandwidth and any side-effects of prefetching a link and understand what I am doing. But if it is simply enabled Internet-wide, isn't that bordering on a DDoS? What about poorly-coded websites/apps where GETS are not idempotent or have side-effects? What about server-side analytics/logs tracking HTML downloads?

> What about poorly-coded websites/apps where GETS are not idempotent or have side-effects?

They're already broken, exposing that is a good thing.

Breaking the web is not a good thing. Regardless of how you think things should be done.

Indeed, breaking the web by misusing GET is not a good thing. By extension, keeping the web broken by not exposing this breakage is not a good thing either.

Like mentioned in another comment, if somebody used a GET http link to logout from a webpage, you would end up with a ton of surprised users. People who read articles by highlighting the text with the mouse would also probably hover over all of the links and would end up wasting bandwidth for no reason.

> if somebody used a GET http link to logout from a webpage,

If you violate the standards, your website doesn't work. Who knew?

> People who read articles by highlighting the text with the mouse would also probably hover over all of the links and would end up wasting bandwidth for no reason.

"For no reason" is obviously wrong, making the web snappier is a reason.

Maybe browsers should only prefetch links on bloated websites since their owners clearly don't mind wasting bandwidth.

This is just an ignorant response. The history of the internet is littered with pragmatic solutions to standard vs. non-standard approaches for exactly these reasons. See: <image>, Referer header, the HTML standard as a whole.

By the way, if your standard contradicts a popular methodology, it's probably a bad standard.

> By the way, if your standard contradicts a popular methodology, it's probably a bad standard.

You can't assume a methodology is good just because it's popular. That's how you get cargo cults.

But if a methodology violates the standards, it's almost certainly bad.

Vulnerable isn't broken

The heuristics to exclude logout links and the like would be very disruptive. Those decisions need to be in the website author's hands.

However, I think if browsers had this, but off by default until seeing tags to enable it along with any exclusions, that would be great.

I think it would only prefetch GET links, which never have side effects.

There's nothing stopping GET requests from having side effects.

It's like pointing to a list of best practices and saying "everyone surely follows these."

For example, someone changed their signature to `[img]/logout.php[/img]` on a forum I posted on as a kid and caused chaos. The mods couldn't remove it because, on the page that lets you modify a user's signature, it shows a signature preview. Good times.

I think it was a joke as GET requests are not supposed to change anything, but often they do (probably because many devs don't know about, understand or respect the RESTful concept).

EDIT: For completeness, I have to add, that I am also part of the group of people who have violated that concept. Maybe neither frequently nor recently, but I did it too :-/

> understand or respect the RESTful concept

It's nothing to do with REST. It's part of the HTTP spec and has always been, that "GET and HEAD methods should never have the significance of taking an action other than retrieval".

Well, if I am not mistaken, REST is just the articulated concept on which HTTP was built. So yes, the HTTP spec (probably) existed before REST became a term itself, but in the end, there is no reason to argue if REST defines it or HTTP.

> There's nothing stopping GET requests from having side effects.

> It's like pointing to a list of best practices and saying "everyone surely follows these."

It’s not a ‘best practice’ it’s literally the spec for the web.

What percent of developers do you think have even read the RFC?

Browsers take a more practical approach than "well, it's in the spec, they should know better" which is apparently what you're suggesting.

It's the same reason browsers will do their best to render completely ridiculous (much less spec-complaint) HTML.

To prove your point: If I remember correctly HN votes are sent as GET.

You're typing this comment on a site that has a GET link to logout.

This phrase "GET link" I keep seeing makes sense, but strikes me as odd. Is that to differentiate from an "a" tag that triggers JS that makes a fetch/xhr with another method? The only non-JS non-GET request I'm aware of is a form action (POST by default, GET if specified) which can hardly be called a link, unless I'm wrong to equate link with "a" tag.

Form actions are actually GET by default (think search forms). You need to explicitly use <form method="post"> for a POST form.

Ah, yep.

It could be a way for browsers to encourage GET to be used more correctly.

Seems like you'd be punishing users instead of website operators since the cause/effect relationship is so unobvious.

User happens to brush over the logout button while using the site. On their next click, they're logged out. Weird. Guess I'll just log in again. Doesn't happen again for some time, but then it does. Weird, didn't that happen the other week? What's wrong with my browser? Oh cool, switching browsers fixed it. You're having that issue, too? Don't worry, I figured it out. Just switch browsers.

It doesn't have to be. Could start by allowing website authors to opt in via a tag in the <head> or something, then opt out on a per-link basis with an attribute (eg preload=false)

I remember using some web accelerator 15-20 years ago. Prefetching all links on webpages. It was really helpful on modem/ISDN connection.

Standalone program on Windows XP (or maybe Windows 98?). It had it's own window where you could see which pages it was loading.

Does anyone know the name?

It's funny how much more sense this made on old-fashioned dial-up connections. Back then, as far as I remember, there was no data limit as such. The only thing that counted was connection time. Rather than sitting there reading something while generating ticks, you could better download much of the site and disconnect. An old form of rush to idle.

Doing it by default seems a bit invasive, but I'd be interested in this as a configurable option or plugin!

Embedding via // without explicit SSL should probably be considered harmful or malicious as there is no reason to make such scripts available without SSL. Even if the end website is not using SSL users can still fetch your script securely.

The example snippet uses SRI [1], so there's no security issue with plain HTTP.

[1] https://developer.mozilla.org/en-US/docs/Web/Security/Subres...

It's not supported by IE.

IE doesn't support script type "module" anyway, so it'll ignore the script tag: https://developer.mozilla.org/en-US/docs/Web/HTML/Element/sc...

There’s no security gain from going to HTTPS if the site is served over HTTP, but there’s a small speed hit.

The communication between the user and example.com downloading the page referring to your script is secured by their SSL if they have it.

Separately to that, the communication between the user and your server when downloading your script is secured by your SSL. This can be secure even if example.com is not, so it should only be secure.

If the first html load isn't on SSL, and someone is able to intercept your traffic, they can change the embedded https url to be a non-https url anyway, so I can't even imagine the attack that is prevented by using https into something loaded over http.

Absolutely correct. But this is the website owner's problem and their consequences for not using SSL. You can't help or prevent this because it's not your server, it's not your fault they enabled insecure communication that can be exploited.

When you forgo SSL on your own server someone can also intercept your script in exactly the same way, they don't need to hack the website embedding your script. Now they are your consequences, your fault there's no SSL, and your problem may be affecting everyone who embedded your script insecurely.

No site should be served over plain HTTP in 2019. Browsers and search engines are actively discouraging/downranking websites that don't use TLS at this point.

None should be, but several are. Just the way that it is.

By the way, since .page is HSTS-preloaded, you may as well include https:// in the code snippet that includes the library. It'll avoid the http-to-https link rewriting internal redirect from happening when included from a non-secure site. It's a tiny performance improvement, but across millions of page views, it might add up.

If the browser sends a Referer header, the page the user is currently on will be sent over plaintext.

For exactly this reason browsers don’t send a Referer header when an HTTP request is made from an HTTPS page. (Nor for any kind of request made from a local file.)


You should NEVER load javascript over https on a page that was served over http.

It gives a false sense of security that doesn't exist.

Because the source page was served over http, the source page can be modified by an attacker, making the script be loaded under ssl makes you think its protected. But its not since an attacker could just modify the script tag to remove the https bit in transit then modify the script thats now being loaded over http.

In sort, forcing the script to use https gives you no gain when the page that includes it is served over http and it tricks you into thinking that javascript asset is secure.

Who does this affect? The attacker can always modify the page however they want anyways and the https for the js source would mean the js itself cannot be tampered with

Does it help much? No. Should you use it? Yes.

Who is being tricked? I don't suspect the user is checking the script tags to see if they are secure. At the very least, this might stop basic ISP tampering used to inject bandwidth warnings, etc.

The programmer that put the script tag in there.

Or just stop using HTTP like a normal person. Who still has pages served over anything hit HTTPS?

     Who still has pages served over anything hit HTTPS?
Someone who wants to explore our emerging financial liability as we grow increasingly compelled by law to actively protect data in transit (that's SSL), at rest, and in distribution.

     Penalties for violations can be huge, as much as 20 
     million euros or 4% of the company's annual turnover.

No browsers present that to users as a secure context, so nobody is being tricked.

More HTTPS == Better. If you can load this over HTTPS you should, no matter what circumstance. The browsers iconography will handle notifying people when the context is secure or not.

FYI, the entire .page TLD is HSTS-preloaded. So in HSTS-preload compliant browsers, this will be rewritten as an https URL prior to fetching it even if included on an http page. In Chrome, you can use web inspector to see the link being rewritten using a 307 'Internal Redirect' prior to sending the request to instant.page.

Not sure how I feel about this. I often hover over a link to see where it is linking to, see if it has a title, etc. But that's probably not typical of most users. And I don't do it on sites I use often and am familiar with.

I feel this fails a user expectation that simply hovering over a link doesn't inform the server of anything.

It's curious you mention checking where links link to, because I think that's also another user expectation failure. The url that appears in the status bar below (or what was once a status bar) is not necessarily the link's true destination. You can go to any google search results page, hover over the links in the results and compare with the href attributes in the <a> tags. They're different. It looks like you'd be going directly to the page that's on the URL, but you're actually first going to google and google redirects you to the URL you saw.

It used to be that checking the url in the status bar allowed you to make sure the link really would take you to where the text made you think it would take you, but that's no longer the case. It seems one can easily make a link that seems like it would take you to your bank and then take you to a phished page.

> I feel this fails a user expectation that simply hovering over a link doesn't inform the server of anything.

I would bet 99%+ of web users do not have a sufficiently detailed mental model of web pages that this is something they've decided one way or the other.

Agreed, my expectations are the opposite.

Google Analytics et al, allow custom events which are used to record mouse overs, clicks, et cetera on a majority of websites. I always just assume everything I do, down to page scrolls and mouse movements, is recorded.

Yes and people block those for privacy reason.

>I feel this fails a user expectation that simply hovering over a link doesn't inform the server of anything.

The validity of that expectation died with the advent of web analytics probably two decades ago now. The wholly general solution is to disable javascript, possibly re-enabling it on websites you trust. Sites that break without js are oftentimes not worth browsing anyway.

There's usually no penalty, though. You hover, and the page preload begins. Unless you're trying to keep your data usage to a minimum, there's no disadvantage to you.

And if it only preloads the HTML and not related files, it's still going to be minimal

Or prevent law enforcement, or other "overseers" from believing you visited a page.

I can see children getting punkd by drive-by prefetch and reporting to teaching staff that X visited a neo-nazi site or, Y downloaded porn during class, etc..

"Prefetch did it" is probably not going to be apparent to most, and is going to sound like a weaksauce excuse.

On the other hand if you're visiting pages that link to neo-nazi content or pornography just one link away from the page you're currently on, chances are the page you're currently on would violate whatever acceptable use policy you're supposed to be following.

Or you opened a random blog, Reddit, or did an innocuous search, or ...

Unless whoever is patrolling this filtering is completely insane just show them that page that you were on and how hovering over the link triggers the filter.

Every instance of web filtering I've been subject to in my life just blocks the bad page and the admins expect people to have a few bad requests just by accident or whatever. You'd have to be constantly hitting the filter for it to actually become a real issue.

I'm curious how this is better than Google quicklink (https://github.com/GoogleChromeLabs/quicklink) which is something I have active on my site currently. Can someone with more technical knowledge point out which of these two "instant pages" solutions is better?

Same preloading technique but quicklink preloads more agressively.

Why use this script as an include from the instant.page domain? I think if I'm going to use this I'm just going to serve this script up myself from my own servers.

Good call, I just switched my site to hosting the script itself.

Nice idea for HTTP/1.x, however, isn't this what HTTP/2.0 [1] is meant to achieve by pushing components at the user?

1: https://en.wikipedia.org/wiki/HTTP/2_Server_Push

The main difference being that instant.page respects users' data allowances by prefetching only resources that it thinks the user intends to load. You could combine it with H2 push and/or prefetch response headers to improve the load times even more :)

It probably respects their data allowances even less, considering it completely re-fetches the page every time you hover over the link.

The difference between the HTTP/2.0 and instant.page is that the preload initiative is on the client, and not the server. I guess you could use both. HTTP/2.0 for linked resources and instant.page to preload based on the user's behavior.

Would you push everything? Or just hovered links?

In all honesty, personally I'd push all adjacent page content. HTML compresses very well, its minor compared to JS and page images. If the user proceeds to a pushed page, they'd just be waiting for the browser to do the render and collect images. It'd be a compromise, since most images are probably going to be standard page furniture, so likely cached already.

I also staged it on my blog and it's working awesome http://staging.ahmet.im/blog/index.html . I wonder if CMS tools or static site generators (Hugo, Jekyll, Pelican etc) should have an option for rel="preload". But I guess it still requires some JS for preload on hover, so is this library going to take off now?

Great stuff, I started doing this in 2006 but manually. I made an unofficial google toolbar for Opera[1] that (in the unreleased final version lol) also loaded the images from the search pages when one hovered over the toolbar icons.

It took a lot of tweaking to give it the right feel. imho it shouldn't start fetching to fast in case the mouse is only moved over the link. Loading to many assets at the same time is also bad. Some should preload with a delay and hovering over a different link should discontinue preloading the previous assets.

Perhaps there is room for a paid version that crawls the linked pages a few times and preloads static assets. Who knows, perhaps you could load css, js and json as well as images and icons.

Or (to make it truly magical) make the amount of preloading depend on how slow a round trip resolves. If loading the favicon.ico (from the users location) takes 2+ seconds the html probably wont arrive any time soon.

Fun stuff, keep up the good work.

[1]- http://web.archive.org/web/20130329183223/http://widgets.ope...

Will this cause problems with "links" that are entirely rendered on the client side? i.e. using something like react-router... In that case, could it result in the react app setting invalid state because it thinks the user is on a page when it's not?

My guess is what would happen when "pre-fetching" a react-router link, is that it would prefetch the JS bundle all over again for no gain.

I'm just surprised at how slow my hover-to-click time was (never got it below 100ms). Thought it would be <50ms for sure when trying hard.

I had the same reaction. On a trackpad, I take a casual 300ms to click the damn link.

https://reactjs.org/ does it pretty well too.

Yeah, it's built with Gatsby which has this sort of behavior baked in https://www.gatsbyjs.org/

This is cool, and the license as shown at https://instant.page/license is the well-known MIT license, already known to be be an open source software license https://opensource.org/licenses/MIT

A problem with the loading instructions is that it reveals, to an unrelated site, every single time any user loads the site that is doing the preloading. That is terrible for privacy. Yes, that's also true for Google Analytics and the way many people load fonts, but it's also unnecessary. I'd copy this into my server site, to better provide privacy for my users. Thankfully, since this is open source software, that is easily done. Bravo!

If you like to hyper optimize your site like me, and since it doesn't do any good on mobile (Edit: apparently it works on mobile, ignore this), you can have it selectively grab the script on desktop and save a few bytes like this:

    <script type="text/javascript">
      if (screen.width > 768) {
        let script = document.createElement('script');
        script.src = '//instant.page/1.0.0';
        script.type = 'module';
        script.integrity = 'sha384-6w2SekMzCkuMQ9sEbq0cLviD/yR2HfA/+ekmKiBnFlsoSvb/VmQFSi/umVShadQI';

The site claims it works on mobile

> On mobile, a user starts touching their display before releasing it, leaving on average 90 ms to preload the page.

It does mean we're trusting your service - we're executing your JS on our sites. But I do like that you're putting in the SHA hash so that we know you're not fudging it.

Just found that you have the source available too :) So overall, this is pretty cool.

I looked (quickly) through most of the comments below and couldn't answer these questions:

1) What, if anything, is the downside here?

2) Is (Google) analytics effected by the prefetch? That is, does that get counted as a page visit if the link that triggers this prefetch is not actually clicked?


The downside is that your pages’s HTML are loaded twice as much, this makes for additional load on your server.

Client-side analytics like GA aren’t affected.

> The downside is that your pages’s HTML are loaded twice as much

How? It would load the HTML just as often and not download it a second time as that would invalidate the usage

> this makes for additional load on your server.

True for users who hover over links they decide not to visit

Or am I misunderstanding something here?

> How? It would load the HTML just as often and not download it a second time as that would invalidate the usage

I'm guessing you didn't read the linked article? It preloads after 65ms on hover, at which point it estimates a 50% chance that the user will click. Hence "loaded twice as much".

> True for users who hover over links they decide not to visit

Yes, that's the point.

If your html is gzipped, that's in the double digits of kilobytes for most cases. That's nothing compared to images and other content.

Is this kind of like turbolinks which Basecamp uses?

Similar in effect, but not in method. Turbolinks fetches pages after a click like normal, but swaps the body tag from the new page into the current page, cutting local render times.

Would it make sense to combine them? Instant turbo links.

Yes, it would make lots of sense. If Turbolinks adds a 'prefetch' mechanism, it will get even faster.

Interesting. So this could be used internally in a web app i guess.

This is very nice idea, but isn't the problem initial liad time most of the time? How could we solve that?

Does this also work for all the outgoing links as well? I don't want to improve other sites rendering time at the expense of my own.

Very cool regardless.

>but isn't the problem initial liad time most of the time? How could we solve that?

Off the top of my head, a good way seems to be to write better sites that don't include 10mb of javascript libraries.

It doesn’t work for outgoing links because the gain wouldn’t be as much as the CSS and scripts of the external site need to be loaded in addition to the HTML (only the HTML can be preloaded).

Also, there’s usually no incentive to improve other’s sites pages load.

Works very well for me.

dieulot, is there a small bug with the allowQueryString check?

    const allowQueryString = 'instantAllowQueryString' in document.body.dataset
I think should be:

    const allowQueryString = 'instantallowquerystring' in document.body.dataset

If I have:

    <body  data-instantAllowQueryString="foo">

    'instantAllowQueryString' in document.body.dataset === false

    'instantallowquerystring' in document.body.dataset === true
because html data attributes get converted to lowercase by the browser (I think).

Uppercase is converted to added dashes. `document.body.dataset.instantAllowQueryString` corresponds to `data-instant-allow-query-string`.

Cool thanks! Didn't know that.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact