The most popular website in the world, known for incredible speed, google.com, took me 197 ms to load, with 139ms spent waiting.
A search for "hacker news" took 186 ms, with 111 ms spent waiting.
64.ms itself takes 3 ms for DNS lookup, 39 ms for the initial connection, and 29 ms for the SSL - total of 71 ms before any content download happens.
I'm on 802.11 AC WiFi seated about 5 feet from my router.
Apparently, users are quite willing to wait longer than 64 milliseconds.
I'm no web developer, so I don't claim to have expertise, but it really does seem like an unreasonable expectation. As soon as you have any sort of complex functionality the idea falls apart:
"Wait for the credit card payment to process"
"Download a high resolution image" (that counts as a page of content right?)
I think your examples hinted at an important part of network-driven applications: loading. Loading statuses, done correctly, and with the right attention to detail, can result in users waiting more patiently than many would ever believe. Sometimes that’s as small as conveying “something is happening” and other times it requires conveying details that communicate why the user must wait so that they can simply think, “that seems reasonable. I understand why that could take some time”.
Instant.page and Quicklink attempt to speed up the perceived loading of pages by pre-loading pages before a user clicks.
The "Test your click speed" button is fun to play with. Times 50-100 ms for mobile and 150-300 ms for desktop are reasonable long for performing some pre-loading.
> To be honest, I have no way of knowing or testing, because ~180 ms is already faster than I can see or react to.
As an avid video game player, 180ms is roughly the visual reaction time of good players. "Sound" to be maybe 150ms or faster however.
However, trained players consistently can "time" and percieve fast events. The throw-break window in Guilty Gear is 3-frames, or 50 milliseconds. Players still hit throw-breaks.
There's also the 1-frame jump, which improves jumping speeds by 1-frame (17 milliseconds), but requiring 1-frame precision.
--------
Street Fighter players have a variety of combos requiring "1-frame links", a window of only 17 milliseconds where the combo would work.
Humans may not "react" to 17 millisecond events, but they absolutely perceive them and can work with them. I've run calculations on musicians (ex: Flight of the Bumblebee), and "proper timing" of songs is also to a precision of less than 100ms.
EDIT: Case in point: a 16th note at 144 bpm (Flight of the Bumblebee speeds) is active for ~104 milliseconds, and probably needs to be "precisely" played at a quarter of that time period. (if you are 25 milliseconds off, a good musician probably will notice)
Humans are way more perceptive of sound than vision. So maybe its not "fair" to study music or musician perception.
You can interact with it immediately. You can hit the X to cancel the page load, you can close the tab, you can quit the browser or use another tab. Browser devs already adhere notoriously and fiercely to the ideas in the manifesto.
Whether the app/site you requested can be interacted with depends on whether the app/site devs care about interaction latency.
Talking about network response times is irrelevant to the point here. Clients can, and do, make changes on-screen before the network response.
For me, neither Google nor 64.ms complete the SSL handshake by 64 ms, so I'm not sure which content I'm supposed to be interacting with, assuming my body is physically capable of doing so by that time.
And really, even offline interactive apps have no chance of reacting this fast.
Try opening TextEdit or Mail - they don't even draw the window frame in 64 ms.
Performance is important but normal users only care about it when it's getting in the way of functionality. If we chose our preferences by performance alone, Windows Phone and webOS phones would still be here and Android would be gone, Microsoft Office would be the least popular office suite, and Steam would be the least popular PC game store.
> And really, even offline interactive apps have no chance of reacting this fast.
What do you mean? Lots of games run at 60Hz, and lots of games are considered unplayable if they render frames slower than 64ms. You might consider your terminal broken if it took more than 64ms to respond by displaying your input. https://danluu.com/term-latency/
> Try opening TextEdit or Mail - they don't even draw the window frame in 64 ms.
The OS shows you the app is loading. This seems like you're intentionally trying to prove some point and not listen, rather than being open to understanding. Arguing that an app doesn't respond quickly when the app isn't running seems obtuse.
> If we chose our preferences by performance alone,
Nobody said that, so this is another straw man. The post is suggesting that UI & frontend devs use generally accepted interaction principles, and nothing more.
Note that it says "interactive software application" not "website", though it's possible on a website using a single page application and preloaded resources. Any time you're fetching a resource from some external source, it's unreasonable to expect it to load in under 64ms.
I assume the intention isn't speed alone, but part of the user experience. When you interact with an application, you should be able to tell immediately that it's recognized your input and something is happening.
Will you wait > 100ms for a checkbox to change state? The point is that actions that should be instantaneous need to appear as such within the limits of human perception. Complex operations can take longer.
Sure, I definitely expect a checkbox to do something in < 100ms. Show that its state changed and at the very least show me some kind of loading icon so I know something is happening.
But a full screen of content in 64 ms? Unreasonable, and really a completely undefined unit of measure (what is/how big is a "screen" and how much content can fit within? How do you measure audio or video content? What happens when I'm working with vector graphics that can't possibly render in 64 ms?)
This argument (accidentally) completely aligns with mine - this manifesto makes no distinction between complex and simple applications.
There were plenty of applications in the 80s that could not render a full screen in that timeframe. That did not stop those applications from being useful.
(And I honestly think a DOS directory listing can take longer than 64ms to fill the screen)
It just shows you how silly and arbitrary the manifesto is. Any "interactive software application" should show "at least a full screen of content."
My web browser is an interactive software application and it doesn't even start up and put an about:blank page on my screen in 64 ms. Repeat this statement for literally any GUI application, the containing window don't even appear in 64 ms.
And anyway, how big is a "screen?" What is the measurement of a screen of content? Do I have to fill up a 320x480 screen or a 3840x2160 screen? How many characters is a "screen? Pictures? Video frames?"
If you make some kind of transform in Illustrator or Photoshop, why would it be reasonable to give up the quality of the end result just to hit that arbitrary 64 ms goal? Can Illustrator even load/render a simple design like a company logo in 64 ms? A lot of interactive software combines batch processes with live interaction, too, so why would it be reasonable to be unwilling to wait more than 64ms for calculation, rendering, or transformation procedures? Of what use would the unfinished content be if it were shown to me within 64ms?
Essentially no application follows this guideline and never will.
This website is more of an exercise in snark and elitism than anything else.
> My web browser is an interactive software application and it doesn't even start up and put an about:blank page on my screen in 64 ms. Repeat this statement for literally any GUI application, the containing window don't even appear in 64 ms.
This is a straw-man. You didn't ask your web browser to open, you asked your OS. Your OS shows you the response to your request to load an app or web browser, and it shows that response immediately. The launch icon blinks, your cursor changes, and a blank frame appears all within the span of a few frames.
> And anyway, how big is a "screen?"
You get to decide. I don't think the post was meant to be taken as so extremely literal as that.
> Essentially no application follows this guideline and never will
Browsers and games already do follow this guideline. I think you're mis-reading what's here, and what's here is already generally accepted and very much used broadly. The audience for this post may be devs unfamiliar with interaction best practices, new web developers who don't think to show changes on screen before a network response, or don't want to do extra work when it seems like a .then() is completely sufficient.
> This website is more of an exercise in snark and elitism than anything else.
Why do you feel it's snarky or elitist? Is it possible you're mis-interpreting this post, or just in a bad mood? As someone who's worked in games and interactive apps, I interpret the post to mean that for user interactions, please keep performance-as-a-feature in mind. There have been many threads on HN on this over the years, and that's just a drop in the bucket compared to all the talk of how to keep interaction from being frustrating to the user.
Is this an example of the kind of reactive, interactive design you're advocating? I don't see how this is helpful: it's dogmatic and doesn't address the actual tradeoffs that might be at play. Why 64 ms? And 32 and 16? What studies are you relying on to suggest that 65 ms is too slow or that 30 ms is too slow for initial feedback? How does it affect user behavior? What is advised when these rules, especially the "full screen of content" cannot be met because of things like external dependencies out of the site's control?
Yeah it's simplistic and doesn't explain the rationale, but having studied human perception a bit, I know without further explanation that this is targeting the time frames of our human ability to see change. The numbers here are similar to the reasons most movies are 24fps.
> What is advised when these rules, especially the "full screen of content" cannot be met because of things like external dependencies out of the site's control?
This is missing the point, and many people here seem to be attempting to pick the same nit. You never have to wait for external dependencies to show changes on screen. This is how browsers and well designed high traffic websites like YouTube already operate. When you take action, you are shown UI changes on-screen that indicate the action is in progress. The manifesto is saying to acknowledge the user action so the user knows the application received the request. It is not saying everything in the world needs to be done in 64ms. Applications can always acknowledge the request immediately, and yet some still don't.
> It is not saying everything in the world needs to be done in 64ms.
Well, most actually could be. And a snappy "acknowledgement" only impresses me the first time. The second time I stare at the loading indicators in disbelief, while waiting for the not-so-snappy action to execute.
What are you suggesting as alternatives that will impress you? Would you prefer to have no indication on screen that something is loading, and no indication that acknowledges you even clicked on something? Do you see a better way to show that megabytes of data are on the way but not here yet?
Loading indicators are simply one single example of how to respond to user actions on-screen. The idea is more broad than this, the idea is for clients to respond immediately to any inputs. This is true for opening menus, clicking buttons, typing, pushing on the D-pad, waving the Wiimote, pressing on your MIDI controller, waving your arms for the camera... it applies to any and all input devices for all applications.
This idea is not in the least bit controversial, I'm surprised by the amount of push-back here in this thread. Devs of games and interactive apps have always done this. If the machine doesn't acknowledge user input immediately, the user doesn't know if the input was captured, and very quickly decides the machine is slow or buggy.
> What are you suggesting as alternatives that will impress you?
In contrast to "not-so-snappy actions"? Snappy actions.
As I tried to say before: Instantaneous feedback is great (even necessary) for first time users. But once I get used to an app, waiting for the termination of a loading indicator is just as annoying as waiting for {page|script}-{load|parse|eval}.
Edit:
> This idea is not in the least bit controversial.
I agree that content generation in one-digit milliseconds is not controversial.
The post's entire point is to say make all actions snappy. That doesn't mean you can finish loading a video instantaneously, it means you can acknowledge the user's request to start loading a video instantaneously.
Waiting for termination of a loading indicator on a web page is irrelevant to this post, it's a tangential topic completely.
"content" in the post is not referring to web page content, it's referring to UI content. We're not talking about content generation in the web page sense.
64ms is still not something you're going to be "waiting" for.
Typical human reaction time is 200ms or so. This effectively means that for 200-264ms you're not priming to repeat the action or wondering whether it "took", because there's an immediate response and the action itself does something useful quickly.
How can I show one full screen of content when it takes longer than 64ms for the content to get from there to here? An acknowledgment that the content is coming is not a screenful of content.
Instead of assuming that "content" must mean "network response", imagine some other possibilities. A full-screen gray-out box with a centered loading spinner over a YouTube thumbnail, along with the thin loading progress meter on the video's timeline is an example of a "screenful of content" that doesn't equate to the network response.
It seems like you're still having a hard time imagining what "content" might mean aside from the network response.
In the context of a UI element or screen layout, "content" is the appearance of the UI elements on screen. The post, and my comment, isn't referring to web content or video content, it's referring to UI content.
Twisting my words to comment on YouTube's quality is funny, but irrelevant to interaction latency.
So it's really 1, 2, 4. Maybe we could have 1, 2, 3, but I'm guessing the "48ms Manifesto" wouldn't sound too great: they probably preferred to have a nice power of two.
And if I may, I think the manifesto is basically right. Something I have to wait several frames for before it reacts feels sluggish. When I type a character, I want it now, not 5 frames later. When I move the mouse, I want it in sync, not lagging behind my hand. Touch screens are even more exacting. They only feel perfect when latency falls under one millisecond. But that last one isn't achievable in practice on current stock hardware…
I don't agree with the page too, it seems better to make terms on order of magnitude may be (10s of ms, 100s of ms), but in their favor the exact times might not be hard limits but maxima in terms of latency, with the hope that you can be better. A point against me is specific numbers (ones that are essentially memorized by every programmer as powers of two) are easy to remember and good for messaging purposes.
It's impossible. Lightspeed doesn't allow for that on websites if you're far enough from the servers.
If you add intermediate network gear, it's hard enough to get across half the globe in under 180ms (I'know, I live in Argentina and that's my ping time to almost anywhere non local).
You could fake it somehow in some scenarios or with an awful lot of money.
This manifesto is made just for you! Waiting for the response before updating the screen is the wrong answer. It’s not impossible, you’re assuming incorrectly that the manifesto is saying the final result needs to be on screen. It didn’t say that, it said the app needs to respond to action visually, not that the response or interaction sequence must be completed within that time frame.
The right answer for web and networked applications is to update the screen with UI that acknowledges the user action and shows the user that the result is pending. Ideally, progress is visible, but that’s tangential to the point of the manifesto.
A client can, in fact, almost always respond to actions within these time constraints. The point is to do something, rather than wait for the network response.
Speculatively make and cache likely requests. Predict server responses client-side, and update with the server response if it differs. Multiplayer games have been hiding latency since QuakeWorld.
Yes, and it doesn't work well. But 300ms ping is unusually high. I'm 141ms from HN's server, and that's in a different continent from me. Google is only 10ms. 141ms is playable with good client-side prediction.
It's not that high if you live in a rural area (And I'm not even talking about a third world country, I live in France). Most of the time, the main culprit is Bufferbloat[1].
I'm currently playing World of Warcraft with 250ms of ping, and even for solo level up it's annoying. League of Legends with 350ms isn't fun at all…
Buffer bloat is an example of what this manifesto is motivated by. The hardware's capable of being much faster, and has been set up wrong.
A game dev can't fix it, but it should not be taken as a ground truth of the world. It needs to be fixed. Combine that with a smaller amount of prediction, and things come out pretty well.
Getting a CDN to put your content much closer to users does not require an awful lot of money. The vast majority of your users shouldn't be across half the globe; if you're a tiny local shop, then you're targeting local users; and if you're targeting multiple continents then it's not that hard to host stuff on multiple continents cheaply even if you're just a couple people, there are services available for that.
> Lightspeed doesn't allow for that on websites if you're far enough from the servers.
This assumes that we need remote servers to do our computing for us. :~) P2P software doesn't require remote servers, plus it means your apps still work offline.
In most situations you can have more than one server. If you focus on the 64ms goal, and you budget 20 for ethernet/fiber, that gets you 2000km each way to reach the closest one.
> It's impossible. Lightspeed doesn't allow for that on websites if you're far enough from the servers.
Your second sentence explains why the first is wrong. Yes, you can’t have the entire world using a single server but it’s never been easier to get servers around the world and the new edge compute services are further improving that.
Cloudflare's got POPs almost everywhere (though densities vary with population of heavy internet users), and has a very good free plan which I use. Have a look: https://www.cloudflare.com/network/
Cloudfront, Google, Fastly, etc. all have pretty good coverage, though you're right they're spottier when it comes to S. America, Africa, and a few other places. Their POPs are, from what I understand, have higher capacity but are less dense. Good for cost savings, though probably worse for latency.
There's also the consideration that most web sites have most of their users in the developed world (though that equation is shifting quickly for a some).
Out of curiosity, what is the significance of "<<" and ">>"? I'm assuming they're not bitwise shift left and right. I've seen them more often as of late, and they seem to come mostly from non-Americans. I ask because they're quite difficult to run an internet search on without knowing what they are called.
This is a really good example. Did you look at Africa? For the record, 1.2 billion people live there nowadays ;)
> Out of curiosity, what is the significance of "<<" and ">>"
Oh, you mean '«' and '»'? They are French quotation marks[1], I try to use the English ones '“' and '”' when I'm writing English, but sometimes I forget. Same for the space before the question and exclamation mark (there is a space in French, but not in English).
Yeah, I looked at it. There are datacenters within 2000km of everywhere, and that's including the Sahara. About 1500km outside of the Sahara.
1500km as the crow flies, so let's say 2400km of fiber. At the speed of light in fiber, that's 11.5 milliseconds. Double it, add time for ten routers, that's 27 milliseconds to go from any ISP line to a Cloudflare data center.
Sounds good to me. If there are infrastructure problems that's a separate issue from whether Cloudflare has enough POPs.
> Out of curiosity, what is the significance of "<<" and ">>"? I'm assuming they're not bitwise shift left and right. I've seen them more often as of late, and they seem to come mostly from non-Americans.
They're quotation marks in many European languages and I believe (though I'm not sure) that the correlation isn't non-Americanness, it's non-English-keyboardness. People from English-speaking countries will still use the quotation marks you're used to.
You know, when you have to aggressively chop quotes to distort the meaning, it's a good time to pause and reflect on whether you are making a mistake. In particular, ask yourself whether “it’s never been easier to get servers around the world” meant “it's perfect for everyone no matter where they are” or what the actual words mean.
Beyond the obvious point that the growth of commercial cloud options make it much easier to get servers around the world without having to research providers and negotiate contracts in each country, you might in particular want to spend a couple of minutes looking at current CDN coverage, especially if your experience is either limited or old. Coverage is a lot better globally than it used to be — gone is the time when “Africa” either meant servers in Europe/Israel or, at best, one POP in South Africa. Next, think about services like CloudFlare Workers, Lambda@Edge, etc. and ask what percentage of people on the planet are likely to be close to one of the POPs where you can run code:
People can often respond to things within a few frames (1/60 second = 16ms). For some interactions, it's useful to respond within one frame. For others, more latency is acceptable. For yet others, sub-frame latency is required (e.g. VR/AR). Light can move ~3,000 miles in that time, which is a hard limit on latency.
Controversial? Really? What's written there is basically what's required for interactive speed. What's described here is basically what's required for interaction to feel instantaneous. Why should we settle for anything but immediacy?
Sure, with all those laggy applications we have right now, we tend to become desensitised. That's not an excuse, though.
Also note that in some cases, we're not even close to acceptable latency. Finger tracking on touch screens for instance, require a one millisecond response time to feel perfect. Otherwise the objects we drag will lag behind our fingers.
If I'm going to have a full screen of content after 64ms, why do I care about having something incomplete but 'actionable' after 32ms? And the policy of showing some feedback before then is useful in some circumstances but not very important in others.
The 64ms goal by itself is clearer and more useful than this combo of three goals.
You also see the background flash as it "renders". The whole website concept is just rotten and dead if you want to even get close to as smooth as Doom in 1995 on computers with less computing power than a shader processor in my mobile phone. Cue the discussion on "speed of light" in this thread while people applaud getting your website to display in under a second.
It's loading favicon.ico, which redirects to favicon.ico/ which returns some kind of html page (larger than the / page itself!), so that seems misconfigured.
It takes 250ms to curl this website. Ping time to the site is less than 64ms though and it is cached by Cloudflare and still can't seem to serve it that fast. Maybe HTTPS can't get under 250ms or so? Only way for me to get under 250ms is to use HTTP...
Depending on how your environment is configured...
Curl will do one round trip to your resolv.conf configured resolver
Then you have one round trip for tcp.
Then another for TLS hello (presuming TLS 1.3 or false-start)
Then one last round trip for HTTP.
Assuming cached DNS, you need a ping to the server of about 21ms to get content via https in 64ms. And that's assuming you can get enough content for a screens worth in the initial congestion window (10ish packets, so let's say 15kb). Good luck!
Oh, and my DSL connection adds at least 20ms to the path, and that's not unusual.
By the numbers it sounds like this is built around an assumption of 60fps. The rounding is wrong, but let's roll with it. That's a noticeable change by the next frame, actionable information two frames thereafter, and a full screen worth of content four frames thereafter.
My intuition says that's steep. Noticeable change 1 frame later might be doable with a web app, if the app is structured well and focuses on fast response times. However, I don't think actionable information two frames later, or a full screen of content four frames later is feasible for a network-based application, especially given the fact that just crossing local layer 3 network segments can add 10s of ms to your latency, depending on the router's load.
Really, these numbers strike me as only achievable by an application where all data and processing is done locally on the computer that controls the display. Even then, these target numbers still feel steep.
In a networked app where your latency is too high for these figures, or if you need to do some large number crunching etc, you should instead focus on giving the user feedback that progress is occurring. A progress bar, a spinning ball, etc are far better than nothing. Just as long as you acknowledge that the user has requested an action, and give the user confidence the action is proceeding.
That, I don't disagree with at all. There's been more than one time I've clicked on something, and then waited for the computer to do something that lets me know it's received my command and acted on it. It can be damn frustrating.
I kind of want web browsers to give a budget of layout changes. Once a page does enough things to exceed the budget, too bad, your layout is stuck there.
If the user does something interactive to warrant further layout changes, then you're allowed to start changing things again.
The idea is to make it cost something to change the layout. Then people will have to actually try not to do it excessively.
On many sites the layout isnt even relevant, and all people actually want is to read the plaintext and locate any relevant images amongst the chaos. Im wondering about a browser extension or something that extracts this information directly and re-renders it in simple (like year 2000 era) html. I would kill for this on my phone.
>In an interactive software application, any user action SHOULD result in a noticeable change within 16ms
In many cases the Input Lag ( Especially Touch Screen ) in itself is more than 16ms. That is excluding the processing from CPU to GPU rendering and your Display Lag.
Apple IIe managed 30ms total[0]. There's no reason why modern computers, which are thousands of times faster, shouldn't manage 16ms. The slow response of modern systems is the result of a deliberate choice to prioritize ease of development over user experience. We can do better.
The mechanism that gave the Apple IIe such low latency has little to do with processing power - the keyboard had an exotic setup that you likely won't get over something as generic USB.
Exotic how? Just being 533Hz isn't; you can get very cheap chips that are capable of polling the keys at 1000Hz or higher.
If you really want to minimize latency on a modern system, and can choose whatever commodity hardware you want, you can get the input+rendering+scanout delay below 10ms. At that point it's a matter of how often your screen accepts new frames. And with a high-end, frame wait plus pixel response time can also be under 10ms.
20ms for local end-to-end latency is a completely achievable goal.
Exotic in the sense that pretty much all gear that is polling 1000Hz nowadays is gaming gear. I never implied it was unattainable - I have a 240Hz monitor, G903, and still use the ps/2 port and the games. However I wouldn't doubt the fact that my 240Hz monitor is exotic.
As for touch screen technology, the Apple Pro Pen I believe is only 20ms and is state of the art. While I do believe that developer ux has contributed to lazy resource management, it's also clear that sub-60hz latency has regularly required exotic hardware.
You speak as if ease-of-development doesn't enhance UX. Of course it does. If a team can ship features and fix bugs faster, the UX does improve. Most would say that's worth it over shaving 15ms off the input lag.
It seems as though relying on something a bit more verbose, such as the guidance from Nielsen Norman Group [1], might be more effective in helping people building software.
In particular:
> 0.1 second is about the limit for having the user feel that the system is reacting instantaneously, meaning that no special feedback is necessary except to display the result.
I love this. Software developers too often focus on the final end state and this distracts from the state changes in the middle of any process.
By focusing on UI responsiveness you vastly improve discoverability and interoperability because it naturally decouples the problem into a model/view/controller design. Best of all you can't mis-design: the responsiveness is your constraint, and if the design doesn't give you responsiveness you KNOW it is a poor design.
The fastest I can view and respond to information is maybe 2 times per second if I'm not really viewing anything. So I'd say 512ms makes more sense or maybe 256ms if you want it to "feel" snappier (assuming we are sticking to powers of 2 for some reason). Any higher is looking more for animation speed than usability speed.
I just want at least a filler response within ~150ms when I tap a button on my phone. Occasionally I will miss a touch target. If I haven't seen a response, I will assume I missed the touch target.
16.666...ms (aka 16ms) is 1 complete cycle at 60Hz, the frequency most computer displays operate at. 32ms and 64ms are approximately 2 and 4 cycles respectively.