A byte is approximately a single letter.
A 4kb page of memory is approximately a single-sided piece of paper.
A 32kb L1 cache is 16 pages, spread on the desk, though you may prefer to think of 512 64-byte post-it notes. You could probably find and read a note in ~2s.
A 256kb L2 cache is a 64-page binder on the desk. Plausible to find a page in ~7s.
A 8MB L3 cache is a filing cabinet or small bookshelf, with 32 binders. It might take a minute to walk over, identify the right book, pull it out, find a page, find a sentence.
8GB of RAM is a sizable library, though far from the largest ever - rows & rows of bookshelves, 512 large shelves in total, 64 books each. If you knew right where you were going, it might take a few minutes to fetch the right book.
A 1TB SSD is 4 million books, among the largest libraries today, or perhaps a warehouse.
If that 1TB is instead a HDD, you can't actually browse the stacks - instead, there's a staircase outside the building, and you can reach in through the window on any floor - the entire building rotates, at the blistering pace of a full rotation every 8 months.
I feel like this would give a better representation of "time to feedback". Saying a ping of 65ms is like 5 years doesn't make a ton of sense as that's a relatively decent ping time. Saying it's 4 frames has a direct correlation to it's perceived delay, and matters to users.
System event Actual latency Scaled latency
------------ -------------- --------------
One CPU cycle 0.4 ns 1 frame (1/60 s)
Level 1 cache access 0.9 ns 2 frames
Level 2 cache access 2.8 ns 7 frames
Level 3 cache access 28 ns 1 s
Main memory access (DDR) ~100 ns 4 s
Intel Optane memory access <10 μs 7 min
NVMe SSD I/O ~25 μs 17 min
SSD I/O 50–150 μs 0.5—2 hrs
Rotational disk I/O 1–10 ms 0.5—5 days
Internet call: SF to NYC 65 ms 1 month
Internet call: SF to Hong Kong 141 ms^3 2 months
For example in predictive typing anything above ~100ms (don't remember the number but there's an exact measurement on the Gmail paper for smart compose) is noticeable and very annoying to a user.
Or video chat, I'd be curious to see the video latency required for a user to notice lag between audio & video.
I know in game dev, I was always taught that 60fps is fine, but then there are gamers swearing up and down when they drop below 140fps. Would be curious to see what the max allowed latency is here as well.
Another common one is the microexpressions latency which is frequently quoted as 1/20 of a second (max latency before an emotion detector starts to miss the microexpression).
I feel like these are the real goalposts in tech; especially in ML, we've got the accuracy/quality and now we need the speed. Every research paper likes to brag about how many ms it takes for one step, I'd like to see this compared to the maximum latency required for practical HCI.
The 60 fps mark isn't really latency related. Instead, at 60 fps animation without motion blur feels smooth. At 30 fps you need motion blur or it feels like it stutters (and in 24hz cinema the camera produces natural motion blur due to shutter time).
The latency aspect is more complicated, as there's a long feedback loop: The frame is shown on the screen, is perceived by the user's eye, the user's brain computes a reaction, the brain sends a signal to the hands, the hands execute the reaction, the input device perceives this and sends the result to the computer, the next frame is computed and finally it's send to the screen and displayed. Each of these steps takes on the orders of milliseconds (often tens of milliseconds). Faster refresh rates and more fps improve some of them, and in theory any improvement in fps and refresh rate makes the feedback loop faster and thus improves performance. There are diminishing returns because you don't control the entire pipeline (and the parts that happen inside a human are particularly slow), but going from 60 to 140 fps is a big enough improvement that it matters.
Another factor is the stability of the latency: humans can compensate for quite high latencies by predicting the correct inputs. But this only works if the latency is consistent. Just imagine trying to hit anything with bow and arrow if you don't know how fast the arrow will fly. That's why frame drops hurt so much: every dropped frame is a giant latency fluctuation.
After using 120/144hz monitors for a while 60 fps without motion blur doesn't feel smooth at all.
60 fps mark is really just because that's all LCDs could do for the longest of time. It has nothing to do with anything human vision related, it was just the long-standing technological limitation of displays.
Really, this is related to the speed of motion. If everything is moving slowly, 30 fps is fine; if something is moving quickly enough, even 60 fps looks choppy. Try scrolling a long page really fast on your smartphone, for example.
The video/audio is well studied, I forget the number but it's pretty short, this is out of my butt but I think it's around 20ms?
Keep in mind frame drops are different from average FPS. Every frame taking 16ms will feel far smoother then most frames taking 8ms but with spikes of 80ms (which is far more common).
My 144 Hz monitor is a literal game changer. I can move my field of view very quickly and the background environment stays crystal clear throughout, making it easier to spot enemies.
My video professor told us 20 years ago, that for a fly in the cinema it's like watching a slide show.
There is also a space scaling that puts things into another perspective: https://blog.codinghorror.com/the-infinite-space-between-wor...
Also: "if...your application uses microservices...you are essentially turning program function calls into network calls..."
Great post overall, full of nice insights.