Hacker News new | past | comments | ask | show | jobs | submit login
Web Performance Profiling: Google.com (requestmetrics.com)
116 points by toddgardner 4 days ago | hide | past | favorite | 83 comments





Am I the only person here who wasn’t thinking about the frontend, but rather all the things that need to happen in the backend to render the search results?

To me it feels like an oversight when answering the question “how the hell is Google so fast?” and not digging into how Google is able to return the results to your actual search query in a matter of milliseconds. That, to me, is the real miracle.


I find this to be the more fascinating part of Google's response time as well. Sending an optimized html file to a client in a matter of milliseconds is cool, but static pages should all load very quickly, so I don't see much surprise here - they've just optimized their front-end and done a good job of it.

Them being able to take your query, discover the data that answers your query, then optimize that data down to a little html snippet that fast is significantly more impressive to me.


And setup ads biddings, customized to you...

There is a lecture from Jeff Dean about it: https://www.youtube.com/watch?v=modXC5IWTJI

It's from 2010, but the fundamentals probably haven't changed.


Sure, but that's a) impossible to dig into from the outside and b) less likely to yield useful information for the average developer who's trying to speed up a website.

It is handy, though, that search results are free to be very "eventually consistent" and easily distributed.

Not discounting the other magic, but exposing read-only data, close to the end user, where freshness isn't a huge concern does simplify things.


Something interesting I noticed is that the further you go down the pages, the slower it gets.

By their own metric, page 10+ take several times longer to arrive, and most search terms only have a few hundred results at best, even when showing billions of results, you only see the real number when you hit the last page. For example, for me, the word "the" only has 445 actual results (instead of 25 billions), and page 45 takes 2 seconds to complete, compared to 0.7 seconds for page 1.


It is an oversight, but equally for people like me it is helpful - I know how to optimise C++ (maybe not at google scale on my own!) but I've bullied compilers and measured pmc's, I haven't got a clue how to optimise frontends (beyond my usual solution of give up go and do something pleasurable)

yeah me too

When I think of "fast", I always think of a time I was building a product that required subscribing to Twitter's API to receive updates from specific users.

In my testing, I'd set a breakpoint on my server code to see when I'd get the "push" from Twitter's API and would use the Twitter App on my phone to create test tweets. Every single time, I'd create a tweet and my server breakpoint would be hit immediately. Not soon, but immediately. I'd see my breakpoint triggered well before the UI in the app even refreshed after submitting the tweet.


I think every dev should try once in their life to write "fast" code. Learn Rust or C++ or whatever and write an echo server and try every trick you can find to make it fast. Run it a billion times for fun and just bask in it.

What's really wild is that the architects who envision and oversee implementation of systems that make these things possible earn a tiny fraction of what executives make.

They make 5x+ the next highest IC, sure. But if they worked half as much and BS'ed 100x as much they'd make 500x.

Personally, that makes me wonder a lot more than a clever combination of memcache and map-reduce.


Technology can be understood and reasoned from first principles. Can you really say the same about human relationships, especially the complex kinds that can result in such a logical mismatch? There is a lot of irrationality in the real world.

I have a bot in Java that re-transmits messages between Slack/XMPP/Telegram.

Sending message in Slack is:

- sent to Slack servers

- bot looked up and data sent to it

- bot (on a server in DO) figures out what to do with the message (working with an MQ server running locally :))) )

- sends message back to a server (slack/xmpp/etc.)

- that message gets processed and pushed to the corresponding client

I could never properly measure the time between the original message and the translated message. It was always way way subsecond.

Everything we have now: networks, servers, code is very fast.

[1] Badly written bot here: https://github.com/dmitriid/tetrad


> I could never properly measure the time between the original message and the translated message. It was always way way subsecond. Everything we have now: networks, servers, code is very fast.

All of that said, if the time is measured in more than triple digit nanoseconds, relative to the hardware capabilities that we have today, it’s slow.

Please do not take that as a reflection on you personally or your work, but rather a reflection on the layers and layers of abstraction we’ve collectively added, and keep adding. While we’ve made it easier for people (especially developers) to write code, we’ve made everything slower, and just continue to mask that with hardware improvements.


I totally agree with you on that conclusion.

One factor that is missing from this post is the processing time on the backend which is also insanely fast. This post only considers frond-end optimizations.

On the author's benchmark, a roundtrip seems to take on average 30ms, and the time to first byte for the main content is around 140ms. Which means that in less than 110ms Google is able to parse the search query, and build the HTTP response.

I'm sure they are heavily relying on caches and other optimizations, and for tail-end requests the result might not be as impressive. But compared to many other websites in 2020 this is still unfortunately not the norm.


> for tail-end requests the result might not be as impressive

Yep, it is possible to craft search queries that take multiple seconds to process. Example [1]:

    the OR google OR a OR "supercalifragilisticexpialidocious" -the -google -a
[1] https://news.ycombinator.com/item?id=20605589

I have a related question. How is instacart so slow? I am usually pretty unbothered by slow load times, but searching for and selecting groceries is a full 10 times slower than any other experience on the internet. Is this a deliberate push to get me to use the mobile app? Some dark pattern thing?

It is remarkably, painfully slow. Every letter entered into their search box (eg when searching a particular grocery store for a product) appears to be doing a full reload of results. You can feeeeel the horrible lag as you try to type and the site can't keep up with either your typed text or the results.

They should be doing a timed release on that. If you stop entering text for N ms, then go for a result. Otherwise they need to have cached results for a huge number of common combinations of letters, very frequently updated, for every store. It's a resource intensive thing to do well at their scale, for such a seemingly simple feature. If they are in fact caching all of those drop-down search results properly, something is very wrong on serving up the cached content.

And for the full results pages, when you try to load them for a given grocery store - the only answer I can guess for that is again mediocre caching. There usually isn't any other culprit other than that for such simple pages. In an effort to match current inventory, my guess is their caching isn't very good (constantly invalidated or rarely put into cache, so they're doing something that isn't very performant; I'd be astounded if they weren't doing some amount of caching on the results).


This makes sense technically, but it still doesn't answer the question of why it isn't fixed. Is it really that hard to experiment with caching policies? Is there secretly only one developer at instacart?

> They should be doing a timed release on that. If you stop entering text for N ms, then go for a result.

I believe that is called debouncing and it's something any non-junior frontend developer should have in their toolkit.


Early on they built a Rails app backed by Postgres [0] to store, these days, 500 million items. Then in this post [1] they say they're rearchitecting the database, then mention Snowflake. It sounds to me like they need a document store like Lucene in order to get fast search performance--but they may be optimizing for other use cases like tracking orders which is probably easier in a RDMBS.

It also looks like they might first do all the work to retrieve all the results for a query. The first request is to an endpoint `search_v3/{term}?cache_key={key}` then subsequent requests are to `asyncresultset_{n}?cache_key={key}` so seems like they might have the result set cached from the first query.

The response to the autocomplete endpoint, which is fast, contains an `elastic_id` so perhaps they do partially use a document store.

[0] - https://stackshare.io/posts/the-tech-behind-instacarts-groce... [1] - https://tech.instacart.com/the-story-behind-an-instacart-ord...


It feels fast because most other sites are insanely slow.

Just build a normal HTML+CSS+JS site with serverside rendering, route the assets through a CDN and voila! your site will be just as "fast" as Google.


Edit: I was too quick to judge your post. The article is indeed just about serving content, not about why their results are fast.

This reminds me of that joke video about replacing MongoDB with /dev/null. Look at the write speeds! /dev/null is web scale!

https://www.youtube.com/watch?v=b2F-DItXtZs


Lots of servers and wires and things stored in RAM long before you ever ask for them basically

By doing all the calculation in memory. Disk is too slow

There's also flash, which is also pretty fast, and fits nicely between magnetic and RAM.

Here's a handy rule of thumb about relative costs: the cost of a byte on magnetic_disk:flash:ram is about 1:10:100. I.e. the cost increases 10x when you go to the next level.


The article is not about how they come up with the results. Only about how they deliver the page.

The article compares the speed to other websites like nike.com which have nothing to do with a search engine.


I think the point here is about how to make page faster about the same size.

You can make Google searches plenty slow if you Google something uncached.

I just Googled: the OR google OR a OR badger -the -google -a

it took: 5.66 seconds.

This is probably cached now so don't try it yourself, replace badger with some other weird word. :)


The slowness of this query is more related to the negations than the fact that the query is uncached. Most uncached queries will be handled much more quickly. I'm surprised this one takes so long, because I would have expected the negations to eliminate the corresponding branches of the OR during query simplification, but maybe there is some expansion happening before we get to that point that makes it hard to detect this logical conflict.

Probably also the sheer frequency of the terms in the OR don't help. Normally "the" would be treated as a stopword (https://en.wikipedia.org/wiki/Stop_word). But I'm not sure if that logic applies when it appears alone in a sequence of terms (as when it's the child of an OR).


5.3 seconds. I think you may have found a pathological case. Or we're just all on different pops (because I got .38 seconds when I did it again so obviously it was cached the second time).

About 9,33,000 results (5.97 seconds)

Perhaps cache evicted. :D Or probably my PoP is different.


If you inline everything then you can’t take advantage of caching those resources. I wonder if there is a fancy way to inline the resources and then somehow use JS to cache the data, set a cookie and then on the second page load it’s even faster because you don’t have resend the inlined stuff.

I had this idea too. I proposed it as an alternative to web bundles.

Include a cache attribute to inline resources to mark the inline resource as cacheable. Whenever an empty element with the same attribute is encountered, use the cache. Could use a header with a serialized bloom filter to convey what has been cached to the server.


Doesn't come as surprising to me honestly. Besides performance reason, inlining everything also minimizes your dependencies as well, and now you can make sure what you send out is what user would be able to see.

I can also see how this makes integration testing magnitude easier/effective.


What s the deal with the CSS thing? I 've noticed that custom fonts cause considerable slowness, but CSS?

Nothing special about it, it's just another round-trip that you'd have to make.

There's also the statistical impact of many round-trips: imagine a network where 1% of connections take 1 second, while 99% take 1 milisecond. If you issue > 100 request then most page loads will be slow.


It doesn't need to be an extra round-trip in HTTP/2 or HTTP/3 thanks to server push (which is basically link rel preload, but without the roundtrip part)

It's real sad that the title of this was changed from "How the hell is Google so Fast?"

Conversely, why is Windows search so miserably slow?

Just use “voidtools everything”. It allows you to search all files, even with regex patterns, and it returns realtime results.

If you want to search on file contents, use “agent ransack.


Outlook search is even worse, I doubt I would employ anyone with Outlook Search team on their cv.

Google search can be fast because thousands of computers complete your request at once, whereas the index of junk on your Windows box only has the one little computer to serve it. Same reason why operations in Gmail are so much faster than a local client like macOS Mail: the instantaneous compute power brought to bear on your request while it is running is thousands of times larger than one computer.

While that's true, Windows 10 search is visibly worse than even Windows 7 search, and much worse than MacOS search. And their fuzzy matching algo for the start menu is deranged.

I would guess that aggressive caching and indexing are at least as important as having thousands of computers complete the request at once (and I'd be surprised if the number of computers on the request path is nearly that high--if nothing else, thousands of computers mean you're almost guaranteed to hit p999 latency every time).

> guaranteed to hit p999 latency

That's why people invented the backup request.


> How to Be Fast, Like Google

>> Make Less Requests

From 130 requests, that's an odd takeaway.


I don't think you read the article. Devtools shows 130 requests because it considers inline data URIs to be a request--instead, there are lots of inlined images. so the page looks mostly complete with only the original document request.

I did read the article that's I picked to sentences from there. I did not pick up from the article that chrome counts inline data as requests and I did not know that earlier.

I find that even more odd that inline data is counted as requests.

edit: stand corrected, thanks for pointing out


> Important: It’s worth noting that each one of these Base64 encoded images is counted as a “request” in the Network tab of Chrome developer tools. This explains why there are so many images “requested” but the page is so fast. The browser never goes over the network to get them!

I paused at that as well, maybe it makes sense to think of it as a "request for additional execution" where it happens to be to Base64decode the data URI or pull the image from a remote server.

Clearly the article means "make fewer render blocking requests."

Climb to the top of a mountain in the USA (or live in the countryside of a developing country), where you can only get 1 bar of 3g service.

Then try to load https://cr.yp.to

Ok now try to do a google search for dog food.

Tell me which one executes in less than a second, and which one hangs forever


Are you comparing search with a static page load ?

Just compare the google page load if you want to compare with that ?


I'm just asking you to do the test, because it was forced on me for 6 months of my life :)

Firstly, cr.yp.to is hosted on some really basic consumer grade hardware most likely and has to make an additional hop through Tonga of all places for DNS resolution.

Of course, the page size of cr.yp.to is very small and does not involve any other communication with other servers to deliver a request.

But Google has x million machines, x million miles of fiber, x million sticks of ram, and the page size on google also isn't terribly huge. And it's serving a cached result usually, the robot has already scraped it.

But still, because of the number of network pings that serving a single google search takes, it is extremely common on cell phones to lose service for a millisecond and completely destroy the bidirectional connection between you and Google, and your mobile browser will just sit there hanging forever until you force restart it.


> But Google has x million machines

That's because google has that amount of traffic. I am sure a hn hug would take the cypto site down.


Does using the .to TLD really imply that the request has to make a hop through Tonga? I'm not a DNS expert, but that doesn't sound right.

(I ask as the owner of a .so site...)


Short answer, No... Long answer, 'sort of'.

DJB uses a custom nameserver for DNS, so only one of the lookups would have to go through Tonga. The web request wouldn't leave the US (assuming that's where you are).

It goes sort of like this:

1. You query 'cr.yp.to' which your DNS server will either have cached, or make a recursive query to '.' (root DNS), then to '.to.' The '.to.' server will have nameserver information for 'yp.to.' which gets queried next.

2. The nameserver for 'yp.to' is looked up. In this case, it's `uz5jmyqz3gz2bhnuzg0rr0cml9u8pntyhn2jhtqn04yt3sm5h235c1.yp.to.'

3. The A records for that NS record are '131.193.32.109' and '131.193.32.108', both of which belong to an IPv4 block owned by the University of Illinois at Chicago.

4. Your computer queries the A record(s) for the subdomain, 'cr.yp.to.' from the nameserver for 'yp.to.' which in this case points to the same two IPs as the nameserver.

5. Depending on your OS/Browser, you will make an HTTP or HTTPS connection with that server and begin to download site content.

So, no you don't "make a hop" through Tonga, but the '.to.' nameserver will receive a request (if it's not cached by your resolver) to resolve the 'yp.to.' server.


I also am not an expert, but Dan J Bernstein (the owner of cr.yp.to) states that his website --- er actually hold on, I think I misremembered the story from his site. Let me paste it here:

2000-11-16: The .to administrators destroy yp.to for reasons that have never been explained.

See more here: https://cr.yp.to/serverinfo.html


That's quite an interesting page.

They had a recent outage after 3 years

> 2017.07.24 roughly 21:50 to 22:10: Unexplained network outage at UIC.

> 2020.04.20 roughly noon GMT: Power outage at UIC.


Hehe, UIC is the bane of cr.yp.to's existence, between network and power instability... Makes you appreciate what the big cloud players abstract away from us!

A cold cache google page load here pushes 725kb just to render a logo and search box. To avoid fingerprinting my cache is always cold.

Google search is a huge dog


Considering the functionality, what would be your limit to keep google from being considered a "huge dog"?

42 links, a text box, vector logo, 20kb and that's generous

Autocomplete function, dropdown menu JS, 10kb max

Considering this page is viewed by millions, and doesn't even contain the logic to render search results


I was linked https://lite.duckduckgo.com/lite, and even without autocomplete or dropdown functionality, it comes in at 8.5 kB, so I'd say that the 10 kB goal is pretty aggressive.

Reverting to their own search site from 10 years ago?

You might be pleased to notice Google Scholar is pretty much still the old design, such a breeze to use


That seems unfair, considering there is no lite google equivalent. The full duckduckgo site still comes in lighter (3/4 of the size), and includes a lot more media, so it's still better than Google.

Honestly, Google is one of the modern marvels, even when I read the article and understand _why_ Google is so fast, I still cannot comprehend it.

To people over 30 years ago, this is a literal example of magic.


> How is Google so fast?

It is not. At least not in the way the article is talking about. So while yes, requests are fast, it doesn't really feel that way when I can't click anything for a good second after hitting search.

My browser takes roughly half a second crunching JavaScript, doing layout, and drawing the result page AFTER doing a whole boatload of HTTP requests. Annoyingly half that JavaScript runs AFTER the page is rendered (150ms-200ms of JS functions when I can already see results, why?), making the page appear to lag if you immediately try to click/tap something.

If they would serve the page as static html and css instead, load times could easily be below 50ms with another 20ms for my browser to present me with something that is interactive right away.

The impressive part is how fast they generate results.


I mean... is it? It feels a _lot_ slower than it used to, especially on slower devices.

Possibly that 700kB of random crap is implicated :)


Classic HN. Snarky, condescending and nothing of substance. Reminds me why I visit HN less and less lately.

Please don't sneer, including at the rest of the community.

https://news.ycombinator.com/newsguidelines.html


I mean, it's fast _for an SPA that weighs 700kB compressed_. It is, however, substantially slower than it used to be, for no clear benefit.

You want to claim it's substantially slower than before. Do you have data to back that up ? If it's based on your personal anecdote, my personal experience is that the speed stayed roughly similar. It has become more "cluttered" with lots of extra stuff - images, knowledge cards, list of items, maps, etc. But the page load speed has been mostly similar - fast enough, though not quite instant.

When they first launched the search-as-you-type, it did feel instant but then I don't search on google.com anymore but use the chrome url bar.


He makes a valid point; why does a page with a text field and links to other pages need to weigh anywhere close to 700kB?

knowledge cards are a lot more than links


My chrome devtools says DDG took 900ms to load the search result on that page (searching for "foo bar"), vs Google search taking ~200ms for the initial result.

And subjectively, comparing side by side, Google is faster to display the result.

So your claim of 100x is not only inaccurate in quantity, it's opposite direction as well.




Applications are open for YC Winter 2021

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: