
Google Homepage Size Over Time - ISL
http://measuredmass.wordpress.com/2012/10/06/google-homepage-size/
======
ashray
I find this interesting as well as unimportant at the same time. Interesting
because it shows that Google has responded to increasing connection speeds by
adding bells and whistles to their homepage. That's really great.

Unimportant because with Google Chrome, Firefox, Safari, etc. - going to the
google homepage is almost entirely optional. Google is working so hard into
baking themselves into almost any facet of web functionality that their
homepage is becoming less and less important over time.

The only reason I check the google homepage is to look at the doodles.

~~~
jackfoxy
Strangely, Google is killing off the one thing that keeps me going to their
_sorta homepage_ , iGoogle. When they finally do kill it about a year from
now, I'll have to come up with some substitute which may not be so Google-
centric.

~~~
mithras
Yeah I don't understand why they do that, igoogle is pretty good and I use it
alot.

~~~
wilfra
I don't even know what igoogle is or does.

User adoption is what drives features like this to remain, not how useful they
are.

~~~
MikeCampo
That is true. This feature has been around for many years and used to be
advertised on their navbar before Google+ came about, but it never stood out
as a prominent component of the Google experience. I figure they are pushing
people to adopt Google+ as their "portal" but I'm likely wrong as the two
products are quite different. It could also be that it doesn't generate any
revenue and the experiment is nearing the end of its life cycle.

------
27182818284
I always enjoyed this interesting story:

 _One vigilante sent Google an anonymous email every so often just listing a
number, like 37 or 43. Eventually Mayer and her colleagues figured out it
referred to the number of words on the Google homepage—the implication being
that someone was keeping track, so don't screw up the design_

– The Google Story

~~~
user24
A friend of mine used to keep a post-it note on her monitor with 2,073,418,204
written on it. She was keeping track of the number of pages google reported it
had indexed[1]. We got very excited when it changed to 3,083,324,652.

[1]
[http://web.archive.org/web/20030205061559/http://www.google....](http://web.archive.org/web/20030205061559/http://www.google.com/)

~~~
tgb
Does Google report this anywhere nowadays? I searched for it a while back and
couldn't find it.

~~~
sovok
I found only estimates: <http://www.worldwidewebsize.com/> 40-50 billion
pages. From the site:

"The size of the index of a search engine is estimated on the basis of a
method that combines word frequencies obtained in a large offline text
collection (corpus), and search counts returned by the engines. Each day 50
words are sent to all four search engines. The number of webpages found for
these words are recorded; with their relative frequencies in the background
corpus, multiple extrapolated estimations are made of the size of the engine's
index, which are subsequently averaged. The 50 words have been selected evenly
across logarithmic frequency intervals (see Zipf's Law). The background corpus
contains more than 1 million webpages from DMOZ, and can be considered a
representative sample of the World Wide Web."

------
gklitt
I see the general point being made here, but it's important to put it in
context. The current homepage design is arguably more minimalistic and useful
than any design from 2000 onwards. Links to other Google services are
conveniently located in a bar at the top where they don't get in the way (as
opposed to crowding the area around the search box previously). I can click a
microphone and speak my query. When I start typing, results start loading
instantly. All improvements in my book, and definitely worth it for the
broadband world, which is probably a lot of Google's business at this point. I
think it's important to separate minimalist and usable design from minimalist
HTML -- I'm sure Google is keeping an eye on their page size and making
conscious cost-benefit decisions here.

Now don't get me started on the results page though...

~~~
ISL
I'm not sure I had a point (though minimalism is appealing); when putting the
numbers together, I just wanted to see what the plot looked like.

------
mbell
TFA: "...included only the size of the ‘.html’ file, no images or fanciness."

The '.html' file for google.com has most of the 'fanciness' embedded in it in
the form of a ton of JS and all the CSS.

I think this architecture was put in place when instant search/preview went
live which is the cause of the first large spike. I'm guessing the second
spike is integration with google+.

------
qxcv
This looks incorrect.
[http://web.archive.org/web/20110713000446id_/http://www.goog...](http://web.archive.org/web/20110713000446id_/http://www.google.com/)
is only 28K according to ls -lh.

Perhaps OP forgot to put id_ after the capture timestamp in the archive URLs?
The id_ makes sure that the Wayback Machine only returns the page as it was
when it was indexed.

~~~
ISL
I definitely used id_ . I saved the source that I used; I'll recheck things.

~~~
ISL
Should be fixed now. The last point is still ~100k. Running Chromium, if I'm
logged in, google.com's HTML saves as 104k. Logged out, it's 94k, as measured
by ls -lh.

~~~
qxcv
Looks good. I was using cURL, so that explains the ~100k vs. ~28k discrepancy.

------
kevs
I'd be curious to see a similar plot of average internet speed vs time.

~~~
roryokane
And, correspondingly, a plot of Google homepage HTML file size divided by
average internet speed vs time, so we wouldn’t have to compare the two curves
in our heads.

------
thingummywut
It's 100 kB now.

If the plot had a similar curve from 0-50kB would everyone still be
commenting? What about 0-5 kB?

Is this about the absolute values or the curve?

Unless everyone is expecting exponential page size growth to continue
indefinitely (and ignoring bandwidth increases), I don't see the point.

~~~
oscilloscope
1-5 kB would be even more interesting. That would make Google's flagship page
several orders of magnitude smaller than most websites.

------
jnazario
would be interesting to compare this to world wide typical and average
bandwidth availability.

google used to prize page load times for their search landing page. i wonder
if the increase is coming about because they recognize they can indeed get
away with it and maintain their performance goals, or if they re-shifted
priorities and/or acceptable values for load times.

~~~
paulsowden
Perhaps also the page is tee-ing up for even faster results page access, by
way of instant search.

And I haven't checked, but perhaps most of the k weight comes in after the
page has rendered, in which case it wouldn't negatively impact perceived load
time.

The page is also more featurefull than it used to be when in the logged in
state.

------
chrisbroadfoot
This is ignoring the fact that a whole bunch of assets (CSS/JS) are embedded
into the page to reduce the number of HTTP requests.

~~~
biturd
Wouldn't it be better to have two asset pages, one for JS and one for CSS. (
unless there's a way to merge those two )

I don't know how often google changes that code, but even if it were as often
as weekly, users would gain the benefit of caching those files and loading
locally, thereby making the page even less to download.

Or are those two http hits really that expensive?

The only reason I can think it is a bad idea is caching could hurt them. If
the browser doesn't handle caching correctly, and they do change the source,
frequently or infrequently, users may see broken pages and/or broken
functionality. Not all users know of Shift-reload, none should need to, and I
find it doesn't work reliably myself.

What is the best practice here? 2 http requests, embedded code on the page,
arbitrary http requests that are cached, no caching, etc.?

~~~
lazyjones
> If the browser doesn't handle caching correctly, and they do change the
> source, frequently or infrequently, users may see broken pages and/or broken
> functionality.

Typically you just use a new URL to get around this ... Frequently changed
cacheable files (css, js) are usually timestamped or contain version
information in the filename.

------
ahoge
On average, the size of a website triples every ~5 years.

Nowadays there are lots of really fat websites with tons of resources.
Personally, I think that's kinda convenient. This way it's a lot easier to
make sites which are much faster than those of the competition.

More and more people start to merge all of their CSS and all of their JS
files, then they minify them, and finally the whole thing is gzipped. This
surely helps a lot. However, their sites still continue to grow, because they
continue to add more and more crap.

~~~
rwmj
If only my internet connection tripled every 5 years too.

In reality, it was 28.8Kbps up to 1999, then has remained in the 2-3 Mbps
range ever since.

------
lazyjones
I wonder how much of this has nothing at all to do with presenting search
results to the user, i.e. fulfilling the user's expectations... (guess: > 95%)

Does anyone have a good CLI for Google and duckduckgo that I could use from
within putty with clickable links? (similar to surfraw, but just dumping text
+ URLs from the results to stdout instead of launching a text mode browser)

------
zzzwat
?

$ curl -sL <http://google.com> | wc -c

11328

~~~
ISL
Plot updated to answer your question. There's a big difference between that
which is served to curl and to a browser.

Thank you!

------
MikeCapone
Maybe it's psychological, but I feel that since they shifted to SPDY the
encrypted version is more responsive despite the bigger size.

------
dmix
The important thing is iGoogle was killed off.

~~~
tnuc
Not yet, another year. <http://www.google.com/ig>

------
mitchi
It gets cached so it's okay.

------
philip1209
The label of the ordinate contains an error - kilobyte = byte * 1024, as
opposed to byte / 1024

~~~
drcube
No,

#bytes * (1 kilobyte)/(1024 bytes) = #kilobytes

or bytes/1024=kilobytes, for short.

Basically, it's a conversion factor, not a definition.

~~~
abrahamsen
Might want to consider using the unambiguous term kibibyte instead.

<http://en.wikipedia.org/wiki/Kibibyte>

It is a tradeoff, it avoids the need to explain which definition of kilo is
used, at the cost of using a far less common term.

~~~
RobAtticus
Kibibyte doesn't make things clearer in this case. They aren't talking about
whether the factor should be 1000 or 1024. The first poster said that
kilobytes should be bytes * 1024. That is incorrect. The number of kilobytes
is equal to the number of bytes / 1024, not bytes * 1024.

------
spullara
Seems like it goes down hill roughly when Marissa moved on.

~~~
cpeterso
I'm sure someone is measuring yahoo.com since Mayer joined. :)

