
Google homepage doesn't close html tags, on purpose - thorax
http://blog.bug.gd/2009/06/27/the-highest-traffic-site-in-the-world-doesnt-close-its-html-tags/
======
tdavis
I demand to see some evidence supporting this assumption that closing tags
"take up time". It seems to me that logic dictates it would take longer to
parse a broken schema than a valid one. Perhaps leaving out those tags saves
Google $1MM per year in bandwidth costs; it wouldn't terribly surprise me. But
what _would_ surprise me is evidence that browsers are _quicker_ at processing
an invalid document.

~~~
joeyo
I would guess that the rationale for omitting tags was to make their pages
smaller so that the data transfer is quicker and cheaper.

~~~
tdavis
Right, and I'm saying that it's presumptuous to just figure "Oh, fewer bytes
== faster page loads!" The data transfer may be quicker (and their costs
certainly cheaper), but does it do anything to speed up the _experience_ of
the site? Not if the saved milliseconds are lost parsing the monstrously
invalid DOM.

~~~
jerf
Go read the HTML specs. Very, very carefully, and without preconceived
notions. This is not "monstrously invalid" for HTML. Many, many end tags are,
in fact, entirely optional, as defined in the specifications, and many tags
such as "body" are also surprisingly optional.

HTML is not XML.

It might be invalid, it is "monstrously invalid XHTML", but it is not
"monstrously" invalid HTML.

~~~
sh1mmer
The question is not if it's valid or invalid (which is another debate) but
rather is it faster or slower.

I'd like to see Google provide facts and figures. When Steve Souders worked at
Yahoo! a large part of recommendations were backed up by experimental data
collected by Tenni Theurer (my manager's wife, coincidentally).

I'd like to see Google provide the same kind of experimental data for their
guidelines. If saving 14 bytes is a network saving, where are the page render
tests in browsers, particularly "A-grade" browsers?

~~~
CalmQuiet
See facts and figures? Absolutely.

In light of the whipping their "facts" on PHP optimization recommendations
have received here ( <http://news.ycombinator.com/item?id=676856> ), it's best
not to take pronouncements from a google post as coming from on high.

------
oliverkofoed
Google is also one of a very few companies where this type of micro-micro-
optimizing even begins to make sense.

~~~
Confusion
No it doesn't: this makes sense for everyone. Scale would be involved if they
were out to save a few bytes per request, to save on bandwidth. However, they
are out to lower the _per user_ load/rendering time, which is completely
independent of the scale of the company delivering the page (assuming the
delivery scales well enough that extra traffic does not lower page delivery
time, but that is usually the case).

~~~
pmjordan
I think he's right. Google's home HTML is only 5kB, which is pretty low these
days, and their images and CSS aren't much more. If your page size is less
than 10kB, those couple of bytes start making a difference. Google have
probably literally optmised everything else they can optimise. Image atlas to
reduce it to one file, they have a CDN, Custom web server, custom OS kernel,
etc., some of which helps far more than some closing tag.

~~~
Confusion
Yes, all _those_ things _are_ determined by scale, because there is a clear
cost involved, that requires a certain scale to pay for itself. However, in
the case of closing tags the cost is negligible, so it's something that can be
used by everyone. Google claims (however, I still can't find the link that
shows) that loading/rendering their search result pages slower makes people
perform less searches. That's why they do not allow you to have more than 10
results per page; not even via your personal settings. The same trade off will
hold for every service whose profitability is directly dependent on the number
of page views. So it's not scale, just the sort of business you are in that
determines whether not closing tags makes sense.

------
inimino
The author doesn't seem to know the difference between a tag and an element.
An unclosed _tag_ would of course be something like "<html". It wouldn't even
be accurate to say that Google doesn't close elements; for example the body
and html elements are closed at EOF whether the closing tags are included
explicitly or implied.

A more accurate headline would have been "Google leaves out some optional
tags, as understood by all browsers and permitted by every HTML specification
since the beginning of time".

~~~
thorax
I'm the author-- I feel it's common enough to say "close tags" when you mean
closing or ending a markup element.

I'm surprised to see such a literal nitpick upvoted so much, but I can't
disagree that you're correct technically. I'm just trying to share an
odd/interesing position of Google's and not trying to write a deep analysis or
anything.

~~~
Confusion
It's common to say it that way, but that doesn't make it proper. As George
Orwell argued in his 1946 essay 'Politics and the English language'
(<http://www.orwell.ru/library/essays/politics/english/e_polit>): clarity of
language is closely bound to clarity of thought. Everyone should be encouraged
to write as clearly as possible, to avoid confusion and muddling of thoughts.

Of course, the grandparent overstates his case and is unconstructive as a
result. His fallacy is that he argues that someone must be an idiot, because
they make a trivial error. Which overlooks the fact that trivial errors are
made by experts all the time, exactly _because_ it's a trivial point that
doesn't have their attention. You wouldn't believe the silly errors PhD's fix
in the papers of fellow PhD's.

------
tremendo
This seems disingenuous once I view source on search results and find multiple
inline script elements, some not too small, again inline, style/CSS in the
head and many dispersed span tags also with inline style declarations. I see
some element attributes not quoted but at least the href on search results are
indeed between quotes.

From what I can see in their source, there is _a lot more_ they could do to
optimize their bandwidth and page load speeds than eating a couple of closing
tags.

~~~
nex3
Requesting an entirely new file for CSS and Javascript is probably
significantly more expensive (time-wise if not bandwidth-wise) than sending
down more bytes for this one page.

~~~
tremendo
For one time, the first. After that it would come from the browser cache with
a bandwidth cost of zero. If every user makes a few queries daily the savings
would surely be bigger than not closing a few tags here and there. And I still
see lots of onclicks, and spans with multiple CSS class names. Their layout is
not that complicated, surely there's room to optimize that too.

------
ashleyw
Is there any real reason for the body and html close tags (and head for that
matter), other than to fit in with the rest of the syntax? Like something
you'd want to put after the body, but still within the html tag?

    
    
      <html>
        <head>
          Here's our head
        <body>
          Here's our body

~~~
psadauskas
There isn't even a need for the <head> and <body> tags at all. This is valid
HTML 4.01 Strict:

    
    
      <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
      <html lang="en">
        <meta http-equiv="content-type" content="text/html; charset=utf-8">
        <title>Here's our title</title>
        <link rel="stylesheet" href="stylesheet">
        <p>Here's our body
    

And this is valid HTML 5:

    
    
      <!DOCTYPE html>
      <html lang="en">
        <meta charset="utf-8">
        <title>Here's our title</title>
        <link rel="stylesheet" href="stylesheet">
        <p>Here's our body
    

(From <http://meiert.com/en/blog/20080429/best-html-template/> )

------
thorax
Note that Google almost never employs any technique on the main search site
unless it _positively_ affects their statistics.

So the fact that these elements aren't closed probably improves (by a
measurable percentage) user engagement/success on their site.

------
audionerd
I remember Anne van Kesteren's site once had something like "it's valid,
sure." in the source code comments. I had looked through it by chance and
couldn't believe someone from Opera would omit _so many useful tags_. But
then, sure enough, it validates just fine.

<http://annevankesteren.nl/>

------
prodigal_erik
There's something deeply wrong with the way web developers learn how this
stuff works. I keep seeing surprised discoveries of facts that haven't changed
since HTML 2.0 came out thirteen years ago.

What is it? Are the books they read (instead of the actual specs) really that
terrible?

~~~
Goladus
Knowing where to start always looks a lot easier in retrospect.

------
gtani
yes, but you want to close frame, img, li, and p tags

[http://msdn.microsoft.com/en-
us/library/ms533020(VS.85).aspx...](http://msdn.microsoft.com/en-
us/library/ms533020\(VS.85\).aspx#Close_Your_Tags)

~~~
gojomo
Theoretically, perhaps. Practically, I doubt the close-tags proposed by this
MSDN article would make any measurable difference.

In fact, leaving tags out consistently in certain tag-heavy auto-generated
layouts, by reducing network IO and fitting more content into memory buffers
at a time, could be faster.

~~~
dunham
Using the close-tags proposed by the MSDN article would yield an invalid HTML
document. Per the DTD and the SGML standard, the closing tag of IMG, an
element with EMPTY content, MUST be omitted. The element is already closed
before the parser gets to the closing tag - so, in essence, you would be
trying to close it twice.

------
known
Back to Basics <http://www.joelonsoftware.com/articles/fog0000000319.html>

------
thomasswift
The only thing about this I really ever heard was the fact that they strip out
all whitespace, but checking
[http://validator.w3.org/check?uri=http%3A%2F%2Fwww.google.co...](http://validator.w3.org/check?uri=http%3A%2F%2Fwww.google.com)
it seems as if they don't use quotes on tag attributes where ever they are not
necessary as well as many other things I was taught not to use, to produce
valid code.

~~~
dchest
Not using quotes on tag attributes is okay for HTML provided that... refer to
the standards for the complete info.

~~~
martey
From HTML 4 (<http://www.w3.org/TR/REC-html40/intro/sgmltut.html#h-3.2.2>):

 _The attribute value may only contain letters (a-z and A-Z), digits (0-9),
hyphens (ASCII decimal 45), periods (ASCII decimal 46), underscores (ASCII
decimal 95), and colons (ASCII decimal 58). We recommend using quotation marks
even when it is possible to eliminate them._

From HTML 5 (<http://www.w3.org/TR/html5/syntax.html#attributes>):

 _...the attribute value, which, in addition to the requirements given above
for attribute values, must not contain any literal space characters, any
U+0022 QUOTATION MARK (") characters, U+0027 APOSTROPHE (') characters, U+003D
EQUALS SIGN (=) characters, or U+003E GREATER-THAN SIGN ( >) characters, and
must not be the empty string._

------
quizbiz
Google homepage is fine, they need to spend more time on maps.google.com. I
wonder what the true return is on investing in the analysis, the number
crunching, etc. just to save a few bits of data.

------
TweedHeads
There must be a formula for queries a year, returning users, versioning and
caching resources.

I bet inline styles and scripts are heavier in the long run than cached as
external resources for millions of requests, specially multiple requests a day
per user.

Now, I don't know how cache works, but if you have to make a resource request
just to get a 'has not changed' response, then the problem is in the protocol.

How about sending all 'modified-since' tags in the header for every resource,
so the client then requests a second bulk with all the resources required?

Somebody can explain how cache works in the browser?

