

Mathematically optimal markup with HTML5 - markchristian
http://shinyplasticbag.posterous.com/mathematically-optimal-html

======
moxiemk1
I hope that in determining which HTML is optimal _bandwidith_ wise, there is
also some consideration of what HTML is optimal _rendering_ wise. (obvious
cost savings of going hard down the bandwith optimizing aside)

Knowing how the parser treats certain constructs and deviations from the spec.
probably gives us some insight into how long things will take to parse. Or is
client side _that_ much faster than waiting on normal network latency?

~~~
yason
The parsing time is probably negligible. I would guess that parsing HTML5
would be a few if not dozens of megabytes per second on modern hardware.

~~~
SpikeGronim
Parsing speed depends on the markup. One example is text layout, which is O(N)
in the size of the text area. So 1 MB of text might be a lot slower to parse
than 1 MB of no-op tags.

~~~
Someone
I would not say text layout is done during parsing, it is done when the parse
tree that parsing produced is used to produce a render tree ('execution
time'?)

------
Hixie
I love how the author uses omitted-</p> as their example, when that's one of
the things that was actually already defined in HTML4 and isn't a syntax
error.

~~~
markchristian
Yeah, I know that. Another random example is <li>, which doesn't require a
closing tag. It was just included as an example. Finding a genuine example
would require actually grokking the HTML5 parsing algorithm and starting to
try to find optimizations.

------
nitrogen
I suppose some day I'll have to start writing my HTML to be read by machines,
rather than humans. Right now I treat it like code: proper indentation,
helpful comments, and the occasional hidden message for the reader.

~~~
pjscott
If you _write_ it, then surely you should be able to read it. Then you have a
minifier convert it into non-human-readable form, if you like.

~~~
wlievens
Except he doesn't write it. It's output from a program. Compilers don't emit
readable output either.

I'm not saying I'm against clean HTML. But the argument you give is probably
invalid.

------
thristian
Interestingly, the HTML serialiser in html5lib[1] actually defaults to
omitting omittable tags; I wanted a small script like htmltidy that used the
HTML5 parsing algorithm, so I hooked up the html5lib parser and serialiser,
and wound up with output even more chopped and cryptic than what I put in.

Luckily, setting the include-optional-tags option gave me much more readable
output.

[1]: <http://code.google.com/p/html5lib/>

------
adamdecaf
<sarcasm> Sure, let's just start to omit ending tags from HTML. Who cares
about older browsers that will fail to properly render our sites. I'm _sure_
that every business and user keeps their browsers up to date. </sarcasm>

~~~
markchristian
Actually, the HTML5 parsing algorithm was derived by reverse engineering the
existing browser parsers. It's pretty compatible.

------
akozak
Does anyone have more details or examples of Google's "crige-worthy" HTML for
efficiency?

~~~
albertsun
Go view the source of Google.com

No </body> or </html> tags for one.

~~~
jasonmoo
A friend of mine worked for Google when that was introduced and said it was
purely a rendering speed optimization. Someone did some tests and found the
page pops a little faster if you leave out those closing tags. I can't seem to
find any articles on it but it seems more reasonable than trying to save a
couple bytes, given the other markup that's been mentioned here.

~~~
paulirish
Friend of mine did tests on this and found there is no significant performance
difference in either direction. So dropping them is certainly worth it.

------
jpr
I have hard time understanding why browers' parsing algorithms should be
standardized or why it's a good idea to do so. Shouldn't a language be defined
declaratively using a grammar?

