How Browsers Work: Behind the Scenes of Modern Web Browsers

gioele · on Feb 14, 2012

> So a lot of the parser code is fixing the HTML author mistakes.

This is probably my biggest problem with most of the ideology behind HTML 5. You learn from past mistakes and HTML has no way to teach you anything because there is no error-correction-test feedback cycle.

How do you learn proper HTML? You write something, you load it in a browser, does it show properly? It was right. Does it show in a funny way? There must be something wrong. What? Who knows. Nobody knows because you cannot, by definition create _wrong_ HTML 5. It is just that you did not write what you think you wrote.

There are validators you can use to validate your page. But most of the times they will tell you that your page is wrong while it actually looks perfectly as you wish in the browser. The browser is not complaining and the page looks good... these validators must be too picky, a waste of time.

The problem that killed XHTML was the draconian error handling in most browser. Only Opera had a good way to handle XHTML errors: a banner that told you that the page had an error and which error. Under that banner the page, rendered as best as possible. That was a good way to learn (and to tell people that the person who wrote the site is not a pro).

RyanMcGreal · on Feb 14, 2012

I'd argue that forgiving HTML parsing is one of the main reasons the web got as big and broad as it did.

http://quandyfactory.com/blog/39/the_virtue_of_forgiving_htm...

finnw · on Feb 14, 2012

I disagree. It's also what allowed the "IE-only web" to persist for about five years.

It might have been a good thing until about 1997, but at that point there was no shortage of people creating new web content, and raising the barrier to entry would have done no harm. And a lot of innovation in browser features might have happened sooner (due to increased competition between browsers.)

RyanMcGreal · on Feb 14, 2012

There was great competition between browsers through the 1990s. Unfortunately, they were competing through 'value add' proprietary extensions and browser lock-in, which is orthogonal to the issue of whether HTML parsing should be more permissive or more draconian.

flomo · on Feb 15, 2012

Somewhat. Microsoft reverse-engineered a lot of Netscape's rendering quirks, as well as adding their own. Being quite liberal it what it accepted certainly didn't hurt IE adoption. (And a lot of these things are now in black&white in the HTML5 spec).

RyanMcGreal · on Feb 15, 2012

Related: http://diveintohtml5.info/past.html

"[W]hy do we have an <img> element? Why not an <icon> element? Or an <include> element? Why not a hyperlink with an include attribute, or some combination of rel values? Why an <img> element? Quite simply, because Marc Andreessen shipped one, and shipping code wins."

flomo · on Feb 14, 2012

> The problem that killed XHTML was the draconian error handling in most browser.

I'd argue just the opposite. Browsers treated XHTML doctypes as "tag-soup" HTML4. Firefox would only validate the document if you used the xhtml mime type, in which case you lost progressive rendering and your site would seem slow to the user.

The key point here is that XML wasn't doing any favors to the browser vendors - their rendering model just didn't work that way internally.

Net result is a gazillion 'XHTML' documents which aren't actually XML. Now you can't start to throw up end-user warnings or half the web would appear to be broken. So, admit it was a dubious idea to begin with and start over.

pornel · on Feb 14, 2012

> What? Who knows. Nobody knows because you cannot, by definition create _wrong_ HTML 5.

HTML5 defines pretty strict conformance requirements for authors. That's separate thing from defining error recovery mechanisms for UAs.

You can easily learn what is wrong with your code using the W3C Validator.

http://validator.w3.org/check?uri=http%3A%2F%2Fpornel.net%2F...

which is a big improvement over the old DTD-based one which couldn't verify contents of attributes or structures more than one level deep:

http://validator.w3.org/check?uri=http%3A%2F%2Fpornel.net%2F...

gioele · on Feb 14, 2012

> HTML5 defines pretty strict conformance requirements for authors.

What you are referring to wrong as in is _non valid_, what I was referring to was _non working_.

Invalid HTML 5 _works_, so does invalid HTML. At no point your browser will stop and say "Come on, that is not HTML, that is garbage". If there is no such point, then there is no _wrong_ HTML 5.

Take this code, it is valid HTML 5 (may the XML gods forbid me)

    <!DOCTYPE html>
    <title>My feelings</title>
    I love HTML
    </html>

It will be shown without any problem by a browser. The title will be "My feelings" and the body will be "I love HTML".

The following is invalid HTML5

    <title>My feelings</title>
    I love HTML

Yet, it will be shown "correctly" by browsers without any problem, just like the previous one.

Once such a lax error recovery mechanism is in place _without additional warning in the UI_, how is one able to define what is wrong and what is correct?

pornel · on Feb 15, 2012

> how is one able to define what is wrong and what is correct?

There are many arbitrary lines there. There were huge bikeshedding debates on HTML WG just how much must be quoted, escaped, declared and closed.

Generally the correct/valid subset is chosen to be free from gotchas as much as possible (only things that behave as expected are allowed).

It's a compromise between best practices and not so pretty, but very common code out there.

It's counter-productive to declare 99% of working pages "invalid". With less nitpicking errors validators can have better signal to noise ratio and flag errors that are more likely to cause trouble, and authors are more likely to take those seriously rather than assume validator is impossible to please.

e.g. misnested tags are disallowed, because it's hard to understand how they are interpreted.

DOCTYPE is required, because it disables emulation of IE5 bugs (Quirks Mode).

OTOH unquoted attributes and some unescaped ampersands are allowed, because most often they're parsed unambiguously in a way that authors expect.

d0mine · on Feb 14, 2012

    <!DOCTYPE html>
    <title>My feelings</title>
    I love HTML

it is a valid html5

gioele · on Feb 14, 2012

You are missing the point: I know that you could add `<!DOCTYPE html>` to make that document valid and you know as well. But whoever writes the second snippet does not know because we are not there to point it out. And if you point it out they will look at you puzzled: "You are saying that it is not valid, but it renders, and in exactly the same way! Why are you making all this fuss about this "validity" thing?"

jacobr · on Feb 14, 2012

HTML5 does not mean that any markup is valid. The specification simply defines what a user agent should do when it encounters invalid markup. Previously user agents had to guess what others (IE) did when they encountered a piece of invalid markup.

monsterix · on Feb 14, 2012

I don't see much of problem with loosely typed markup. Because markup is not supposed to be written by only the highly skilled engineers.

An average Joe is supposed to feel great about writing something that renders the way he/she wants without having to go into deeper stuff like semantics, validity or even cross-browser.

This is something that should be left to people who need to know it in their strata, isn't it?

gioele · on Feb 14, 2012

> An average Joe is supposed to feel great about writing something that renders

Indeed HTML is great for that, but the problem is that you never "level up". Once your content renders, you are done. A lot of Joes may be interested in how things work behind the scenes or in making things "correct" more than "just working". It would be great, from a pedagogical point of view, to have browser render Joe's content (for instant gratification) but also to point out that "Ehi, on line 32 you closed </p> before </i>. It should be the other way around because of this rule called nesting, have a look at it". I think we are wasting a lot of man-years around the globe for the lack of such warnings.

In the education of many people, compiler errors and warning had exactly this function: they made you do whatever you wanted (as long as decent) but they would also pointed out the basic mistakes ("Ehi, on line 14 you print the variable prg_name, but that variable has not been initialized, beware").

nske · on Feb 14, 2012

The downside is the average Joe thinks this stuff is easy enough and he should do it professionally. And then:

- Market floods with professionals who antagonize proper web developers, since, to unknoweledgeable clients, the result appears to be the same

- Web floods with sites that behave in unpredictable ways in different browsers

- Proper developers carry the burden of dealing with idiosyncrasies of various browsers

- Browser developers carry the burden of trying guess through amazingly creative atrocities

Although, to be honest, things don't look as grim as they used to regarding the middle two. I just wish some people would stick to HTML and stopped brute-forcing javascript to work.

flomo · on Feb 15, 2012

Except all of those things started happening in about 1994. And that's why 15+ years later we have spec which defines error conditions rather than just the 'proper' way.

(If you weren't online/sentient back then, sites commonly had these 'Best viewed in Netscape' badges on them.)

yuhong · on Feb 15, 2012

I think it started with Mosaic in 1993.

andrewthornton · on Feb 14, 2012

This is great content, but it is so hard to read. The light blue against the light gray is rough on the eyes, and the font size/style isn't helping either.

truncate · on Feb 14, 2012

Here's the article at author's website http://taligarsiel.com/Projects/howbrowserswork1.htm.

moonboots · on Feb 14, 2012

"Tali published her research on her site, but we knew it deserved a larger audience, so we've cleaned it up and republished it here."

duaneb · on Feb 14, 2012

That's much better; thank you!

tomkin · on Feb 14, 2012

I was just about to post something to that effect. The site feels like Microsoft MSDN. In the case of documentation or technical information, you'll never go wrong by using a large, easy-to-read font with a strong contrast between background and text color.

fady · on Feb 14, 2012

could not agree more. very hard to read, seems opaque in a way when trying to read the text.

Update: so, i removed the dot texture on the page and also changed the google webfont to "gudea" instead of "open sans" and it seems to be a lot more readable. i think if the authors of the site changed the font to something more readable, all would be good. the font i choose was just picked at random.

see my edits: http://dl.dropbox.com/u/103326/after_Gudea.png

david_a_r_kemp · on Feb 14, 2012

readable http://readable.tastefulwords.com/ makes a reasonable job of it, but it's clearly not built to be read.

wtetzner · on Feb 14, 2012

You might want to try readability: http://www.readability.com/

fady · on Feb 14, 2012

readability is a great option, but this site is brand new. it's not like it has outdated typography, etc.. you would expect the site to be readable without any tools :\

why-el · on Feb 14, 2012

I actually used readability to prettify the article and send it to my kindle. Sadly though, it stayed on my Kindle for a long time, hopefully I will read it soon.

sairion · on Feb 14, 2012

It really is.

javascript:void((function(){jQuery("body").css("font-family","hevetica,arial,sans-serif");})());

Changing it to Helvetica (or Arial) make a little bit easier to read (Putting above code into URL bar does this).

mauriciob · on Feb 14, 2012

Best way to read it is selecting all the text. The white text on red "background" is not bad.

Erunno · on Feb 14, 2012

Started reading the document tripped over a statement at the beginning:

"It is important to note that Chrome, unlike most browsers, holds multiple instances of the rendering engine - one for each tab. Each tab is a separate process."

This is not true and has never been true to my knowledge without restricting the statement. Chrome will open links opened in a new tab/window which point to the same domain in the same content process and it will start mixing different domains in the same content processes once a certain threshold is reached (whether this is a hard-coded upper limit on how many content processes may exist simultaneously or if this is calculated dynamically I do not know).

srl · on Feb 14, 2012

Old discussion: http://news.ycombinator.com/item?id=2894708

Personally, I'm happy for the repost - I was looking for this, but couldn't find it, just yesterday.

topherjaynes · on Feb 14, 2012

Agree, reposts help me when searching in google site:http://news.ycombinator.com/

Also, it gives a chance to add to the discussion of what's been happening in the domain in the past 181 days!

alexchamberlain · on Feb 14, 2012

Sometimes I wish reposting of old content at the same URL was allowed.

camtarn · on Feb 15, 2012

A few interesting quotes about performance optimization from the article:

"Firefox blocks all scripts when there is a style sheet that is still being loaded and parsed. Webkit blocks scripts only when they try to access certain style properties that may be affected by unloaded style sheets."

"WebCore simply throws a global switch when any sibling selector is encountered and disables style sharing [optimization] for the entire document when they are present. This includes the + selector and selectors like :first-child and :last-child."

"After parsing the style sheet, the rules are added to one of several hash maps, according to the selector. There are maps by id, by class name, by tag name and a general map for anything that doesn't fit into those categories." Every element then looks in these maps to find rules which might apply to it.

awt · on Feb 14, 2012

I desperately need this in ebook format so I can read it in chunks, leave it, come back to it, etc. [edit] Found something -- dotEPUB works great.

Mavrik · on Feb 14, 2012

sigh Is it too much to ask HTML5 advocates to make sure their pages render properly on an iPad?

monsterix · on Feb 14, 2012

And also you guys, being at that level, should insist on Apple guys to respect web-standards for file-input on iPad Safari.

This is gonna get corny very soon.

monsterix · on Feb 14, 2012

Great work! The guy who does photoshop job for you is simply amazing.

From a high-level view, I hope we see super-intense competition of tablet browsers soon. Because looking at -webkit-on-iPad, I feel IE6 has just been re-invented.

duaneb · on Feb 14, 2012

What's so bad about webkit on the iPad? It's my favorite way to browse the web.

monsterix · on Feb 14, 2012

Yeah it's good when it comes to rendering neatly done content websites - i.e consumption.

But when it comes to creating the content, such as having an email done with browser based rich text-editing, or file-input for attachments it falls flat on the face.

Example of a discussion: http://answers.37signals.com/basecamp/2659-ipad-file-uploadi...