Hacker News new | comments | show | ask | jobs | submit login
Microsoft, Google, Mozilla, and Apple Object to W3C Fork of DOM Spec (github.com)
549 points by tptacek 8 months ago | hide | past | web | favorite | 374 comments




Thanks. The TL;DR seems to be that, instead of sticking to documenting what is, the W3C is (either deliberately or through incompetence) trying to push their own "vision" for DOM 4.1 without browser buy-in.


That does not mean that it is a good thing to put all bells and whistles from different browser vendors under the same umbrella.

Let's say you are a) owning and b) managing a project that three competing teams are working on in parallel.

If you will not curate the project you will have one team adding <marquee> and another team adding <blink>...

In real world you would invite dedicated architect or team of architects to define the spec that all 3 teams will implement.

All that WHATWG vs W3C flame is about bazaar vs cathedral management style I think.

Origins of the mess: W3C itself has no architects on board - they are just trying to moderate votes of others. Where each vote has an obvious weight (weight(Google) > weight(JohnButSmart)).

In contrary WHATWG has professional architect on top of the construct - Ian Hickson. In fact WHATWG was created by him. But Ian is associated with Google and that makes WHATWG legitimacy a bit questionable.


This information about how the WHATWG works is somewhat outdated; please see https://whatwg.org/faq for more.


I'm pretty sure Ian Hickson is no longer involved with the WHATWG. I think he's working on Flutter now (https://news.ycombinator.com/threads?id=Hixie).

In fact, I would say that it's not quite like a cathedral and bazaar here.

The W3C is like a legislative committee. Lots of things done by committee vote. Formal process that must be gone through to advance beyond committee. Petty politics that screw with actually getting things done. The committee chair having the ability to block work by procedural means.

The WHATWG is more like an open forum. The work happens out in the open. Anyone can participate in the discussion (the W3C committees frequently have private meetings, there have been private mailing lists, and so on). There are a few people who have commit rights to the repo, so who actually control what goes in and what doesn't, but they are generally willing to let in changes that have broad support, and are implementable, rather than letting such features get held up by political processes.

Besides the difference in structure, there's just a difference in attitudes. The W3C tends to strongly favor certain principles, like accessibility and modularity, but to the exclusion of compromise for technical reasons or real-world reasons. They also seem to have a tendency to get very attached to particular ways of doing things, without being willing to compromise. I think the biggest example of this was longdesc; it was never implemented properly in pretty much any browser, and very few people actually followed the spec and had it point to a URL (many people just copied the alt tag, or provided a longer description in the attribute instead of a URL), so even if browsers or screen readers had implemented it, the content wouldn't be useful. But people in the W3C made a big stink about removing this, and spent a lot of time and effort fighting and litigating over that, rather than actually trying to work on a different feature that could gain wider adoption.

The WHATWG tends to take a more pragmatic approach; pave the cowpaths, if there are differences between implementations do it in the most sensible way that preserved compatibility.

Now, what the WHATWG has produced hasn't always been the best; there are times when it's made mistakes in its approach. The drag and drop spec, which had been reverse-engineered from IE's drag and drop support, was a pretty bad; not sure if it's gotten better since.

But overall, the WHATWG has been a lot more productive in getting standards done that are actually used on the web, because they involve the implementers, and don't override them with tedious, drawn out, political battles over obscure features that no one uses.


the formal complaints of all browser people seem to support the WHATWG


The browser people are the WHATWG. There are no other stakeholders in that.


Not sure exactly what your comment is saying, but at least one reading is that "browsers are the only stakeholders in the WHATWG", which is not accurate. The WHATWG is a community organization open to participation by all; see https://whatwg.org/faq#process for more. We receive a lot of participation from users, web developers, and other companies.

There is a formal group, the Steering Group, which represents the browsers that implement WHATWG standards, and serves as the point of final appeal if the community doesn't come to a consensus on its own. But this is similar to the W3C appeals track where if all the paying member companies don't come to consensus, they appeal to "The Director" as the ultimate decider. ("The Director" is nominally Tim Berners-Lee, but recently all Director decisions have been made by W3C management "on behalf of" The Director.)


Are there any non-browser members as part of the editors or steering committee in the whatwg?

To me, this is giving browser makers even more power over HTML than they have otherwise.


The MIME Sniffing, Streams, Console, and Quirks Mode Standards are all edited by people who are not working for the browser-engine-developer companies. (Streams is co-edited by Googlers as well.) That's 4 out of the 15 standards currently developed at the WHATWG; not so bad, given how few companies are willing to pay people to work full time on web standards.

Of course, we have lots of work to do, and if you or anyone else are able to devote a good chunk of time to standards work, we'd love to have more editors---no matter what company they work for.

Browser engine developers comprise the steering group, since ultimately we want the specs to reflect what will be implemented in web browsers, so the best way to resolve disputes about what should be in the spec is by asking the people who will be spending their resources to implement it what they think. I think this is probably the right way to go, instead of "The Director" having the final voice.


> That's 4 out of the 15 standards currently developed at the WHATWG; not so bad, given how few companies are willing to pay people to work full time on web standards.

Thats actually nice to know, but specifically about HTML (which is one of the most important specs the whatwg works on) it's all browser makers. (That too, majority of them from one particular browser maker).

>I think this is probably the right way to go, instead of "The Director" having the final voice.

I think having a better mix of people in the decision making process than just browser makers would be the right way to go. I understand that browser makers have to implement the specs, but the web community as a whole has to use the specs to build the web of the future - and as such, people from non-browser companies should have a greater say in the final voice.


Oh, you're right, I forgot HTML! That is co-edited by someone from Bocoup too. So make it 5 out of 15.


Not only are there non-browser-maker WHATWG stakeholders, there are non-browser-maker formal objectors to the W3C DOM move to CR endorsing the same view as the four browser makers (Bloomberg and Disruptive Innovations; I've never heard of the latter previously, AFAIK.)


I was curious so I googled a little bit:

Bloomberg is know for its Terminal. It seems they have a program where you can extend their terminal with web applications. Presumably Bloomberg didn't built a renderer themselves but there are custom APIs. So they are something between a browser maker and a UI framework like Electron.

Disruptive Innovations is a firm by Daniel Glazman, formerly of Netscape and of the CSSWG fame. His main product seem to be continuing the Editor part of the original Mozilla Suite based on the Gecko Layout Engine. There seems to be an NVU Editor and rather new an Editor called BlueGriffon which also does ePub and maybe his custom WebBooks format.


Tldr is that corporations don't want a standards body with any public input so they created their own competing body, strangled w3c, and are now saying w3c is limited to being a rubber stamp for the standards they create.


This is not remotely the case. W3C used to be the reference, up until when they obstinately championed XHTML 2.0 that nobody wanted to write. Meanwhile, browser vendors wanted to implement all the fun stuff that web apps need but were being blocked by the W3C's glacial speed. So they just founded the WHATWG for collaboration and standardization, and since then W3C has beed ripping off the WHATWG's standards but always with inexplicable alterations.


This isn't accurate; notably, the WHATWG is the standards body that is actually open to public input (see https://whatwg.org/faq#process). Whereas to give input to the W3C, you have to pay membership fees (https://www.w3.org/Consortium/fees; between $2250 and $77K depending on company size for the US). This latter model is commonly referred to as "pay-to-play" standardization.


You can give input to the W3C without that, but yeah, it's the input from the paying members that counts the most.

Additionally, the W3C has some private mailing lists, does private working group meetings, and so on. So yeah, the WHATWG is a lot more open when it comes to input.


> Additionally, the W3C has some private mailing lists, does private working group meetings, and so on.

For all the web platform stuff (I don't know about the semweb and digital publishing side of things), you can basically ignore the existence of the private mailing lists (they get basically no traffic, and people like me start screaming whenever anyone tries to have a technical discussion there). I'd rather they didn't exist, but realistically they're not a barrier to participation.

The F2F meetings and telecons are more problematic (because both are inherently exclusionary, either in time or in travel), but those are at least publicly minuted and any resolution from them can be overturned.


Unfortunately that's over-simplification of the problem.

10-15 years we were told that use of <table>s for layout purposes was terribly wrong. Without any reliable alternative mechanism.

Here is my proposal to W3C CSS WG to add flow property and flex units: https://www.terrainformatica.com/w3/flex-layout/flex-layout.... It covers as flexbox as grid features under the same mechanism. Yet it establishes robust framework for other layout methods.

Note it is dated April, 5, 2009. It took us almost 10 years to have something matching that.

So neither W3C process nor WHATWG process is perfect - there is no reliable feedback from customers (web designers) to browser vendors.

What if browser would provide us just abstract DOM with minimal HTML/CSS implementation and some extensibility/plug-ins mechanism with something like Java.

In this case we, the community, can provide better implementations:

   main {
     flow: HolyGrailLayout(params) url(/layouts/HolyGrail.class);
   }

   <main>
     <header>  
     <footer>
     <aside>
   </main>
So instead of waiting for the weather for 10 years we will do something literally tomorrow when we need it.

In this case W3C will be able to perfectly fit its role - high-level management of all this.

No one can manage the reality up to need of toilet paper in full. USSR tried and where did it go?


> What if browser would provide us just abstract DOM with minimal HTML/CSS implementation and some extensibility/plug-ins

That's what Houdini layout extensions are about, if that work ever comes to fruition that will indeed be a pretty major advance for web design.


It's interesting that a formal objection is done by creating an issue on github.


A sign of the times perhaps? It makes sense given the repository, but of course, it does make it challenging to verify the authenticity of the request when, for all we know by looking at the messages, they could be random users.


Any group discussion could be "random users" to an outsider. To people at W3C, or even to people who just follow web standards development, the names are pretty instantly recognizable, and presumably since these accounts have been added to the W3C's GitHub org, W3C feels confident that they are who they claim to be.


It does say that they are members of the W3C organization.


So could I.

I mean, I see what you mean, but to a casual observer that's not very clear.


Your objection is funny. Other methods of communication could be faked. Here you have Github as an authority to check membership and the organisation's repositories as an official place of discussion (perhaps equivalent to the website).

You do have to trust that the org is legitimate, but you could also fake a website or a whole organisaion.


GP is referring to the "Member" badge next to the commenter's name, not to any claims made in the comments themselves.

You could claim to be a member of the W3C, but unless you actually are that badge won't show up next to your name in the issue tracker.


The GitHub org member list is not going to be an authoritative list of all delegates from all members that all have GH accounts. One of our W3C participants conveyed our objection and he isn't in the GH w3c org member list.


We know that Apple, Google, Microsoft, and Mozilla, consider the WHATWG to be the canonical version. We don't yet know whether the other 450+ W3C member organisations that represent the wider web platform agree with this position or not though.

Honestly, do the other 450+ W3C member organizations matter? (https://www.w3.org/Consortium/Member/List)


> Honestly, do the other 450+ W3C member organizations matter?

Well, the four objectors are responsible for browser engines that cover somewhere between 95 and 99+% of browser usage, depending on which set of stats you use and whether you count other browsers that have Chromium or Firefox upstream, including the system browsers of every major mobile and desktop OS.

So, no, in practice if those four agree on something, it is the way the web works.


Remember that Apple just implemented the Canvas tag. If you implement something good perhaps almost none of the W3C member organisations matter in pushing something.


Can you explain what you mean? According to caniuse[1] it has been supported for a while.

[1]https://caniuse.com/#feat=canvas

*EDIT I misread the parent. I didn't put the context for that paragraph together and read 'just' as in 'right now' instead of 'just decided to'.


Apple was the originator and original implementer of the canvas tag, in support of a feature they added to their Desktop operating system in 2005.

Prior their implementation, canvas was not a tag that existed, nor supported in any version of HTML at that time, but it was later incorporated into a new version of HTML. Apple was a minor browser developer as well as a minor OS developer and minor system designer at the time.


I was confused as well, as Apple literally invented <canvas>.

I think what he meant to say was "Remember that Apple just went ahead and implemented the Canvas tag. [without waiting on any standards organisation]"


Yeah, I agree w/ your reading of the paragraph.


(sorry)


The canvas tag (and canvas APIs) were first developed/shipped by Apple in tiger (so more than a decade ago) so yes, it has existed for quite a while :)


And note that the parsing of the canvas element was changed in a backwards incompatible way compared with how Apple originally shipped it. (It was originally a void element, with no closing tag, like img. It was changed to not be, which made the rest of the page vanish into the canvas element.) Standardisation isn't always plain sailing.


Hurray for the oligarchy.


> Honestly, do the other 450+ W3C member organizations matter? (https://www.w3.org/Consortium/Member/List)

Given that the W3C charter requires actual implementations in order for a standard to move forward, I'd have to say that no - entities other than those who might produce a significant implementation probably don't matter in this case.


The W3C Process considers all implementations equivalent: if you were to implement DOM 4.1 in Python, that would be as significant as a browser implementing it. That said, each group has to define what they'll consider "sufficient implementation experience" for each spec when they publish a Candidate Recommendation: the DOM 4.1 spec does not do this, and this forms part of Apple, Google and Mozilla's objections to the spec advancing to CR.

The DOM 4 implementation report, http://w3c.github.io/test-results/dom/details.html, is based on the tests in web-platform-tests: however, the web-platform-tests policy is that we test what browsers implement, and hence the DOM tests there are based on the WHATWG spec: there's no evidence provided that anyone has implemented what the W3C spec says in any case where it differs.


No.

They could matter, if they built a browser with competing market share. It's not a static equilibrium.


Did you look at the list of members? Who on the list is going to be building their own browser and what would the business case be? Out of the big four, only two of them even thought it made sense to build a browser from scratch. Apple and Google started off with KHTML.

EDIT:

To clarify. Who is going to be building their own rendering engine instead of taking an existing one - 3 of the 4 are open source - and building a browser on top?


The lineage of the rendering engine doesn't matter. What gives Apple and Google and Microsoft command of the standard is the fact that they mediate access to web pages; they own th actual customers. That's what matters.


Firefox doesn't own the customer.

But creating a rendering engine from scratch is hard and there is no business case for anyone doing it from scratch. Apple didn't (they tried with CyberDog ages ago) they used KHTML to create WebKit. Google didn't either, they started with WebKit. Opera gave up on their own rendering engine years ago.


Why is "creating a rendering engine from scratch" the bar here? A new player could fork Blink or WebKit.


If you fork an existing rendering engine, you're rather implicitly using a WHATWG DOM. In order to use W3C DOM 4.1, you'd either have to modify a WHATWG DOM renderer into a W3C DOM 4.1 renderer -- which is a bit like starting with the emacs source to build a vim clone -- or to build your own. The inertia of forking an existing project pushes you towards the WHATWG implementation, not W3C. That's the point of forking. You get to preserve the forking.

Further, the W3C's argument historically has been to make the DOM easier to implement from the ground up (XHTML strict) compared to the overall rats nest of HTML5, which, as far as I'm aware, still is not fully supported anywhere [1] and is very loose about document structure errors etc. So, if you're planning to implement the W3C's DOM, it makes sense that you're agreeing at least somewhat with the W3C's historical philosophy about what the web should look like and how it should behave, so you're more likely to be concerned about the implementation difficulty of HTML5.

1: https://html5test.com/results/desktop.html


I think you have the cart before the horse here. WHATWG is just a codification of what is. If we didn't have WHATWG, you wouldn't be freed from the burden of supporting current webpages, you just wouldn't know what that burden entails. At any rate, refactoring Blink to meet W3C's spec has to be easier to write a renderer from scratch, even if you don't care about writing noncompliant webpages, or as they're usually known, webpages.


If that's the case:

1a) Why would it matter that Microsoft, Google, Mozilla, and/or Apple object to W3C DOM 4.1 if they don't implement it?

1b) Why would Microsoft, Google, Mozilla, and/or Apple care enough to object to W3C DOM 4.1 if they aren't implementing it? Why would they even give any effort to a competing specification and just allow it to die from inactivity?

2) Why does what is in W3C DOM 4.1 matter if the high 90s percentage of users are served by a browser in the WHATWG DOM camp? This could probably be condensed down to "Why do W3C's specifications matter at all" really.


> Why would it matter that Microsoft, Google, Mozilla, and/or Apple object to W3C DOM 4.1 if they don't implement it?

It matters to the utility of the W3C DOM spec that it doesn't represent either what browsers have implemented or what they will implement.

> Why would Microsoft, Google, Mozilla, and/or Apple care enough to object to W3C DOM 4.1 if they aren't implementing it?

They care enough because they want the W3C, if it is going to write purported web standards, to do something that won't confuse developers and lead to browser vendors fielding complaints from developers who mistake useless W3C documents for something meaningful.

> Why would they even give any effort to a competing specification and just allow it to die from inactivity?

They don't want to have a competing specification, though they do not seem opposed to having a specification with a different focus but consistent with WHATWG to the degree dictate by the purpose.)

> Why does what is in W3C DOM 4.1 matter if the high 90s percentage of users are served by a browser in the WHATWG DOM camp?

The idea is not to have opposing camps, though if W3C insists on making it an opposing camps situation, thst becomes a real issue.


Exactly. They don't want any competing products so preventing the development of a standard they don't control is obvious.


If this is about preventing competition in the browser space, then why doesn't one of the browser makers with lower market share defect from Google's position?


No, you start with a WHATWG DOM. What you do after that is up to you.


Creating a browser isn't the only part of mediating access to web pages. In different senses, Digicert, Comcast, Akamai, and Cisco do that as well.


Fair point, but I think it is somewhat orthogonal to the discussion. I don't imagine Cisco cares which DOM spec is used in rendering the application layer bytes of the packets it routes.


The browser is the only part of the web-access stack where the user has any choice. I suppose they can choose the ISP as well but that effectively only changes access speed, possibly.


Nonsense. Web standards are supposed to be de jure, not de facto.

Once upon a time Microsoft had 90% of the browser’s market. We created web standards in order to prevent monopolies, such as the former IExplorer, from holding the market hostage. That’s the whole reason behind web standards.

And yes, they matter even with an IExplorer that has 90% market share, because governments can and do enforce adherence. That’s also the reason for why Microsoft came up with OOXML, ODF being a threat even with a tiny market share.


It doesn't matter what is suppose to happen. In reality. If none of the popular browsers support the standard - it doesn't matter.

Governments are not going to force every major browser manufacturer to support a standard.

That's why W3C lost relevance.


Web standards are supposed to be de jure, not de facto.

HTML5, in large part, was created to do exactly the opposite -- formally set down in writing all the de facto quirks of HTML as actually used, parsed and rendered in the real world, instead of continuing to prescribe behaviors which didn't match observed reality.


You and I lived a different history then, because if what you're saying is true, then ActiveX should have been standardized.

We've got no ActiveX, so your claim is false. Mozilla actually could implement ActiveX. They refused to do so.

Also, lets not forget that IExplorer 6 had incompatibilities with the standard, including XMLHttpRequest, even though Microsoft invented it.


For the most part it was documenting the common subset of how the browsers actually worked. Only one browser implemented ActiveX or ever wanted to so it isn't in the spec.


Nothing in your comment actually refutes anything I said.


"Who is going to be building their own rendering engine instead of taking an existing one"

Just for the note: I did - https://sciter.com

It was not meant to render all possible pages from Wild World Web but it renders HTML5/CSS3 (some subsets but still).


Impressive! So what are your thoughts on the technical merits of W3C's DOM approach? You agree with Google/Apple or is this just a case of them using their power to lock down the market?


Awesome! How would you compare your product to Electron?



TL;DR: It's small and performant.

But the thread is an interesting read.


Holy Crap that is bloody impressive!.


> if they built a browser with competing market share.

As if building a competitive browser isn't hard enough. You then have to convince people to use it. Considering the walled gardens 3 of the big 4 are erecting around the platforms the control, that seems neigh impossible.


Users should have a voice too.


We definitely believe users should have a voice in the WHATWG, and thus in guiding what browsers implement. We strive to maintain an open and welcoming community; this has brought a lot of good ideas to the table.

A few years ago I gave a talk on this. https://www.youtube.com/watch?v=hneN6aW-d9w . I hope it's not too embarassingly outdated now :)

In particular, unlike the W3C, we do not require membership fees (https://www.w3.org/Consortium/fees?countryCode=US&quarter=04...) for participation.


Right, that's my point :) But thanks for making it clear for those reading.


They do: they pick which browser to run, and thus give power to.


That's not a free, expressive choice. One shouldn't expect that the spectrum of browser maker's choices align with user's preferences. Also they might use a browser because a website requires it, not because it aligns with their preferences, or because it was built into their phone, etc. etc.


Hmmm... Do you remember IE6 days? It was the best browser at that time - at least in respect of user base.

Yes, they did a lot of innovations there we all use now. Most notable - the whole AJAX idea was born there.


I was about to say no, but then I clicked the link and the first entry I saw is an organisation I happen to know (Access-for-All, Swiss Foundation), a foundation dealing with accessible technologies. They're doing a lot of good work with educating developers here in Switzerland.


There are lots of smart people doing important work that are also on the W3C, but that doesn't make them truly influential in de facto web standards.

From what I've read (which is not that much), the relationship between accessibility advocates and web standards has been particularly fraught, with advocates pushing for standards features that are received poorly in the marketplace. The argument here being: not every good idea about HTML is best expressed as a fundamental part of the HTML standard.


I came here to ask the same thing! If all the major browser vendors think one way, then it’s irrelevant how many people disagree with them.


Serious question: why are the W3C still publishing or trying to publish standards for DOM and HTML and probably a few others, when no one that matters cares about them? Why not rather throw in the towel on those particular standards and acknowledge that the WHATWG has won on them?


> Serious question: why are the W3C still publishing or trying to publish standards for DOM and HTML and probably a few others, when no one that matters cares about them?

There is a potential legitimate role for the two-track approach, if WHATWG represents a moving target of what browser vendors have agreed to implement and essentially is the vehicle for documenting hmthe agreed future common web platform, and W3C presents a versioned publication of the stable, widely implemented, currently usable state at a particular point of time; the W3C version would then be the target for conservative app developers that need something that works everywhere today, the WHATWG standard would be what people making browsers and other user agents would target, and what more ambitious developers willing to deal with “can I use...?” pitfalls would be guided by.


Why cant people just use an older copy of the WHATWG standard as the new "stable" documentation? This is basically how caniuse.com and browserslist work today, allowing developers to precisely describe their compatibility targets and even automate their builds.

It seems unnecessary to have an entirely separate organization to just copy/paste/publish new "versions" of existing archives.


> Why cant people just use an older copy of the WHATWG standard as the new "stable" documentation?

Because the order of incorporation into the standard and the order of implementation and stabilization aren't the same, and some fesutres may be implemented incompletely in some browsers, so that what is stable and usable is a subset of features (and sometimes a subset of functionality within a particular feature) that doesn't correspond to any particular version of the LS. So you'd need manual curation.

> It seems unnecessary to have an entirely separate organization

Perhaps, though the audience and thus interested parties for the implementor-focussed spec and the developer-focussed spec are different.


The WHATWG is actually the only organization I know of that publishers a developer-focused specification; see https://html.spec.whatwg.org/dev/. (We only do it for HTML currently.)

Anyway, I agree with the grandparent poster that caniuse.com is a much better approach to documenting the interoperable subset than copying and pasting someone else's spec, and trying to delete the parts that are not interoperable by some threshold. We actually have caniuse.com boxes in the margin of the HTML Standard: see for example https://html.spec.whatwg.org/multipage/scripting.html#attr-s...

Finally, it's worth noting that we only incorporate features into WHATWG Living Standards if they have multiple implementer interest; see https://whatwg.org/working-mode#additions


I don't really understand how this would make the two-track approach legit. Couldn't they just version the specification under WHATWG if that's what they're after?

I just can't see a reason for the W3C to be handling any of this anymore except for money reasons.


In fact, we already do publish commit snapshots for every change we make: https://dom.spec.whatwg.org/commit-snapshots/


Then what is the purpose of the W3C in this case?

Many years ago I tried to get a membership to the W3C as I wanted to provide a voice for a company I worked for (and for myself, honestly) but found out that the lowest level of membership was many thousands of dollars. How can anyone who isn't already very well established ever be properly represented there?

Then you check out WHATWG and, as far as I can tell, there are never fees associated with being a member and participating.


Yeah, we try to make the WHATWG a welcoming place for all, with no pay-to-play structure. Please feel free to provide your voice there! We've gotten a lot of good community contributions and ideas.


Thanks. I plan though :)


W3C process dictates that two implementations exist, not that all major browsers implement the whole standard. Thus, a W3C fork of the standard is of no practical use to “conservative app developers”.

If you need to target specific (probably legacy) browsers, you check caniuse.com. By the way, the WHATWG HTML standard integrates little boxes with caniuse.com data. That is indeed useful to developers.


It's especially amazing that there's a comment asking what the WHATWG will give up in return if W3C gives this up... as if this is some kind of battle truce.

These people are living in wonderland and need to wake up to the reality that there's already a winner and the war is long over.


It definitely needs to be some curation at least.

As an example, CSS: Google came up with Flexbox, Microsoft came up with Grid.

Now we have two competing layout methods doing pretty much the same. Yet they are conflicting in the sense that define the same flexibility entity by two different means: flexbox uses CSS property (that by itself conflicts with CSS 2.1 box model) and grid uses fr units for defining the same flexibility concept - portion of free space left in container from other fixed elements in it.

Problem is that all browser from now on shall follow this mess.


Flexbox and Grid coming from different companies isn't really relevant (and, to note, Flexbox comes from Mozilla originally, if I'm not mistaken, it is ultimately based on parts of XUL).

They're also not competing: flexbox makes many 1D layouts easier than grid does. They're complementary, not conflicting.


Yes, flexbox was an old Mozilla XUL's feature (<vbox>/<hbox>) where flexes were defined by attributes. We all agreed at the moment that having presentation attributes in markup is nto that good idea. And that CSS flex was no-brainer port of that thing by replacing DOM attributes by bunch of CSS properties.

Problem is that flexbox ruins CSS box model that mandates that width CSS property is what defines the width of inner box of the element. Now they have flex-basis that if defined in galaxy far, far away overrides that width by something else.

That above already recognized as a mistake: https://wiki.csswg.org/ideas/mistakes

As of that 1D ...

grid-auto-flow: row | column;

makes flexbox obsolete at great extent.


> Problem is that flexbox ruins CSS box model that mandates that width CSS property is what defines the width of inner box of the element. Now they have flex-basis that if defined in galaxy far, far away overrides that width by something else.

I'm not quite following; may you please explain this? Thanks.


Having this CSS:

    .flex-container {
       border: 1px solid #555;
       display : flex;
     }

     .flex-container > span {
       display:block;
       border: 1px solid #900;
       flex:1;
       width:100px;
     }
and this markup:

    <div class="flex-container">
      <span>Foo</span>
      <span>Bar</span>
    </div>
what would be the width of each <span> there?


Well, flexbox was much simpler and easier to standardize on and implement sooner. It has been available in all major browsers since 2015.

Grid is great, and now works in most places, but it was good to have flexbox while grid was still in process; it only became available in Edge, in the final form, late last year, making it now available on all major browsers.

You're always going to have cases like this; where there's something that's simple that works now, and something better that comes later. If you always wait for the better one, it will take forever for things to get done. Not to mention that flexbox is probably simpler and easier to understand and use for some of the simple use cases, so people will still probably continue to use it despite the fact that grid is available.


Problem is that flexbox is a subset of grid. Or to be precise: flexbox and grid are just two forms in a set of layout methods that we already have and will have in future.

In normal architectural process we would establish first common infrastructure for all layout methods.

I've proposed something like that 9 years ago: https://www.terrainformatica.com/w3/flex-layout/flex-layout.... but it didn't go through (Who am I and who are browser vendors?)

So we would have single property that defines layout methods:

   display:block;
   flow: horizontal; // flexbox now
   flow: vertical; // ditto
   flow: grid(
           rows: ...,  
           columns: ...
         );          // grid 
   flow: multi-column( columns: ... ); // current multi-col
   flow: stack;
   flow: row(label,input); // variant of grid 
So flexbox and grid are just parts of larger entity - set of current and future layout methods. Yet flexes has to be units as they a) were from the very beginning ( see "proportional" units here: http://www.w3.org/TR/html401/struct/tables.html#h-11.2.4.4 ) and b) can be used in other layouts and properties (why not margin-left:1fr ?)

Note that each layout method has its own parameters in their own namespaces.

Currently set of CSS properties is about 400 in one flat namespace. Any junior architect will tell you that this is close to unmanageable state. But we still pushing new stuff on that x'mas tree. It will fall down by its own weight as some moment, but who cares ...


I use both flexbox and grid layout together. They are not comepting, merely completing each other. Grid can do things Flexbox can't do. Flexbox can do things Grid can't do.


"Flexbox can do things Grid can't do"

For example?


flex-wrap. flexbox is one-dimensional, so it can "wrap" items that don't fit to the next line. The wrapped items don't have to line up with vertical grid lines, like they would in a grid, and they can be stretched or centred to fit across the full width of the parent.


How that is different from a sequence of display:inline-block's (horizontal wrap) and multi-col layout (vertical wrap)?


I assume you're talking about CSS columns? Again they're not designed for layout - it's designed specifically for newspaper style columns of text where it doesn't matter which piece of text flows into which column. To use it for layout requires hacks or luck to adhear closely to designs. Pre-grid and flex I spent many hours of my life dealing with oddities around CSS columns - my life is in a much better place now not having to resort to it!


Like I said, stretching and centering. The justify-content property works on each row of a flexbox, so you can have a flexbox full of items with different initial sizes, automatically wrap the items that don't fit to multiple rows, and distribute the space around each item evenly. You can't do this with flow layout.


inline-block isn't designed for layout - so you get side effects like the preservation of whitespace which you have to use hacks to resolve. Sure it works - but flex-wrap is a syntactically correct solution that doesn't require hacks.


It really is all about not losing face.

W3C/TBL had “owned” HTML and DOM for way too long to just acknowledge that they botched it and that their work on that has no practical relevance any more.

“We are the organization that provides infrastructure for the standardization of stylesheets, and also some awesome-in-theory semantic web standards that are too complicated for actual implementations, and also some XML standards that are actually relevant for DTP” doesn’t seem to be the mission statement of choice for the creator of the web.


Politics and pride, mostly. It's really rare and difficult for an organization to voluntarily admit that it has no purpose anymore and dissolve itself. That's what the W3C should do, but the realities of human psychology mean that's unlikely to happen.


Interesting discussions from a year ago: https://www.reddit.com/r/javascript/comments/5swe9b/what_is_...

The W3C is very well funded and they don't wanna see the money gone.


It's sad to see the relationship between WHATWG and W3C has deteriorated to this point. Trying to wrangle a standard from a "living" (i.e. constantly changing) specification was always going to be tough but I'd have hoped both WHATWG and W3C would be able to maintain a working relationship.


Is there an article with the background on this? Why do we have both the W3C and the WHATWG, and why do the W3C just copy and paste work from WHATWG, if that is indeed what happens?


I don't know of an article, sorry. A brief history from memory would be that during XHTML days the W3C essentially let the HTML spec languish and people weren't moving to XHTML (at best they were moving to XHTML-like HTML).

So the WHATWG came along (mainly organised by the major browser vendors) and started the HTML spec moving again. This became part of what's known as HTML5.

However WHATWG doesn't exactly make a "standard" it makes a "living standard", which is a constantly shifting document which aims to describe where browsers currently are and what they hope to implement. The W3C decided to keep publishing its own HTML specifications and, as the WHATWG does describe what browsers are trying to do, the W3C's spec has to build at least partly on that work. There are differences though. For example, the W3C requires at least two implementations of a feature for it to be included in their spec.

The WHATWG has always opposed the W3C's spec. They see it as confusing to have two "official" specifications.


To put a slightly different spin on the same story as perspective always colours the telling:

W3C decided to deprecate HTML in favour of XHTML. Most of the web quickly moved to XHTML. One individual (an employee at Opera, then Mozilla, finally and currently Google) wrote an oddly influential opinion piece saying the the move to XHTML had been somehow harmful and pushed for the major browser vendors to form a rival non-democratic standards body (WHATWG) to the W3C, which forked and completely redefined HTML.

The W3C, which unlike the WHATWG has many voting members from many backgrounds, not all related to browser making, quite understandably was never fully on board with the new WHATWG HTML spec efforts. However, with the level of adoption and support it received (mainly from being the creation of the powerful browser vendors) W3C were eventually pressured into conceding to advocate for HTML. Which they've done by maintaining a copy, rather than blindly directing people to the work by what for all intents and purposes effectively amounts to a rival organisation, and an extremely undemocratic one at that.

As web developers, we should follow the WHATWG and ignore the W3C, because the W3C have lost the political battle for HTML and we need to get our stuff working on browsers, all of whom follow WHATWG. But that's an unfortunately pragmatic approach that shouldn't amount to acceptance.


> Most of the web quickly moved to XHTML.

This simply is not true. The web moved to an XHTML-like dialect of HTML which was still served as text/html and browsers interpreted it as "HTML soup" because actually serving pages as application/xhtml+xml would have broken the majority of the web because browsers would actually validate them and refuse to display a page at all if there was even a single missing close tag.


> This simply is not true. The web moved to an XHTML-like dialect of HTML

You're thinking of XHTML 1.1 or XHTML 2. That "XHTML-dialect" that everyone switched to was called "XHTML 1.0", which allowed serving as either content type.

If you're choosing to nitpick about the fact that most sites published would not have worked if served as application/xhtml+xml, I'd invite you to do a survey of sites currently being served as valid HTML5. It's not even that easy to verify as the Nu validator version in use varies so much depending on where it's hosted (or if it's a local jar), and which iteration of the living standard it conforms to is always ambiguous. Have you tried reading the WHATWG spec diffs?

The burden on devs who might like to adhere to any kind of strict automated verification of spec. conformance is now out of the question. With XHTML, even if you were serving non-well-formed XML with a text/html content-type, at least your markup could be trivially checked for conformance by almost any XML parser to see why it's not well-formed. It was actually conceivably viable to put that check into build steps or CI.

Serving application/xhtml+xml was a nice to have, but anyone believing that serving XHTML as text/html had no value completely missed the point. At least now, years later, the mess we're stuck with should make it a little easier to see though.


OK, so by lucideer's quirky definition of "XHTML", the vast majority of the web moved to XHTML. Based on the expansiveness of lucideer's definition, this appear to have encompassed web developers who probably weren't even aware they were writing "XHTML".

By the definition that most of us are using, which is that XHTML is complaint XHTML that could be rendered without error in browser's XHTML modes, to a first approximation nobody ever did it. Even today XHTML-levels of precision in HTML requires an awful lot of API support and very careful usage; doing it ten years ago was above almost everybody's skill level.


> by lucideer's quirky definition

Which also happens to be the definition the w3c xhtml 1.0 spec. You can choose to think that's quirky, please don't attribute it to me.

> definition that most of us are using, which is that XHTML is complaint XHTML that could be rendered without error in browser's XHTML modes

Which, again, is the definition used in the later w3c xhtml 1.1 & 2 specs, the former which wasn't widely used, the latter which was abandoned without being published at all.

If your issue with XHTML was that W3C were moving towards a direction you disagreed with, then you don't have an issue with the version of XHTML that was in popular use.

> XHTML-levels of precision in HTML requires an awful lot of API support and very careful usage

I'm not really sure where this view comes from. HTML validation is a lot more complex and difficult to achieve than XML well formedness, and HTML4/XHTML1 validation were both far simpler than modern HTML5 validation (the Nu validator is inordinately complex in comparison to the older DTD one). Furthermore, dev tools for ensuring XML well-formedness are far more readily available and integrated into most things even today, while HTML5 validation is such an obscure concept today I'm sure many devs don't even know it's a thing.


Which also happens to be the definition the w3c xhtml 1.0 spec. You can choose to think that's quirky, please don't attribute it to me.

Except that the number of people who actually implemented valid, well-formed, properly-served XHTML Strict -- of any version -- in compliance with all the relevant specifications is at best vanishingly tiny. XHTML Transitional was tag soup.

Your retort further up about many sites serving invalid HTML5 actually works against you, since HTML5 explicitly has a forgiving parsing model, while XHTML is explicitly "every error is a fatal error". If browsers had enforced the XHTML approach on every document using an XHTML DOCTYPE, we would have seen the death of XHTML much earlier.

This is why people say XHTML was never really adopted -- many people certainly put a "/>" to close their empty elements, and stuck an XML prolog and an XHTML DOCTYPE up at the top, but surveys like the infamous "XHTML 100" showed that next to nobody actually adopted XHTML in a manner compliant with the relevant standards.

And I say this as someone who, way back in the early 00's, was serving valid, well-formed XHTML as application/xhtml+xml. XHTML was a terrible approach, and the W3C process was dragging farther and farther from practicality at every revision (remember XHTML 2.0?).


You're taking about the ease of validation. Everyone else is talked about the ease of writing.


Oh, you mean like AMP, which now every major site supports, and which is even stricter than XHTML?


Turns out, when there's financial incentive to use strict syntax, people will... guess that's all XHTML lacked...


I mean... yeah? "there must be an actual benefit to do something that costs me development time = money". XHTML did not offer this.


> If you're choosing to nitpick about the fact that most sites published would not have worked if served as application/xhtml+xml, I'd invite you to do a survey of sites currently being served as valid HTML5.

Completely different thing. XML processing and all reasoning based on the premise of XML processing are fiction when XHTML is served as text/html. The HTML parsing algorithm and tve rest of the processing requirements is not fiction when HTML is invalid.

(Why are we still talking about this in 2018. Sigh.)


Because someone asked and it explains some of the history quite nicely.

FWIW, maybe I'm the 1% but I wrote valid XHTML 1.0 for a while, but also soon gave up :P


I did server side browser sniffing to give IE the version it understood (IIRC it couldn't handle well-formed XHTML served with the proper mime tag, not sure, it's been a while :D) while everything else got proper fully compliant XHTML. I'm pretty sure I used a code snippet from Anne van Kesteren who is also posting here ;)


And the versions of IE in use didn’t support application/xhtml+xml anyway so you would have to switch to text/html based on the user agent string.

It was never clear what the technical benefit of this was supposed to be. I only ever saw one site whose pages served double duty as an API and UI by serving styled XML. It seemed like a challenging approach to pull off well.


I wrote an XSL stylesheet that turned an HTML page into a pretty-printed and syntax highlighted display of its source code.


The shoe web site skechers.com used to do this. With the removal of XSL support from browsers, though, it looks like it's now using some form of JS templating.


The Gentoo website / handbook does this.

Or rather, did this a few years ago when I was last messing around with Gentoo. It seems to be HTML now.


The Handbook was interesting in that it was one of the few sites that actually went with the XML + XSLT = XHTML route. Of course, nobody knows XSLT, and everyone hates XML, so it was dumped in favor of MediaWiki, which everyone still hates, but at least now mostly understands how to use. (although the same people that insisted we use XML+XSLT also insisted we use SMW, which is even worse... I gave up then, but I hear they're trying to undo SMW now.)


Interesting!

What's SMW? I'm not familiar with the term and searching for it isn't being particularly helpful.


Semantic MediaWiki


Oh I remember this period of time and damn this is true. I think about 80% of the pages on the web during a certain time period had that tramp-stamp of XHTML Validated button somewhere on the page.


Only the cool sites :)


And you couldn't use target="_blank" in anchors...


I remember the fierce battle in my mind trying to judge whether I wanted a proper strict xhtml page, or I wanted external links to open in a separate window... This was literally what drove me away from strict xhtml. Everything else I was on board for at the time.


I'm not certain that it's true to say most of the web quickly moved to XHTML. Sure a number of sites advertised themselves as XHTML but they were not strictly XHTML compliant. This could be due to third party widgets or other included code or it could be due to a mistake in template construction. Whatever the issue fully compliant XHTML wasn't used much in practice outside of hand-crafted pages.

Also Internet Explorer, for example, never implemented XHTML which would have been a deal breaker for many sites.


> they were not strictly XHTML compliant

The vast majority were not strictly XHTML compliant, but whatever the figure was, I'd imagine it wasn't too different to all the many "strictly HTML compliant" sites now (compliant according to which commit?).

The point was they used XHTML, which means they could trivially choose to validate and test their XML well-formedness with built-in tools everyone had ready access to. Exposing your end-users to those conformance checks (i.e. the in-browser strict XML-parser) wasn't the only "value" offered.


The point was they used XHTML, which means they could trivially choose to validate and test their XML well-formedness with built-in tools everyone had ready access to.

I'm honestly trying to figure out whether this is satire or not.


That seems to ignore the conventional wisdom that according-to-Hoyle XHTML was a DOA standard because it mandated error handling in ways that no browser implemented and most of the authoring community didn't want. Authors don't write well-formed XML, even today.


> it mandated error handling

XHTML 1.0 didn't mandate so-called "draconian error-handling", it just offered it as an optional feature.

XHTML 1.1 (which was released but noone used) and XHTML 2 (which was never finished nor released) did mandate it. I wasn't a big fan of that decision, I don't think it would've worked, but XHTML 1.1 was still very usable while ignoring that one requirement; throwing out the baby with the bath water was a massive overreaction on the WHATWG's part.


XHTML, other than Transitional, which nobody should count as "implementing XHTML", is an XML application. It inherits XML's parsing model. Every error is a fatal error.


>because it mandated error handling in ways that no browser implemented

IIRC Opera implemented XHTML error handling.


IIRC several major browsers implemented XHTML error handling, but only for documents with a Content-Type: application/xhtml+xml header, which was basically nothing because that would then trip up other browsers


Opera had the "draconical" approach, where upon the error you just had that, an error. Firefox, iirc had a softer approach where you still got the page rendered, but you'd get the error reported too. Anyway it all depended on the proper MIME type for the XTHML (as it should). However the whole MIME type and everything associated with it (some elements and APIs are treated differently) is a whole barrel of worms, so XHTML in any of the incarnations was never a good idea.


> "draconical"

That's xml error handling, following rules as written.


"Draconian" error handling is a term of art in HTML.


> not all related to browser making, quite understandably was never fully on board with the new WHATWG HTML spec efforts.

That's the other thing that pisses me off about the WHATWG, is how much they shit all over XML and other interoperability technologies. E.g. their URL standard (because, why not fuck the IETF as well) basically ignores anything non-HTTP for specious reasons.


The only reason there is a URL standard in the WHATWG is that the IETF URL RFC didn't define error handling and this led to interop problems. So there was a need for a URL standard that _would_ define error handling. The IETF refused to produce one (basically said "fuck you, we don't care about your use cases or interop problems" with slightly more polite wording), so the WHATWG ended up doing it...

I'm not saying this is a great situation. I'm not saying the WHATWG couldn't try to do better at considering non-HTTP or non-browser use cases here. But the representatives of those use cases in the IETF told browsers to take a hike. And then browsers did.


The first line of the WHATWG URL spec says that it deprecates and replaces all IETF URL standards.


For its target audience (browsers and web pages) it does.


A number of folks involved in WHATWG work bought into the XML vision initially, but reality has a strong text/html bias and we've been able to adjust views as experience has accimulated.

See https://annevankesteren.nl/2011/02/xml-tired

(Personally, the first time I managed to get funding to work on a Web engine was to make Gecko's XHTML-as-XML support better. At the time, I thought it was so important that I sought funding to get it done...)


What do you mean by non-HTTP? It handles URLs whose scheme is not http(s): just fine...


I forget the specifics, but there are several incompatibilities between the IETF URI and IRI spec and the WHATWG URL spec (see [1], EDIT: as I'm sure you're well aware, given your username). The WHATWG spec amounts to "what four popular web browsers do", explicitly without considering compatibility with the hundreds (thousands?) of other non-web-browser tools that make use of URIs.

What you've defined are effectively not URLs. Very similar, but different. If you wanted to call them "WHATWGRLs" or something I wouldn't care. But they're not URLs, and the WHATWG is choosing to muddy the waters rather than, say, specify an optional legacy compatibility layer on top of the IETF spec. It's one thing to say "in addition to IETF URIs, browsers should also accept these malformed URIs, but should not accept these valid but problematic URIs"… it's quite another to say "URIs aren't that anymore, now they're this".

[1] https://daniel.haxx.se/blog/2016/05/11/my-url-isnt-your-url/


I'm not sure I get the distinction. As for curl, it doesn't follow any standard which seems worse, but does at least helpfully demonstrate that the RFCs cannot be implemented by major clients.


> Most of the web quickly moved to XHTML

No, it didn't.

At best a large share of new, greenfield development moved to XHTML, but I'm not convinced it was a majority of even that.


> Most of the web quickly moved to XHTML

Ridiculous and absurd. A small proportion of the web moved to invalid XHTML that rendered as tag soup because it was sent as text/html. Virtually no websites actually served XHTML as XHTML, because: 1. there were and still are no compelling technical benefits of XHTML, 2. it broke Internet Explorer, 3. most webmasters were and still are incompetent and have no clue what Content-Type is.


>W3C decided to deprecate HTML in favour of XHTML. Most of the web quickly moved to XHTML.

In some parallel universes, yes.

Even if so, there's also the fact that XHTML wasn't updated itself with features people needed.


What features?

If you're referring to "features" in the HTML5 spec., like canvas, webgl, geolocation, DOM etc. they were separate specs, which WHATWG lumped into one monolith (though they're mainly JavaScript APIs, and aren't directly related to HTML). They were being worked on separately to XHTML, and still work fine with XHTML to this day.


>they were separate specs, which WHATWG lumped into one monolith

For which I could not care less. Whether there's a big spec for HTML5+JS APIs, or 20 different specs, is a bureaucratic concern, not a concern to the developers or the end users.

W3C might had them "neatly" separated, by it also haven't moved them notch towards completion and release for more than a decade.

I've used and worked for the web before W3C, in its heyday, in its long decline days when we waiting a decade+ for some progress, and after it become irrelevant. Now it's a way better situation.


> Most of the web quickly moved to XHTML.

Most of the web didn't move to XHTML.

A lot of people who were interested in being standards compliant moved to XHTML 1.0 Transitional, which was the HTML compatibility subset, but they only ever served it and validated it as HTML, not XHTML, because if you served it as XHTML, one single stray < that someone had forgotten to quote somewhere would break the parsing of the whole page.

The piece written by Hixie was influential because it was a wake up call that the direction the standards bodies were going in was pretty much fruitless, and that there could be a much better way to do it which wouldn't involve breaking compatibility with all of the existing content and would give web developers and users features that they actually wanted.

> As web developers, we should follow the WHATWG and ignore the W3C, because the W3C have lost the political battle for HTML and we need to get our stuff working on browsers, all of whom follow WHATWG. But that's an unfortunately pragmatic approach that shouldn't amount to acceptance.

I fail to see how there is anything unfortunate about this. What about rewriting everything in XHTML 2.0 (https://www.w3.org/TR/2010/NOTE-xhtml2-20101216/), and having to be extremely conscious of any possible stray < that could sneak in to a page without being quoted, would have been preferable to:

1. Consistent parsing support for existing content, and content that might have slight problems like stray <, in all browsers

2. Standardization of things that people actually use to build web apps, like XMLHttpRequest and Canvas

3. Consistent handling of encodings between browsers, including encoding sniffing

4. Consistent handling of quirks mode vs. standards mode between browsers

5. Actually having browsers support compatibility with vendor-prefixed versions of features, because some browsers widely used introduced prefixed features that web developers actually started relying upon

And also, have you ever tried getting involved with the WHATWG process? I have, and I find that they are very receptive to intelligent discussion of issues.

What doesn't work well is to insist that you have a problem and that this particular solution must be used to address the problem; because a lot of times, it's easy to come up with some proposed solution but it then turns out that it's either a lot more complex in practice, your proposal does't fit in with the rest of the ecosystem well, or the problem can actually be solve just in tooling on top of HTML without having to change the spec at all and then wait for multiple browser vendors to all independently implement it.


> and having to be extremely conscious of any possible stray < that could sneak in to a page without being quoted, would have been preferable to:

Any system that publishes content that would let this kind of thing pass is incredibly insecure, and shouldn't be on the internet. Today it's a stray <. Tomorrow it's a stray <script>

It's no wonder software is where it is today with attitudes like these.


Not if that < had slipped in because it was in a piece of static text in a string somewhere in the source code.

You can apply mandatory quoting to untrusted input all you want, but there are going to be times when you have trusted strings that can still contain stray characters that will make the resulting markup invalid. And in many cases you don't want to have mandatory quoting for all of that, because these strings may have markup you want to include.

And yeah, you can argue that instead of generating content by appending strings, you should be building up a proper type-safe DOM structure that can be serialized. I'll wait while you go boil the ocean of converting every single web application framework that exists now outside of a couple of obscure type-safe functional programming frameworks, and in the meantime I'll be able to browse the real web without every other page giving me validation errors.


To be fair, I only use obscure type-safe functional programming frameworks. That's what I'm employed to do, and this obviously impacts my feelings on the matter. Personally, I think it's irresponsible to use anything that could be this unsafe. This doesn't mean everyone needs to use FP, just that frameworks and libraries should be chosen so as to guarantee safety. There are easy-to-use libraries for all these things in every language.

In no other world of engineering is this attitude okay. If you were a civil engineer and had to hold a license to practice due to the danger your designs could present to society, this attitude would eventually cause you to lose your ability to practice. It's becoming more and more clear that software can have similar levels of impact, and software engineers should practice as scuh.


I agree with you that we do need to do better about writing more robust software, and type safe languages are a good way to do that.

But what you're saying is as if you suggest that since the metric system is more consistent and more widely used than the English, I as a bolt distributor should start selling my bolts in metric sizes, despite the fact that the nuts that everyone has are in English sizes.

The browser vendors, at least, are working on implement their browsers in more type-safe languages (https://github.com/servo/servo), but even still they have to work with the content that is produced by thousands of different languages, frameworks, and tools, and millions of hand written HTML files, templates, and the like. Just turning on strict XML parsing doesn't make that go away, it just makes your browser fail on most websites.


A good first step in enforcing web standards would be if browsers would detect these rule violations, and -- instead of failing -- put a giant banner on the top of the page warning end users that the site may be compromised and could compromise data.

Soon, every business will be clamoring to fix their buggy software, and users will still be able to access the unsafe websites they so desire.


You can be unsafe even with typesafe builders. See

fn build(text_to_show: &str) -> HTML{ HTML(Body(H1(text_to_show))) }

What if text_to_show wasn't sanitized? You got yourself a XSS. And if you do sanitize it (and keep it in a StrSanitized type), what are the chances of accidental XSS?

Really, what should have been done is a "user supplied tag", which automatically displays everything as plain text, like <user-supplied id="ahdjdh37736xhdhd"> Content </user-supplied id="ahdjdh37736xhdhd">


You would generally want the general purpose string type in your language to always be escaped when serializing, and only allow avoiding that if you opt-in explicitly.

So, for instance you'd have an H1::new(contents: TextNode) constructor, and you'd have to build a TextNode; if you build TextNode::new(text: &str), then it would escape it. If you wanted to explicitly pass in raw HTML, then you'd need something like HTMLFragment::from_str(&str), and it would parse and return the fully parsed and appropriately typed fragment object that could then be used to build larger fragments.

There might be some way to unsafely opt out, like HTMLFragment::from_str_raw(&str), that would just give a node that when traversed would just be dumped raw into the output, but that would be warned against and only used if you wanted to avoid the cost of parsing and re-serializing some large, known-safe fragment; it wouldn't be what you would normally use.


Your builder isn't really using types to guarantee safety. You can write untyped programs in a strongly typed language, by just coercing everything to strings, but this isn't what I mean when I say 'type-safety'.


> The WHATWG has always opposed the W3C's spec. They see it as confusing to have two "official" specifications.

The WHATWG has not always opposed the W3C's spec. The WHATWG explicitly agreed to work with the W3C to form an edited, snapshot spec based on the WHATWG spec. That's what HTML5 was supposed to be.

However, the W3C process then hijacked this, by dropping things from the WHATWG spec, adding things back that had been removed because they had never been implemented properly and implementing them wouldn't have been very useful, and so on. The WHATWG objected to this useless divergence.


> W3C process then hijacked this

While I agree the W3C's insistence on maintaining a parallel spec. is a silly idea they should absolutely abandon, I fail to see how any but the most biased perspective could conclude that they are "hijacking" a process of their own. W3C haven't dropped anything from the WHATWG spec.: that's separate and out of their control. They can drop what they like from their copy, it's their copy. Unless you're proposing that the WHATWG should be running the W3C, I'm not sure what you're getting at with the term "hijack". Surely you can't hijack your own thing?


Because there's no point in reconciling the specs if you don't actually reconcile them.

If the W3C spec is a snapshot, possibly of a subset, possibly with some editorial but not functional changes, then reconciling the specs is useful; it gives you want the W3C wants to provide, versioned, frozen specifications that can be used as the basis for other specs, for people to claim "full conformance" with a particular version, and so on.

Or, if the W3C process identifies real issues, then it should work with the WHATWG community to resolve those issues; since the WHATWG spec is being used as the upstream, evolving spec that these snapshots are being made from, it makes the most sense to get the changes into the upstream first, so you don't have to resolve the issues every time or maintain divergence forever.

However, the W3C instead just insisted on writing the spec the way it wanted, without regards to whether it would actually be implemented.

It makes no sense to publish a spec that will never be implemented by any of the projects that actually have real-world implementations, and differs from the spec that the implementers actually use. That just causes confusion.

So yes, they hijacked the process in the sense that the WHATWG agreed to work together with the W3C, but the W3C never really worked in good faith to resolve differences or provide technical arguments for their changes.


> there's no point in reconciling the specs if you don't actually reconcile them

completely agree

> yes, they hijacked the process in the sense that the WHATWG agreed to work together with the W3C, but the W3C never really worked in good faith to resolve differences or provide technical arguments for their changes.

I can't see either party working in good faith. In what way did the WHATWG's "agreement to work together" bear out in terms of a positive contribution to W3C's parallel spec., which we can both agree isn't a great idea but I'm failing to see how the WHATWG is a positive actor here in any way; they've forced W3C into an impossible position through political bullying and somehow W3C are vilified for hijacking something?


Ian Hickson, the editor of the WHATWG spec at the time, acted as editor of the HTML5 standard at the W3C for a while after the two groups agreed to work together. However, the committee had chairs who could override the editor.

However, despite a couple of years of effort working together, the W3C process allowed for a lot of people to raise objections that re-litigated a lot of things that had already been decided in the WHATWG process, or just didn't have implementer support, or whatnot. This led to the HTML5 draft specification being stale, as these objections held up migrating the editor's draft (which was the WHATWG specification) to the TR on the W3C site.

So lots of people who still saw the W3C as the "official" source of HTML were brought to an out-dated copy of the standard, because publishing more interim drafts was held up with all kinds of bureaucracy; and the W3C objected to linking to the WHATWG copy to suggest a more up to date version with bug fixes, so there was a fight over this.

The combination of the W3C's heaviweight process making it easy for lots of people to raise objections to slow down the process, and having the ability for those objections to be escalated above the editor, eventually made Hixie give up on editing HTML5 and just go back to editing the living specification.

The thing is, a specification only makes sense if it's actually implemented. Lots of non-implementers raising blocking issues on wishlist features, and then having to take the time to formally resolve all of those issues, does not make for a productive environment; and when the resolutions of those issues are escalated to chairs of the group or higher up in the W3C against the support of the implementers, it really hampers the process of coming up with a productive spec.

By the way, I haven't followed this drama in a few years, but taking a look at what's happening now, it looks like the W3C is essentially just plagiarizing the work of the WHATWG.

Features are generally discussed in the WHATWG, or implemented by browsers and then proposed, and the spec writing goes on there. After the spec is reasonably well worked out, the W3C is copying and editing some of the text into their standard.

Now, the WHATWG spec is under a Creative Commons attribution license, and the W3C does provide a small attribution in the acknowledgements section, so they are not violating copyright.

However, what they are doing essentially amounts to plagiarism as they are presenting themselves as the source for the standard. The introduction to the standard doesn't indicate that the actual work is going on in another group; they invite people to make comments on the W3C's GitHub. This is confusing, it gives people an out-of-date view of the standard, and it seems to be a move to make the W3C seem to still be the relevant authority when it's basically just cloning the standard from the WHATWG, but with enough wording and formatting differences that it could conflict and is hard to tell when it would.

Alternatively, they could fork the standard but do so more in the way that distributions package packages; take what's from upstream, have a separate set of patches that they apply on top that make it clear what the differences are. For instance, those patches might apply their layout, their disclaimers and the like, possibly disable some things that they think are underdeveloped or contentions and likely to change, and otherwise mostly just freeze the text. They could push any patches that they thought were for meaningful differences that they've fixed to the upstream project. They could properly attribute the WHATWG spec as the original source at the very top of the article, and list the editors of the WHATWG spec as the primary editors and the people doing the W3C release as maintainers of that particular fork.

But instead, they are listing as editors people who are basically just doing light paraphrases of the WHATWG spec.


> By the way, I haven't followed this drama in a few years, but taking a look at what's happening now, it looks like the W3C is essentially just plagiarizing the work of the WHATWG.

Ditto, and it's why if anyone asks about HTML and spec. conformance, I don't even mention the W3C, except to dissuade them from paying any attention to them. Their current HTML work is irrelevant and misguided.

My issue here is more with the historical negationism around the relationship between the organisations. The W3C's current HTML is, frankly, wrong-headed. But the context around their current situation is the fact that they've been bullied, cajoled and even somewhat ridiculed reputationally into these quite irrational actions by the WHATWG's very existence. That fact is lost when they're accused of acting negatively toward the WHATWG (e.g. hi-jacking apparent agreements and processes), when the actual background was WHATWG originally hi-jacking the specification of the web's central language.

Your post here is supporting the idea that the W3C's current direction on HTML is irrational. That's fine, I agree. But what they're doing is no worse than what WHATWG did originally with HTML5; the only differentiator is that WHATWG was extremely powerful (being primary implementors) and could use that power to win hearts and minds of pragmatic developers. The W3C have no such power and as such their wrong-headed actions are fruitless. But the equivalence is still worth pointing out.


For anyone confused about how the WHATWG came to write a new HTML spec entirely from scratch, after the W3C blocked the work happening at the W3C, this is a good place to start: http://diveintohtml5.info/past.html#webapps-cdf

I'm going to assume your point that the WHATWG is "extremely powerful" compared to the W3C was meant to be satirical.


> The WHATWG has always opposed the W3C's spec. They see it as confusing to have two "official" specifications.

As the old joke goes... if it hurts, they should stop doing that!

The WHATWG spec is worse than useless to me as a developer. It's impossible to tell what is usable and what is just Google's wishlist (which is about half of it). The MDN has entirely replaced it for me, since they at least do a good job of documenting reality.

WHATWG should quit trying to bully the W3C out of the field, and instead clearly mark the WHATWG "spec" as what it is: a public notepad for browser developers. Leave the business of documenting what browsers actually conform to to the W3C.


> The WHATWG spec is worse than useless to me as a developer. It's impossible to tell what is usable and what is just Google's wishlist (which is about half of it). The MDN has entirely replaced it for me, since they at least do a good job of documenting reality.

The WHATWG living standard is largely where browser vendors (and other interested parties) work out what the web will be. W3C (with their implementation requirement), and, as you note, MDN serve to describe what the Web is. The latter is more useful to developers, but, as you suggest, MDN is doing a better job of it.

OTOH, to get to a place where things have interoperable implementations, a forum for implementors to collaborate on forward-lookong specifications is necessary, and that’s what WHATWG does well, and W3C does not (which is why WHATWG exists.)


I agree with everything you said. What rubs me the wrong way about the WHATWG is that they give the perception (and it may be just that) that they are trying not just to serve as that forum for browser makers, but also as the standard reference for web developers (which is the role the W3C HTML specs, save XHTML 2, have historically served), and doing a poor job of the latter.


I don't think the WHATWG is trying to serve as the reference for web developers (though their HTML spec has notable and laudable features for that use); I think they are mostly fine with W3C trying to do that as long as they do it correctly (which requires alignment with what browsers do, otherwise developers will target a non-existent platform.)

I don't know if they (or developers, MDN is probably a more widely used reference than W3C) see a standards body as essential in that role, though, and I don't think it seems W3C really wants to accept being relegated to that role rather than driving the web platform, even though they haven't driven the platform for a long time.


Exactly what is the purpose of a standard reference for web developers that fails to track the documented behavior of browsers?


I'm not sure the intent of your comment; failure to track what browsers actually do is exactly the problem with the WHATWG "living standard" – it's very much a forward-looking spec at best, and too often a wishlist.


The ideal situation is for the WHATWG document to be a roadmap of what vendors have discussed and tentatively agreed on, and the W3C document to be a periodic snapshot of what's actually been implemented.

That wouldn't make either one of them "bad". The issue here seems to be W3C wanting to push forward things that the vendors haven't agreed on or implemented yet.


I don't think the W3C DOM document has anything the developers haven't agreed to. The problem is that it's an incomplete, intrinsically out of date, and often buggy subset of the the WHATWG living standard.

I agree that the W3C value proposition COULD be to publish a snapshot that describes what's actually implemnted. That might be a way forward here, but it requires a lot of work to define what "actually implemented" means in a useful way, and to check the test results and update the document (or build an automated way to harvest resources such as https://wpt.fyi/dom ).


Great, so if I build a website based purely on the WHATWG specs, it will work in all browsers, correctly?

No, it won't.

I can take the A4 paper spec and build a printer that takes that paper, and I know paper will comply with it. And the other way around.

You can't build a website just from WHATWG specs, and you can't, excluding the parts about backwards compatible parsing, easily build a new browser from scratch either.

A standard is an a-priori written document that describes the entire API surface, so that people on both sides can develop based on the standard without having to verify with actual implementations.

The WHATWG documents are useless for this purpose.


But how are the W3C specs any better than the WHATWG specs?

As far as I know the W3C take the WHATWG specs, and modify them with some of their own ideas so they're different from the what the browsers implement or are planning to implement.

What on earth is the point of that? Why design your own spec that nobody is implementing or planning to implement? What a waste of time!

And back to your point - why is it better than the WHATWG specs?


> Leave the business of documenting what browsers actually conform to to the W3C.

Documenting the prevailing conditions is very much not the purpose of a standard.


That's what the W3C has historically done, with HTML 2.0, 3.2, and 4.0. "Document, clean up, and nudge" is maybe a better description. The WHATWG today seems to take more of a "document, don't clean up, and add our wishlist" approach. (The "don't clean up" mentality is embodied in their "don't break the web" ethos; the "add our wishlist" mentality is a consequence of the "living standard" ethos… the "standard" never becomes reality because it is constantly changing.)


I'm not sure where you got this impression, but it's wrong. https://whatwg.org/working-mode stipulates the requirements on additions. That's quite a bit different from a set of wishes.

And there's a lot of cleanup of legacy APIs happening too. E.g., removal of the isindex tag and deprecation of AppCache.


That is the governing philosophy of the WHATWG, sadly.


That's a bit of a stretch. This is only relevant to legacy APIs and only when all implementations are in agreement, which is quite the rarity.


I think that's the rub - if w3c wants to "document" what a browser conforms to - I imagine it will be a copy-paste from what the browser vendors are doing in their separate meetings of the minds.


Urm, I thought MDN was based on WHATWG? (Not W3C)


MDN includes clear documentation about what is actually implemented in all major browsers (the compatibility tables), so I (as someone who wants my code to work everywhere now, not next year) can tell at a glance what pie-in-the-sky ideas I should ignore.

That's great that it's based on the WHATWG's work – it should be, since it should document what's in Firefox, and Firefox presumably is following their own work with the WHATWG. But the WHATWG shouldn't pretend that they're useful to me in any other way than a preview of what's coming down the pipeline. For that, I need clear documentation of what is, not what will be. W3C HTML specs prior to HTML 5 (with the exception of the abortion that was XHTML 2) have historically served that purpose well. It was easy to make the judgement that, once my target market primarily supported HTML 4, I could use anything in that spec. The WHATWG "spec" throws that idea out the window.

Ideally, with a "living standard", periodically there are snapshots of some form that document what all or most major browsers supported as of some point in time. So I as a developer can say, "well I know most of my target market have updated their browsers since date X, so I can just use anything in this standard snapshot". The W3C I think is trying to do this. They might not be doing a very good job (indeed that is the crux of the WHATWG's objections); like I said, I personally rely on MDN to fill this same role for me. But the WHATWG living standard itself cannot fill this role, short of including MDN-style compatibility tables, or making their own snapshots that are somehow "better" than what the W3C puts out.


FWIW, the HTML Standard (not the DOM Standard) does include CanIUse information in a sidebar, to help with this. I'd like to include this into other WHATWG standards, but it hasn't really happened yet. I'd expect most web developers to use MDN and StackOverflow though, as you say.


I appreciate the attempt to include compatibility tables, but they're nowhere near detailed enough for serious usage. Take the canvas element as an example. The WHATWG spec has one "CanIUse" sidebar for basically each section, if that. But compatibility issues exist at the level of individual methods. E.g. .filter and .resetTransform() both have very low cross-platform support ([1] and [2]), which I can tell at a glance from MDN, both in the sidebar listing them, and the compatibility tables on each page. Whereas the WHATWG spec doesn't even mention that these are experimental ([3] and [4]), and the CanIUse sidebar is totally absent for them.

StackOverflow is not a reference, and the answers for even popular queries are sometimes a decade out of date.

[1] https://developer.mozilla.org/en-US/docs/Web/API/CanvasRende...

[2] https://developer.mozilla.org/en-US/docs/Web/API/CanvasRende...

[3] https://html.spec.whatwg.org/dev/canvas.html#dom-context-2d-...

[4] https://html.spec.whatwg.org/dev/canvas.html#transformations



> Differences between the W3C HTML 5.2 spec and the WHATWG Living Spec: https://www.w3.org/wiki/HTML/W3C-WHATWG-Differences

FWIW, I'm pretty certain that is incomplete. It may well be the case that that is the set of deliberate changes from the WHATWG spec (at some revision), but we've had cases before where changes from the WHATWG spec have been copied only partially leading to the W3C spec, as published as a Recommendation (i.e., with two interoperable implementations) has been impossible to implement as written.


http://diveinto.html5doctor.com/past.html

A very entertaining read, IMO.


The W3C was the original standardization organization for the web.

They wanted to create standards that allow easy implementation by others, and were willing to make some tradeoffs with backwards compatibility for that (see XHTML).

The browser vendors obviously oppose this, and want standards that just formalize what they already implement. As result, the browser vendors created their own standards committee, which standardizes whatever the browsers already do (if existing). This is the WHATWG.

As result, the web standards situation has gone to insanity. The WHATWG URL spec contains 4 pages of pseudocode and algorithm definitions for how Chrome parses URLs, and how you should as well, and the W3C, still being relied on by the other actors on the web that aren’t the 4 largest browsers, has to copy the WHATWG spec as base for their own specs, because browsers will ignore whatever the W3C says anyway.

But remember that the WHATWG proposed to the W3C that the W3C should copy the WHATWG specs as base for their own specs: https://en.wikipedia.org/wiki/WHATWG#cite_note-9


To offer another perspective.

I'd say the W3C wasted a huge amount of time pursuing quests of purity (XHTML) over actually making the web better for users. I see the value in what they were trying to do, but it wasn't letting people do the things they wanted to do on the web.

As browsers started just implementing features outside of standards in completely disparate ways because everyone was desperate for them (leading to plugin hell, apps rather than websites, etc...), WHATWG was created to try and ensure that the web remained a single thing and not a mess of things that would only work in one browser.

The web platform sprang forward massively as a result of this, with browsers implementing much more consistently and with new features tending to be implemented in compatible ways, with real progress being made.

This lead to the W3C specs becoming totally redundant and the only way for W3C to keep up was to lamely try to copy from WHATWG into a "spec" at random intervals and claim it was something people could work towards, when it reality it offers no real advantages over working to the living standard, because no browser offers better coverage of that spec than any other random point of the living spec.

People want features. We saw what happens with a very slow moving, rigid standard: plugins. Flash was popular because at the time you simply couldn't do good video, animation, games on the web platform. Likewise, mobile phones shifted to apps because websites couldn't do notifications or use location information. You can bemoan a living spec, but you get one anyway, because people will work around the web if they can't do what they want. If you want to use a subset of that living spec, use it, but at least keeping it together and agreeing on roughly how to do these things is better than plugins or abandoning the web entirely.


> This lead to the W3C specs becoming totally redundant and the only way for W3C to keep up was to lamely try to copy from WHATWG into a "spec" at random intervals and claim it was something people could work towards

So why don't they just disband the W3C? It sounds like it's not needed any more if WHATWG are doing the work?


The W3C actually does do some good work in other working groups; the CSS working groups seem to be working smoothly.

The W3C also oversees a lot of other standardization processes that aren't directly related to web browsers, like RDF. There are people who find this useful.

I think a lot of it is a power play. The W3C wants to be relevant, and the most relevant things in the web world are HTML, DOM, and CSS (there's also ECMAScript, but that already has a different standards body that owns it).

There are a lot of other standards that use HTML, CSS, and the DOM, such as ePub. The W3C wants to be the normative reference for these core web standards; many times, one standard will have to refer to the other, so the W3C wants to be the one that defines the "official" HTML standard.

But the W3C's process and policies are just terrible. They let people take over standards who have no intent on working with those most impacted by the standards, the people who develop the browsers that billions of people use daily to access tons of diverse content. So instead of just providing a lightly edited snapshot, possibly with some WIP features removed, of what the WHATWG produces, they start going in and meddling and making changes with insufficient justification so you have two forked standards providing a lot of confusion for everyone.


> The W3C actually does do some good work in other working groups; the CSS working groups seem to be working smoothly.

The W3C is also doing work in HTML at least too. If browser makers would actually participate as editors (like they do in CSS) then it would work just as well as the CSS working groups and others.


> If browser makers would actually participate as editors

Microsoft tried that, investing in easier to use GitHub tooling to allow a wide range of people to submit pull requests to update/fix bugs in the W3C HTML standard. "If you build the field of dreams, they will come...." Nope. "They" had all gone to WHATWG ballpark, and all the W3C editors do is cherrypick (that's the actual word in the HTML 5.2 Recommendation) WHATWG's specs. It made a LOT more sense to just join WHATWG for HTML (and DOM).

> > The W3C actually does do some good work in other working groups

Right, W3C as a whole does a lot of good work. CSS is a good example, Web Payments, Web Authentication, Web Assembly come to mind as groups where a broad group really does come together and build consensus on how to solve hard problems. The HTML and DOM communities, however, have moved to WHATWG for reasons that happened long ago and apparently can't be un-done, even if a company with Microsoft's resources tries.


>Microsoft tried that, investing in easier to use GitHub tooling to allow a wide range of people to submit pull requests to update/fix bugs in the W3C HTML standard. "If you build the field of dreams, they will come...." Nope. "They" had all gone to WHATWG ballpark, and all the W3C editors do is cherrypick (that's the actual word in the HTML 5.2 Recommendation) WHATWG's specs. It made a LOT more sense to just join WHATWG for HTML (and DOM).

It won't work if only one browser maker will participate. If only microsoft participated and implementing things in the CSS WG then nothing really would get done over there too.

If all the browser makers would have editors in the w3c html spec (like they do in many other w3c specs) and agree to implement stuff there, then that would also work.


How would one convince the others to re-invest in W3C HTML and DOM? Microsoft's rationale a few years ago was that WHATWG wasn't a real standards organization with a patent policy, dispute resolution system, etc., and that created various legal and business concerns.

It turned out to be much easier to add a legal framework to WHATWG than to convince the HTML and DOM standards community to move back to W3C. Basically, people work on specs (and code) together in the places where there is a critical mass of expertise and energy being productively engaged. The key variable is the people, not the organization.

I don't understand the dynamics of how these critical masses of expertise coalesce, break up, and move around. I have learned that it's much more efficient to go with the flow than try to redirect it.


What good work is it doing in HTML?

In HTML, as far as I can tell, it appears to be copying features from the WHATWG standard, paraphrasing them, and including them in their standard, with only a small notice on the acknowledgements page the the HTML standard contains parts derived from the WHATWG standard.

The browser makers did participate. They participated in the W3C working groups up until they were shot down when trying to propose to work on features that users actually wanted and would be backwards compatible rather than backwards-incompatible XHTML 2.0.

The browser makers then proceeded to do their work on rich web applications, with features like canvas and XMLHttpRequest, as well as actually putting together a spec for how to consistently parse HTML that would be compatible with real content, in the WHATWG.

When it was clear that the WHATWG standard was the one that actually mattered because it was what was actually implemented, the W3C invited them back in to start working on the standard together. That's what HTML5 was; the W3C agreed that they would start from the WHATWG standard, that they could have the same editor (Ian Hickson), and they wound down the XHTML 2.0 group.

However, various people involved in the W3C process proceeded to use bureaucratic moves to raise formal objections to things that had been changed, and escalated the issues above the editor. Eventually, he got fed up and left the process, and most of the browser vendors proceeded to continue working through the WHATWG. Microsoft was the last holdout, but eventually they too left the W3C process and moved over the the WHATWG as well.

So, the browser vendors have tried to work directly with the W3C on the HTML spec twice, once before the WHATWG split off and once as part of the attempted reconciliation. Both times, they were stymied by other people involved in the process who were more interested in purity and process than actually providing a forum for working out a good specification for real world implementation.


>What good work is it doing in HTML?

From my experience, it has generally done a better job of explaining things developers would want to know, especially in terms of accessibility and internationalisation.

The XHTML stuff was a long time back, and at that time, it warranted having a whatwg. Now that W3C is no longer insisting on XHTML (and hasn't for many years).

>When it was clear that the WHATWG standard was the one that actually mattered because it was what was actually implemented, the W3C invited them back in to start working on the standard together. That's what HTML5 was; the W3C agreed that they would start from the WHATWG standard, that they could have the same editor (Ian Hickson), and they wound down the XHTML 2.0 group.

This is the crux of it. The WHATWG is really usefull for browser vendors because they can do essentially whatever they want in it without anyone having the power to formally object to it (unlike the W3C). Now the whatwg editors (and thus browser makers) can say that they will listen to community feedback, but thats pretty much a benign dictatorship over the most important spec of the web.


I mean, reading that GitHub thread, it seems to me that everyone involved is saying that's exactly what should happen, and the people invested in the W3C are trying to force through new work just to justify the organisation's continued existence (and presumably, if I were being cynical, their paycheck).

The bone of "you could set specs based on snapshots of the living standard" was thrown to them, but the reality is no one cares enough about it to actually do that well, so it's just being done in a bad way that will hurt everyone.

That thread reads, to me, as "we tried being nice, but now you are causing problems, just stop please".


W3C do a huge amount more than HTML and DOM. All of CSS is coordinated through there for example.


> I'd say the W3C wasted a huge amount of time pursuing quests of purity (XHTML) over actually making the web better for users. I see the value in what they were trying to do, but it wasn't letting people do the things they wanted to do on the web.

Okay, let’s make a deal:

We both write a crawler that can fully reliably parse websites.

I implement XHTML1.1, you implement HTML5. We both get 1 month time.

What do you think is going to happen?

XHTML was a worthy goal – with it, we wouldn’t have a need to run headless Chrome for tests. We could parse the web, and actually use the data. OpenGraph tags would never have been necessary. We wouldn’t need to throw DNNs at rendered output of a browser just to parse data.

HTML5 is amazing for existing browser vendors, developers, and in the short-term, users. But everyone else loses. Horribly.


And if we were all using XHTML1.1 now instead, yes, your parser would be easier, except all the richer content would be in flash, and all the web apps would be desktop applications, and you wouldn't be able to parse that at all, even with a full browser.

You are acting like everyone would just stop and wait for you to make your dream implementation that's ideal - that's not how the world works.

WHATWG was an admission that we can't stop it, so we might as well embrace it. Embracing it has resulted in browsers being far more consistent, and new features being a shared part of the web platform, and not siloed off in plugins and other platforms. That's why W3C is irrelevant now.


You’re assuming XHTML1.1 would never have evolved further, never have gotten more content.

And Flash, despite its flaws, would have been a much better starting point for rich content than the ecosystem we have today.

Many of the features Flash provided are only available in browsers today through babel.js transpilation. As result, we’re stuck with a language without stdlib and broken syntax.

We’re stuck with a document model that’s impossible to work with or parse, and with impossible layout management.

If you want web applications, it’d make much more sense to port the Android layout XML format to the web than to attempt to use HTML5 for it, because HTML5 is insanity for building applications.

> and all the web apps would be desktop applications

I don’t see that as anything bad.

The web is for documents, and lightly interactive content. All the rich applications on the web are opaque to any crawler I could write anyway, as I just get "You need JS to view this React app". Desktop applications would be just as parseable, except they would also be less of a resource hog.


The idea that standards groups exist to push for top-down reconcepting of how the world should work is a common one, and is also a good reason why standards groups fail. Your idea of what the best outcome is won't be the same as every other stakeholders, and no one stakeholder will have exactly the same idea of the best outcome as the market will.

Ultimately, the market will win, no matter what your standards group says.

The point of standards is interoperability, not rationalization. When standards groups try to rationalize technologies real people work with, they cease to provide value, and instead become obstacles that real engineers end up laboriously working around.


When I was a child in elementary school, a standards committee decided to change my native language.

They replaced the spelling of most words, and many grammatical rules.

We were forced to obey these changes, any use of the old rules was counted as mistake in school.

Back then, many older books were still using the old rules.

By the time I left high school, almost no books with the old rules were left. All had been reprinted. All newspapers had switched. Autocorrect programs had been updated with the new rules as well.

In a matter of 8 years, an entire language had changed its orthography and parts of its grammar, top-down, and it worked out fine.

I’m sorry, if an entire human language with 120 million speakers can be updated top-down like that, a web spec can as well.


Apples and oranges - when you introduce a government mandate, you remove the market. @tptacek's aregument is clearly about market forces in publicly defined standards, not government enforced ones.

I'm pretty sure the last thing anyone really wants is what we'd wind up with if web standards were left to government dictates...


There’s no need for a government to enforce standards on the web – there’s already an oligopoly that can do it on their own.

In fact, there’s a single company that can just outright dictate web standards, because they hold almost 70% of the browser market: Google.


What language is this? This sounds fascinating.


My example was the implementation of the German orthography and language reforms between 1996 and 2006 (I went to elementary school in 2002, when implementing it was still in progress, and most stuff was still using the old spelling, I left high school in 2014).

But French has their Academy, which has even more power over language, and afaik, Spain supposedly has similar governing bodies.


Meh- the French Academy has power over prescriptive grammar sure, but it has almost no power over what descriptive linguistics finds. Lots of Arabic and Verlan has made its way into everyday parlance.

I'm not really sure that the French Academy is really that much more effective than Strunk & White is for English speakers. It primarily seems to be ceremonial / an expression of French pride.


If you want a more tech-related example, Jobs said "no flash on iPhones", and in a few years... poof! no flash.


The counter to that is pretty obvious, because if you remember Jobs also said "no native apps on iPhones", and then in a few months... poof! an app store.

Flash was pretty much dead anyway, and the web platform had advanced enough to mostly replace it at that point. That wasn't true for native apps.

If you want to make a standard, it has to let people do the things they want to do. Otherwise, people will just use a different (or no) standard.


I'm not really getting into the standards thing here--just throwing some ammo to the underdog.

My only point is that there are only a handful of companies with the cash, the talent, and the inclination to tackle these things, and most of them are near if not total monopolies, so as long as what they put out there isn't a blatant kick-in-the-nuts, most of us will just accept it.

iPhone was a compelling product, didn't have flash, everyone migrated to Javascript ASAP. Google is practically a monopoly, and when webmaster tools tells people to jump, watch everyone piss away a weekend to add microformats and shave 5% off of a few 40k images.

Serfs. We are all serfs.


That sounds horrible.


It was amazing. The new orthography is much simpler, and has far fewer insane rules or exceptions. And most people that have seen the transition, but were born after it, or went to school during it, agree.

I know it can work on this scale, I’ve seen it IRL. Many languages do stuff like this, German has the council of German language, and French has their Academy.

You can do the same on the web. You just need to have all vendors working together to actually do it.


The idea that a bunch of standards group officials can decide for the world that web pages are simply lightweight content publishing mechanisms and that real applications should be build exclusively in Flash and that that worldview can be ratified and mandated by browser vendors does not seem amazing to me.

At any rate: the Internet is a market system, not a top-down autocracy.


The alternative (and current reality) is that the same things are decided by about four companies in an entirely intransparent manner.

At least the W3C had processes and a wide array of members.


Isn't that just theater? None of them can tell Apple and Google what to put in their browsers; in fact, if they can't convince just one of the big 4 browser vendors to do something, their standards have no meaning at all.


It's even more work than that-- check out caniuse for SVG fonts:

https://caniuse.com/#feat=svg-fonts

They had support in both Safari and Chrome, but never in FF or IE (nor Edge). Chrome eventually dropped the support.

So I'd say if you can't get all four to implement the feature then you might as well call that part of your spec a "living standard." Those features are going to get way fewer eyeballs, fewer bugfixes, fewer reviews, fewer pieces of documentation, etc.


Uh, WHATWG is an open process - they have a similar level of control over things that W3C had.

If you want to try and claim W3C ever had the power to enforce people following their specs, IE6 would like to have a word.


The Internet is a network. The web is an oligopoly. Google, Google-by-proxy, and Apple fill the dog bowl, and the rest of us eat from it because it is there.

Tomte 8 months ago [flagged]

If you‘re talking about German, it was not amazing, but a cultural catastrophe, and an extra-legal totalitarian nightmare.


I assume you’re older than 22? There’s pretty much a strict split at around that age. People older seem to consistently hate it, people younger seem to consistently like it, because the new rules are much simpler.

Previously, Gruß and Kuß had no info about how long to pronounce the u – Gruß and Kuss do. And until 2017, capitalizing them into GRUSS and KUSS lost this information, now GRUẞ and KUSS keep it.

Previously, for many words, the rules when to split the word, when to write them together, when to use – was insanity. Now it’s all in a few easy rules.

And you have to remember, this wasn’t the first time German went through such changes – ever since the advent of the printing press, when a written German language was basically "invented" from the many dialects that existed, until today, there have been proponents of a prescriptive language evolution, and they’ve had lots of influence over time.

When you use Tarnen, Verfasser, or Absender, Abstand, Bücherei, Augenblick, Leidenschaft, Entwurf or Briefwechsel, Rechtschreibung or Tagebuch, Grundlage, Altertum, Erdgeschoss, tatsächlich or Hochschule, all these words were defined top-down. (All these words are just from Philipp von Zesen, Christian Wolff, and Joachim Heinrich Campe)

A massive amount of what we consider "German" today was defined and changed top-down, and without these changes, German wouldn’t be recognizable.


You‘re misinformed.

Yes, the German language has had several big changes, but until the reform we‘re talking about it was linguistically „proper“ in that the existing language was described and codified. It was bottom up.

In this reform some non-elected people (who just a few years earlier had said themselves that there job wasn‘t to invent German, but to describe existing use and trends) invented a whole new orthography from scratch. The new rules have never been in use anywhere throughout the German-speaking lands.

They were and are pure fiction.

In linguistics that‘s how you tell a layperson: they think linguistics is proscriptive. Now it seems to be... :-(

And of course people under 22 don‘t care. They have never learned proper German.


You mean, just like in many other languages? According to Wikipedia, French, Icelanding, Spanish, Swedish, and a few more have had varying degrees of prescriptive language standardization.

> Yes, the German language has had several big changes, but until the reform we‘re talking about it was linguistically „proper“ in that the existing language was described and codified. It was bottom up.

I just explained why that wasn’t the case. Many linguists in the past have intentionally invented words (see the ones I mentioned) to make the language simpler, and stricter.

And the same continued until today – the drug store chain Rossmann has been a constant supporter of linguistic prescriptivism, has sponsored groups supporting it, and has been using these concepts in all their published material as well. Many other companies engaged in this as well.

The language has never been defined by the people speaking it, but always by the journalists writing it, the linguists describing it, and the companies influencing it.

And German as a whole was created, as pure fiction, by people trying to publish books across the whole of Germany at a time when everyone spoke local dialects.

At no time has German ever been a bottom-up language – and if we already let our language be influenced and shaped by companies, by media – why not at least use similar influence to make it simpler?

Having a language be simple to use is more important than some fake emotional value of being "natural".

Tomte 8 months ago [flagged]

You simply don't understand what I have written. I think we can leave it here.

I don't care about your opinion that it's "fake" and "emotional".

Language is a core part of my being, and a fascist power-grab killing my mother tongue is simply a crime against humanity. It's no different from how the Turks have been treating the Kurdish language.

I have only weak hope, but still hope, that we can someday reverse this. Violently or non-violently.


Do you believe languages are meant to live forever?


But it worked.


I am not sure if this is still the case today, but I remember that not too long after the new orthography / grammar rules were passed, two major news publishers announced that they would return to the old rules.

Also, my sister is a linguist, and I can trigger her going on a long rant just by mentioning the Rechtschreibreform. ;-)

(Personally, I think some of the new orthography rules are much simpler and consistent, so I use them. The rest I basically ignore, unless a spellchecker nags me about it.)


The HTML5 person's crawler will parse some significant fraction of real websites, and the XHTML one won't, because people write HTML5 and not XHTML, even if you as a tool vendor would greatly prefer otherwise.


And yet, forcing people to implement opengraph tags, forcing people to drop flash, forcing people to use HTTPS, forcing people to drop Symantec certs, forcing people to drop SHA1 certs – so often the actors behind the WHATWG have managed to get website authors to change what they use.

Hell, Google has AMP, which is far more intrusive than XHTML ever was, and yet, they’ve managed to get every major website to implement it. https://www.ampproject.org/docs/troubleshooting/validation_e...

And yet, somehow, implementing some stricter spec is supposedly impossible?


There' s a Big difference between "every major website" and "the Web".

"every major website" means 100 companies with skilled developer who can and will react to changes in browsers quickly.

"The web" consists of millions of websites maintained by individuals and small organisations who have no resources to update the way their web pages are coded every year. It contains HTML generated 10, 20, soon 30 years ago. It contains that one app in your intranet with the table layout that you can't replace and that IOT thing you connected to your home Wi-Fi 7 years ago that has no way of upgrading its web interface.

A browser that looses access to "the web" is worst than useless.


There's also a federal procurement picture: big governments making big purchases aren't fans of incompatibilities and standardised solutions.

For a company like MS losing access to "the web" could keep a lot of people from becoming VPs...


IMHO XHTML is pretty painful. Even if you say: okay, there is a server-side auto-generated markup tree and we can formally verify what happens. There is now a solution to that and it's called (server-side) React which is basically a (useful) alternative to XSLT. Except that it outputs HTML5.

Even if you argue that the specs are huge. Just compare book sizes about XML, XSLT vs HTML5 and CSS, JS, React. When I actually tried to do some useful work with XSLT (which needs to be mentioned here IMHO), I realized that the - less painful - 2.0 version is hardly implemented by anyone.

Regarding the parsing: X(HT)ML lexing is ridiculously easy, for HTML5 it's slightly more difficult but not tough at all. You just need to keep a list of closing vs self-closing tags. Not talking about building in fault-tolerance, that would be tough, yes, even tougher for XML!

> XHTML was a worthy goal – with it, we wouldn’t have a need to run headless

> Chrome for tests. We could parse the web, and actually use the data.

> OpenGraph tags would never have been necessary. We wouldn’t need to throw

> DNNs at rendered output of a browser just to parse data.

Yes and no. If you use CSS for styling, the answer is no. If you require JS to show the initial show/page, the answer is no as well. But yeah, if you use the whole XML machinery with XSLT and possibly even XPATH, then you would be kind of right. I mean, as long as we properly handle the schemas and dtd's - which almost no parser does AFAIK. So it's true, one can do pretty bad-ass stuff with all the X*. But tooling and library support is not good and has never been. XSLT 1.x is insanely difficult to use and XSLT 2.x hard to fully implement I guess.


> OpenGraph tags would never have been necessary.

Small correction: XHTML2 had the Metainformation Attributes Module [1]. That then became RDFa in (X)HTML, practically the same syntax and processing model.

Facebooks Open Graph stuff is claimed to be RDFa. When I tested it then their Parser did not really do RDFa processing - other CURIE prefixes for the same URI weren't recognized, if I remember correctly.

But in effect Open Graph meta information would have looked the same in XHTML2 as in todays WHATTF HTML.

For your other argument I agree. WHATWG (and dumb modern style of development) reduced the democratising aspect of the web. But of course the people of WHATWG word for billion dollar companies, which want to have a moat to centralize behind.

[1] https://www.w3.org/TR/xhtml2/mod-metaAttributes.html


The HTML5 crawler would be able to crawl the web, the XHTML crawler would be able to crawl 0.0001% of the web.

XHTML would only improve the web if all existing HTML went away or was changed to XHTML. Since this is never going to happen, XHTML does not simplify anything.


HTML5 is amazing for existing browser vendors, developers, and in the short-term, users. But everyone else loses. Horribly.

I don't understand who the 'everybody else' is in this case and what and when their horrible losing will be.


People trying to build new tools that parse the web.

Try building a crawler without reusing an existing browser engine.

Try running unit tests against your own web projects with Selenium without running a headless browser.

Phantom.JS gave up because they couldn’t keep up with the complexity, and Chrome headless "just works".

Opera gave up on their own browser engine because of the complexity of parsing HTML5 accurately, when the spec is just "whatever Chrome does".

We’ve thrown away an entire ecosystem, just for more flashy graphics.


One of the most valuable aspects of HTML5 is that it defines a parsing model for "broken" HTML.

This means tat, for the first time, it's possible to build a brand new HTML parser that has a high chance of working against all existing HTML without needing to first reverse-engineer existing browsers.

Remember, when HTML5 was first designed Internet Ecplorer was by far the most widely used browser. And IE was closed source. If you wanted to build a parser you needed to first reverse engineer IE and figure out how it handles invalid HTML.

The HTML5 spec fixed that. The thing you are complaining about here (HTML5 making it harder to build a new browser from scratch) is one of the things HTML5 actually solved!


I very much doubt Opera gave up because of "parsing HTML5". Parsing HTML5 is well-defined, certainly better than what was there before (unless you are willing to say "if you don't write perfect XML you don't get to be on the web", but good luck with that – no browser ever was and never will be in the position to do that)

The vast majority of the difficulty of the web platform is in the layers above, which don't care if the DOM they are looking at came from HTML5 or XHTML: layouting/rendering, interactions of JS and DOM, ...


Indeed, having worked for Opera during the time where we implemented HTML5-compliant parsing, the net result was that we fixed a bunch of site compatibility issues. Implementing it made competing with other browsers easier. (And as you say, parsing HTML is complex, but the rest of the web platform is vastly more so.)


That's a pretty tiny 'everybody else' compared to users, web and browser developers. They seem to be doing mostly ok and I still don't follow your argument that their concerns should somehow reign supreme over those of, you know, the actual everybody else.


It’s a tiny "everybody else" because it never was given a chance to develop.

Maybe you’d also say that the number of people that want to do their online banking with a desktop program that isn’t provided by their bank is a tiny "everybody else".

Yet in places where OpenHBCI exists, many people use it – and there is an ecosystem around it, e.g. KMyMoney can integrate with it, and there’s a small widget for KDE to show your current account balance in your tray.

If we had a machine parseable web, a similar ecosystem would have developed. When embedding links today, we use OpenGraph tags. But why? If the web was directly machine readable, we could’ve directly embedded that content.

Google search shows you as preview excerpts of tables on a page, with an easily machine readable web, similar stuff could’ve happened here.

Maybe instead of only embedding YouTube videos, I would have been able to easily embed any part of any web page into any other. Maybe I would have been able to easily embed any part of any web page in a desktop application. Maybe I would have been able to build addons that easily search over content of web pages, in a structured way.


This would be a more compelling counterfactual if the approach that won wasn't in many senses the most successful technology in the history of the computer industry.


It seems like the web is 10 times more parseable. I feel like almost all content is now delivered via JSON services, and the HTML around it is constructed on the fly and is totally pointless to look at. At least that's the direction I've seen. I don't know how "everyone else" are people that are writing parsers. Seems like that would be a tiny percentage of the population, not "everyone else".


> I implement XHTML1.1, you implement HTML5. We both get 1 month time.

Do you get to use libraries or not?

If you get to use libraries, there are plenty of libraries for both of these tasks, so it will be fairly equivalent. In fact, if you want to really support XHTML well, it will probably be more complex, because you have to take into account namespaces; a tag or attribute is not just a simple string, you have to consider the namespace it's in, so you would have to deal with that.

If you have to write it from scratch, I'd actually also bet on it being quicker for HTML5. The exact algorithm is specified in the standard; you just have to translate that from pseudocode into whatever language you're working in.

In XHTML, you have to look through several standards (XHTML 1.0, XHTML Modularization, XML 1.0 which includes the DTD, XML Namespaces, XML Schema), and then translate from the specification in a declarative style into a algorithm that can actually be used to parse the document. XML parsing is actually quite complicated.

> XHTML was a worthy goal – with it, we wouldn’t have a need to run headless Chrome for tests. We could parse the web, and actually use the data.

What? XHTML wouldn't have replaced the use of JavaScript to add features and load content. It wouldn't have replaced people's inappropriate use of tags. It wouldn't have replaced the fact that web pages are written for human consumption, and so don't generally try to include appropriate annotations on data for processing by other tools.

> HTML5 is amazing for existing browser vendors, developers, and in the short-term, users. But everyone else loses. Horribly.

Imagining that some magical standard is going to make everything better for users is a pipe dream. Providing good data formats and APIs for exporting data is hard, and is a different problem that providing user interfaces and interactivity. There's not some way in which XHTML could have been extended to serve both purposes; they are just too different.

This is why, when people care about providing programmatic access to data, they generally provide two endpoints; one serving HTML, CSS, and JavaScript for humans to interact with, and one for providing JSON for machines to parse. There have been endless attempts to try and make one standard that would work for both purposes, and they've failed because that's just not a good approach for solving the problem; instead, it's better to just have an API that both the UI code (whether on the client or server side) can use, and other developers can use if you want to expose it.


In the long run, we're all dead. We need short term solutions.


No, it wasn't simply a question of backwards compatibility. The W3C wanted (wants?) to pursue its own quixotic vision of a semantic, machine-readable web. One in which the needs of human users, browser user agents and use cases like web apps was of incidental importance at best.

The HTML5 spec effort that gave birth to WHATWG wasn't simply about documenting backwards compatible parsing, it was about vendors like Mozilla and Opera wanting to evolve HTML in a way that added actually useful new features, something that W3C had zero interest in at the time.

Nowadays, the W3C's behaviour seems to be driven entirely by an institutional desire to justify its own existence, and protect its revenue stream and its self-assumed position as the one-true source of web standards, by engaging in bad-faith practices like taking standards produced through the hard work of others, removing all citations, making breaking changes, and publishing it as a competing "standard".


Yes, and today we’re left with a web that you can only parse if you’ve got a few thousand developers and billions of dollars to throw at the issue.

As I wrote below, I offer $100 to anyone that can write a tool that can fully parse and render HTML5, the entire spec, and can a real-life React app, within of 4 weeks, without using any existing library or code for the parsing or rendering.

Doing the same for XHTML and a strict scripting language is easily possible in that time.


That was the situation prior to the writing of the HTML5 spec, not a situation it created. The majority of the web was not, and never would have been parsable as strict XHTML, whatever the desire of the W3C or anybody else. And even for new content, the idea that every hand-authored file would be well-formed, or that every half-baked and buggy CMS would always produce correct markup was a pipe-dream.

If strict parsing had been enforced, we'd have a web where a large proportion of the sites you visited each day would be broken. Or rather, we wouldn't, because complaints from users would have long ago forced browser vendors to implement graceful error handling, and so you'd end up with something akin to HTML5 anyway. Indeed, it was encountering precisely this problem that made vendors like Mozilla and Opera (who, incidentally, never had billions of dollars) to lose faith in XHTML in the first place, not some masochistic desire to keep their browsers' code as complicated as possible.


It would be awesome if the layers of the browser would allow for a more parseable, document-based web. That would be an effort towards standardization.

But are you really advocating for removing dynamic media from the browser? That seems like an incredible step backwards in most regards. In the absence of a desktop toolkit to rule them all, browser standardization is what we are left with right?


> In the absence of a desktop toolkit to rule them all

Qt. Runs everywhere, works everywhere, just fine.


If only some standards group somewhere could mandate that everyone use Qt.


Ah, this gets into gpl vs bsd though doesn't it? I'm not sure what to think about dual licensing. I have a fondness for Qt, but I also like tcl/tk so... its hard not for me to think of that xkcd about standards. https://xkcd.com/927/


React probably uses HTML5 APIs but could probably be re-written to do the same thing without them... I'm not sure anyone here is interested in $100 as they probably wasted at least that much "company" money reading this thread this happy Friday.


The $100 is for someone writing a fully-working parser that handles real-world HTML5 pages, without reusing any existing implementations.

Building an entire new parser is (obviously) much easier for strict languages (e.g. JSON) than for lenient languages (e.g. HTML5).

Which is the problem I have with HTML5, JS, and many similar technologies – they’re so lenient that almost everything is broken instead. We might as well write websites in english prose, it wouldn’t be much harder to parse.


I think you're downplaying the shit show that was XHTML.

It was the epitome of standards people chasing the ideal of a perfect platonic ideal of a standard at the cost of actual usability. E.G. a single parse error in an XHTML document and it doesn't render at all. That all by itself was a deal breaker for many people.


That seems reasonable to me. Would you want your programming language to compile code that has a syntax error?


Yes, it seems reasonable, and the analogy with programming languages seems to make sense. But there's a big difference. With programming languages, you write a program, and when it's correct, you check it in and it's immutable until you check in a new version. Web pages are composited on-the-fly by programs that combine static files, database content, content from third parties, etc. The only way you guarantee a valid output document is if your compositor is bug free and defensively validates any third-party content you might be pulling in. Any bug in the compositor and your website is totally unavailable for users that hit the bug.

This is a great illustration of the concept: https://web.archive.org/web/20060613193727/http://diveintoma...


Thanks for the link. I can certainly see how making the transition now is pretty much untenable, but I'm not 100% sold that it wouldn't be a good idea if everyone had adopted the policy from the start. I can certainly see how it would still be important for browsers to have a mode where they do their best to render the page, but it's less clear that it should be the default. Even if it was the default, then I would expect variations in different browsers ability to recover from errors would lead developers to being much more careful about allowing errors to creep in.


What makes that story (in my link) so compelling to me is that it happened to people who were XHTML advocates. They were the people arguing that browsers should be strict. They were the people writing blogging software The Way It Should Be Done, to ensure maximum XHTML compliance. It was a blog entry that was specifically arguing for strict parsing that it became invalid XHTML due to a bug.

If true believers can make this kind of mistake, how often will it happen to people who are just trying to get work done? People will have bugs in their code sometimes, but what should the failure mode be?


Yes, but if browsers were strict by default (or at least by default in dev mode), the bug in his compositing software likely would have been found and fixed much earlier, and it wouldn't have persisted into production.

So again, I understand that it's nearly impossible to make the transition now, but that doesn't mean it wouldn't be preferable. And there's no reason the transition couldn't still be made, but much more slowly, and perhaps without ever making the big browsers reject by default.

I don't know if there are other disadvantages to XHTML, but if the strictness issue is the only one, then it seems like there would still be value in slowly transitioning over.


Web pages are information held inside containers, not code.

The analogy is a classic example of the "Everything must act like a compiler" fallacy.

There are many situations in which compilation is hopelessly inappropriate as a user model.

Not only is the web one of them, but the hypothetical semantic web is also one of them.

You can't force semantics into compilable tokens. The suggestion that you can - and should - is nonsensical.


I’ll remind you that the AMP spec has an extremely strict validation requirement: https://www.ampproject.org/docs/troubleshooting/validation_e...


Would you want any and all systems to have identical failure modes?


That’s great if you’re either an existing browser vendor, using a browser, or developing a broken website.

But if you actually try to write a new browser from scratch, or a tool to scrape websites, you’ll learn to love XHTML, and hate HTML5.

The same that applies with human languages applies here as well. Writing a vocaloid for japanese is a high school programming class project. Writing a TTS for english takes thousands of Google developers years. Writing an XHTML1.1 parser and renderer takes a month. Writing an HTML5 browser takes thousands of developers years.

The WHATWG specs prevent the web from ever evolving further – we’re stuck with opaque websites and no way to build new technology on top of it. Building a crawler is a task of years, and semantic data tags are impossible to parse, because no one uses them right.

The only reason we can automate any parsing of websites is because either browser vendors spent billions on crawlers for their search engines, or if we run a full browser, or if we use the opengraph tags that Facebook forced on websites.

With XHTML1.1, Chrome and Firefox headless would never have been necessary. Imagine how much time and computational cost you could save.


"That’s great if you’re either an existing browser vendor, using a browser, or developing a broken website." - and that's the whole point, in the marketplace the convenience of these people matters much, much, much more than the convenience of those people who want to "try to write a new browser from scratch, or a tool to scrape websites."

If you want to scrape websites or show them in a browser, then you have to follow the needs of makers of these sites, because you need them and they don't need you. If you want to go to the right and they want to go to the left, you either follow them or go alone and become unable to scrape or browse their content. If there's a feature that they want to use that makes your parsing more complicated, tough luck, that feature is going in as long as somebody (e.g. major browser vendors) will agree to make it work.


The huge underlying assumption here is that people would use XHTML as a standard instead of using HTML4.01 with browser-specific extensions, which is what actually happened. XHTML didn't add any value to page authors. It added hugely indirect value to readers. The only people XHTML helped were tooling authors and browser vendors. It's very hard to market that as a value-add.


But the web already contains billions of pages which do not conform to XHTML. Any browser or scraper would still have to able to parse those to be of any use. Adding XHTML would just be an additional parser frontend, it would not simplify anything.


That’s not true – the WHATWG has already deprecated some HTML specs, and older HTML pages already break today.

XHTML would have worked the same way – after a few years, you can deprecate the old parsers.


As far as I know, WHATWG have deprecated some elements not widely used (like "isindex", "font"), but the documents using these elements will still be readable even if not exactly as the author intended. Moving to XHTML, on the other hand, would make billions of pages totally inaccessible.

There are people who are now dead who have pages on the internet. These pages will never be updated.


font still works. isindex does not. It's extremely rare for a cross-browser HTML element to get removed but isindex is one of those.


blink also doesn’t work anymore, and neither does marquee. Several of the frame attributes are broken as well. noscript doesn’t always work reliably depending on browser.


Marquee works for me in Gecko and Blink. Didn't test the other two engines.


You keep repeating over and over that the big advantage to XHTML is "not having to run headless browsers", which I don't understand. I use Selenium all the time for my job, it's not great but it's... not horrible? It's fine, it's not some massive inconvenience, and it's definitely not worth getting rid of HTML5 to abandon it, when, as others have pointed out and you keep ignoring, the practical effect would be that the web would be useless for actual humans without plug-ins.

I guess I don't get your huge bone to pick with Selenium.


> I use Selenium all the time for my job, it's not great but it's... not horrible? It's fine, it's not some massive inconvenience,

Try running hundreds of tests at the same time, to actually get fast results when running a full test suite.

Right now running a small testsuite takes here over 2 hours, 99% of the time is spent in the browser processes.

And that’s not nearly close to 100% test coverage.

> not worth getting rid of HTML5 to abandon it

You don’t have to – XHTML isn’t the only strict spec out there, AMP is another, React also enforces strict syntax in JSX templates. AMP fails to render anything if there’s even a single mistake, React fails at build time.


Parsing isn't the difficult part and above the parser both XHTML and HTML involve the same complexity.

You can get an HTML parser off the shelf. A browser vendor (Mozilla) even funded one for non-browser purposes before adopting it for Firefox.


> Writing a vocaloid for japanese is a high school programming class project. Writing a TTS for english takes thousands of Google developers years.

It didn't take "thousands of Google developers years" to teach a computer English spelling rules. Indeed, even a programming class assignment could do that: you can cover most cases by just looking the pronunciations up in a dictionary. (The number of quirks and inconsistencies makes English spelling quite hard for humans to memorize, but computers are rather good at lookup tables.)

Even if you consider the actual Vocaloid software, which was developed by a team at a large corporation, there are two factors differentiating it from English TTS that make the latter much harder:

1. Japanese has much simpler phonetics than English, with a smaller set of phonemes and (somewhat oversimplifying) only using open syllables. So it's easier to consume and produce, for both computers and humans, but at the cost of being a less efficient encoding: Japanese tends to require a lot more syllables than English to express the same concept, and there are a lot of homophones.

2. Vocaloid sounds robotic. It's gotten a bit less so over time, but it still doesn't come close to passing as human. If you're okay with robotic, English TTS software has existed for a long time, starting many decades before Google was founded. The hard part, the part that requires neural networks and massive computational power and Google and still has yet to be perfected, is making it sound human.

By the way, although vocaloid software would be given phonetic input, normal Japanese writing uses kanji (i.e. Chinese characters), most of which have multiple unrelated possible pronunciations. Determining which pronunciation applies to each character in a given piece of text is nontrivial, and sometimes even depends on context or meaning.


XHTML lost in the marketplace. This was due in part to lack of developer adoption and the fact Internet Explorer ignored it. You could argue that the web community should have forced this through but that simply wasn't working at the time.


I'm pretty sure javascript heavy pages would exist even in a strict parsing world.


This summary glosses over several events that alienated browser vendors and web developers from the W3C and makes the W3C sound like the good guys.

One that springs to mind is the complete debacle around XHTML 2. The link above ( http://diveinto.html5doctor.com/past.html ) is worth reading to understand things from the WHATWG's perspective.


> the W3C, still being relied on by the other actors on the web that aren’t the 4 largest browsers

Why do other actors need to rely on the W3C? Why can't they use the WHATWG specs as well? You say they're more complicated, but if that's what the browsers do then that's what they do.


Specification vs Documentation...


Because the XHTML spec would have made a lot of things easier.

I can write an XHTML1.1 parser and renderer in about one month.

Writing an HTML5 parser takes thousands of developers 5 years.


I can make a self-driving train far more easily than I can make a self-driving car.

You are comparing two things that aren't equivalent. The living standard provides a lot more value than the old specs, and that's why they won.


The living standards winning is something we should see as tragedy, not as something good.

We’ve made parsing the web something that only massive corporations can do. We’ve made building a browser so complicated that even Opera gave up after the complexity of HTML5, and we’re now left with only 4 major browser engines, of which 2 share a significant amount of code.

Do you see an innovative ecosystem there?

We’re left with a web that can’t be parsed, we’re left with a web where, to run Selenium tests, we have to run Chrome headless, because nothing else can even attempt to render websites anymore.


> Do you see an innovative ecosystem there?

Yes! Everyone says that the specification approach is what was severely limiting innovation, and when they took the living specification approach instead that's when web innovation took off again.


Oh? How many tools do you know that can parse a current HTML5 react app without importing any existing browser?

Every tool we have to use on the web relies on 4 existing tools that only major companies can build.

I’ll pay you $100 if you manage to write, in 4 weeks, a browser, from scratch, that renders a real life React app accurately, including all content, without importing a single bit of code or libraries from existing browsers or web tools.


> Oh? How many tools do you know that can parse a current HTML5 react app without importing any existing browser?

I don't recall XHTML deprecating JavaScript- pray tell, how would you render the XHTML version of a react app without a browser, or a JS engine as a bare minimum? (X)HTML is orthogonal to Js/react.

I could develop a self-driving car for $50 000 (instead of millions) if human drivers and pedestrians started behaving in well-defined patterns, following strict rules and stopped doing stupid, unexpected things. I'd really love that, but it's not going to happen even though there already are "standards" in the law books.


You asked about innovation, not how easy it is for new people to enter the market. Innovation in browsers has definitely gone up and as a user I feel like I'm winning from this with better websites and more powerful web apps.


How many new browsers do you see? Is that innovation in browsers? Browser competition and innovation is at its lowest ever. Chrome holds 2/3rds usage globally.

I don’t want more powerful ways to display ads, I want to get more content, better connected, without any of the fluff around it.


> Browser competition and innovation is at its lowest ever.

I don't care directly about competition, because that's just a means to an end, and the end I care about is innovation, and I don't agree that innovation is low - I think it's high. New HTML features are coming out in a fast continuous stream, unlike how it used to be.


What does it even mean to parse a React app? Crawlers can't resolve the halting problem, either.


> Crawlers can't resolve the halting problem, either.

And yet, that’s where we’re at today. Blogspot posts require JS. So many other pieces of content are similarily built.

And with phatomjs gone, we’re now running entire headless browsers just to test if websites are rendered correctly. It’s insanity.


Yes, and before that there were maybe 1.5 viable ways to render flash content.

Then you need to take that `.exe` and try and pull out it's content. Game, app, whatever.

We have a web that can be parsed, with difficulty, instead of a web without half of the content we want to parse.

The web was being replaced - this is what saved it, like it or not.


> Writing an HTML5 parser takes thousands of developers 5 years.

This is demonstrably untrue. Servo's HTML5 parser, html5ever, was largely written by a single developer within a year. (Yes, it's not a month, but it's also not 5 years.)


And wouldn’t it have been easier to write it without any of the leniency it has to expose? Wouldn’t it have been easier if all tags were either ending in /> or followed by a closing tag? Wouldn’t it have been easier if the syntax was formally defined as ABNF, and could be translated into code in a matter of days?

I believe it would have been. And I believe that making the web easier machine readable, and making it easier for people to develop tools working with the web, would be a valuable goal.


> Wouldn’t it have been easier if all tags were either ending in /> or followed by a closing tag?

Not really.

> Wouldn’t it have been easier if the syntax was formally defined as ABNF, and could be translated into code in a matter of days?

No, working from an spec written as an algorithm is easier than working from ABNF plus some inconvenient prose constraints.

(With the exception of template element support, I wrote the HTML parser used in Firefox and Validator.nu.)


> No, working from an spec written as an algorithm is easier than working from ABNF plus some inconvenient prose constraints.

That sounds very unlikely.

I’ve implemented my own parsers for countless specs – plaintext or binary, and the parsers that are written as imperative algorithms are insanely complicated to implement as functional implementations.

I end up with horrifying code, while the specs written as ABNF are much easier to translate into pattern matching code.

The specs written as algorithm only work fine for a single type of implementation IME, while the ABNF specs work equally well for all types.


Way back in the day, HTML was implemented as an application of SGML. SGML was a quite complex markup format, that had lots of features that were kind of complex to implement, and so web browsers didn't actually implement all of SGML, just that which was necessary for HTML and the HTML found in the wild. However, the HTML found in the wild was frequently invalid, so browsers had to implement some clever rules to do something reasonable with invalid markup.

Eventually, people thought it would be nice to have a simpler, easier to parse markup format, with a proper specification, and developed XML. Of course, once you have a shiny new thing, people will come and start bolting things on to it, and so they added features like namespaces, and they included some of the worse features from SGML like the DTD (it turns out, you don't generally want a document to reference its own schema and entities; an application should know what schemas it can accept, and having a URL for a schema in a centralized location means lots of poor implementers would actually download that, and entity expansion is just a nightmare).

However, XML wasn't compatible with HTML, and it certainly wasn't compatible with HTML in the wild. XML parsers are required to reject any invalid markup. The W3C developed XHTML as a way to have a subset that could work in either HTML mode or in XML mode; the idea was that people could start moving to XHTML in HTML mode, and then once everything was cleaned up they could switch to XML mode with strict parsing.

The problem was, strict parsing was never a benefit for either publishers or users. One little bug somewhere in a template substitution which allowed an unquoted < sign could cause the whole page not to parse, and users just be left with an error message. Without completely changing how the majority of web apps worked, there was no way to ensure that all of your content would be strictly compliant XML without the chance of breaking.

In the browser world, XHTML was implemented, with strict parsing in XML mode, but almost no one used it.

At this time, the browser landscape was pretty bleak. Netscape had died. Mozilla was in the process of building their new browser on a completely new engine, but early versions were fairly bloated and slow; it required a rogue group of developers within Mozilla to fork just the browser portion without a lot of the other functionality to produce what is now Firefox. Opera existed but had a tiny sliver of the market share; it didn't help that the free version came with ads, or you could pay for it without ads, while all of the other browsers were just free. IE was dominant, and eventually captured a huge percentage of market share, and then Microsoft just rested on their laurels and pretty much stopped development.

Soon enough, Apple came around and forked KHTML to build WebKit and Safari. In doing so, they did a huge amount of reverse-engineering effort to make their browser compatible with all of the parsing and layout quirks in IE and Gecko; the standards are not at all sufficient for compatibility. This gave them a browser that could actually be used on the majority of web content. With the introduction of Firefox instead of the bloated Mozilla browser, there was actually competition in the browser landscape again.

Meanwhile, the W3C decided that the failure of XHTML to be adopted because it met no ones needs was not sufficient; they went on to start working on XHTML 2.0, a backwards incompatible change with maybe a few nice features for document publishing but which didn't address any of the needs that web developers wanted, like rich interactive web apps.

So the WebKit, Mozilla, and Opera developers decided to get together and actually start providing a spec for what would actually work in the real world on real web content. They called this group the WHATWG. The W3C was not interested in working on this; the W3C group insisted on continuing work on the backwards-incompatible XHTML 2.0. Inspired by the clever algorithm the WebKit developers had come up with for parsing invalid content, they actually wrote up something a lot like that into a spec (with some improvements), and this spec eventually was implemented by all of the major browsers, providing much more robustness and consistency between browsers in how content was handled.

Other features that people wanted, like the ability to draw on a canvas, were prototyped in browsers and specced out in the WHATWG group. Some features were adapted from browsers that had already implemented them; for instance, Microsoft had implemented XMLHttpRequest, which turned out to be hugely useful for interactive web apps, so the WHATWG wrote up a spec for this and other browsers implemented it.

Google eventually wound up forking WebKit for Chrome, then merging back in, and eventually forking again into Blink, and joined this group as well.

Finally, the W3C realized that the work is was doing was irrelevant. No one was interested in implementing XHTML 2.0. What everyone wanted were the new features the WHATWG HTML specification, new features like the canvas, and so on. So they agreed to take the current WHATWG spec and edit it into HTML5.

However, it didn't take long for this to break down again. There were people in the W3C who objected to some of the changes made in the spec for the purposes of matching up with the real world. For instance, there had been some accessibility features like the "longdesc" attribute which were specified as containing a URL pointing to a page with a longer, more detailed description of the item in question for accessibility purposes (something that could, say, contain markup, when "alt" wouldn't be sufficient for a description). However, no browsers had ever actually implemented any reasonable way to get to this, and a survey of web content found that even if you did try to implement it according to the spec, very little content actually used it, and a large amount of the content that used it used it incorrectly, pointing to broken URLs or just including plain text like the "alt" attribute. So the WHATWG spec dropped this, and recommended other ways to link to descriptions which would show up even without special accessibility tools. One of the problems with specialized accessibility attributes is that people who aren't using screen readers can't easily test them out, so it's easy to bit-rot, but providing a normal link and annotating that so that screen readers can link it to the image make it possible to see in a normal browser.

Anyhow, some people in the W3C objected to this, and so rather than just providing an edited version of the WHATWG spec, they started tinkering with the spec themselves, adding things back in, removing some things that had been in the WHATWG spec. The W3C structure is very bureaucratic, and it has all kinds of members who are only peripherally involved with any actual tooling for web development, so it makes it very easy for various people with big egos but no real skin in the game to get involved in the process, while the actual browser developers who would be implementing the features can be shut out of the process.

So, eventually the cooperation between the WHATWG and W3C died down again, with the W3C publishing what it wanted, and the actual browser vendors continuing to work on their living standard document that is a much closer representation of the real world.

And this seems to be another case of yet another attempt to reconciliation and break down. The W3C decided agreed to start with the WHATWG DOM specification, then decided to make some incompatible changes without very much justification, and is now trying to publish a new version.

I think that in a lot of ways, there's an ego thing going on here. The W3C was originally started precisely to specify HTML, CSS, and things like that. While their CSS working groups have managed to stay pretty reasonable and are willing to work with those doing the actual implementation, their HTML and DOM groups keep on being hijacked by people with particular agendas, people who won't work in good faith to try to resolve differences reasonably, and people who think that because they're the W3C, they "own" the spec and so think that the WHATWG is just a rogue group, as opposed to a group of the people who actually have the most skin in the game because they have to actually implement the browsers that billions of people use without breaking a hugely complex and diverse amount of content out there.


> Way back in the day, HTML was implemented as an application of SGML.

[citation needed]

I'm unaware of basically any implementation of HTML treating it as an application of SGML; the only notable case I'm aware of is the old HTML validator.

Tim's original implementation of HTML didn't treat it as SGML.


> I'm unaware of basically any implementation of HTML treating it as an application of SGML; the only notable case I'm aware of is the old HTML validator.

Plugging my XML Prague 2017 paper on a SGML DTD for W3C HTML 5.1 here [1].

[1]: http://sgmljs.net/blog/blog1701.html


I was looking for that earlier, it's a fascinating paper, thanks! :)


Sorry, maybe I should have said "inspired by" or something.

You're right, I'm not aware of any actual implementation, outside of validators, that treated it as such.

I did actually say a little later that no web browsers actually implemented it as SGML, but the sentence you quote could cause confusion; but it's too late for me to edit now.


Why don't the browser companies just entirely ignore the W3C from now on?


In part, I think it's because there is still good work that goes on under the W3C umbrella in other areas; the CSS working group has managed to stay reasonable, learn from the mistakes of some over-engineered past standards, and continues to work with implementers.

Also, there are reasons to want some of what the W3C provides that the WHATWG doesn't. The W3C has many more member companies, and it can get them to sign off on patent rights so there's less likelihood of some one of Adobe's patents on page layout in InDesign suddenly being infringed by web browsers due to something in the spec.

And finally, I think that the W3C wants to stay relevant and so it keeps on trying to work from a basis on the WHATWG spec, and promising it will be good this time, and then it goes off and pulls this stuff again.


> The W3C has many more member companies, and it can get them to sign off on patent rights so there's less likelihood of some one of Adobe's patents on page layout in InDesign suddenly being infringed by web browsers due to something in the spec.

The patent policy only has commitments from members of the WG who developed the spec, so only have coverage from Adobe patents if Adobe is a member of the WG. (As it happens, Adobe still has one representative within the CSS WG, who happens to be one of the chairs.)


> Inspired by the clever algorithm the WebKit developers had come up with for parsing invalid content,

What was the clever algorithm?



It's both true and false. W3C does copy/paste some stuff, but it also seems to add some stuff (Especially related to accessibility and internationalisation) that the whatwg hasn't historically cared for.


If I remember correctly, that relationship was always weak. WHATWG exists because the W3C failed, after all - it was created in opposition to a situation that some players saw as hopelessly broken.

Given the substantial overlapping of players in both groups, these days, I actually find surprising that there is divergence. All major vendors are opposed, so who is actually trying to push for changes, and why? I guess I'm lacking some background here, I really don't follow "standard wars" anymore...

More

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: