Hacker News new | past | comments | ask | show | jobs | submit login
W3C and the WHATWG sign agreement to collaborate on single version of HTML, DOM (w3.org)
493 points by klez on May 28, 2019 | hide | past | favorite | 272 comments



To clarify why WHATWG exists and why W3C lost power over the HTML spec:

Around 2004 W3C abandoned organizational effort on HTML in favor of things like XHTML2, XEvents, semantic web, etc. The WHATWG was formed in reaction to that, rewriting HTML completely from its W3C HTML 4.0 version for example to make it better for web applications and to specify things in more detail.

Since the formation of WHATWG, W3C HTML specs became copy-pastes (often not even correct copy-pastes) in an effort to satisfy paying member companies (https://github.com/w3c/charter-html/issues/14#issuecomment-1...).

You can see everything WHATWG maintains here: https://platform.html5.org/ (they are the green question mark)

The W3C still maintains a good chunk web standards though, such as CSS, Wasm, web security, etc.


The biggest problem is that in addition to going all in on xhtml, the w3c actively dismissed the need to correctly specify the actual behavior of the web, rather than some idealized model.

Essential the w3c went “incorrect” html isn’t valid so we don’t need to specify it, even though all browsers support that “incorrect” content. Instead they said “everyone should just use xhtml as that makes sure the syntax is correct”.

Unfortunately they again failed to address the real world:

* xhtml is necessarily slower to display because you cannot do anything until you’ve got a full document (otherwise it will definitely fail to validate)

* because IE didn’t display xhtml it was published with the html mime type, so browsers that did support xhtml still had to parse it as html

* as a byproduct of the last one, invalid xml got added to documents which would then cause the browsers that did try to treat it as html to appear broken

* xml is also incredibly hard to actually get right - take RSS that was ostensibly XML from the outset. Even that has to be parsed as html because of the amount of broken xml.

By going all in on XML the w3c essentially went all in on a technology that people didn’t actually use or want.

But browsers did actually need an accurate specification that matched the real world, and that’s what became HTML5 through the hard work of people from apple, Firefox, (eventually) google, Microsoft, and opera - the w3c was not really involved. The end result is that in the modern DOM you are far less likely to need per-browser hacks than you once did.


That explains the HTML spec but not the DOM spec.


The DOM is specified as part of the HTML spec.

The whatwg HTML spec defines exactly how html is parsed, and exactly how every element interacts the scripting environment. Just defining the grammar is not sufficient.

Historically for instance something like

<bold>foo<italic>bar</bold>baz</italic>

Produced a different DOM tree in different browsers. WHATWG specified what should actually happen. IIRC IE managed to produce a DOM graph rather than a tree in the above example.


The DOM is a completely separate unrelated specification. https://dom.spec.whatwg.org


fair point. I was conflating whatwg spec and html5 (Actually, it's possible they were at one point the same spec - there was some work over the last 5 or so years to stop putting literally everything in a single spec document, unfortunately after a decade of web engine work everything turns into a single amorphous blob)


They were never the same. The closest thing is that the DOM spec and W3C’s XML Schema spec were joint publications for several years.


What are you talking about? I am not talking about the W3C's nonsense, I was fairly clear that I was talking about the actual real spec, which is html5, via the WHATWG [1]

What you claimed is absolutely wrong.

Please note that it defines the DOM interfaces for all of the core elements, and more or less every DOM API, including all elements, as well as most programmatic types - even things like the XHR objects. They used to all be in a single giant "HTML living standard" document, and have in the relatively recent past been split into separate spec docs (many of which reference the original "living standard").

[1] https://html.spec.whatwg.org


The WHATWG DOM spec was spun out of W3C DOM level 3. Neither document was ever associated with or aligned to any version of HTML, though the WHATWG did try something like that and backed off when all the browser vendors refused to give them the time of day. Now the WHATWG document is largely the document of record and W3Cs DOM level 4 is largely a snapshot of the WHATWG document.

None of this is either confusing or a mystery. It’s all out there in the open and the people who maintain these documents respond to email. I typically avoid talking about the DOM online because many developers aren’t aware of what it is and are less aware of its history and sometimes people get sensitive about it.


I still maintain that the "living standard" is an oxymoron. It's a collaborative browser dev document. Don't get me wrong, that's great. However for everyone else an unversioned document, any part of which can change at any moment, is not what's usually thought of as a standard.


This obsession with "living" and constant change seems to be mostly confined to the web --- instead of settling on a spec and then leaving it alone and "doing what you can with what you have", those working on this stuff seem more inclined with continuing to make browsers change.

I suspect at least part of the reason is to build a high barrier to entry and preserve the monopoly, keeping out competitors, given who the people in these groups work for.

My personal opinion on this is to stop feeding the monopoly and refuse to use anything other than basic HTML for static content sites.


> My personal opinion on this is to stop feeding the monopoly and refuse to use anything other than basic HTML for static content sites.

That's not going to work for one simple reason: despite HN's obsession with plain HTML/CSS, these new standards _are_ actually useful. They're being created with the express purpose of solving practical problems that developers, site operators, and users are experiencing in the real world. Those stakeholders aren't going to just ignore a practical solution to their problem over some esoteric concerns about "feeding the monopoly".

I share your concerns about the proliferation of web standards making it difficult for browser vendors to compete, but if the only alternative you can offer is stagnation I think it's completely unsurprising that your concerns go unheeded.


This times a million.

And for anyone who needs some bona fide examples of new standards and updates that are useful from the WHATWG specifications, well how about:

New semantic elements like header, footer, section, article, nav, aside, main, etc. These are far better for making logically structured pages than a ton of divs with class names would be.

The various new input types and attributes. Now you can have input fields which validate email addresses and phone numbers, present the right keyboard for the type of input you want, do various other validation checks or even provide things like a nifty date picker without JavaScript.

Picture and srcset too. No more having to load giant images on mobile devices or have blurry ones on devices with retina support, you can choose which image displays on which type of device.

The preload attribute and what not. Being able to load content the user will likely need in the background is helpful.

The built in video and audio elements, obviously. Again, object was a terrible solution for this, which was unsemantic, was awkward to use and struggled in certain browsers. These elements don't.

The canvas tag and things you can do with it.

And this will be a controversial one, but... most of the old elements that were made official by 'paving the cowpaths' were nice to see added too. The whole fiasco involving embed/object in the old days and how supporting multiple browsers meant invalid HTML was a bit of a farce really, especially as neither were particularly 'semantic' to begin with. Seeing them both made official and better alternatives provided just puts a lifelong issue at rest.


if only all of those things were fully flushed out in the various browsers... for instance, it disheartening to realize that it's been 15 years already and useful form elements like date(-time), phone, and email still don't have reliable and complete cross-browser behaviors, validation (i know it's hard, but still), and styling/event hooks.


True, support still isn't perfect. The calendar/date picker stuff is especially annoying here, since it really should be easier to customise and more similar cross browser than it currently is.

But most of the things mentioned do work in more browsers than not, and you can use them without worrying the site will break in almost all cases. The new semantic elements will never have issues (unless you're trying to target IE8 or below), the validation works fine in pretty much all modern browsers (as do simple styles for it) and support for the picture element and srcset are pretty good too.

The likelihood of anyone using the Blackberry browser or IE mobile is small enough to ignore.


> these new standards _are_ actually useful

I don't necessarily disagree, but it would help your case if you includes some examples in your comment (if only to build a discussion upon).


The Web Authentication standard [0] seems super useful and something we really need in the web.

[0] https://developer.mozilla.org/en-US/docs/Web/API/Web_Authent...


Yeah, this is pretty great. It's also comes from the W3C, not the WHATWG.

(it might be a bit lost in the hierarchy of the thread by now, but the original comment was about the WHATWG taking over and monopolising the normal considered and democratic standardisation process of the W3C with their HTML5 "living" spec.)


W3C was a good fit for WebAuthn because the W3C is a body for corporations and by its nature WebAuthn is built and primarily implemented by corporations.

Not a criticism, by the way, sometimes that's just the right fit.


Huh? The WHATWG steering committee is Apple, Mozilla, Google, and Microsoft. WHATWG's legitimacy (such as it is) doesn't come from being non-corporate, it comes from being entirely controlled by the top corporate browser vendors and thus reflecting the de facto state of affairs regarding how and when and why features are defined and implemented. If the WHATWG disappeared nothing would change because the top vendors are going to pick & choose W3C standards, anyhow.


Sorry, I was thinking out loud, I was contrasting the other obvious place to standardise this - the distinctly non-corporate IETF. WHATWG makes no sense for something like WebAuthn


You have to take the mindset that "giant bundles of JavaScript" aren't how we want to build the web but to begin to ship more and more of the web baked into browsers instead. And that even "static HTML" requires accessibility improvements and adaptations to new device types. For example, split screen phones, where you've a mobile device that starts at one size, but then can open up to a larger size. Tiny watch apps, new camera APIs, Responsive @media queries, better support for print media, less onscroll jank through the use of IntersectionObserver, new authentication APIs to support Windows Hello, advancements to link tags for prefetching resources and DNS, CORS security enhancements, discussion about proprietary browser implementations of features so they're not proprietary, using origin trials (Chrome) or the Develop menu (Safari) or preferences (Firefox) to encourage web developers to try new features on their sites and report back on how they work (origin trials make this trivial), non-vendors building new features for the web such as Intel and Facebook, and ... well, it's hard to summarize how many advancements there have been in web standards since HTML5 became a thing. I'd also point to a useful parallel in how ECMAScript versions their spec, it's in a constantly evolving state also as new proposals move between stages. To me, this reflects how standards bodies now can use git much more easily and effectively as a versioning mechanism and how they've learned to try and use the best of open source instead of locking up versions behind paywalls and private interest groups. Literally anyone can contribute to these specifications, see https://www.youtube.com/watch?v=y1TEPQz02iU


> You have to take the mindset that "giant bundles of JavaScript" aren't how we want to build the web but to begin to ship more and more of the web baked into browsers instead.

"giant bundles of JavaScript" weren't how we shipped the web before the HTML5 process began. The solution to "we're loading too much overengineered bloated redundant logic over HTTP" isn't "lets bundle all of that logic", it's "stop overengineering simple webpages".

> even "static HTML" requires accessibility improvements and adaptations to new device types

Accessibility standards predate and are still external to HTML5, and have actual fixed specs. They're relatively simple to implement from a parser perspective, and individual tools implementing functionality on top of that are ancillary to mainstream browsers. The only exception here is support for things like MSAA, etc. and that's also completely separate to HTML5 et al.

You go on to list a bunch of CSS stuff (again, some pre-dating HTML5, none coming from WHATWG), proprietary browser settings (not a part of any spec, living or static).

> better support for print media

Now this would be great to see. However this is something we definitely have not gotten since the advent of HTML5. What are you referring to here?

CORS pre-dates HTML5 (was later subsumed into HTML5), but the newer auth APIs you mention are nice; I'll give you that.

Things like IntersectionObserver however are one perfect example of extra feature-creep in HTML5, building hacks on top of an ecosystem overrun with overengineering. You should not need IntersectionObserver to get basic, usable performant scrolling. If you feel you do, you've overengineered your webapp: try actually addressing your performance bottlenecks.


I was speaking more generally about live standards, but if you want to focus specifically on the HTML spec and notable changes for newer devices and accessibility:

https://github.com/whatwg/html/commits

Let’s see, right off the top we’ve inputmode attr which helps support different keyboards on plain input text controls, we’ve enterkeyhint which lets you pick from a list of options for the enter key on virtual keyboards - both of these changes improve the usability and accessibility of the keyboard on newer touch screen devices. Form-associated custom elements let you create your own HTML elements that can participate in forms (the goal of custom elements is to help you build your own custom LEGO pieces instead of relying on the basic blocks the HTML spec includes — one could think of this as an eventual replacement of JSX components with HTML-based ones), autocomplete=one-time-code (thanks, iOS!), more granular control over file downloads, updating the spec to match reality (what browsers actually implemented for compatibility vs what was written in advance), srcset for retina (which reminds me, lazy loading images as a simple img tag attribute), WeakMap/WeakSet to help reduce memory leaks from stale DOM node references, requestAnimationFrame and other enhancements to the page lifecycle, CSP headers and related HTML attribute changes, and well, the list doesn’t end. ;-) For accessibility there’s the inert attribute that prevents focus to child content which is great if you need more flexibility than the dialog element provides by default but don’t want to get into setting tabindex manually or controlling all focus with JS, etc.


I really really don't mean to be dismissive here, but... you know the WHATWG largely emerged as an backlash against the W3C's efforts to create an extensible standard for allowing authors to create custom elements (what's more, they would be namespaced with actual schema, a la react prop-types, or—later—Typescript interfaces), with extensible forms (xforms). This is a process started in the late 90s! Two decades later, and there's people on the internet claiming the WHATWG are doing something novel reinventing this concept now.

WeakMap is ECMA. CSP was W3C and implemented by Firefox 4 years before HTML5 was released as a spec.

There's good stuff in the HTML5 (there would want to be in a spec. that size!), but it's remarkable how many of the cited examples are nothing to do with it.


It's important to note, nobody's being forced to use custom components or new syntax by specification, though as deprecations occur it's possible based on usage statistics that services like Google or new browser security or performance improvements could affect your site. And speaking to XML for a sec, while XHTML's backwards compatibility problems could have been avoided by specifying some kind of graceful fallback instead of Firefox's red text on yellow (yikes!), the real point is that when specs are created now, there's more of an attempt to "see what sticks" than there used to be. It's funny because E4X was also a failure, but JSX is so popular today, go figure.

Re. specific examples mentioned:

CSP is still evolving today and HTML has to keep up -- https://github.com/whatwg/html/search?o=desc&q=CSP&s=author-...

With WeakRef, things still need to stay up-to-date: https://github.com/whatwg/html/pull/4571

I would say it's remarkable how many standards on the web apparently don't have anything to do with HTML.


Tiny off-topic side-not on E4X: it was extremely popular in it's time... for extension author, GM scripters, XULers, anyone who had the freedom to use it without the concern of cross-browser compat. I think its failure was either one of standardisation bureaucracy, or of odd cross-client resistance to implementation, rather than it not being tech people wanted.

> I would say it's remarkable how many standards on the web apparently don't have anything to do with HTML.

I took the thrust of the original comment I responded to above to be giving WHATWG and general living/rolling-spec. process credit for increasing the pace of useful/practical/needed web innovations. I was pointing out that many of the cited examples of useful innovations were created either before WHATWG existed, or at least outside of WHATWG process, and that the majority (admittedly not all) of what the WHATWG has actually contributed has been superfluous cruft. That's the intent of my separating "this is part of HTML5, this isn't". Obviously many things are/can be subsumed into HTML5 as that's where they belong taxonomically, but I'm focusing on inception and what benefit living-standard process brings.


More probably, it's just that there is a lot of money to make in taking total control of the user experience. Up till now, we had to deal with this pesky OS, those package dependencies, long cycles, and complicated installation process. But with the Web, you can inject your code almost in real time on the user machine, they don't have to understand anything, they don't have to wait, and you don't have to care about what's under the hood. In fact, the user doesn't even know or accept it's running a software.

Now the browser is pretty limited, sandbox and all that, so companies making bank with it want to widen what they can do with it, again, and again, until the browser becomes an OS under their control, and the user machine just a terminal to access it.


> I suspect at least part of the reason is to build a high barrier to entry and preserve the monopoly, keeping out competitors, given who the people in these groups work for.

A lot of what happens in someone like Chrome implements something not yet standardized, then it gets standardized in a different way, or Mozilla implements something and Chrome implements it differently from the standard, perhaps to try and reach the same goal without a care for the yet to be finalized standard.

Of course Embrade, Extend, Extinguish is something I think of mostly in regards to Google, but I rather not think they're completely and utterly sinister in their goals. Some parts of Google are bad, just like some parts of Microsoft (sadly) are bad.

Go is a good example of the good parts of Google. Sure the core team works for Google, but they make all the decisions, not Google.

VS Code is another good example in regards to Microsoft, and soon enough GitHub (which I suspect is implementing a lot of ideas they couldn't afford to work on prior to the acquisition).


> Sure the core team works for Google, but they make all the decisions, not Google.

If the core team works for Google then Google is making all the decisions.

A company is the people working there making the decisions. A company can't make decisions without the people there doing so.


Maybe they are saying that the senior leadership at Google is evil, but the developers in the trenches are more benign in their intentions.


AMP, WebUSB, WebBluetooth all indicate otherwise.


What are they doing to harm WebUSB and WebBluetooth?


They're not doing anything to "harm" them, they created both of them, and both specs basically include a caveat that says "this could be mis-used, but YOLO".


So you're against the standards entirely? What is the alternative that you favor?

If you are making a web app (like my company does) that needs to print to a label printer how should we do that besides switching to a native app that all of our clients need to install and maintain? WebUSB gives us an option of doing it in a seamless and maintainable way.


> If you are making a web app (like my company does) that needs to print to a label printer how should we do that besides switching to a native app that all of our clients need to install and maintain?

So rather than just doing what works and shipping a native app, you instead rely on a feature that's currently in draft status, and works in one browser.. unless Google disable it again, like they did last year?

Sounds like a winning strategy my man.

Edit: to answer your original question > So you're against the standards entirely?

Yes. Literally the only reason either exists, is because without them Google's theory of "everything can be in a browser" falls flat on it's face, and when things are in a browser Google has a good chance of controlling the conversation (the same way Microsoft controlled the conversation in the 90s and early 2000's with Windows).

Both specs have glaring security issues that they themselves point out, but then provide no actual solution or work around for. They may as well start each one with "this could lead to compromise of your device.. life's a lottery, be lucky!"


We already have a native app but we are transitioning the majority of our core app to the web. Printing labels is one of the tricky parts that will be hard to transition. We are currently using an electron shell which makes maintenance easier but it still requires that end users install something which they might not have administrative rights to do. With WebUSB we expect that our entire app will be on the web and fully usable both on and offline without much maintenance fuss for the thousands of clients we have.


[flagged]


We are not a label printing business. I know I haven't really told you what we do but it's just a tiny, minor thing that goes along with the huge suite of software we make. Building on the web platform has huge advantages over native apps and we do know what we are doing.

WebUSB is just one piece of tech that helps wrap up some of the edge cases.


Yes, it's about the people! The point is that Google isn't a monolith, decision-making happens at all levels, and people working there have different, often conflicting opinions and are not necessarily thinking strategically like a CEO. I mean, sometimes they are, but without reading internal design docs that we don't have access to, you can't get good insight into their motivations.

This means that decision-making is less consistent or deterministic than many people pretend. Things happen because someone advocated for it and other people gave in. The outcomes of internal political decision-making are not necessarily predictable. Speaking of "Google" making a decision is usually misleading.


> A lot of what happens in someone like Chrome implements something not yet standardized, then it gets standardized in a different way

This. The Chrome team is hopelessly optimistic when making assumptions about how standards evolve. See Web Bluetooth which was heavily advertised to developers before other browser eventually chimed in and said, "yeah, we're not doing that."

Still turned on by default in modern Chrome without a flag, probably because the Chrome team is still assuming that eventually the API will be standardized to Chrome's specification.


All the browser engines and most browsers are open source, so making them harder to implement does not "feed the monopoly".


The fact that the source is open doesn't say anything about difficulty of implementation; in fact, being open-source probably increases the monopolising effect, since now people will be more inclined to think "that implementation is open-source, I'll just use/contribute to it instead of starting another independent one."


It might create a monopoly of browser engines, but not a monopoly of browser vendors, which is the harmful part (of course we might be heading to a browser vendor monopoly anyway, but that's due to Google abusing their market position, not the complexity of HTML).


That's a bit of an oversimplification.

The Electronic Frontier Foundation resigned from the W3C for a reason.


What reason?


Probably the introduction of DRM so Hollywood would start using video tags instead of flash/silverlight/other plugins. It’s a pragmatic approach, but right now it’s incredibly limited for anyone who wants to make their own browser that can load Netflix.com, for example. The only folks DRM punishes are those trying to do the right thing, everybody else can find workarounds if they need to.


>I suspect at least part of the reason is to build a high barrier to entry and preserve the monopoly

That isn't very logical. New specs like Grid are introduced because they're insanely useful; not because browser vendors just want to make the spec more complex and hard to implement.


> I suspect at least part of the reason is to build a high barrier to entry and preserve the monopoly

That is far too grand a motive. The reality is that the average web developer has a shelf life of about 5 years or is primarily focused elsewhere, such as Java or C#. That said consider the people who do this work with 100% focus. These people are typically not the same developers who are performing graduate level statistical analysis, building artificial intelligence, or resolving quantum computing. The hiring expectations for front-end developers in the corporate world tend to skew much lower, and the historical average salaries for these positions reflect that. Taking this into account I suspect the real reason for the constant churn is that many of these developers want things to be easier everything else be damned.


You are already downvoted into oblivion and rightfully so, but I just wanted to add my perspective as a backend developer (and the one who did post-graduate level mathematical stuff at that) who now manages a team of back and front end developers and plays with front end development for toy projects. The sheer amount of complexity and required knowledge for front end development is simply baffling to me. These folks have to know such disparate technologies as CSS, HTML, Javascript as a bare minimum and it never stops there. Typescript, templating engines, CSS preprocessors, huge and continuously evolving javascript ecosystem, evented concurrency (please take me back to my threads and semaphores), asynchronous state management in complex UIs with non-trivial interdependencies... Compared to this stochastic calculus is a breeze.


Totally agree. Also: I'm strongly against calling people "Front-End Developer" or anything like that at all. Where does front-end start? Where does it end? What is full-stack, does it imply you can do embedded? I've seen many listings that count PHP as one of the front-end requirements. This is just a term that should make the HR's job easier, and it does that badly.


We web devs actually have pretty solid definitions for "front-end," "back-end," and "full-stack." The back end deals with work on the server done after an incoming HTTP request comes in and involves preparing the HTTP response, including preparing the outgoing data for the templating layer if applicable (eg, when doing web page responses rather than JSON, XML, or binary responses). The front end deals with building templates such that that data becomes a standard web page, using CSS to style that page, and using JavaScript to implement client-side interactivity. A full-stack developer will be constantly switching between work from both sides of this divide rather than primarily focusing on one or the other.

If a job listing for a front-end developer is expecting applicants to know PHP beyond a "writing templates" level, it's either a poorly-written job listing or its creator has unrealistic expectations of its applicants - sadly, neither case is very uncommon in this industry.

At any rate, someone calling themselves a full-stack developer isn't implying they can do embedded, as that sort of stuff is pretty far afield of web development.


>The sheer amount of complexity and required knowledge for front end development is simply baffling to me.

And none of that complexity is essential to the task, which by itself is a testament of the quality of the “web platform”.


>The sheer amount of complexity and required knowledge for front end development is simply baffling to me

99% of it is self-inflicted though. You didn't need react/redux/sagas/uber popular framework 7.5 to do your job, but you and your coworkers thought "This is the cool new toy that facebook made!!!! Let's use it!!!" When 90% of web sites are reinventing a wheel made in 1994, there's no reason to actually do most of this junk


Bullshit. More and more front-end developers are building applications. These projects aren't in anyway equivalent to a 1994 website.


I agree that you don’t need any of that madness. It is self imposed pain by developers wanting easy over simple and work arounds over original code.

See the framework fallacy https://news.ycombinator.com/item?id=20014888


[flagged]


Having a low opinion of yourself doesn't make you right.


Why are people so hyper sensitive about this? What mortal wound does this subject expose?


This is the most arrogant and condescending comment I think I've ever read on HN, you spend so many words dancing around it, just call web developers stupid and lazy instead of wasting so much space.

It's not good enough to for me to be an arrogant techno-mage looking down on those stupid commoners from my technical tower, I must now also shit on all the other lower developers I have judged to be not as smart as me, some other class of developer.

Fuck off.


Also, I will say, some of the worst engineering I have ever seen has been on these allegedly superior engineers that work in AI and statistical analysis.

Maybe self reflection or hubris is the issue?


And yet, I can build world scale cloud systems but can’t build a web front end anymore to save my life.

I’m blown away by the sheer complexity that has evolved in frontend code. Developers there have such a broad toolset and arcane knowledge about how you really need to combine two CSS traits on surrounding containers to get the positioning you want.

Unfortunately, in our industry, most engineers think the layer above theirs is a trivial composition of parts


And Rube Goldberg machines are engineering marvels, but we shouldn't laud the engineer who builds one to transport a part from here to there when a "simple" conveyor belt does the job just fine.

How much of that complexity was necessary? How much of it was chosen to boost someone's CV?


I mean a lot of the complexity in the front end has arisen from the need of the business to provide more feature full experiences, not because some developer thought he could pad his CV. Developers aren't developing complicated front ends to pad CVs, they are developing complicated front ends because the business needs it to solve certain problems.


I agree that the frontend landscape is too complex, but much of that is self imposed. Most developers on the client side refuse to string two lines of code together without a framework or large abstraction library. Offers of reducing complexity such as removing unnecessary decoration in the code or returning to a standards driven approach is often met with hostility. Much of this complexity is the result of striving for comfort and easiness. Simplicity isn’t easy.

As an example consider this recent comment that was upvoted 11 times and some of the responses to it: https://news.ycombinator.com/item?id=20021708


While jaded, I don't think your reasoning is valid. If the question is, "Why suspect an organization to coordinate with another to establish a dominant position in manufacturing or standardization in order to create a barrier to entry/monopoly for potential competitors?" I'm pretty sure the answer has nothing to do with web developers, even if that's the subject at hand in the context of this question.


My reasoning is primarily a consideration of where pressure and redirection of web standards originates. For example consider how classes landed in ES6. The maintainers of the standard didn’t want classes, but hands down they were the single most requested feature of that specification.


"Standard" = consensus definition of desired behaviour. "Living" = open to change.

I don't see what is so oxymoronic about this? Just because a consensus has been reached doesn't mean that consensus must be immutable.

Just because the process of standardisation in some technical domains has tended to use versioned documents doesn't mean that such an approach is fundamental to standardisation.


The point is that HTML is almost 30 years old, and based on SGML which is much older (even though ISO 8879 is officially "only" from 1986), where SGML is just a formalization of typesetting practices established in the 1960's and 1970's. Given the depth of usage of HTML in everyday life (laws, contracts in ecommerce, medical records, personal communication, education, etc., etc.), I think HTML deserves better than being tinkered with all the time for no good reasons other than job security and/or achieving Webkit dominance, or other nebulous reasons at this point. At the very least, a standard should serve the purpose of defining a wellformedness criterion for a deliverable that you outsource to some HTML author, and that doesn't change all the time.

These are the first sentences of Yuri Rubinski's foreword to The SGML Handbook outlining the purpose of markup languages (from 1990):

> The next five years will see a revolution in computing. Users will no longer have to work at every computer task as if they had no need to share data with all their other computer tasks, they will not need to act as if the computer is simply a replacement for paper, nor will they have to appease computers or software programs that seem to be at war with one another.

Held up to that goal, HTML has utterly failed. And there's just no justification for attempting to complicate HTML, leading only further away from that goal, and by intent never coming to an end for a task that isn't terribly complicated, and increasing the scope without any sense of mental discipline, and without considering the scarce resources available, thereby creating a browser monoculture.


The markup parts of HTML (e.g. the parsing) are pretty frozen and have been for a while. This is the part that can sort of argue it used to be based on SGML (though now it's not, for various reasons).

The "HTML spec" includes a lot of APIs and processing model details that need tweaking as new constraints come up. A good example is that a lot of APIs that involve cross-window access need changes to their specifications in a process-per-origin world; the old spec text assumes everything can always be done synchronously, but that's not actually possible in that world as far as I can tell.

This, and fixing security bugs in the spec that get found from time to time are far from "no good reasons"!

None of the recent HTML spec changes would have affected "wellformedness" of an existing document in the markup sense. They're mostly about fixing spec bugs (in that the spec doesn't match what web sites expect!) in complicated areas like security, navigation, etc.


> it used to be based on SGML (though now it's not, for various reasons)

HTML doesn't cease to be based on SGML by mere declaration, or even by brittling it to the point it can't be parsed by any known formal standard. That's more a political stance, as in American isn't English or some such.

WHATWG's formulation of HTML has deliberately distanced itself from SGML out of ignorance and a desire to not being accountable or formally verifiable (aka move fast and break things) against established, rich theoretical foundations of markup languages. And it shows: already in a paper I published two years ago [1], I show flaws in HTML as described by WHATWG, some of which have since been fixed ([3]). Not only is the concept of "transparent markup" flawed and underspecified, it also has since be used in the definition of the content model for the dl element, and as an unintended consequence also the div element (cf [2]), flaws that could be easily avoided by just using SGML for the grammar WHATWG is attempting to describe when SGML has been around for ages.

It's also not entirely true that WHATWG HTML can parse all legacy docs. For example, the keygen element has been removed, and while not a terrible loss as such ;), since keygen is a void element (an element with declared content EMPTY in SGML parlance), its presence in a legacy document (eg without an end-element tag) will make HTML5 parsers fail hard ([4]). It's also completely unclear which version of HTML is being validated by eg. W3C's nu or another validator. Heck, even the spec text for WHATWG HTML itself reads

> This file is NOT HTML, it's a proprietary language that is then post-processed into HTML

when a large portion of the spec text portrays HTML in the role of an authoring language.

So tell me why, as an author, I should follow WHATWG's vision for HTML? As you say yourself, WHATWG hasn't advanced HTML the markup language at all, and has rather prevented the evolution of declarative UI features, to the effect of making JavaScript essential for all but the most basic documents.

[1]: http://archive.xmlprague.cz/2017/files/xmlprague-2017-procee...

[2]: https://github.com/w3c/html/issues/1116

[3]: https://github.com/whatwg/html/commit/6e305c457e42276bf275b8...

[4]: Edit: this affects differences introduced in HTML 5.2 vs HTML 5.1, not some distant archaic HTML 4 version


> WHATWG's formulation of HTML has deliberately distanced itself from SGML out of ignorance

I don't think that's a fair characterization at all. Ian Hickson was pretty intimately familiar with the SGML formulation of HTML.

What led to that being dropped was that no browsers implemented it in practice and that trying to implement it as written in the HTML 4 standard actively broke web pages (which had been written against browsers, not against the spec). Firefox tried going down that route for a while; I should know, because I implemented some parts of that and reviewed the code for the implementations of other parts. We eventually had to remove them, because of compat problems with websites. Comment parsing was a perennial favorite there.

> For example, the keygen element has been removed

That is a good point, yes. Arguably it should have been left in as a void element with no behavior to avoid parsing issues of the sort you describe...

> So tell me why, as an author, I should follow WHATWG's vision for HTML?

Honestly, because that's the thing browsers will implement. That's the only reason.


The narrative that "HTML5 looked at what browsers do, rather than following ivory tower SGML" is simply a myth and not backed up by facts. Ian Hickson introduced sectioning elements, the whole flawed outlining algorithm idea, and the aside element (presumably to make it easier to tell ads from content for Google's crawler), with lots of controversities at the time. The HTML spec is also chock full of inconsistencies of that time and mindset (for example, the allowed characters in ID attributes or general lexical rules for elements not matching CSS selectors' idea of an ID or element name) not actually supported by any browser.

Please don't tell me SGML comments were the problem - SGML commenting syntax is straightforward, eg. anything in double-hyphens within a markup declaration is treated as a comment, and there can be multiple comments in a markup declaration (unlike XML). The only problem I see is that there is an interaction with an ancient form of JavaScript comment taking the form of double-hyphens, presumably to make JavaScript commenting uniform with HTML syntax. Now the rules for terminating the content of a script element are dangerously bogus in ancient HTML, but HTML5 has done nothing to fix the situation.

In any case, WHATWG has driven almost all web browsers out of existence already; appealing to what "browser vendors" (Google) actually do is not the solution, but part of the problem, obviously.


I'm not saying HTML5 didn't have its share of architecture astronauting, attempts to add features that didn't pan out, etc. Trust me, I know it did.

I'm not saying it didn't spec various things that didn't match browsers (some just because, some because it tried to reverse-engineer browsers and failed). Well do I know it did; we're still sorting some of those things out. The only defense there is that unlike previous web specs it actually tried to specify this stuff (like navigation!) instead of just saying "yeah, do whatever".

The element name thing was actually needed for compat with how browsers parsed HTML in practice. The ID thing largely affects well-formedness, but was also informed by common practices, and the mismatch with CSS in large part was probably somewhat unavoidable due to differences in reserved chars. For example, there's really no reason, within the context of HTML, to not allow "foo.bar" as an ID, and people were definitely using IDs of this form all over, whereas in CSS you'd need to jump through the "#foo\.bar" hoop to use it in a selector.

SGML comments were definitely _a_ big problem (I'm not sure why you decided to describe them as "the" problem). This sort of markup:

  <!-- This is comment -- This is just in a markup decl -->
    This is still comment, because the '>' is inside the comment the third double-dashes started, yes?
was fairly common: people like to use "--" as a replacement for em-dash and it often ends up in the middle of comments. Browsers that attempted to implement SGML comment parsing would end up with the "This is still comment text commented out; other browsers did not.

> presumably to make JavaScript commenting uniform with HTML syntax

No, that was there to enable hiding of <script> tag contents from browsers that didn't know about <script> at all. So you would write:

  <script>
  <!--
    // Your script here
   -->
  </script>
and in a non-script-aware browser you wouldn't have a blob of script text showing up... It's actually a pretty sane approach for the problem of initially introducing the <script> element in a world where it didn't use to exist.

> In any case, WHATWG has driven almost all web browsers out of existence already

I'm not sure the problem here is "WHATWG" per se. I'm pretty sure that if WHATWG had never existed the results would have been pretty similar...


The HTML parsing algorithm looked at what browsers do. WHATWG HTML also included other innovations, some of which didn’t entirely work out. Nowadays, new additions are not added so loosely and there is a better defined working mode and governance policy.


"Government policy" yeah right. Chrome implements stuff, and Moz has to follow suit; then it gets prescribed in WHATWG's spec. OTOH, stuff introduced by Moz that Chrome doesn't implement gets removed from the spec. Such as much needed new elements for basic declarative UIs (menu, menuitem) to fight over-reliance on JavaScript and CSS hacks, introduced by FF but boycotted by Chrome. As was part of the WHATWG snapshot on which W3C HTML 5.1 was based, and removed in W3C HTML 5.2. There's no evidence Hixie analyzed "what browsers were doing". There is, however, evidence that Hixie just made up new elements as he saw fit [1].

[1]: https://www.webdesignerdepot.com/2013/the-harsh-truth-about-...


> That is a good point, yes. Arguably it should have been left in as a void element with no behavior to avoid parsing issues of the sort you describe...

That's what we did, actually :). Parsing behavior was unchanged. Ctrl+F "keygen" in https://html.spec.whatwg.org/multipage/parsing.html.


You do realize that spontaneously editing the spec to drop, then re-introduce an element buried in a git commit is exactly the reason why WHATWG is an unreliable source for a definite HTML reference, don't you? Especially when it goes unnoticed even by experts such as GP. At the same time, you want to claim authoritative control over HTML, yet show no sign of respecting other established standards and standard bodies such as ISO/IEC, IETF (eg avk's URL "standard"), and W3C?


I'm not sure what you mean by drop and then re-introduce, buried in a Git commit. We made a single commit to remove keygen, after displaying a public deprecation-will-be-removed notice in the spec for some years. Dropping keygen was part of a highly-public pull request, which gathered discussion from all browser vendors, as well as interested community members. The pull request was only merged once all four browser vendors supported removal (2 had already removed by that point).


I should have checked carefully, sorry!


I agree 100%, its sad to see XHTML go from declarative and well thought out components to a web ui markup + a pile of js.


People often forget two things:

- Virtually none of these changes are breaking, by design. If you prefer the web of 1998, then as a web developer by and large you can pretend that's still the world we live in.

- HTML itself has actually been a very small fraction of the "HTML5" (a silly marketing term) rapid iteration over the past decade. CSS has grown dramatically in power, and JS is hardly even the same language (which is good, because it was barely a real programming language at the beginning). But HTML itself is not dramatically different; most of what it's gotten are a handful of native replacements for things people had been implementing in JavaScript on a regular basis.


"not breaking" is an aspirational goal, not remotely a fact. When browser vendors decide to ship a breaking change (and make no mistake, they do this multiple times a year - probably dozens to hundreds) they have to run live experiments to gather data on how many commonly-visited sites use a feature they're going to change or a quirk they're going to remove. Typically if the value is like 1% or above the change is killed unless it can be made compatible, but they will go ahead even if it's over 0%. 1% may sound small to you, but web browsers are used by like... a billion people? More? And 1% of a billion is a pretty sizable number of people.

Pretend if you want, your stuff will probably break eventually. If it's simple enough it won't break and then you can go on with your life - that's certainly the goal of browser developers.

At the moment you start using the DOM or other JS APIs exposed by browsers the odds of your stuff breaking in the next decade go up, especially if you're using things that aren't like 5-year-old parts of the spec. The shelf life of most JS-heavy webapps is like 2 years in my experience.


I'm really not sure what kind of changes you're talking about. They certainly don't do what you describe at the API level. Maybe you're just talking about bugs? I have encountered a few browser regressions over the years but they're always for really exotic use-cases of relatively new APIs, and they always get fixed in the next release. I've been working on a suite of highly complex JS-heavy tools for the past 2.5 years, and the problems that have stemmed specifically from browser bugs in that time are a vanishingly small number. Probably less than five.


1% is _way_ higher than the acceptable breakage thresholds I've seen browsers use. Usually it's more like 0.003% or less.

This can still be a significant number, of course...


> When browser vendors decide to ship a breaking change (and make no mistake, they do this multiple times a year - probably dozens to hundreds)

can you share an example of a breaking change to HTML that a browser has intentionally shipped in the last year?


This is not remotely true anymore. I can remember times when browser updates broke site functionality, but that was a decade ago (longer?) when we were still coding for specific bugs in Internet Explorer 6.


> I think HTML deserves better than being tinkered with all the time for no good reasons other than job security and/or achieving Webkit dominance, or other nebulous reasons at this point.

That's good, because these aren't the reasons the HTML standard is changed, and to claim they are is absurd.

HTML may have had its origins in SGML, but it has long, long since grown past beyond those origins to become the web platform. Like any other non-dead software platform, it is undergoes a process of refinement and enhancement, not for reasons of "job security" or "Webkit dominance", but to provide additional functionality to allow more and better software to be build with of it, and to remain competitive with other software platforms.

You, and a large constituency of Hacker News commenters along with you, may utterly loathe the web platform. You may wish the web had evolved along entirely different lines, remaining a simple system for server hypertext documents. But the fact is, it didn't, and to act as if that the global development ecosystem that relies on the web platform doesn't exist is ridiculous.


There is a gulf of nuance between these two extremes you describe.

The web has "evolved" from a medium for simple self-publishing into a medium of mass surveillance and manipulation, big media, privacy-invading ads, uncalled-for browser monopoly, information oligopoly, and arbitrary crap code being sent to you in ridiculous quantities, not only draining your batteries and showing no respect for planet earth wrt energy efficiency, but also actively putting you in danger through fishing, xss and whatnot, and making your future ability to even read your legal, personal, study, business, or banking documents dependent on a needlessly over-complicated technology stack that no-one has the ability to influence in meaningful ways except Google, an ad company.

What it has not evolved into is a medium for long-term preservation of digital information, information autonomy, for simple ecommerce transactions and payments for everyone (as a merchant), for letting content producers thrive with quality content, or one that fosters free speech and diversity.

It has "evolved" by being captured to serve the interests of very few players, and fails the criteria of not having to appease computers or software programs that seem to be at war with one another.


> "Standard" = consensus definition of desired behaviour. "Living" = open to change. > I don't see what is so oxymoronic about this?

A standard is set in stone as a fixed target for implementers.

No fixed target? No standard.

And no, an unversioned document that changes arbitrarily is not a standard.

To put it differently, a standard is a goal. If the goalpost is arbitrarily kept on the move then there is no goal.


> A standard is set in stone as a fixed target for implementers.

Is that a descriptive or normative statement?


The thing that makes the web different and why the idea of a living standard makes sense for the web is because (by and large) web changes can't break backwards compatibility. Browser vendors are unwilling to make changes that will break existing websites because it could result in losing market share as users switch to browsers that still work for those websites. So anything currently in the standard today is expected to continue working tomorrow. That means even though the standard may change regularly, you can depend on anything that it currently says to keep working.

The idea of versioned standards is only really important if you have to worry about things changing out from under you that could break your existing work.


I don't mind a changing standard as long as two people can agree on exactly which version of the standard they're talking about and actually use that version.

I should be able to build my web app against an LTS version of the standard and expect it to behave identically in all future versions of all major browsers, until the EOL date of said version, no matter what changes they make in future versions. I want this for the same reason I want either CentOS or Ubuntu LTS, not Arch, on my production boxes.

Unfortunately, the whole HTML 5.1/5.2/5.3 business never quite caught on, and browsers don't support anything but the latest snapshot of whatever it is that they call a standard.


> Unfortunately, the whole HTML 5.1/5.2/5.3 business never quite caught on...

Because that was an artifact created by W3C by taking snapshots of the WHATWG HTML5 standard and making arbitrary changes to portions they didn't agree with. Since the WHATWG standard was already considered normative by most browser makers, the influence of the W3C's "standards" work here was essentially nil.


>I should be able to build my web app against an LTS version of the standard and expect it to behave identically in all future versions of all major browsers, until the EOL date of said version, no matter what changes they make in future versions.

That's already true of HTML5. Breaking changes in standardized features almost never happen. HTML5 actually goes further than what you suggest: they aim to never have breaking changes.


Attempt of versioning the HTML was failed miserably in the past. I prefer living document than outdated standard.


Hopefully those aren't in contrast to each other. You can have a living document that gets consistent updates AND still has meaningful versions, much like a lot of well developed software.

The goal is for the time between HTML versions to not be a decade, but instead for consistent, incremental improvements without browsers trailing behind for years. At the same time, these should (hopefully) not be breaking changes.


Sure. IMHO one solution might be to have a HTML standard specification that is stable, which can then be built on. Extensions can be proposed so long as they don't break the standard. This would allow new elements and attributes to be introduced before they are themselves fully stable, as they are now. Breaking changes would require bumping the spec to a new major version and should happen only rarely.


We have HTML standards that ar stable. 4.01 e.g. The problem is that no browser ever implemented all of it, and even implemented parts were done differently.


The difference is this time lessons have been learned. Browser vendors are now actively working together. The HTML5 project has done a great job in collecting and codifying real world behaviour.

No browser will ever implement the current spec because it's shifting sand. A smaller but stabilised spec that can be extended is much better for everybody who isn't a major browser vendor.


This new model announced in the OP will do that. Snapshots every 6 months that are taken through the W3C REC track.


It isn't great at all, because even that living standard is never being implemented up to spec, often by the same people who wrote it, and the current browser support status usually ends up being "what's written in bugzilla"


There are very comprehensive browser support matrices at https://kangax.github.io/compat-table/es6/ (JavaScript features) and https://caniuse.com/ (DOM features).

It's only for very recently implemented features that you should need to look in bugzilla.


A better resources is https://wpt.fyi/, which contains comprehensive test results in all browser engines. These days, at least for WHATWG specs, tests are required before any changes land in the specification, so wpt.fyi will necessarily contain all browser support for all features landed in the specs.


"should", sure, but vendors (I mostly run into this w/Chrome but other vendors do it too) happily just ignore the spec because it's inconvenient and sometimes explicitly have no plans to ever align with it. I recently ran into a case where Chrome was intentionally moving away from the spec without making any effort to update the spec, because being correct was... annoying. Not impossible or bad for users, just annoying.

At the time the behavior worked right in Firefox and Edge but now that Chrome is going to ignore the spec I suspect the other vendors might too. (For reference, it was an issue related to the lifetime of javascript objects for frames that have been unloaded or navigated)

This sort of thing will never appear in a compatibility matrix. You find out when it breaks your code.


JavaScript (EcmaScript) follows a more traditional standard process, though at a very high pace.

caniuse.com is amazing but let‘s not kid ourselves what it covers. It‘s about feature availability. Some feature might have 1000s of normative behaviors specifics but their being condensed down to there/not there.


HTML has always been poorly defined and has mostly been a description of de facto standards since forever. It takes a lot of leverage to make the browser vendors follow any kind of specification.


Extremely condensed version:

W3C is giving up publishing future HTML and DOM standards. They will focus on writing 'recommendations' for the WHATWG's living standards.

Versioned vs living HTML discussions aside, I personally admit some mild sadness that the original group responsible for maintaining Sir TBL's work on HTML has been forced to give up.


It is a sad day, but for a long time now W3C has been a figurehead and nothing more. It was like Japan's modern-day emperor. It carried no weight to actually contradict the WHATWG's version of things. So it's simply acknowledging that fact.


These are the same people who created XHTML, an ivory tower idea nobody was waiting for... who didn't support the most popular layout method at the time, tables, in their new styling language CSS.

W3C became irrelevant because they kept thinking they could just tell the entire web what to do, that they'd make enormous technical investments to satisfy the W3C's latest fashions.

I interacted with them once, over their seamless iframe proposal. I had to point out that the two biggest uses of iframes at the time, i.e. Twitter embeds and Facebook apps, could not make use of it, despite being a perfect fit. They just hadn't considered that.


> These are the same people who created XHTML, an ivory tower idea nobody was waiting for... who didn't support the most popular layout method at the time, tables, in their new styling language CSS.

You certainly could lay out pages using tables in XHTML, but the point of the standard in the first place was to enable semantically sound documents for the sake of interoperability and to facilitate separation of concerns. So maybe if you authored XHTML documents that was already an important consideration to you.

On a side note, now that the web development industry seems to have collectively given up any ambitions for semantically sound documents it's strange that the criticism against table-based layouts prevails.


I honestly don't think either was the purpose. XHTML existed to make parsing easier.

The only things you couldn't do in XHTML 1.1 Transitional that you could do in HTML were having unclosed tags and using uppercase in tag names. That's it.

Now yeah, the strict version tried to force you into semantically sound documents... but that was completely orthogonal to XHTML vs. HTML. Both XHTML and HTML were available in both transitional and strict forms.

The real problem is that HTML is a massive pain in the ass to parse. You can close tags out of order, some tags can't even be closed (<hr>), some tags can optionally be closed (<p>), nothing is case-sensitive, and because of how flexible SGML is, you need a DTD to properly parse any SGML implementation (fun fact: SGML allows for a bunch of different markup styles, but the HTML DTD explicitly disallows most of them). XHTML sought to eliminate all that by mapping HTML onto XML, which was much easier to parse.


> I honestly don't think either was the purpose. XHTML existed to make parsing easier.

That being the only goal would have resulted in a much simpler standard. Extension modules for example is completely orthogonal to that goal. I think that the standards themselves do a good job of describing what motivates them.

> The only things you couldn't do in XHTML 1.1 that you could do in HTML were having unclosed tags and using uppercase in tag names. That's it.

That's true, but ignores the opposite question—what you could do in XHTML that you could not in HTML.


Could you have unquoted attributes in transitional?


Ah, I just checked, and no you can't. So I guess that's a third difference. Thanks for the correction!

Another correction I noticed when looking that up: I meant to say XHTML 1.0 Transitional, not 1.1. 1.1 was Strict-only.


And thinking about it weren't value-less attributes also illegal? I remember doing a lot of

    <button disabled="disabled">...</button>
in XHTML, instead of

    <button disabled>...</button>
in HTML.

Even SGML has problems with expressing an attribute like that from what I remember reading of someone's attempt to specify HTML 5 using an SGML DTD.


> You certainly could lay out pages using tables in XHTML, but the point of the standard in the first place was to enable semantically sound documents for the sake of interoperability and to facilitate separation of concerns

That was a major focus of HTML5, as well, which was more successful at enabling that (arguably, it was more of the focus of HTML5, Which didn't try to impose major syntactic change as well, even though it also supported an XML form.)

> On a side note, now that the web development industry seems to have collectively given up any ambitions for semantically sound documents

Semantic soundness in the strict sense has always been something of a niche concern, not something hat was a general web dev industy goal and then abandoned.

> it's strange that the criticism against table-based layouts prevails.

Table-based layout remains an accessibility problem, which is a more practical issue than abstract concern for semantic soundness (though related).


> That was a major focus of HTML5, as well, which was more successful at enabling that (arguably, it was more of the focus of HTML5, Which didn't try to impose major syntactic change as well, even though it also supported an XML form.)

I agree. Don't confuse my interpretation of the point of XHTML with some sort of endorsement. For the record, I think XHTML is an unnecessarily complex standard that results in more work for little added value. I do recognize, though, that the point of XHTML is unrelated to its success to that end.

Comparing HTML5 to XHTML is also a bit of a no-brainer. There are 14 years between their introductions, and HTML5 obviously had a history of lessons learned from XHTML to take into account.

> Semantic soundness in the strict sense has always been something of a niche concern, not something hat was a general web dev industy goal and then abandoned.

What is the strict sense of semantic soundness? I'd agree that few people care about semantic soundness even in a general sense. I frequently see documents where the presentation seems prioritized over content. But the ambition did exist at some point, and was more of a mainstream concern in, say, 2005 than it seems to be now, and the idea of a more semantic web was peddled by big names like Tim Berners-Lee.

> Table-based layout remains an accessibility problem, which is a more practical issue than abstract concern for semantic soundness (though related).

I'd say that they are strictly related. You improve accessibility primarily by improving the semantic representation of documents. I don't see semantic soundness as an "abstract concern" for this reason. The most useful screen readers unfortunately seem to rely on some level of guesswork to make verbal sense of documents that rely on spatiality to communicate element relationships.


After writing XHTML for several years (giving it a full-faith attempt) I never understood the point. You keep repeating "semantically sound" but I can't fathom what that means in your context. I never saw any indication that XHTML brought significant practical semantic information or standardization over what HTML5 can do. It did add significant gratuitous verbosity that made XHTML documents much harder to read and edit.


> You keep repeating "semantically sound" but I can't fathom what that means in your context.

Documents built on the principle that their structure should relate to the meaning of their content rather than its presentation are what I consider "semantically sound". By negative example, an HTML document filled with div pyramids just to apply layout information, and obtuse class and id names are not what I consider semantically sound. However clumsy it was in practice, XHTML and CSS sought to address this by making the markup extensible and moving style information out of the document.

> I never saw any indication that XHTML brought significant practical semantic information or standardization over what HTML5 can do.

Obviously HTML5 had the advantage of hindsight and the perspective to learn from some of the mistakes of XHTML while adopting some of its more useful qualities. I should also note that I'm discussing the point of XHTML, not trying to tell anyone that it was particularly successful to that end. Don't confuse the two.


One of the interesting things you could do with XHTML was ditch the HTML entirely and write an XML document expressing the semantic content, and then couple that with an XSLT stylesheet to convert it into XHTML for display. This way the exact same resource could be read by machines to get the semantic content, and then read by browsers and transformed into the display content.

I'm not sure if anyone ever actually used this seriously though. Definitely very "ivory tower" design. But in the abstract it's a cool idea.


I wrote an XSLT stylesheet that could turn an XHTML document into a display of its own source, complete with indenting, code folding and syntax highlighting. I was quite pleased with that :)

When I worked at the BMJ all of our content was stored as XML and rendered to the browser using XSLT, although this was done on the back-end and the resulting XHTML embedded in our various Spring applications. This sort of thing is probably the most common use of XHTML today...


XHTML documents have exactly the same semantics as HTML. Neither is more semantically sound than the other.

XHTML just defined a slightly different syntax which was XML compatible. This was useful if you were using XML tools, but it didn't affect semantics at all.


XHTML 2.0 was very different from HTML semantically. Some of its tags (ARTICLE, SECTION, MENU, etc) wound up smashed into HTML5 later and became semantically meaningless again, but XHTML 2.0 tried to be more semantically sound than HTML and is a large part of how W3C lost the war to HTML5, because semantics are hard and most of the browsers didn't care about semantics.


Why to you consider the XHTML 2.0 elements more "semantically sound" than the equivalent HTML5 elements?


A reason XHTML 2.0 got so bogged down in committee and never actually finished a standard was that the attempt was made to define semantically what an ARTICLE would be, how SECTIONS work, what things a browser or semantic web crawler could infer/summarize build from such things. For instance, one group of the committee argued you couldn't have SECTIONs outside of an ARTICLE; that an ARTICLE consisted of zero or more SECTIONs (and maybe SECTIONs could be nested inside of each other). Folks argued for SECTIONs to have concepts of names that could be listed in auto-generated Tables of Contents.

HTML5 mostly just defines ARTICLE and SECTION as optional block-level content elements, with no other real importance. This leaves them as merely fancier synonyms for DIV. Semantics is almost entirely left to ARIA, and while HTML5 has come back around to ARTICLE tag should imply, for instance, ARIA role="article", there's still a bunch of interesting reasons that people concerned with ARIA semantics continue to write "redundant" things like <ARTICLE aria-role="article"...


> These are the same people who created XHTML, an ivory tower idea nobody was waiting for

I never understood what people had against XHTML. The more common theme seems to be "XML... eww" and nothing else. Can anybody help me understand?


This isn't against XHTML 1.1, which was HTML 4.01 shoved into a container of "if it isn't valid XML, display an error page instead;" rather, a lot of the hate is against XHTML 2.0, which decided to rip out HTML features such as forms, frames, most of the old elements such as <b> or <i>, and generally screw compatibility completely.

For a reaction from a browser developer, see https://dbaron.org/log/20090707-ex-html


Ooph yes. I had forgotten how they were rewriting HTML essentially from scratch and this time basing it on XML. In fairness, it wasn't a completely dumb idea as HTML had picked up a lot of warts from its long history and anyway almost no browser implemented the HTML spec as written. The browser wars especially spread a lot of debris across implementations. The temptation to rip it up and start again is very strong with software engineers even in the best of circumstances (heck, just ask Google about this).

But getting everyone else to start over as well was always going to be an uphill struggle even if it was truly the best thing ever.


XHTML 2.0 also did screwy stuff with MIME types I believe. It required specific MIME types that a lot of browsers at the time didn't support, and because of that most browsers would straight-up refuse to render XHTML 2.0.


That… wasn't even the screwy stuff.

Yes, it was a requirement that XHTML 2.0 documents be served as application/xhtml+xml (which IE didn't support at the time), but that was really a non-issue (with so much renamed and moved around v. HTML 4.01 and its XML reformulations (XHTML 1.0, XHTML 1.1) because there was no graceful fallback story.

The bigger problem was that it required content to be served as application/xhtml+xml, and gave the same element and attribute names, in the same namespace, different semantics to what XHTML 1.0/XHTML5 gave them, with different implementation requirements (and it being impossible to satisfy both).

AFAIK this essentially got resolved by XHTML 2.0 being abandoned (in 2009, years after HTML 5 had moved to being jointly developed with the W3C).


IIRC, the main objection was that error handling was an all or nothing affair. Whereas most HTML on the web is (or was) broken to a greater or lesser degree. There was also the objection that stricter parsing made it harder for hobbyists.

In these days of Typescript, where web devs seem to like having stricter rules, it could play better. But that ship has already sailed.


I never understood that either except, as you said, for the hobbyist's sake. A lot of XHTML was generated from XML and, if one is using XML, chances are programming is involved in the transformation to XHTML. But programming has strict rules itself and will also fail if not adhered to so I never understood the complaint of "draconian" error checking in XML/XHTML.


People underestimated the extent to which markup may be mixed in from sources you didn't control, and the power that this ability gives to your users. Say you built a shiny new forum engine with from-scratch XHTML markup. You try to sell it. Most of your customers say that they've already been running forum software since 1995 but the existing posts allow inline HTML (which was less unsafe in 1995, because no Javascript), which is all badly misnested. As soon as they dump the previous data, their site stops working.

Or you import a small Javascript library from 1999 that generates its own innerHTML for a few elements, but does it with HTML. Oops.

Or you built a new CMS with shiny XHTML markup, but before you had the CMS your org just hand-wrote pages which you now need to parse and import into the CMS.

These were all very real considerations in the 2002-2004 period; I've dealt with all of them. Backwards compatibility is often the most important feature you can offer, because it directly affects the value the end user gets out of the product. Sites that were concerned with "doing it right" in that time period largely failed, while sites that "did it fast" in a hacky, XSS-prone way are now worth hundreds of billions of dollars.


As the scale of a program increases, the probability that someone will do something wrong increases polynomially. Consequently, as web sites got larger and larger, the probability that some component would break the XML goes up quickly. This is a difficult pattern to deal with, pushing up the skill floor required, and as HTML5 shows, it isn't even all that necessary.

There was also similar exposure from the data side; as the amount of data you handled increased, the odds that some data would tickle some code path that you didn't even know could blow up went up. You write your news front page in XHTML, and everything seems fine for six months, until someone finally includes an ampersand in their headline, and your entire front page crashes for two hours (not in any way monitoring will pick up, either, so you're getting customer reports), and it takes you hours to discover that someone was passing through the headline (and just the headline!) unencoded.

The problem isn't XHTML's rigidity per se; personally I'm inclined more in that direction myself. The problem is when you have a ton of sloppy systems working together (MySQL, old HTML generation code, plugins from third parties your don't control, open source written by people whose belief in their understanding of HTML exceeds their actual understanding, decade-old internal databases with poor validations and unknown provenance, and so on and so on indefinitely), and then trying to suddenly, at the last minute, couple that big sloppy pile of technology to a strict technology at the last second. That sudden mismatch there at the end was a huge problem.

One of the reasons I tend to prefer being as strict as possible is that in general, starting with existing strict-tech and adding sloppy-tech to it is no big deal; the sloppy tech doesn't complain that it only gets a subset of possible data it will accept. And if you need to couple to strict-tech, you still can. But if you start with a sloppy-tech system and for some reason need to couple it to a strict-tech system... prepare for some long nights and blown deadlines. So, professionally, the correct default is to choose strict-tech whenever possible. But XHTML forced that at almost the worst possible place.


> But programming has strict rules itself and will also fail if not adhered to so I never understood the complaint of "draconian" error checking in XML/XHTML.

In most cases, if your code has some syntax error, the author of the code sees the syntax error; in the web case, if your code has some syntax error, the user sees the syntax error. That's the dramatic difference.

The other reality is unlike program code, there's vastly more often user content intermixed in (X)HTML and it's rare for people to implement sanitisation correctly (do you handle U+0000? U+FFFF? U+1FFFF? most people outputting XHTML historically haven't, even if they get the security critical stuff (like "<") right).


Not true. XHTML and HTML have validators to check your markup for proper syntax and usage. You know that. You made one of them!

Good writers of markup will always check with those before they ship it. In the case of user supplied markup, that's still an issue today with HTML.


I've been involved with multiple HTML and XML parsers, but never validators. :)

The reality ten years ago, when a number of prominent XML advocates were using XHTML (and actually using it as such, serving it as such), almost all of their sites had user input means where the input was sanitized well enough for HTML to be secure (and not have any markup injection), but not for XML well-formedness (they got all the markup injection risks in XML, but not all the other WF requirements). If the very people who claim XML is easy can't get it right, can everyone else?


Yes. It was your outliner I was thinking of, not a validator but weren't you involved with the original http://validator.w3.org/nu/ at least in part?

I, too, served my web pages as "real" xhtml 10 years ago and loved it :)


Promise I've never worked on a validator!


Well, now I have to rewrite my book :)


The error handling has already been mentioned. There also was that issue of what mime-type to send in your HTTP headers to get the page displayed correctly in various clients. While you could develop in XHTML, you needed to serve it as technically broken HTML to be compatible with older browsers that still had a significant market share.


Sounds like the complains are targetted at faulty implementations, not the document standard.


Yes, that the standard wasn't designed with the people actually implementing browsers and as a result was unusable for content in the real world whatever advantages it had in the theoretical world where it was widely and correctly implemented was a major problem.

Hence the “ivory tower” comment upthread.


The expression "ivory tower" has an entirely different meaning. Just because the couple of dominant players chose to ignore a standard, particularly in a time and age where they were outright hostile to interoperability initiatives, it doesn't mean the standard was not reasonable.


What good is a standard you can't use in practice? There also were other issues: XHTML was chasing the dream of the semantic, well-formed, machine-readable web, but it didn't do enough to help with pragmatic problems web designers trying to deliver a product actually faced. As much as I like the idea myself from a theoretical perspective, the market had other priorities.


I don't think the idea works from a theoretical perspective.

The browser is a communication channel between a publisher and a reader. They may want different things...

But the "semantic, well-formed, machine-readable web" idea is, in theory, a demand that the channel imposes on both parties. It's not something the publisher wants. It's not something the reader wants. Nobody cares what the channel wants; demanding extra information that isn't relevant to what any party to an actual transaction is trying to accomplish is always going to be doomed.

Readers care about positional information at a fairly minor level. Publishers care about it a lot. And hey, positional information has robust, if annoying, implementations.


Agreed, I was all-in on XHTML, I'm still smarting that HTML5 won with the seeming death of the semantic web.

We did get a handful of different names for a div though, so that's nice, maybe.


I don't see this connection of XHTML and semweb. The preamble to the original XML spec reads as follows:

> The Extensible Markup Language (XML) is a subset of SGML that is completely described in this document. Its goal is to enable generic SGML to be served, received, and processed on the Web in the way that is now possible with HTML. XML has been designed for ease of implementation and for interoperability with both SGML and HTML.

Basically, XML's purpose was to make parsing rules for web documents generic and DTD-less, in particular to support new vocabularies (eg. SVG and MathML) in addition to HTML/XHTML. The XML namespace spec back then was another pillar for advancing that goal.

HTML5 then simply imported SVG and MathML into HTML as external vocabulary, without a need for namespaces or other modularization meta-technique.

That XML has failed on the web (and was succesful only in enterprise and publishing) shouldn't make fans of structured documents on the web bitter, though: SGML is exactly as capable as XML to describe the syntax of markup documents (with tag inference, a feature recently being discussed for re-inclusion into XML under the term invisible markup), and even can parse markdown and other Wiki syntaxes, can do HTML-aware, injection-free macro expansion, and a whole lot of other things waiting to be discovered by XML fans.


I don't see this connection of XHTML and semweb

XHTML continued the spirit of HTML 4 strict, getting rid of presentational elements and replacing them with 'sematic' ones. Eventually, you were also supposed to combine it with RDF to embed machine-readable metadata into your documents (cf https://www.w3.org/2003/03/rdf-in-xml.html and https://en.wikipedia.org/wiki/XHTML%2BRDFa ).


Also, XHTML 2 tried to push people to tags with defined semantics like ARTICLE, SECTION, MENU, etc. Most of the tags exist now in HTML5, but their semantics were neutered in the transition.


If properly implemented tags like that would make stuff like Firefox Reader Mode a breeze to develop, but also easy to subvert.

But it undoubtedly makes development harder. Web Developer 101: Section 1: This 20 page list of categories and their strict formal definitions.


It doesn't really make development harder, because the semantic tags are defined based on properties that aren't visible to a parser. So it doesn't matter whether you use them correctly -- no validator will be able to tell.


> who didn't support the most popular layout method at the time, tables, in their new styling language CSS.

That's quite a misunderstanding of the past. Your proposition is only technically correct: CSS1 (Dec 1996) did not have table display properties, but they were added in CSS2 (Mar 1998). They were available early enough to matter.

The reason why many Web authors did not design with these properties is not the fault of the W3C, but rather because of the typical Microsoft sabotage. In the early 2000s, I personally did not give a shit about IE - my standard compliant CSS rendered fine in other popular browsers.


Thank you - saved me typing it. CSS support for table display predated XHTML 1.0 by nearly 2 years, and worked in XHTML with non IE browsers.


> who didn't support the most popular layout method at the time, tables

I can do tables just fine in XHTML. And luckily, tables have become redundant as layout technique, since CSS would do that, already then.

The nice thing with X(HT)ML is, that you can place queries against any document, natively.


> And luckily, tables have become redundant as layout technique, since CSS would do that, already then.

It took well over a decade to get CSS to play nicely with layouts. It's disingenuous to present CSS as the obvious solution to a problem while criticising table usage to implement grid layouts, as this line of argument entirely ignores the recent history of the web.


Did you miss the 5-10 years during which every CSS designer tortured themselves replicating tables with floats? Google 'pure css page footer' to see the wreckage. CSS tables took years to be supported, and even then, only brought back what people had been irrationally told to stay away from. It took another 5 years for flex box to become usable, adding something actually new.

To this day, changing a site's entire design without touching its markup is a mirage. It only works in contrived scenarios with extremely artificial restrictions, and nobody does it in the real world. CSS Zen Garden was demoscene.


Tabular markup for non-tabulated data is not rational & semantic markup is certainly not irrational.

You don't come from print design by any chance?

With flexbox, for example, you can change order of appearance contrary to the code order in the markup. It's not a mirage. Anyone who's used a browser's reader mode, or distilled view (as Brave calls it), knows the usefulness of applying different styles to a fixed markup.

Separation of presentation is possible.

Indeed now we've moved to responsive design and the number of devices and UA has exploded the separation of design and data is coming in to its own -- but instead pixel-perfection is still being chased with a billion @media declarations.


> Tabular markup for non-tabulated data is not rational & semantic markup is certainly not irrational.

Aren't you talking past each other? The issue is that with initial versions of CSS, replicating the powerful layout possibilities of tabular markup was not possible. Developers generally built layouts using "float" instead, and had trouble replicating some of the standard table features like keeping columns the same height.

Only with full CSS 2.1 implementation in IE8 did this state of affairs change, at that point you could apply rules like "display: table" to DIVs and get the same layout possibilities without using tables.

Flexbox, on the other hand, is actually an improvement (are are Grid Layout and Template Layout Modules). But it didn't come until after CSS 2.1.


You don't come from print design by any chance?

You've not been doing that whole web design thing for very long, by any chance? Those of us who have been around for a while do remember the inadequate layouting capabilities of CSS of bygone eras. While I was on the semantic markup side of the debate at the time, it's not as if the pragmatists that just went with the table for convenience's sake did so for no reason at all...


I started web design for lynx browser, back when using pine for email was hot, then moved on to Mosaic and NN.

There's a clear body of web design/dev people that think it's just a visual display medium, and a lot of those seem to come from print design.

The web for me has always been primarily a medium for information transfer, visual design is nice but not at the expense of semantic markup; table markup for visual layout is entirely unnecessary (and was terrible for accessibility).

People went with table layout because marketing people demanded pixel matched presentation and/or they didn't care to make sure their content was machine readable. The same views lead to IE only and give us websites that don't bother with semantic blocks now, or that don't work without JavaScript when that js is just being used for presentational flair.


My mistake, then. Point is, you didn't need to start off in print design to end up with certain attitudes under discussion. Customers demanded it, and you can only do so much trying to convince them otherwise.

For another, wishing for a certain amount of control over the layout seems like an entirely reasonable demand to me (eg you should not need to jump through hoops just to place something in the center of the viewport).

Lastly, if we're being honest, how much has semantic markup improved the web experience in practice, and how much did layout tables hurt us? When I used to do web design, I was a good citizen, avoiding layout table and arguing for semantic markup. But nowadays, I'm far more forgiving, though I still think there's some value in it insofar as proper markup can improve the screen reader experience.


Personally, I was fine with ditching tables, but stuff like differing box models, unequal CSS support, quirks modes, etc did make for a rather painful cross-browser development experience: Making things work out correctly (or at least gracefully degrade) in multipe IE versions, Mozilla, Opera, ... could be a challenge.


> Did you miss the 5-10 years during which every CSS designer tortured themselves replicating tables with floats?

I never really understood why people had such difficulty with this. I was able to execute table-less layouts while still supporting IE5 on Mac.

> To this day, changing a site's entire design without touching its markup is a mirage.

That's only because HTML authoring is dead. No one writes HTML well these days. Just look at tools like Elementor. How many nested divs do you need to add a faux button to a website? It's ridiculous.

Write well-structured, semantic HTML being mindful of a separation of concerns, and flipping between stylesheets is a piece of cake.


> Write well-structured, semantic HTML being mindful of a separation of concerns

I keep hearing about this mythical beast, and yet I have never seen one beyond the simplest of text-only blogs.


You don't see it because either (a) people don't do it or (b) people are using HTML as a markup language for apps rather than documents.

HTML was created to be used for documents. It was coopted to be used for apps. They really should have come up with a different language for apps.


Why people don’t use it in a) would be an interesting discussion. And I fully agree on b): HTML (+ CSS) is wholly inadequate for app development.


Can you give us a link to your elegant table-less IE5 supporting website? I would like to see how you achieved it.

I lost a lot of hair trying to make a simple 3 column layout where the middle column would scale to the width of the window and could consist of multiple DIVs in a vertical row, all of the same width. AKA "baby's first blog" layout. Something that should have been one of the design cases for CSS.


> Can you give us a link to your elegant table-less IE5 supporting website?

I cannot. They no longer exist. This was 14 years ago. I left web development shortly afterwards.

For your example, horizontal alignment was easy. One container div with a width of 99.9% and a left/right margin of auto. Inside you place three divs (columns) with a width of 33.3% and float left. Add another div at the end to clear the float.

Vertical alignment required a "hack."

http://www.greywyvern.com/code/min-height-hack

And to be clear, I never called it "elegant." I simply suggested it was possible.

Should CSS have been better? Sure. But, by that measure, it still sucks today.


You don't want the sidebars to scale with the screen though, just the middle. And as I recall the obvious solution of just setting a fixed width on the two outer divs and letting the middle one autoscale didn't work for some stupid reason. Maybe because they scaled to the content, not the width of the remaining space.


Exactly. And the same people who keep producing tons and tons of specs no one uses -- and even if anyone wants to use they're so complex it doesn't make any sense.

RDF, JSON-LD, "semantic web", piles and piles of garbage. And all written by the same group of 10 people who have never written a web app by themselves.


I think what really hurt the W3C was its obsession with "semantic web" features (RDF, XHTML 2, etc.) that nobody wanted and few people used.


> seamless iframe proposal.

DOM v3 Document.load() https://www.w3.org/TR/DOM-Level-3-LS/load-save.html was the candidate for client side document loading long before most people even knew what W3C is


Perhaps that is a great example of why the W3C failed?


From what I can see, your example of "DOM v3 Document.load()" is a great example of why the W3C failed to convince browsers to implement their proposals.

That spec seems to be mostly related to Java and it seems to me that it hardly considers how ECMAScript could use it i.e. someone had an idea, but couldn't translate that into a useful feature...


> didn't support the most popular layout method at the time, tables

Tables were NEVER a layout method and it was only used for that because CSS was deficient at the time.


> Tables were NEVER a layout method except for the decade or so they were

Okay.


? Tables for layout? Seriously?

Foolishness aside it is important to understand the DOM came out of work from the XML Schema spec and for many years updates to those two documents were always released together.

The W3C isn’t irrelevant as there is more to the web than just HTML just like Oasis isn’t irrelevant for schematic design and business integration. This talk of irrelevance is a hard argument to make for many frontend developers who cant tell the difference between the standard DOM and Reacts VDOM.


Yes tables. There is a reason why people where abusing table for layout even though all hated it.

In the end, we finally have CSS grid now ( https://css-tricks.com/snippets/css/complete-guide-grid ) which do what we were trying to achieve back in the day with tables. We had to wait until 2016 to have a type of layout which is considered standard in most GUI toolkit ...


`display: table` has been around since around 2001/02, for exactly that reason. However IE didn't support it until IE8.


However IE didn't support it until IE8.

Which got released in 2009, but took another year to overtake IE6+7 in market share. This means `display:table` was of limited usefulness for nearly a decade after its introduction...


Sure but that's a different issue. It's not the fault of the standard if a major browser doesn't implement it.

In practice I was using display table in some designs in the mid 00's but also using conditional stylesheets to hack an IE layout. This was not a particularly good solution but it "worked".


But that's kind of the point, we had to do a lot of things that just "worked" because we didn't have the tool that were considered standard in almost any other GUI toolkit.


On the one hand, good.

On the other, it seems like just complete capitulation by the W3C; WHATWG makes all the real decisions still, W3C now performs important administrative services for free, turning the "living standard" into an actual usable standard, without actually having any meaningful power over the results. WHATWG gets to do the part that matters without having to do the hard part, W3C does the hard part for them without any control over what matters.

On the other other hand, sometimes when the battle is already lost, formal capitulation is all that's left.

The browser vendors took control of the web standards process from any body that might represent/balance multiple constituencies/interests, and that's just how it is now.


W3C did try to balance multiple constituencies for HTML, but it couldn't find consensus among them, and broad consensus is the basis for its authority. WHATWG didn't so much take control of the standards process as recognize that what actually works on the web defines the "stadard"

It's not a capitulation, it's a way to exert influence in a world that respects rough consensus and running code more than formal processes and authority. Under the agreement, W3C has the power to ratify (or not) changes to HTML/DOM that align with the needs of its broad community for accessibility, internationalization, privacy, security, etc. The agreement provides a way for experts in those "horizontal" areas to participate more effectively in WHATWG to get improvements made upstream, rather than downstream in what amounted to a fork.

And yes, W3C provides the service of providing vetted snapshots of the Living Standards into more formal standards that governments and other standards bodies can reference and ratify. That's adding real value for some constituencies.


If they choose "not to ratify" something... will it have any effect on browser behavior at all? I don't think so. It'd just be a standard none of the relevant software cares about. Pretty useless to anyone. (Much like current W3C html standards...)

Seems to me W3C will straight up be acting as administrative staff for WHATWG, providing free labor to do the "hard parts" of providing a useful standard, without much decision-making ability.

Without much decision-making ability is indeed the status quo. Now they're providing some free labor too. But it's certainly less pointless than what they were doing before, so.


> If they choose "not to ratify" something... will it have any effect on browser behavior at all?

W3C has no authority to change browser behavior, no. But they CAN influence browser behavior by providing expert assistance to promote W3C's traditional values (accessibility, internationalization, privacy, etc.) in WHATWG.

We'll have to see if W3C's ratification or not has an impact on the larger web ecosystem, but it would probably get key customers' (and regulators) attention if W3C refused to ratify changes to HTML, DOM, etc. on accesibility or privacy grounds. Browser developers may respect W3C's supposed authority to set standards, but they definitely do respect the opinions of customers and authority of regulators.


The browser vendors always had control. W3C could put anything they want into the standards, it does not matter if nobody implements it.


In practice I've found that web standards are mostly driven by people who are willing to spend the time and effort to iron our all the edge cases.

I've had a very pleasant experience contributing to things like `fetch` and negative experiences contributing to some other things - essentially it's a "people problem" more than a technology problem.


I find the pleasantness of the standards process depends on how many people have strong opinions about the thing you're working on. Being involved in 'new' standards like fetch or gamepad or webgl is generally pretty relaxed because while it's interesting to many people, there aren't a bunch of people with Past Experience with the new api/standard and they aren't about to dedicate a bunch of time to doing it themselves.

Coming in to try and fix problems or propose improvements for existing stuff like canvas or XHR or whatever is another matter entirely, as you've probably noticed. Thanks for helping make fetch good!


Isn’t that the case with most human committees?


It's probably an unpopular opinion but I'm all for progress that may eventually lead to W3C's complete irrelevance. I've talked with folks at the WHATWG before, _Anyone_ can provide input and work with them. With the W3C it's, what, like $30k for the cheapest option and that barely gets you in the door. The W3C is too expensive to ever let small businesses or individuals have impact.

I'd love to see more groups like the WHATWG specifically for this reason.


There were serious issues with the way HTML was done at W3C, and this agreement between W3C and the WHATWG is a good thing.

However, all W3C specs are developed in the open, nowadays on github, and feedback from anyone anywhere is taken seriously. Don't like something about css? go here: https://github.com/w3c/csswg-drafts/issues. Something wrong with SVG? https://github.com/w3c/svgwg/issues. An issue with the Payment Handler API? https://w3c.github.io/payment-handler/. Dislike the way W3C itself works? That's here: https://github.com/w3c/w3process

Yes, the W3C is a membership based organization, and if you want to vote for instance on who gets to be on its Advisory Board, or have a say in the Charter of Working Groups, you need to be a member, and pay.

For the WHATWG, you don't get to vote on who's on the Steering Group, and you don't get a say on what new Workstreams are started / changed / stopped, even if you did want to pay. That's up to Google, Apple, Microsoft and Mozilla. Sure anybody can suggest stuff (but that's true at the W3C too), but they decide, and there's no way into that club.

Not to claim that W3C is perfect. Plenty of improvements are needed. But claiming that small business or individuals cannot participate just isn't true.


Thank you. These are fair points. When attempting to talk to various folks years ago the WHATWG always made it easier for me but I see your point.


Invited experts don’t pay any fees: https://www.w3.org/participate/invited-experts/


I mean, I get that. But if you're a business or even an individual who was an interest in how the internet is being shaped you have to either pay a ton of money or somehow get them to invite you.

WHATWG seems to have found a way to manage the noise of allowing essentially anyone in to comment and contribute.


Maybe it's just me, but I've never heard about the WHATWG before. And for everybody like me, that acronym stands for "Web Hypertext Application Technology Working Group".


WHATWG is basically the browser vendors sidelining W3C to discuss what features HTML etc should have. The news today is that the W3C finally accepted that situation.


The biggest impact of WHATWG's work wasn't about features, but about defining HTML itself in a way that matched how web pages were written. Before HTML5, every browser parsed HTML in subtly incompatible ways. A document written for one browser could fail to render properly in another. Now, the process for converting bytes into a DOM tree is completely and precisely specified (https://html.spec.whatwg.org/multipage/parsing.html#parsing), down to the level of sniffing text encodings and changing them mid-parse. It's hard to overstate how important this sort of compatibility work has been to the strength of the modern web.


> It's hard to overstate how important this sort of compatibility work has been to the strength of the modern web.

There are fewer compatible browsers than at any time in the past 20 years.

The compatibility problem has been solved by drowning browser maintainers in complexity and convincing them to give up. All the small browsers lack the manpower to implement the rapidly churning spec. This leaves the Google-funded Firefox and the Google-funded Chromium in the space.

Compatibility has been solved by reducing the web to a monoculture, with a puppet competitor to avoid monopoly issues.


I don't think that is a fair analysis.

Firstly, much of the complexity is an emergent property of interaction between a spec and reality: the engineering choices developers make when implementing a spec.

Pixel examples: (a) mitering of borders, (b) sizing four 25% width divs within a 99px div.

Pick just about any old spec, then look at the corner cases where developers have discovered different browsers act differently ("bugs"). The programming differences are often emergent and are not covered by the spec.

Developers create web pages that depend on the differences in a browser: that is a hard reality.

Secondly: Chrome mostly works better, follows specifications faster, and it is marketed better. Firefox has a significant budget, but it tools around with a bunch of shit that doesn't make their browser better. My interactions with Mozilla trying to get real bugs fixed have been poor. Safari and Microsoft were worse. Chrome cares about bugs, and fixes them in my experience.


> Firstly, much of the complexity is an emergent property of interaction between a spec and reality: the engineering choices developers make when implementing a spec.

Ah, clearly that makes it easier for third parties to maintain browsers. Or, wait, no -- that's yet another way to push players that don't have hundreds of millions per year worth of funding.

> Chrome mostly works better, follows specifications faster, and it is marketed better.

That's largely because of the way the specs are developed: They rubber stamp Chrome features.


I'll bite: do you think that Chrome is some sort of conspiracy??

I also don't like how Firefox, Safari and Edge have failed to compete with it, especially because I don't like that Google slurps so much private information. I wish Opera could have continued to compete as I loved the underdog for years.

If it were just a problem of marketing, I could be upset. However technically the Chrome team is just totally obliterating competitors by just being technically so much more competent and also providing features. Features that developers and consumers want: as a web developer I see Chrome kill the other browsers on metrics I care about such as bugs fixes, adding features that are relevant to my development, development tools, etc; As a Chrome user, Chrome mostly trounces the other browsers on security, speed, etc (Safari does win on some metrics on closed iOS and macOS).

Mozilla, Apple, and Microsoft are not poorly funded - they are just being beaten for what are mostly technical reasons (relatively slower, unreliability, wasting time on low-value features).

On topic: Chrome implements many standards it didn't create far better than the other browsers. Browse through the list at https://wpt.fyi/results and see that Chrome has better scores for IndexedDB (I think created by Firefox); better scores for offscreen-canvas and orientation-sensor (I think created by Apple).

A great example: Firebug which was created outside[1] of Mozilla and fantastic at the time. Good developer tools draw in plenty of developers, and help make sure they test everything on your browser. Chrome Dev tools overtook it in every way I cared about: reliability, features, inbuilt F12, remote debugging, async stack, etc. Now developers use Chrome because this massive feature is so much better, and why blame the developers for using the tool that makes their job the easiest?

PS: I didn't downvote you - someone else must have disliked the way you answered.

[1] https://hacks.mozilla.org/2017/10/saying-goodbye-to-firebug/


> I'll bite: do you think that Chrome is some sort of conspiracy?

No, it's a perfectly rational set of business decisions. I'd make the same ones if I was in charge of maximizing Google's control of the web platform.

> Mozilla, Apple, and Microsoft are not poorly funded

Correct. Apple and Microsoft just can't make a business case for burning huge mountains of cash to dominate the web, so they let Google have that pie. Again, a rational business decision.

So we end up with what is largely a monoculture, with fewer compatible web browsers than any time in the past 20 years, because is willing or able to put in the resources to keep up with the incredible complexity required by the modern web.


> WHATWG is basically the browser vendors sidelining W3C to discuss what features HTML etc should have.

One way of looking at it is who has final say what ends up in the documents. That seems to be what you're discussing.

In the W3C, that's the Director, Tim Berners-Lee. Except, as pointed out elsewhere in this thread, he has delegated all his decision-making powers to W3C staff. So ultimately it is the W3C CEO and the folks he hires.

In the WHATWG, that's the browser vendors (i.e., the WHATWG Steering Group). I find the WHATWG model to make more sense, as I think what ends up in specs should match what the implementers plan to do; that makes for a more useful spec ecosystem for everyone. When you diverge too far from that model, you get circa-20004 W3C, i.e. XHTML 2, XForms, XEvents, etc.

A different way of looking at it is who contributes to the discussion and development of features. In the W3C, that's paying member companies (https://www.w3.org/Consortium/fees, $2.25-$77k/year in the US). Although anyone can comment on GitHub, to contribute ideas that get incorporated into the specification, you need to be a W3C member, for IPR reasons. (Or invited expert, but that has its own problems; see https://medium.com/@tobie/w3c-doesnt-help-its-invited-expert...) In the WHATWG, anyone can contribute; the IPR concerns are taken care of by signing an agreement, somewhat like an open-source project CLA, but that agreement does not require paying membership fees.

Overall, I think the WHATWG structure, of being open to input from all (instead of pay-to-play), with the spec tracking what browsers intend to implement (instead of tracking what W3C staff deems good), is a pretty great model for standardization.


You can look at theoretical process improvements all you want, but the reality is that the WHATWG process has driven Opera and MS out of producing core web browser tech alltogether, and prevented countless unnamed innovative browsers from being even considered, let alone developed, out of sheer infeasability to develop a browser from scratch. Worse, it doesn't help that the WHATWG "living standard" process is setup to never end, and never produce a final spec of sorts, contributing further to the problem.

WHATWG fails to realize that they cannot represent the browser vendors left (Google and Google-financed Mozilla) having a desire for ever more webapp contamination, and web users and authors at the same time who want a stable content media format.

Also, I have to say your comment doesn't sound very promising wrt the future WHATWG/W3C relations being anounced.


>Overall, I think the WHATWG structure, of being open to input from all (instead of pay-to-play),

Thats not entirely fair. The WHATWG is also pay-to-play in a way - the browser makers have disproportionally higher power, even more so than the w3c model where there is at least some form of consensus forming among a group of people with more international representation. Now technically anyone in the world can open a github issue (w3c also does it though now), but thats about what anyone can do to oppose something in the spec that the browser makers have their hearts set on.


Hmm, you seem to have skipped over some of my points.

The WHATWG is open to input from all, but only things browsers are willing to ship get into the specs. So yes, that's more power. No disputing that. But I would rather not have specs full of unimplemnted fiction, even if that fiction has consensus among paying member companies.

The W3C is not open to input from all. If you open a GitHub issue contributing an idea, but do not pay member fees, they cannot incorporate your idea into the spec, because of Intellectual Property Rights concerns. There is no way in the W3C to say "I sign over my IPR rights" without also paying thousands of dollars per year.


> The W3C is not open to input from all. If you open a GitHub issue contributing an idea, but do not pay member fees, they cannot incorporate your idea into the spec, because of Intellectual Property Rights concerns. There is no way in the W3C to say "I sign over my IPR rights" without also paying thousands of dollars per year.

This is not true: https://www.w3.org/2019/Process-20190301/#contributor-licens...


> There is no way in the W3C to say "I sign over my IPR rights" without also paying thousands of dollars per year.

Or, as you mention up-thread, get invited into the group by the chairs and with consent of the W3C staff. Which… is an opaque process, and totally non-obvious to an outsider.


It's not like anybody of us voted for W3C either (so it's not like one body is the "legitimate" one and other is not).

And the W3C had stalled progress so much in the 00s and 10s, that it deservedly got sidelined.


>It's not like anybody of us voted for W3C either

You have more of a voice in the W3C than any other standards body (you can comment on the github issues, or join as an invited expert etc). The W3C was created by Tim Berners-Lee, if anyone should have a vote on how web standards are run, then the creator of the web should be one of the contenders for it.

>And the W3C had stalled progress so much in the 00s and 10s

Not the 10s, just the early part of 00s, due to the xhtml vs html5 thing.


> You have more of a voice in the W3C than any other standards body (you can comment on the github issues, or join as an invited expert etc).

That's… not any different to the WHATWG. You can comment on GitHub issues there just fine too. (And you can't just join as an invited expert for the W3C, but even being an invited expert means relatively little in most groups.)

> The W3C was created by Tim Berners-Lee, if anyone should have a vote on how web standards are run, then the creator of the web should be one of the contenders for it.

Tim is scarcely involved in the W3C nowadays, and hasn't been for a long time. The fact he's in theory the Director (but in practice almost everything is done by W3C staff in the Director's name) means very, very little.

That's an appeal to an authority who isn't even present.


>You can comment on GitHub issues there just fine too.

Yes, but (and you can disagree with me here) the W3C has much better international participation and say in standards. I know you can just file github comments, but none of the world has any real way to oppose any development that the whatwg makers have proposed. There doesn't seem to be any consensus forming or way to really oppose any development apart from filing a github issue. All the editors are pretty much from one company, which feels like handing over control of the web to that one company essentially.


> All the editors are pretty much from one company, which feels like handing over control of the web to that one company essentially.

Which company are you referring to? Per https://github.com/whatwg/sg/blob/master/Workstreams.md, I see

- 3 independent (Dom Farolino, Robert Kowalski, G. P. Hemsley)

- 2 Mozilla (Mike Taylor, Anne van Kesteren)

- 2 Google (Domenic Denicola, Philip Jägenstedt)

- 1 CloudFlare (Terrin Stock)

- 1 Bocoup (Simon Pieters)


Okay, so they have added more people there than the last time I looked (which was pretty much all google except for 1 mozilla person). Thanks for pointing towards that link.

Still seems a pretty small group to control something so important, but I hope they keep improving and adding more people there from other companies and countries.


The main restriction here is people willing to volunteer. We have tons of specs desperate for more maintenance work. If you'd like to get involved, please do so, and if you can spend significant amount of time on any of them, I imagine their current editors would be happy to add you to the team. (I certainly would, for the specs I edit.) You can get started at https://github.com/search?q=is%3Aopen+label%3A%22good+first+....


They were focused on data organization and security instead of rounded corners or querySelectors, but most frontend developers saw that as stalling.


As someone who was involved in some of the W3C work at the time, it's not clear to what extent they were focused on "security" at all.


They were focused on replacing web development practices. Something that was much needed, but completely uncalled for.


Neither data organization nor security is their role...

One belongs to the backend to decide, the other is for HTTP/S-level specs and Javascript, none of which are or should be W3C's concern.


I use all kinds of defined data structures in my JavaScript applications. Defining such as an interface is one TypeScripts biggest wins. W3C also owns XML and XML Schema which are all about defining data structures on the client side.


Yeah, I expected something like that. As the other comment mentions, I'm not sure whether to feel bad about this or not.

On the one hand, browser manufacturers now "have won" and can do what they like. On the other hand, the W3C always was kind of a toothless tiger, hoping that everybody would do as they say.

Difficult situation, not sure if this will "help the web".


>On the other hand, the W3C always was kind of a toothless tiger

Not really, it was, and still is, a neutral ground for people to discuss and build consensus on what technical standards should be on the web. CSS, ARIA, WCAG and many other stuff are working in the W3C pretty well.


WHATWG also stands for "What working group", as in "What working group is going to work on extending HTML"

http://ln.hixie.ch/?start=1086387609


Mentally I always read it as "What We Get"


And it's pronounced more or less like "wat-woo-g"



W3C will now publish a Recommendation instead of a "version" as such. The Recommendation will be from a "Review Draft" published by WHATWG, this draft will then go through the W3C process (Candidate Recommendation → Proposed Recommendation → Recommendation).

And then there is patent exclusion ( What? )

Anyway I don't see anything new to developers. Apart from the politics and two parties bury the hatchet.


Why don't W3C just leave it to WHATWG entirely? Why do we need two groups involved? Why should I put any value in what W3C recommends? What would they ever recommend instead of WHATWG?


W3C still has lots of standards and some WGs are working fine, like CSS and accesibility. WHATWG is too much browser focused IMHO, W3C tried to be more broad with things like RDF, SPARQL, EPUB, SVG, ...


> Why should I put any value in what W3C recommends?

For the same reason you put any value in, e.g., ISO or ANSI. Because others recognize them as the standard organization and put value in what they recommend.


But no browser maker recognizes the W3C standard or claims to implement it, so what good is it?


That no browser maker recognizes W3C's standard is a bit misleading. MDN (i.e. Mozilla) quotes the w3c recommendations all over the place.

Google, from what I found around, cites nothing, not even WHATWG.

Webkit cites MDN (which cites w3c) as a source.


The W3C citations on MDN are a historical artifact by the documentation folks, not influenced by the browser folks. There's a slow movement to consolidate around what's implemented and stop referencing W3C forks, and I think today's news will help hurry that along. See e.g. the discussion in https://github.com/mdn/kumascript/issues/1019 or my previous work in https://github.com/mdn/kumascript/pull/220.


If I complain to Mozilla that they deviate from the W3C spec in some respect, will they treat that as a bug, or will they more likely say "no-one implements this, so no website uses this, so we have no reason to be the first to implement this because we don't think this is part of the spec is really important"?


Depends on the deviance. If it's 'this was changed in the whatwg spec' the answer will probably be 'the next w3c spec will document this change, for now here's <an explanation and/or an update to mdn>'. If the deviance breaks production apps sometimes Mozilla will roll it back, it's happened before. In other cases they'll treat it as a bug and fix it.


Thanks, that's encouraging to hear!

Although I still worry about the case where fixing the deviance would break production apps. Potentially break them, since people won't complain until you release the breaking browser version. That's what's usually kept browsers from hewing to standards.


I think it's more misleading to claim that any written standard has authority. In practice these written standards are ultimately subservient to the actual implementations—particularly where the competing implementations happen to agree. And especially in disputes that affect real websites.

If there is an inconsistency between what the standards say and what the browsers have done, the browsers will more than likely trump the standards. In all practical senses the standards are descriptive, not prescriptive.


For HTML and DOM, this is true. There are many more things at the W3C, the most obvious of which is probably CSS. Browsers totally pay attention to what these spec says, and are the dominant participants in producing these specs.


Folks may find the more-detailed blog post linked at https://www.w3.org/blog/2019/05/w3c-and-whatwg-to-work-toget... to be more clear.

Also, for the full details, here is a direct link to the agreement: https://www.w3.org/2019/04/WHATWG-W3C-MOU.html


This looks like W3C capitulating:

> WHATWG maintains the HTML and DOM Living Standards

Which is understandable, since all of the main implementers of HTML are active in WHATWG, not so much W3C.

What will be the concrete impact of this agreement?


The biggest concrete impact, in my opinion, will be the removal of confusion in the web developer community around where things are specified. Currently W3C forks (https://wiki.whatwg.org/wiki/Fork_tracking) are often linked to as authoritative, or misleading show up in Google search results higher than the document they are forked from.

Per the Memorandum of Understanding at https://www.w3.org/2019/04/WHATWG-W3C-MOU.html#transition, after the W3C first marks a WHATWG Review Draft as a W3C Recommendation, all the forks will be marked as "Superceded" with redirects to the WHATWG originals.


Sounds like a recognition of WHATWG as the master, with W3C becoming more of a publishing / vetting organization for snapshots of the standard?


When in reality the actual implementations are the master, WHATWG are the first to describe the implementations' intended behaviors, and W3C is a glorified proxy server with aggressive caching.


I guess it's more a recognition of the limited resources of both W3C and WHATWG, of the fact that there's a need for a "real" fixed-in-time and versioned document describing HTML rather than a collaboration workspace (WHATWG's "living standard") and W3C's choice to not die on the particular hill of redacting WHATWG texts. Though practically speaking, W3C HTML 5.2 (the last completed recommendation) was sponsored by The Paciello Group and MS before that, and with MS' withdrawal from browser development they also gave up redacting W3C HTML to fit IE's features set. I would love to be corrected by an insider, but that's what I could make up from publically available info.


As far as I'm aware, MS continued to be involved with the W3C HTML much longer than other browsers due to the W3C patent policy, which the WHATWG only had any equivalent of as of December 2017.

Note also the W3C specs weren't really redacted versions of the WHATWG spec, especially in later years: they were pretty much complete forks and moving changes between them non-trivial.


Or it could be a relationship similar to that between the IAB and IETF


Out of couriosity, does the WHATWG have any commitment to keep the "review drafts" meaningful (e.g., by orienting the things that are actually supported by browsers along them)?

Will it mean anything if one of them becomes a W3C Recommendation?

If the W3C sends a draft back with requests for corrections, will anyone from WHATWG actually be motivated to make those corrections?

Or is this more of a "pick any commit you like and we can declare it the 'review draft' if that makes you happy" kind of thing?

Also, by this agreement, does the W3C have any avenues to influence HTML/DOM design work that exceed those of an average volunteer an the WHATWG mailing list?


Is WHATWG pronounceable?


> It has various pronunciations: what-wee-gee, what-wig, what-double-you-gee.

https://whatwg.org/faq#spell-and-pronounce


"What double you gee"


Just think of "What would Jesus do?" and leave off the last two syllables.


"What working group"


I instinctively read it as "what-wee-gee"


That is how a lot of us say it.


I always thought it to be "what wig".


what-wig


I say "wot-wug", but then I've never actually had a spoken conversation about web-standards in over 20yrs of making websites (mostly non-commercially)


depending on accent, those pronunciations can be pretty close


Good


The only thing that matters is what Chrome and iOS Safari decide to do. Sadly.


So W3C is signing to confirm their irrelevance?


No, the W3C publishes a lot of standards apart from HTML and DOM, and they would still be somehow involved here too.

However, it is indeed sad that the HTML spec is now controlled by browser makers only, and the editors are mostly from just one company.


FWIW, w3c publishes and standardize a lot more than just HTML and DOM. So calling it irrelevant seems a bit of a stretch.


I would also add that having third-parties propose specifications to standardization bodies is a widely established practice. In fact, standards are supposes to reflect industry practices and are a way to establish common ground.


[flagged]


What could Mozilla have done with W3C making HTML5 DRM (EME) part of the official spec?

That stance is the main reason W3C lost support from the community (They not only lost Mozilla but also EFF and a lot of trust from independent developers).

That DRM affair was also conducted in a very shady way with secret votes, etc...


W3C didn't loose Mozilla over EME. 1) Mozilla is still an active participant in W3C (there are many more things than HTML there) 2) Mozilla supported EME. Maybe reluctantly, but supported it nonetheless.


Lost trust from a small but very vocal bunch of disruptive people and others who don't know what that actual issue is but seemed like a good band wagon to jump on grrrr big business giving it to the man!


[flagged]


As someone deep in the markup world, I'd say thank god XForms didn't make it, though I don't know Mozilla started WHATWG, let alone for promoting XForms of all things (Mozilla/Netscape once introduced JavaScript). WHATWG was Ian Hickson's (of Google) project, started out very successfully in adding capabilities to the web, but sadly turned into an instrument for browser monopoly.

As for validation, you can parse/validate all published versions of HTML, including W3C HTML 5, 5.1, and 5.2 using SGML and my SGML DTD grammar(s) [1] for HTML 5.x. SGML (ISO 8879) is the original markup meta-language on which both HTML and XML are based. XML was once started as a subset of SGML (dropping tag inference/omission, SGML-style empty elements, attribute shortforms, and other features requiring DTD declarations for the concrete markup language to parse).

[1] http://sgmljs.net/blog/blog1701.html


> WHATWG was Ian Hickson's (of Google) project

He was at Opera when it started, back in 2004.

The original W3C Workshop proposal to do Web Forms 2.0 was a Opera/Mozilla joint paper, and a few months later when the WHATWG started Apple had joined. I don't think Google's name was attached until Hixie moved to work there (admittedly not that much later).


> WHATWG was Ian Hickson's (of Google) project

But WHATWG would've never flew if not him getting those dissenting Mozilla devs on board


Impressive




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: