
The evolution of the web, and a eulogy for XHTML2 - tannhaeuser
https://www.devever.net/~hl/xhtml2
======
pgcj_poster
> This is unsurprising given the total willingness and business incentive of
> major players such as Google to drive the applications use case and their
> comparative disinterest in the semantic hypertext use case

Indeed, I am consistently surprised at the document-display features that are
still missing from web browsers that are increasingly preoccupied with serving
as application runtimes.

The web was originally envisioned as a platform for scientists to share
documents, but thanks to Google dropping MathML, there's still no cross-
browser, non-kludge way to display math.

Browsers still can't justify text properly. Even with hyphenation (which iirc
is still a problem in Chrome), the greedy algorithm used for splitting text
across lines still results in too much space between words when justified.

There seems to be a complete lack of interest in Paged Media support, so if
you want your web-page to be printable with nice formatting, you basically
need to provide it as a PDF.

I've gotten to the point where I'm often happier to read something in-browser
as a PDF than as a web-page. Sure, it can't reflow properly, but at least it
won't have a 2-inch sticky header and make XHR requests to 10 different
domains. My vision-impaired father reports that his text-to-speech software
now works better with PDFs than with real web-pages. This is insanity.

~~~
AndrewDucker
A "page" exists because paper exists. When you aren't using paper why would
you break up your content into arbitrary fixed size increments, rather than
using (say) "sections", which are the size of the content you put into them?

~~~
pgcj_poster
Right, you shouldn't do that. Instead, you should leave your content as-is
when viewed on a screen but add a separate stylesheet that gets applied only
when you print to format your content for the paper: eg. put footnotes at the
bottom of the page. Unfortunately, features like this are largely missing from
browsers.

~~~
gboss
This is exactly what the print media query is for, which is supported in all
modern browsers.

[https://joshuawinn.com/css-print-media-query/](https://joshuawinn.com/css-
print-media-query/)

------
superkuh
That last paragraph absolutely nails it. Trying to re-implement the operating
system as the browser is mutually exclusive with it being a good browser or
standard for webpages.

> _If anyone constructed a PDF, which was itself blank but, via embedded
> JavaScript, loaded parts of itself from a remote server, people would
> rightly balk and wonder what on earth the creator of this PDF was thinking —
> yet this is precisely the design of many “websites”. To put it simply,
> websites and webapps are not the same thing, nor should they be. Yet the
> conflation of a platform for hypertext and a platform for applications has
> confused thinking, and led developers with prodigious aptitude for
> JavaScript to mistakenly see mere websites of text as a like nail to their
> applications hammer._

~~~
nitwit005
It's worth remembering that, because there were no Javascript or HTML APIs for
many things people felt their pages needed, a huge portion of the web was
loading up flash, silverlight, or java applets.

Now, maybe they never should have allowed Flash and friends, but the genie was
well out of the bottle. You could either have HTML and JS based functionality,
or the plugins.

~~~
skunkpocalypse
WHATWG should have published a standard to replace Flash -- all the same goals
as Flash, but a clean-slate standards-driven design.

Instead they infected the HTML standard with everything Flash was being used
for.

~~~
zozbot234
Um, they did exactly that via HTML canvas. Replacing the main use cases for
both Flash and Java applets.

------
jefftk
A lot of issues with this post, but I'll just take two:

 _> rather than articulating particular requirements and principles but not
how they need be met, the WHATWG specifications tend to be written in a highly
algorithmic and prescriptive style; they read like a web browser's source, if
web browsers were written in natural language._

It turns out that if you want to have pages work the same in every browser you
need to have every browser doing the same thing when it interprets the pages.

 _> The pursuit of the semantic web has changed in the era of HTML5, which
represented a rejection of XHTML — to me, a seemingly bizarre rejection of
having to write well-formed XML as somehow being unreasonably burdensome._

In practice, people won't write valid XML. We had a lot of cargo-culting,
people putting self-closing tags into HTML, but people weren't using XML
editors. And without an editor that understands XML it definitely is
unreasonably burdensome to create XML. What we saw instead was that even most
"XHTML" documents were not valid XHTML and were served with an HTML content
type. If you had served them instead with an XHTML content type the browser
would have simply refused to render them.

Both of these were recognitions that the previous approach wasn't working, and
that if the spec was to achieve its goals we needed to try something
different. Under WHATWG the spec has moved from "yes, the spec says this but
it doesn't matter" to "the spec describes what the browsers do, and the
browsers treat cases where they violate the spec as bugs". Sites now really do
work the same across browsers, and WHATWG deserves a lot of credit for that.

~~~
blablabla123
> In practice, people won't write valid XML. We had a lot of cargo-culting,
> people putting self-closing tags into HTML, but people weren't using XML
> editors.

That's probably the most popular argument against X(HT)ML and in favour of
HTML5. In fact I think among the popular programming/markup languages, only
very few are so forgiving. Namely it is HTML(5), JS, CSS and perhaps Shell
script and Perl. But even in these cases following best-practices and using
Linters has become extremely popular. On the other hand you have strongly
typed languages or even languages like Python or Makefiles that even make sure
you use consistent whitespaces.

I think nearly everybody uses quite powerful editors with a load of plugins
these days because accelerate editing and also do autoformatting.

> Sites now really do work the same across browsers, and WHATWG deserves a lot
> of credit for that.

On the other hand there are just 2 popular/"usable" browser engines left. I
think XHTML is far more modular, maybe it would even be possible to outsource
some browser rendering tasks to XSLT transformations. HTML at one point became
a messy standard through the Browser competition and then WHATHG somehow
manifested that situation I guess. Now there's a massive monoculture of
Browser engines.

~~~
twsted
> On the other hand there are just 2 popular/"usable" browser engines left.

I count at least three

~~~
jefftk
Yes: Gecko, Blink, Webkit, and EdgeHTMl. Edge is switching to Blink, and Blink
is a fork of Webkit, but I'd say there are four now soon to be three.

~~~
WorldMaker
The way developers test Blink/Webkit still (they just test one browser in the
family, be it Chrome or Safari largely depending on the web developer's home
hardware) it's hard to consider it a hard enough fork to count as two separate
engines. Sure, reality shows more divergence than such web developers that
test that way tend to generally expect/believe (Safari as Apple-only "LTS"
Chrome), but that perception and its reflection in testing alone seems reason
to consider the family together (especially with Edge moving into the "family"
for among other reasons that very reason of "simplifying web developer testing
requirements").

------
ainar-g
Thank you for posting this! Now I know that I'm not the only “grumpy old man”
who smells the fishy part of the “Modern Web”.

With that said, I would love the author to go on and elaborate on other
advantages of XHTML2, such as possible integrations with XForms (including
more inputs and sending requests without page reloading and without
JavaScript), XFrames, the single header element <h>, every element as a
hyperlink, etc. Then there are MathML and XSLT. If XHTML2 became a reality, we
would probably see XSLT 2.0 more actively adopted by the browser vendors,
which is a good thing in my book.

~~~
techdragon
XForms is one of the things I wish would be more widespread, it’s such a good
idea in principle. I think it just gets tarred with the “XML is bad” brush and
ignored.

~~~
ainar-g
Someone should write an article and call it something like “‘XML Considered
Harmful’ Considered Harmful”. By no means is XML a very nice one to work with
(manually). But at some point during late 1990s it had a real chance of
becoming _the_ document and data mark-up language, with standards allowing you
to do pretty much anything with it. And a bunch of WYSIWYG tools for those who
are allergic to plain text and scriptable editors. And I think it would be a
slightly better world.

~~~
acdha
I wish the XML community had given proper priority to usability along with
quality tooling and examples. There was a solid decade where people would work
on a spec which sounded cool but effectively never shipped from the
perspective of working developers or did so with enough bugs/inconsistent
support/poor performance/bad UX that it was a net cost. As a simple example,
you still can’t use XPath 2 portably because libxml2 never implemented it –
absolutely nothing that the XML standards community was working on had even 5%
of the value that would have come from fixing that or countless similar
problems which had a constant pressure to stop using XML. The same was true of
good documentation and examples: the assumption was that other people would
take time to learn these convoluted specs but most of them started using JSON
instead because they could ship so much faster.

~~~
zmix
> you still can’t use XPath 2 portably because libxml2 never implemented it

That's not the fault of the XML community at large, but just a lack of
resources for the implementation of an unpaid open source project.

You can happily use XPath 3.1 with Saxon, BaseX, eXist. All three use Java,
so, it's not portable, but Saxon has a C library, that mirrors the Java
version 1:1, and that C library is also available as open source, though, it
lacks some XPath 3.x features, like higher order functions, then.

For the command line, there is a partial XPath 3.1 implementation with
'xidel'.

But I agree, libxml and libxslt being at XPath 1.0 for so long, did not serve
XML well.

~~~
acdha
I classed that as something for the XML community to prioritize because
implementations are so important to adoption for standards. If you design
standards and want them to be used, at some point you need to figure out how
to get resources to support development of key implementations, help major
projects migrate to alternative implementations[1], or develop a replacement
if nothing better is available.

The fragmentation you mentioned is part of what made this so frustrating: if
everything you used was within certain toolchains, the experience was fairly
good but then you'd need to use a different ecosystem and either drop back to
good old XPath 1 or take on more technical debt. In many cases, the answer I
saw people favor was leaving the XML world as quickly as possible, which is
something the community has a strong interest in.

1\. For example, Saxon added support for Python just a few days ago:
[https://www.saxonica.com/saxon-c/release-
notes.xml](https://www.saxonica.com/saxon-c/release-notes.xml) Imagine if that
had happened a decade ago and everyone who was stuck with libxml2 could have
easily switched?

~~~
zmix
You are right. Too little, too late.

On a fair note, one should also take into account, that Saxonica is a rather
small, even if highly skilled, shop and the program, they create, is a huge
undertaking.

~~~
acdha
I’m not faulting Saxonica in any way. It’s just a shame that they seem to be
taking this on alone.

------
perlgeek
Hey, I actually wrote an application that produced strict xhtml2, and served
it with a application/xml+xhtml content type (iirc).

It was a page that displayed IRC logs, with linkable anchors for each line,
automatically breaking words that were too long for the browser, turning text
into links by regex, interpreting terminal colors etc.

Each time I had a tiny cross-site scripting bug in there (some part wasn't
XML-escaped properly), some data would eventually trigger it (you wouldn't
believe the amount of encoding junk on IRC), the browser would simply refuse
to render anything at all. Inconvenient for my users, but it made sure such
things didn't slip by unnoticed.

\---

This was a side project, done for fun, and a few friends that used my site. If
I had been trying to make money from it, the first thing I would've done is to
switch to something less strict, so that tiny errors wouldn't stop rendering
the whole page.

~~~
gwbas1c
Around 2010 I did some web programming. XHTML was so much easier to parse and
render from code because I could use some very powerful XML libraries. HTML is
different enough from pure XML that it requires a different parser.

It was very nice to load an XML document and look for tags in a specific
namespace instead of using a specialized HTML templating engine.

------
indentit
One of the things I hate most about HTML 5 is that documents don't need to be
well-formed XML, and are even encouraged not to be - `<hr>` instead of `<hr/>`
etc, thus excluding XML processing tools from being able to work with HTML
documents. Then one needs to support tag-soup / follow the HTML5 parsing
guidelines to the letter when the whole mess could have easily been avoided.
This affects text editor plugins etc. where one might want to utilise a single
plugin/codebase and use XPath to traverse both XML and HTML documents easily

~~~
tannhaeuser
I don't quite understand the XML fetishism. HTML is originally based on SGML,
and SGML is every bit as structured as XML by definition, since XML is
specified as a proper subset of XML. From the XML spec:

> _The Extensible Markup Language (XML) is a subset of SGML that is completely
> described in this document. Its goal is to enable generic SGML to be served,
> received, and processed on the Web in the way that is now possible with
> HTML. XML has been designed for ease of implementation and for
> interoperability with both SGML and HTML._

The "generic" part refers to XML being canonical, fully-tagged markup not
requiring vocabulary-specific markup declarations for tag omission/inference,
empty elements and enumerated attributes like is necessary for HTML and other
SGML vocabularies making use of these features.

That XML has failed on the web doesn't mean one has to give up structured
documents. In fact, HTML can be converted easily into XHTML using SGML [1]. If
anything, markup geeks should embrace SGML (an ISO standard no less) to
discover the power of a true text authoring format. For example, SGML supports
Wiki syntaxes (short references) such as markdown.

[1]: [http://sgmljs.net/docs/parsing-html-tutorial/parsing-html-
tu...](http://sgmljs.net/docs/parsing-html-tutorial/parsing-html-
tutorial.html)

~~~
bhaak
SGML is much more complicated than XML.

Look at this "<p<a href="/">first part of the text</> second part". This is a
valid document fragment in HTML 4.01 because HTML is authored in SGML.

Writing a correct XML parser is much easier than writing a correct SGML
parser, and what's more important, it's much easier to recognize errors.

I agree with OP that HTML5 should have been XML from the start. Nowadays, you
hardly write any HTML by hand and even if you do, it's easy to write
syntactically correct XML.

It's true that you can convert any HTML into XML with ease but it's still a
stupid, unnecessary step.

~~~
tannhaeuser
> _HTML5 should have been XML from the start_

The point is that this hasn't happened; neither back in XML's heyday, and much
less today. Now you can bemoan XML's demise until the end of time, or you can
fallback to XML's big sister SGML. As I said, SGML has lots of features over
XML that are in fact desirable for an authoring format, such as Wiki syntaxes,
type-safe/injection-free templating, stylesheets, etc. on top of being able to
parse HTML. Many of these features are being reinvented in modern file-based
CMSs and static site generators, so there's definitely a use case for this.
Whereas editing XML (a delivery rather then authoring format) by hand is quite
cumbersome, verbose and redundant, yet still doesn't help at all in how text
content is actually created on the web.

~~~
bhaak
Is SGML even still used? The only usecase I remember besides HTML is DocBook
and that of course also has a XML variant for a long time.

SGML is needlessly complex as an authoring format. Even HTML was considered
too complex and that's why we got lightweight markup languages like MarkDown
and AsciiDoc.

I would be very surprised if we ever turn back to something like SGML.
Especially as there are well designed LML as AsciiDoc or reStructuredText.

~~~
tannhaeuser
To give you an idea of what SGML is capable of, see my tutorial at [1]. It
implements a lightweight content app where markdown syntax is parsed and
transformed into HTML via SGML short references, then gets HTML5 sectioning
elements inferred (eg. the HTML5 outlining algorithm is implemented in SGML),
then gets rendered as a page with a table of content nav-list linking to full
body text, and with HTML boilerplate added, all without procedural code.

[1]: [http://sgmljs.net/docs/producing-html-tutorial/producing-
htm...](http://sgmljs.net/docs/producing-html-tutorial/producing-html-
tutorial.html)

------
ealexhudson
One benefit of XHTML not mentioned is that, as an XML spec, it can be embedded
into other markups. OpenDocument / OASIS ODF for example, uses it extensively
- the written documents are basically just HTML, which for many applications
is much more accessible than the OpenXML equivalent.

A lot of people used to emit XHTML with invalid string builders rather than
XML serialisers though. Much XHTML was ruinously broken.

------
pixelmonkey
> The pursuit of the semantic web has changed in the era of HTML5, which
> represented a rejection of XHTML — to me, a seemingly bizarre rejection of
> having to write well-formed XML as somehow being unreasonably burdensome.

This is a classic "worse is better" situation. HTML5 may be "worse" than
XHTML, from the standpoint of extensibility, namespacing, code cleanliness,
and so on. But HTML5 is simpler to write for people who knew HTML4, and easier
to get right using the one ubiquitous web development practice: staring at the
rendered result in your browser, which every web developer has installed. So
it's "better", and ends up winning.

------
zelly
The horrible "be liberal in what you accept, be conservative in what you do"
meme is the cause of this. At some point it was decided that it would be too
rude to emit a compiler error on someone's junk HTML, so browsers just added
hacks to make it work anyway. It has to be backwards compatible too; don't
dare break behavior of an old tag that someone out there depends on. There's
not much point in switching to XHTML2 with sane versioning, namespacing and
extensibility if it's also going to be hacked around because no one wants
their input rejected. It's hacks all the way down, by design, forever.

------
why-oh-why
This was not avoidable.

Had the XHTML2 standard been adopted by many, browsers would have _still_ had
to support all the other non-X HTML documents, which would have never
disappeared.

HTML5, with the exception of the new elements, just formalized the existing
web parsing strategies for better cross compatibility.

As a web user, I don’t miss the days of XHTML sites randomly _completely_
breaking because some tag wasn’t closed.

As a developer, I only miss XHTML2’s support for `href` on any element.

~~~
ChrisSD
> As a web user, I don’t miss the days of XHTML sites randomly completely
> breaking because some tag wasn’t closed.

Ironically that was a very rare event in practice. For it to even possibly
happen, three things had to come together which were still uncommon even at
XHTML's height:

1\. The page had to be written as XHTML

2\. The page had to be served as XHTML

3\. The page had to be parsed as XHTML

Usually at least two of those things weren't happening.

~~~
Vinnl
The second didn't happen precisely because people wanted to avoid the sites
completely breaking. So it didn't happen because people avoided XML, in
practice.

~~~
ChrisSD
That was my point. The "days of XHTML" barely existed outside the minds of a
relatively small number of developers.

~~~
zmix
It doesn't make much sense to state things, I can't back up with references (I
do not remember where I read it), but a few months ago I read, that by 2006
(or 2008) 60% of all web pages were XHTML. Now, whether that means, that they
have been served with the right media-type, I do not remember, but what I
know, is that "nerdy" places, and that includes Steam, HumbleBundle.com and
GOG.com, were all running on XHTML. So, it was not just a bunch "the minds of
a relatively small number of developers".

~~~
ChrisSD
I'm not sure where your information comes from because it sounds very suspect
to me. Sure XHTML was a thing in nerdy circles, I know because I was one of
them, but we were hardly the majority.

The vast majority of sites were "HTML 4.01" others were at best "XHTML 1.0
Transitional" (which in practice meant the same thing). Those using pure XHTML
were relatively few. And of those who did, no major site served it as such
because it would have locked out IE users, IIRC.

------
thayne
I think both applications and media would benefit from a split of html into
seperate language for defining app-like sites, and traditional content pages.
Html and the dom APIs are currently strange mix of content orinted semantic
elements and app oriented (often non-semantic) elements. Not to mention that
the default flow layout does more harm than good for most applications.

~~~
zmix
> I think both applications and media would benefit > from a split of html
> into seperate language for > defining app-like sites, and traditional
> content pages.

This! I would go a step further, even. The "web-applifier" community should
just leave the classic web and do their own:

* protocol (I am sure, HTTP is not ideal for serving apps) * GUI description language (document markup language for UI design, really?) * runtime (let them have WebAssembly and whatever they need) * each app then could have their own window, making it look like a traditional app

~~~
thayne
so... like Java swing or Adobe Air? I don't know all the reasons technologies
those failed to deliver, but I do think that being able to deploy from the
same platform that users use to discover the app (that is, the browser) as
well as almost effortless instant updates has a lot to be said for it.

------
nl
For those who agree with this:

The reason HTML5 became what it is is because many people wanted to see the
open web thrive as a competitor to closed, controlled eco-systems like the
mobile application development platforms.

Just 10 years ago this was a mainstream view - there were groups who were
fighting to give browsers web cam access so they could be used as video-
messaging platforms, groups fighting to give location access so we could write
location aware documents and apps etc etc.

I still believe this was the right decision.

~~~
klez
If you view this in the context of needing separate models for documents and
application, it means that you wouldn't need all that substrate of APIs for
documents (i.e, why would you need a webcam API or even AJAX to render a blog
article?).

So in that context I do agree with the article.

~~~
nl
_i.e, why would you need a webcam API or even AJAX to render a blog article?_

This seems like a lack of imagination. The modern scientific publication
Distill.pub makes heavy use of AJAX (eg [https://distill.pub/2019/activation-
atlas/](https://distill.pub/2019/activation-atlas/)) and it's easy to imagine
it using a webcam (eg, to demonstrate semantic segmentation)

------
sgeisler
I once searched for a way to send PUT requests using HTML forms because I
despise using JS for core functionality but still wanted to build a nice REST
API. Somehow I stumbled across XForms which was part of XHTML2 and supports
PUT and DELETE only to find out that no browser supports it.

Now I hope that the form extensions proposal [1] will gain some traction.

[1] [http://cameronjones.github.io/form-http-
extensions/index.htm...](http://cameronjones.github.io/form-http-
extensions/index.html)

~~~
hlandau
The most common "solution" for this is to add middleware on the server-side
which accepts a form parameter "_method" which supercedes the real HTTP
method. I believe Rails uses this.

~~~
sgeisler
I'm aware of this workaround, but it's still a sad state of affairs,
especially since POSTs have to be handled with more care because they aren't
idempotent. So the browser will warn the user if they really want to resend
the query if something goes wrong the first time.

------
darepublic
I got on board the js train as a young developer just at the right time, when
the term AJAX was something of a buzzword. Now my fellow developers give me
dirty looks when I suggest our largely static site can do without redux, or
that someone would conceive of building a simple accordion component in
anything less than React.

~~~
acemarke
fwiw, I _maintain_ Redux, and I completely agree that it would be unnecessary
for that kind of site :)

------
scandox
> The number of web browsers capable of consuming plain (X)HTML massively
> exceeds the number of web browser engines capable of consuming the modern
> application platform, a number which stands now at approximately two.

~~~
saagarjha
WebKit, Gecko, Blink?

------
roca
This piece gets many things wrong --- or at least fails to present both sides
of the argument.

WHATWG did not "usurp" the W3C. The W3C abandoned HTML development by focusing
on XHTML2, incompatible with HTML. This left an opening for someone to propose
backwards-compatible extensions of HTML, since the W3C was explicitly not
interested in that. WHATWG was formed to do this and produced HTML5. HTML5 was
adopted by industry, XHTML2 was not. In an attempt to stay relevant, the W3C
tried to stage a hostile takeover of HTML5. That attempt failed because the
W3C had blown their credibility by that point. However there were still some
advantages to having a W3C-approved HTML spec, so an agreement was reached
where the W3C could approve the specs produced by WHATWG.

There are technical reasons why HTML <object> was not suitable for audio and
video elements. For example, media elements need to expose media-specific JS
APIs (e.g. seek()), but the MIME type of an <object> can change over time due
to URL loading and DOM attribute changes, which would mean that the interface
exposed by the element would need to change unpredictably over time, which
would be a nightmare for developers. Also, there were very nasty legacy
browser compatibility constraints around (mis)use of <object>.

The author misunderstands, or misrepresents, the WHATWG's spec design
philosophy. Unlike the W3C, the WHATWG treated compatibility with existing Web
content as essential. That means the WHATWG specifies existing browser
behaviour where there is significant existing Web content that requires it.
The W3C, on the other hand, tended to assume that Web developers pay attention
to specs and that writing down conformance requirements would magically cause
all Web content to be updated to satisfy them. Those assumptions are not true.
(The idea that Web developers would migrate to XHTML2 because the W3C
proclaimed it as the future was in the same vein.)

XML syntax for HTML failed for various reasons but not because of the WHATWG
or browsers, which always supported XML syntax for HTML. One major problem is
ensuring that dynamically generated XML pages are always valid XML. It is very
easy to have bugs so that under some conditions (e.g. malicious user input)
the server outputs invalid XML and produces a "yellow screen of death". Common
examples of those bugs were bugs that allowed the Unicode 0xFFFE or 0xFFFF
code points to slip into the XML output, which are not allowed in valid XML. A
similar problem is when users interrupt a partial download of an XHTML file;
the file has unclosed tags, so a conforming browser will replace the partially
loaded and rendered document with a "yellow screen of death". This is not what
users or developers actually want. (This assumes the browser bends the rules
to allow partial rendering in completely loaded and validated XHTML documents,
which is something users and developers do actually want.)

~~~
jillesvangurp
Exactly, browser manufacturers had a need for specifications that were of a
higher quality than what W3C was producing in order to address (mostly)
accidental implementation differences between different browsers. What was
specified across existing specifications for html, css, and javascript was
just not covering what was being shipped by browsers. And as these
specifications were stuck in committees, browser developers had to just figure
things out themselves.

HTML 5 started out as an W3C position paper to start the work of creating a
backwards compatible successor to HTML 4 backed by several browser developers.
The proposal for this was rejected in favor of continuing work on the non
backwards compatible XHTML 2. As these browser developers had a need for a
spec, the WHATWG was formed to create what became HTML 5.

Several years later when it became clear that their specification was the
closest thing to describing what browsers actually do, W3C basically endorsed
it as a recommendation. However, the WHATWG continues to drive work as it has
been highly successful in producing the high quality specifications required
to achieve the high levels of interoperability between the remaining browser
engines.

XHTML 2 gradually became redundant as most of the functionality that web site
developers actually needed from browsers got absorbed by HTML 5. As browsers
could implement spec changes as they were happening, within a few years of the
working group forming, it had become a huge success in standardizing many new
features.

At the same time, many mobile browsers simply disappeared as Webkit (a fork of
KTML by Apple) and the Chrome fork of Webkit by Google became the norm on
mobile. This was important because many W3C working groups were dominated by
people from the mobile phone and telecom industry seeking to control the
standard for long forgotten things like WAP and the various mobile profiles of
XHTML. Once decent browsers appeared on mobile (i.e. browsers that supported
HTML 5), the need to continue to do work on XHTML 2 disappeared. Most of the
companies that then dominated the mobile web were out competed by Google and
Apple; neither of whom was a major mobile player at the time the WhatWG was
formed. By the time W3C endorsed HTML 5, Android and IOS were dominating the
mobile web to the point that even MS threw in the towel by first creating Edge
(an html 5 browser with no backwards compatibility for IE specific stuff), and
then recently just switching to Chrome entirely.

------
em-bee
i have been creating websites since the early 90s, and every single one of
them was an application. html was always generated dynamically, and never
static. any static html documents were embedded with a dynamically generated
navigation.

the whole idea of a web of semantic hypertext was never a reality. that idea
died with gopher. (anyone remember that?)

why? because gopher had a builtin navigation system that allowed you to manage
directories and document structures that didn't belong to the documents
themselves. semantic hypertext within gopher would have worked well. as would
have applications. but without gopher we were forced to reinvent that
navigation and squeeze it into our documents, overloading them with stuff that
didn't belong inside.

think of a library with books. what the semantic hypertext promised was to
make all those books into interactive texts where you can easily jump from one
reference to another. but those books still need a library to live in.

what the web ended up doing was to remove the library completely, forcing me
to reinvent the library within the book. suddenly the semantic document that i
want to send you not only contains references relevant to its context, but it
has to include the whole navigation for my library, because there is no way to
do that externally. with that navigation included, you are no longer getting a
semantic hypertext, but an application.

now, we get to write that application in javascript and actually run it on
your device, instead faking it on the server. but on the flip side, on the
server i can now finally go back to serving static documents. i can finally
serve semantic hypertext documents as they were meant to be served because i
can separate the application from the content, and i can treat the content as
static as it was meant to be treated.

i am not reinventing navigation logic in javascript. browsers never had
navigation logic in the first place. gopher had that. i always had to re-
invent navigation logic for every site i built and was forced to embed that
into html dynamically so that site visitors could find their way.

------
jgalt212
If semantic won out, there'd be less need for sophisticated search engines to
make sense of all _this stuff_.

In fact, I'd go one further: If gopher had taken up multimedia quicker and
then beat out the web, there would be no Google today.

These sort of things are not inherent to the web. People and entities want to
transmit data in the format(s) that are easiest for them. It's up to the
aggregators to make sense of it all.

~~~
makecheck
A perfectly-marked-up HTML document only solves the problem for text expressed
in HTML. It doesn’t solve any other format that people publish (even today we
have links to PDF and Word and Excel, etc.). And of course there is meaningful
information in formats that are not documents at all, like images. Some of
those images are pictures of text that we might want to read.

It requires some investment to come up with tools for analyzing files. I’m
wondering if the tools would have been as sophisticated if documents had
lowered their barriers. For example, how do you justify developing a machine-
learning model to look for more, if it _seems_ all your documents are already
semantically tagged with the details that were important to somebody?

------
gfodor
Oddly enough I just wrote an article about this as it relates to predicting
the arrival of the 3D ‘metaverse’:

[https://github.com/mozilla/hubs-cloud/wiki/The-Web-
Emergent-...](https://github.com/mozilla/hubs-cloud/wiki/The-Web-Emergent-
Metaverse)

------
commandlinefan
> If it’s an article I simply leave

And many times that choice is made for you: there’s a link to a Vulture.com
post on HN right now (about Disney archiving the Fox catalogue) and every time
I try to read it, Chrome crashes trying to keep up with the volumes of
JavaScript they add, ostensibly to keep advertisers happy.

~~~
acdha
Are you running Chrome on an old phone, or perhaps with some extension which
injects a lot of code? Looking at the page, it's certainly showing why ad
blockers are popular but it doesn't crash or use unusual amounts of memory on
either desktop or mobile Chrome:

[https://webpagetest.org/result/191027_BF_57a6cd57fc6fe629628...](https://webpagetest.org/result/191027_BF_57a6cd57fc6fe629628f8551448417fb/)

[https://webpagetest.org/result/191027_2G_fba74afe0c488b99e53...](https://webpagetest.org/result/191027_2G_fba74afe0c488b99e53f92d73883c9fd/)

------
thosakwe
I'm a bit conflicted. I like the idea of Web browsers as an application
platform, though it borders on "reinventing the operating system." The File
API, Notifications API, MediaSession, etc. all make it possible to essentially
have cross-platform desktop applications that users don't need to manually
install.

I also don't, and never really cared much for "semantic hypertext," linked
data, XHTML, XHTML2, RDF, nor is my browser usage predominantly about just
sharing/receiving information.

That being said, there's so much involved in writing a Web browser that
writing one from scratch would take years. There are only a few active
implementations, and I don't expect that change for a long time, if ever (at
least not without a lot of funding). So I can understand the author's
viewpoint.

~~~
james-skemp
> nor is my browser usage predominantly about just sharing/receiving
> information.

What do you use the browser for then? Do you primarily play
games/music/movies?

You mention applications in your comment; if that's not sharing/receiving
information, what are you doing in them?

~~~
thosakwe
Yes, primarily music, media, social media, mostly interactive experience.

I should have been a bit more clear about "sharing/receiving information," or
rather, reworded it as "viewing static documents."

------
diskmuncher
I guess no one would object if you want to write an SPA using HTML5 and ES2020
that retrieves an XML semantic Web document and displays it in all kinds of
CSS glory.

------
dustingetz
semantic web can only happen if paired with a new business model that is not
advertising and e-commerce. Also, to get alignment from coders they need to
fall into the pit of success. So it needs to be a new thing, separate and
distinct from HTML.

