
HTML Tags (1991) - mmoez
https://www.webdesignmuseum.org/web-design-history/tim-berners-lee-published-a-document-called-html-tags-1991
======
tannhaeuser
Linked original doc is at [1].

Interesting tidbits:

> _< H1>, <H2>, <H3>, <H4>, <H5>, <H6>_ > > _These tags are kept as defined in
> the CERN SGML guide. Their definition is completely historical, deriving
> from the AAP tag set._

This probably refers to [2].

Oh, and

> _...(not good SGML)...:_ <NEXTID 27>

This is in fact no SGML at all (NoSGML?), because SGML attribute minimization
allows to leave out the attribute name if it can be uniquely identified using
an enumerated token value but not arbitrary numbers eg. the following is
valid:

    
    
        <!ELEMENT e - - ANY>
        <!ATTLIST e myatt (true|false) #IMPLIED>
        ...
        <e true>
    

and short form for

    
    
        <e myatt="true">
    

[1]:
[https://www.w3.org/History/19921103-hypertext/hypertext/WWW/...](https://www.w3.org/History/19921103-hypertext/hypertext/WWW/MarkUp/Tags.html)

[2]:
[https://en.wikipedia.org/wiki/SGMLguid](https://en.wikipedia.org/wiki/SGMLguid)

~~~
nabla9
HTML DTD was made later and bugs were fixed in the first first standard
proposal in 1993.

Real world HTML in the wild frequently failed to validate, so the fact that
HTML was SGML application became irrelevant almost instantly. Nobody parsed
HTML using SGML parsers with DTD.

~~~
tannhaeuser
The official W3C validator site [1] begs to differ. And I am, in fact, parsing
lots of HTML5 using SGML (see eg. [2], prepared for an ACM DocEng 2019
workshop with a focus on preserving and aquiring HTML5 corpora into document
engineering and ML approaches for search, text extraction/summarization, etc.)

[1]: [https://validator.w3.org/](https://validator.w3.org/)

[2]: [http://sgmljs.net/docs/sgml-html-
tutorial.html](http://sgmljs.net/docs/sgml-html-tutorial.html)

~~~
nabla9
The official W3C validator site seems to agree with me. Did you misunderstand
what I said or can show me wrong. Just feed to it any widely used webpage, for
example [https://news.ycombinator.com/](https://news.ycombinator.com/) and it
will not pass.

Just be be clear, just because there are still uses for SGML does not make it
relevant in the big picture. Your use case seems to be the exception.

~~~
tannhaeuser
Don't know what you did exactly, but the official W3C validator site uses 20
year old DTDs for DTD-based validation, but then HN's markup uses
presentational elements/attributes from the HTML4 transitional/loose era
intended to ease migration to CSS back then. The errors show exactly what's
wrong with HN's markup eg. missing "alt" attribute on images where required,
use of long-obsolete elements, missing DOCTYPE, etc. so I guess it's working
as expected in suggesting improvements to your site's markup, doesn't it?

FYI: if you want to parse modern HTML 5 using SGML (with my HTML5 "mini"-DTD),
see [1]. For example to check the HN homepage, download it using curl, then
add a DOCTYPE to it ('<!DOCTYPE html SYSTEM "about:legacy-compat">'), then
invoke "sgmlproc" on it, and it'll just work and parse without errors (see
downloads and instructions on linked page).

[1]: [http://sgmljs.net/docs/parsing-html-tutorial/parsing-html-
tu...](http://sgmljs.net/docs/parsing-html-tutorial/parsing-html-
tutorial.html)

~~~
nabla9
Yes, but that is not relevant to my argument. Validator validating is
irrelevant. HN's markup is not wrong because it works. You use sgmljs to deal
with the unnecessary mess that SGML/HTML/XML started.

ps. Since you seem to know this stuff, where I can find standard DTD for DTD
before XML. DTD was defined using DTD, right?

~~~
tannhaeuser
Not sure what you're after exactly but DTDs were introduced with SGML (ISO
8879:1986 [0]) and then used in simplified form with XML (which is specified
as a simplified profile of SGML [1]).

The (historic) SGML-DTDs for HTML, including those used by W3C's validator and
early IETF DTDs for HTML 2.0, can be found at W3C's site eg [2], [3].

[0]:
[https://www.iso.org/standard/16387.html](https://www.iso.org/standard/16387.html)

[1]: [https://www.w3.org/TR/REC-xml/](https://www.w3.org/TR/REC-xml/)

[2]:
[https://www.w3.org/TR/html4/sgml/dtd.html](https://www.w3.org/TR/html4/sgml/dtd.html)

[3]: [https://www.w3.org/TR/2018/SPSD-
html32-20180315/](https://www.w3.org/TR/2018/SPSD-html32-20180315/)

~~~
nabla9
My question is this: Is there standard SGML-DTD for DTD? I have no access to
ISO 8879:1986, so I can't check it.

~~~
tannhaeuser
Not really. SGML (and XML) are "meta-markup languages", meaning you declare
your vocabulary yourself or use a ready-made one. There is in fact a simple
general-purpose vocabulary declared in an ISO/IEC 8879:1986 appendix
consisting of generic paragraph and heading elements, but it's not widely used
in that form.

~~~
nabla9
This gets close to my point.

Even people working with the standard don't want or don't need to SGML.
Similarly for CSS.

------
nabla9
> The design of the first version of HTML language was influenced by the SGML
> universal markup language.

HTML was designed as an application of SGML. Just like JSON-RPC is application
of JSON. HTML has DTD (SGML Document Type Definition). HTML was technically
SGML application until HTML5.

SGML comes from Latin and means "complex solution to simple problem."

~~~
tannhaeuser
SGML isn't _that_ complex. If you know the XML subset of SGML, there are only
a few additional concepts to learn (mostly markup minimization which is
designed to greatly simplify the directly authored form of a text document
such that you can write markdown-like syntax, with the canonical/internal form
being _exactly_ the same as an XML parser would see it). I'll give you that
the official ISO standard spec sucks to the point of being incomprehensible;
but then most markup-related specs, including the HTML 5 spec, do. This is
what Eliot Kimber (or was it another HyTime editor?) has to say about it (on
an admittedly not so well-known topic even by markup standards):

> _Why can 't people understand the SGML Extended Facilities as written and as
> standardized by the ISO?_

> _ISO standards are very hard to understand because they describe very
> technical things in an abstruse techno-legal vocabulary and reduced-
> redundancy style. In short, despite having great things to say, even the
> deathless prose of the HyTime standard tends to be unreadable and, quite
> frankly, to suck as informative literature. (I 'm a co-editor of it; may God
> have mercy on us.)_

~~~
nabla9
> SGML isn't that complex.

It's unnecessarily complex. The complexity and the features it has have no
purpose once you step away and look at the big picture. People don't want to
use it. It's easier to write your own dataformat than use and learn SGML. XML
was a move away. HTML5 was move away. Heck, just microXML is enough
[https://blog.jclark.com/2010/12/more-on-
microxml.html](https://blog.jclark.com/2010/12/more-on-microxml.html)

Starting from scratch something like s-expressions or JSON would have been
better starting point than SGML/XML/HTML.

------
TheVikingOwain
A few months ago I some high school kids job shadowing me to see if they
wanted to get into development. I was showing the Web dev part of my job and
was asked if I went to college for that. After thinking about it I realized
the img tag was suggested my sophomore year of college and formally accepted
my senior year. [0] So: A) no, I couldn’t have. B) I’m old.

[0] [https://thehistoryoftheweb.com/the-origin-of-the-img-
tag/](https://thehistoryoftheweb.com/the-origin-of-the-img-tag/)

