

Semantics in HTML 5 - brandonkm
http://alistapart.com/articles/semanticsinhtml5

======
cgranade
_Edit: Boy, do I feel stupid. What I proposed exists, and is called RDFa._
(<http://www.w3.org/TR/rdfa-syntax/>)

It occurs to me that the W3C already has an extensible semantics framework:
RDF. This framework can be represented in many different languages (primarily
N3 and XML), and is quite well-understood amongst semantic web people. The
trouble is how to associate RDF 3-tuples with (X)HTML elements.

To take the date example from the original article, <span id="date1">May Day
next year</span> could be represented in a machine readable format be the
following 3-tuple. (I am not an expert in N3, so please pardon any mistakes.
Also, if there's a good existing vocabulary for "equivalent to", please let me
know.)

<#date1> <[http://example.com/#equivalent>](http://example.com/#equivalent>);
date"2009-05-01"

In turn, this 3-tuple could be encoded in <meta> and <link> elements a la
Dublin Core, or could be stored in an external RDF/XML or N3 document
referenced by a <link> element. Perhaps a new element could even be introduced
(just one) that allows the direct embedding of RDF content in a manner similar
to the embedding of CSS content via the <style> element.

This solution would have the benefits of being extensible, flexible, robust
and well-understood. The biggest drawback that I can see is that developers
would have to learn RDF along with some common ontologies.

------
s_tec
Since HTML will be around for a long time, the author says, we need a
mechanism for adding new semantics to the language over time. Well, the W3C
already has a nice mechanism for doing that called "adding new elements". This
system works just fine, and there is no reason to define a monstrosity like
<div structure="header"> when defining a new <header> element would accomplish
the same goal. The only valid complaint against adding new elements is that IE
doesn't apply CSS to elements it doesn't recognize. Fortunately, a simple
workaround exists: <http://xopus.com/devblog/2008/style-unknown-elements.html>

~~~
jerf
I think that the situation would end up better in the end if, rather than the
W3C trying to define new element after new element after new element,
especially if we're going to take the viewpoint that the language will be
around after 100 years, we instead simply said:

* Use whatever elements you want, subject to some name limitations.

* Style those elements using stylesheets however you want. (CSS 3.0 finally, AFAIK, completes the set of CSS elements you need to finish emulating/implementing all HTML elements that exist up to this point, including table.)

* Let an entirely separate layer argue about semantics. Use the <link> tag to link in those semantics or something. (The ideal would be XML namespaces but this is HTML, not XHTML.) Define a semantic set that matches today's semantics, let the future define what it needs. Let this be the default if no semantic is chosen. Let a thousand flowers bloom, let the winners win.

Ultimately, the whole W3C process is flawed at its very core. It's some small,
essentially self-selected group of people/organizatiosn trying to decide how
the Web Will Be for the Next Ten Years. That's just stupid, and inimical to
the way of the web itself.

But this is just too darned _simple_ an idea to get "standardized"...

More seriously, one way I know I differ from the HTML and Semantic Web folk is
that I fundamentally disagree about the nature of semantics. Semantics are not
something a document _possesses_ , it is something that you _apply_ to a
document. Marking up text is a way to make it _easier_ to apply semantics, but
a sufficiently sophisticated (AI-complete) algorithm could apply semantics to
flat text just fine. Indicating what semantics you are using in the document
somehow (like an XML namespace) makes it easier for the producer and consumer
to agree about their semantics, but the consumer may still ignore it and may
still extract or apply their own semantics to the document without
contradiction. Think about it this way and my proposal is simply the only
natural way to proceed. Believe that "Today is <date>January 20, 2008</date>"
is actually any different than "Today is <menu>January 20, 2008</menu>" in
some "real" way and this W3C exercise of trying to define _THE_ tag set makes
sense. But it's all just tags. Just numbers applied to other numbers. Meaning
is brought by the reader in collaboration with the writer, there is no
intrinsic meaning.

~~~
seldo
A separate layer for semantics is a really excellent idea; something like
"cascading semantics sheets"* -- instead of cluttering your markup with tons
of extra semantic information, frequently duplicated, just use existing CLASS
and ID and attributes, plus CSS selectors, to specify elements and define
roles for them within the linked document. Thus if the browser understands
this extra semantic information it can use it efficiently, and if it doesn't
then you've not made your markup crazily crowded for no reason.

* except those would also be CSS, so maybe "cascading semantics documents", or CSD

------
eli_s
Allowing developers to create their own semantics as they see fit does not
create a more semantic web.

The whole point of having semantically meaningful documents is so that
_machines_ can better understand them. Having a system where each developer
creates their own "meaning hooks" is worse than useless - its a waste of time
and bandwidth - and leaves us in the same position we are in today where a
machine ignores markup and tries to infer meaning from context.

Standards and convention are important.

The only way to create a truly searchable and meaningful web is for us all to
use an agreed upon set of tags that add meaning.

~~~
brandonkm
> The only way to create a truly searchable and meaningful web is for us all
> to use an agreed upon set of tags that add meaning.

completely agree.

