Hacker News new | comments | show | ask | jobs | submit login
HTML5 Microdata: Why isn't anyone talking about it? (briancray.com)
79 points by briancray 2283 days ago | hide | past | web | 28 comments | favorite

Because it's boring. And no browser implements it. You're not doing anybody any favors by implementing it. That google rich data thing? It's done with html classes too, that works equally well, and we've had them for a while now.

Seriously, I like microdata, I think it's useful, but this kind of utopian, far-away 'we should use this because it might be better in the future' never got us anywhere. That is the lesson we should learn from XHTML2, not 'adding more attributes is cool'.

I strongly disagree that it's boring and doesn't do anyone any favors. There are all sorts of domain-specific metadata that could serve as a platform for other services. For example, Creative Commons licenses parse RDFa in referring documents to render copy-paste attribution information (http://wiki.creativecommons.org/RDFa#CC_Deeds_using_RDFa_and...).

Standardized metadata can serve as a platform for other services, not just functionality in a browser. And the common sentiment that "we should use this because it might be better in the future" is a hurdle most standards go through at some point.

Well put. I think there are very few examples of when people actually implemented a spec that didn't provide an immediate, personal benefit.

There are interesting ties to game theory here. You might see it as a variant of the Prisoner's Dilemma (http://en.wikipedia.org/wiki/Prisoners_dilemma).

On the other hand, you might argue that it's simply a coordination problem (http://en.wikipedia.org/wiki/Coordination_game).

In any case, it's somewhat shortsighted to look for immediate, personal benefits. Of course you need to look at the payoffs for implementing a spec, but I think it's wrong not to take longer-term possibilities into account.

Well, as Cray mentioned, there are a variety of places where someone might be tempted to make a json or xml based API, and we can start implementing that as plain HTML5, and writing Python and Ruby libraries to make it as easy to work with as json or xml. That's easy, it can be done today, and it will lay the groundwork for browsers to actually implement.

Someday, it would be nice if every single street address on the web was marked up in such a way that just from looking at the markup, a computer can unambiguously figure out the location you're looking for. Or a way for browsers to unambiguously know that something is a phone number and not a zip code (Android does a horrible job of this. Try texting your current location from the maps app.)

I guess, the semantic web remains a dream, but that's no reason to avoid adding semantic data, especially if you want computers to interact with your data directly.

fyi phone numbers already have an HREF spec - RFC 3966 http://www.rfc-editor.org/rfc/rfc3966.txt

<a href="tel:+44-555-5555">my phone number </a>

I agree that there's too much "utopian" technologies out there with impractical implementations. However, Microdata is already seeing Google support. Browser support I think could come, but we have to bring fuel to the microdata conversation rather than waving it off as nonsense.

Hear hear.

Until recently, one of the biggest publishers of such microdata (in its earlier incarnation, microformats) was Upcoming.org, an events startup acquired by Yahoo.

Gordon Luk, the lead engineer, wrote an excellent post describing his frustrations with microformats:

    [...] my experience with being a microformat publisher 
    has shown that things are exponentially more complex 
    than they let on in the “sales pitch.”
-- http://getluky.net/2009/01/08/a-warning-about-the-real-cost-...

HTML5 Microdata seems slightly more rigorous and abstract, but still, commingling presentational HTML with an informal data API is just asking for trouble.

The real reason for this, IMHO, is one of the last points of the article - No major browser has embraced it as a core feature. Getting this into say Mozilla or Safari would go a long way to make it standardized.

Also, I'm waiting for a usable "Receipt" microdata - imagine buying something, then when you get to the receipt page, it gives all the information about the transaction, ready to go into your financial management software, or into a program to keep track of the packages being shipped to you.

Then again, this may be because I hate Intuit with a passion...

I've used http://microformats.org/ in the past for marking up dates, addresses and the like.

Looking at the examples on the Google page (http://www.google.com/support/webmasters/bin/answer.py?hl=en...) of marking-up breadcrumb links though there is a clear step away from clean functional markup. It looks like the sort of code people make (have made) to add styles to containers and what not. Couldn't semantic info be simply added in any tag available thus:

<a href="/a.htm" typeof="breadcrumb1">a</><a href="/b.htm" typeof="breadcrumb1">b</> ...

Why would that be so hard?

Perhaps further detail could be added by having a typeof-sheet that references the id of a node and details the information it presents (like external CSS).

Microformats.org abuses <abbr> and title="", hurting usability and accessibility.

Microdata provides a less horrific way to express the data in question.

Arbitrary semantic data on the web at large is DOA, just like the semantic web.

If you think about it, a big part of the appeal of facebook is that, for many people, it solves all the actual problems semantic markup is supposed to be able to solve.

The article dismisses XHTML with little explanation: "all the ideology surrounding XML that never came to fruition".

AFAIK the goal of XHTML was extensibility and interoperability with other data formats, by allowing arbitrary XML elements from other XML namespaces to be embedded. It seems to me that this solves exactly the same problem (as microdata) and more in a much cleaner way. Why invent yet another format?

Neither solves 'the problem', if one describes it as 'make all/most web content semantically richer'. Both just provide a mechanism, but nobody is going to invest in tagging their data without a good use case. so, it is sort of a chicken and egg problem.

Worse, marking up my data in the database from which I serve my web pages may benefit me, but currently, marking up my data in my web pages just costs bandwidth.

From the article: "Unlike all the ideology surrounding XML that never came to fruition, Microdata is already being adopted by Google as part of their rich snippets to aid in providing richer search results"

O RLY: http://www.google.com/support/webmasters/bin/answer.py?hl=en...

Huh? They support RDFa, Microformats and Microdata (relatively) equally.

Exactly - the author has a clear anti-XHTML, pro-HTML bias, even to the point of telling us that Google don't support any XML-born technology.

"XHTML provided no actual benefit"

Yes it did provide great benefits that are now being destroyed. The benefit was that we actually had working, up-to-date XHTML parsers in each and every programming language. Message to web designers: Some people have to parse the crap you put in those pages and it's not just the browser makers! I know HTML parsing was never an exact science even with XHTML, but at least we were moving in the right direction.

XHTML1 depends on DTD for entities, which means that parser must fetch DTD from w3.org (from heavily-hammered server) or support DTD catalog and have one set up in the system (rarely available out of the box, not always possible to do). IMHO that breaks "works everywhere out-of-the-box" promise, and sometimes it makes it easier to just use HTML parser.

It's been 11 years since XHTML recommendation was published, and you still can't rely on XML parser for reading content on the web. Even documents labelled as "XHTML" are often ill-formed (and almost all of them are sent as text/html). Even if we were moving in the right direction, we were moving too slowly. Now that HTML5 parser is specced, it may be quicker to add it to popular languages than turning whole web around.

I agree that the DTD issue is a nuisance but it's a nuisance that can be solved in a few hours. XHTML has not solved all parsing problems but the quality of parsing results is infinitely better than it used to be a few years back because most websites do actually use well-formed XML (at least most of the ones I'm parsing).

I don't understand why we need yet another slightly incompatible pointy bracket syntax. Adding that syntax to all languages and ironing out all the quirks will take years and it won't replace the existing mess. It will just add to it. I see no progress at all. It's a pointless waste of resources.

I'd say that being able to parse XHTML isn't any easier than parsing HTML in that both requires an author who writes strict code. There's HTML strict, too, ya know.

Keep markup, style, script and data separated.

Microformats and microdata don't obey the first law.

How does one keep markup and data separated in XML/HTML?

I think this is what Facebook wants to be with OpenGraph!

That doesn't make sense. Care to elaborate?

Why it matters: Dumb eBooks Must Die, Smart eBooks Must Live http://ebooktest.wordpress.com/2009/07/21/dumb-ebooks-must-d...

Why Publishers Fear Metadata http://www.knowthyshelf.com/?p=30

Stop thinking it's limited to web browsing.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact