Hacker News new | past | comments | ask | show | jobs | submit login
W3C Approves RDFa 1.1 for HTML5 (w3.org)
66 points by mindcrime on Aug 31, 2013 | hide | past | web | favorite | 42 comments



I'm a believer in the semantic web.

I highly recommend this app: https://bitbucket.org/alexstolz/rdf-translator for converting between:

Microdata, RDFa, RDF, JSON-LD, N3 and N-Tuples

I recommend Apache Fuseki if you want to do some querying over tuples.

I'm currently trying to build a semantic living document editor and I've been experimenting with Fuseki to do so.

http://samsquire.github.io/livingdocuments/


Another thing of note here are the different databases RDF traditionally uses.

Take a look at: https://github.com/openlink/virtuoso-opensource

This is the triples store traditionally used. You will also want to look in to sparql.

An example link: http://www.ibm.com/developerworks/xml/library/j-sparql/

The syntax is a bit weird, but it's not that different from a document store.

For those of you who use the Facebook API, you can also get RDF data out with the rdf Accept header. Something neat to think about.


Very cool. I'm also a believer in the Semantic Web vision, and we are starting to incorporate some elements of the semweb stack into our projects. Apache Stanbol is another wicked cool SW related project from the ASF, and we're doing a lot of work with it now. Fuseki is also very cool.


Hold on, whats this XHTML5 they mention? We dont want XHTML5!

The W3C is just a weird organization, basically owned by some members who pursue their own agendas, producing reams of "standards" that, largely, no one uses, or even implements. Tim Berners-Lee is a great guy but it is a failed organization as far as I can see.


We dont want XHTML5!

Speak for yourself.


Maybe that is a little harsh but I noticed that. Some working groups have obscure people who enjoy generating standards and come up with alphabet soup standard abbreviations and then spend a lot of effort convincing the world that they need it.

RDF is one of those things, JSON-LD is another. Most efforts I have seen so far involve trying to convince me that I need it. It wasn't me looking for a solution. There is a big difference.


That might be true in some cases, but - for me - that's certainly not true of RDF (and the whole familly of semantic web standards produced by the W3C). After a lot of selfstudy on something quite similar to RDF/triples I was really glad to come across the W3C standards that were already there. Using the 'frameworks' of RDF,RDFS and OWL enourmously reduces my workload and greatly enhances my capabilities for the A.I. development I'm doing.


But there are also some working groups that don't seem to understand the concept of a _standard_.


> what's this XHTML5

It's HTML 5 that's an XML dialect. That's right: XHTML 2 died, but you can use XHTML5 instead.


I wish we'd made HTML5 from XHTML, actually. But alas.


Why?


I like requiring closing tags, while I think the Html5 spec says some tags can be implicitly closed (like <li> for instance).

Could be wrong though, my wife often points out that it happens...


I used to be like you. I believed in the proper correctness of of markup; proper closing tags, proper nesting. But I've come to see the light. The WWW succeeded and flourished because of it's faults and it's lazy error checking. Thousands of non-technical people writing their own html. Thankfully it didn't have to be perfect and it worked.

I still like tidy clean code, but I don't agonize over it's perfection.


I hear that repeated, but I don't find it convincing. A simple grammar would make it easy to find errors and kick them out immediately. Instead, we ended up with shitty ambiguous standards (common in "friendly" text-based protocols) and still have to deal with cross-browser compatibility.

If HTML had error checking and kicked out unspecified/ambiguous syntax, people may have left off tags (decided not to bold or make a list), omitted some images or something.

It's hard enough writing a spec - there will be unforeseen combinations resulting in conflicting behaviour. The answer isn't to give up and make the spec loose.


HTML5 isn't loose -- it has a well-defined procedure for handling errors.

Which is worlds better than XML's "every error is a fatal error" approach, since real-world XML is often non-well-formed (and, when validity checking is possible, invalid), and tools ignore that to varying degrees and recover or ignore just like they do with older versions of HTML.

(my favorite example of all time, with that, is the ability of XHTML documents to have their well-formedness status depend entirely on the HTTP Content-Type header, and at the time none of the major toolchains actually handled it)


Can you detail this often non-well-formed XML? I've not seen any XML parsers that handle invalid XML. Except for people who wrote their own XML parser and think a simple regex is enough.

Validation is another issue, and I don't think you'll find anyone saying that the myriad XML addons are simple or easy :).

The mixing of HTTP and HTML also seems like a bit of strange hack to me. And let's not start talking about well-formed HTTP; I'd be surprised to find many real-world clients or servers actually following the inane HTTP spec. Just like mail clients don't always handle comments in email addresses.


Well, the classic example is XML + rules about character encoding. Suppose I send you an XHTML document, and I'm a good little XML citizen and in my XML prolog I mention that I've encoded the document UTF-8. And let's say I'm also taking advantage of this -- there are some characters in this document that aren't in ASCII.

So I send it to you over HTTP, and whatever you're using on the other end -- web browser, scraper, whatever -- parses my XML and is happy. Right?

Well, that depends:

* If I sent that document to you over HTTP, with a Content-Type header of "application/xhtml+xml; charset=utf-8", then it's well-formed.

* If I sent it as "text/html; charset=utf-8", then it's well-formed.

* If I sent it as "text/xml; charset=utf-8", then it's well-formed.

* If I sent it as "application/xhtml+xml", then it's well-formed.

* If I sent it as "text/xml", then FATAL ERROR: it's not well-formed.

* If I sent it as "text/html", then FATAL ERROR: it's not well-formed.

Or, at least, that's how it's supposed to work when you take into account the relevant RFCs. This is the example I mentioned in my original comment, and as far back as 2004 the tools weren't paying attention to this:

http://www.xml.com/pub/a/2004/07/21/dive.html

These are the kinds of scary corners you can get into with an "every error is a fatal error" model, where ignorance or apathy or a desire to make things work as expected ends up overriding the spec, and making you dependent on what are actually bugs in the system. Except if the bug ever gets fixed, instead of just having something not quite look right, suddenly everyone who's using your data is spewing fatal errors and wondering why.

Meanwhile, look at things like Evan Goer's "XHTML 100":

http://www.goer.org/Journal/2003/04/the_xhtml_100.html

Where he took a sample of 119 sites which claimed to be XHTML, and found that only one managed to pass even a small set of simple tests.


HTML has strict implementation requirements and loose authoring requirements. I recall that it is a goal of HTML that a significant percentage of "anyone" can create useable documents with it, but the closest I can come to a citation at the moment is this: http://wiki.whatwg.org/wiki/FAQ#Why_does_this_new_HTML_spec_...


One of the things I really like about HTML5, actually, is that it recognizes that real-world HTML is not perfect... and then specifies exactly how parsers should deal with imperfections.


It worked because the rendering engines picked up the slack - Gecko, Trident, and Webkit are all magnitudes more complex for having to reinterpret pages for the nebulous correctness.


Exactly. I weep to think how many CPU cycles have been wasted processing bogus, mal-formed HTML on the web. :-(


One of the big differences between HTML4 and HTML5 is that implicit closing tags are defined in the spec, and not just a consequence of browser implementations. So "error handling" in HTML4 has essentially become a feature in HTML5

For XHTML, one of the big ideas was that you could use an XML parser, and embed custom XML. Since an XML parser errors on invalid input, it can be smaller and faster. Having an XML parser also means embedded XML is easy to deal with. However, all this falls down when you consider that nearly all XHTML was sent as HTML, so the XML parser never kicked in. All this meant you required properly formatted files.


Sadly, Microsoft deserve a fair amount of blame for this, for not ever really supporting XHTML in IE back when it was so dominant. Oh, I mean, they "supported" it in that it would render, but they didn't support the application/xhtml+xml content-type, which mean that, in turn, nobody served their XHTML as application/xhtml+xml, and so on.

I won't say the lack of widespread adoption of XHTML was all Microsoft's fault, but they definitely played a role.


Nothing stops you from closing tags if you want to (like me). It is very much allowed by html5.


Plenty of things do. Such as web analytics tools & plugins that can only work in a non-xhtml-compliant way.

My favourite was a google tool (can't remember what it was - google website optimizer?) that required you to use some godawful <script> construction that was necessarily broken. And you'd have thought google would know better.


You might not be stopped from going <tag></tag> but you are stopped from going <tag /> 90% of the time, which is annoying.


That space before the closing slash is actually not allowed in XML, but was required for browsers that couldn't interpret XHTML. XHTML was broken from the get-go; the only virtue it had was that it taught a generation of web developers to be consistent in their markup.

(By the way, since sibling nodes have no specified order in XML, there's no reason why one paragraph should have followed another on a web page consistently, and the <ol> was an oxymoron.)


Because maybe we would not have to reinvent the wheel (making it oval, by the way) for each and every "new" feature that come along HTML5 (I'm looking at you, Web components).


Tim Berners-Lee is a great guy but it is a failed organization as far as I can see.

There is a reason why I pushed the TAG to finish this:

http://www.w3.org/wiki/Evolution

(Clue: TimBL is part of the TAG.)


It exists even in WHATWG HTML. It it just not encouraged.


I tend to agree the W3C is a bit pointy headed. Still, some kind of referee is needed to keep the browser-makers playing fair. Firefox's switch from RDF to JSON for bookmarks was anecdotal evidence that the RDF format was just too obtuse for a lot of practical use cases. Way back I tried modelling data in RDF; it was challenging and not well received internally. By comparison the JSON modelling I've done recently has been intuitive, easy, and well received. Although I used be be a big fan of RDF, its become a relic, and modelling relationships isn't so hard a problem that you need RDF to do it, IMHO.


Quick poll: is anyone here using RDF in their technology stack? What has your experience been like?


Yes, I've been working with RDF(S) and OWL in the past 3 years. I find it extremely usefull for reasoning, and in general pretty cool to work with. Having all your data in a forward directed graph, including your Ontology (data structure) has some very interesting potentials. When I started with this project 3 years ago, I had to work with SPARQL 1.0 (the query language for triples/RDF), which was a bit troublesome sometimes. But now that SPARQL 1.1 has become a reccomendation, it is becomming well implemented, and working with it maked my queries quite elegant and fun again. Some people say the semantic web has long been dead since guys like google started their own semantic formats. Personally I think that's true at all. The vision/potential of the semantic web remains potent as ever. The tools are slowly but surely maturing. The field perhaps seems not that big, but it is surely active and growing. I think in time we'll see plenty of semantic web stuff entering the general field, and RDFa may be another helpfull bridge tool along the way.. Though in all honesty, I havn't worked with RDFa at all yet, so can't say much about that, but will surely have a look at this spec soon.


As an early adopter it was though (since 2008). However, if you are starting today it is not bad. And RDFa (lite) is a really easy way of getting in to it. Just make you site indexable using schema.org and RDFa for Google et al is a nice start.

I work for UniProt (a databases for biologists) and for us it is really a nice way of providing data to other users to integrate into their own systems. We also provide a sparql endpoint at beta.sparql.uniprot.org for other scientist to run analytical queries without having to deal with running a large datawarehouse of their own (beta.sparql.uniprot.org). i.e upload 5kb of SPARQL queries to us instead of downloading a 100Gb of data them to run SQL in their own systems.

These days there are a number of off the self systems that work well enough. Your main danger in the beginning is trying to deal with existing (large) public data before you are ready. The philosophical/reasoning side of the semantic web can also be very confusing but can be ignored in the beginning. Just use RDF as graph data format and SPARQL as a query language and you are good to go.


How do you like working for UniProt? Where are you guys based?

In the future I'd like to work at a place like this, so if you could tell me more I'd be grateful. (My email is in my profile.)

Thanks!


It's pretty cool. We use Apache Jena and some related Apache projects as our main SW stack, and it works pretty well. If you aren't real experienced with SPARQL, RDF, etc. and want to see some cool demos of what the Semantic Web enables, check this link for a taste:

http://wiki.dbpedia.org/OnlineAccess?v=6yg

The biggest fly in the ointment right now, IMO, is that working with larger datasets, with queries that require complex inferences via OWL, are crazy computationally expensive. And while there is research underway on the subject of parallelizing this stuff so it can run on clusters, I wouldn't say that that is a solved problem yet. So you can need some pretty honking fat software if you want to do really complex stuff.


I use it in a question answering system. DBPedia/Basekb/other random data sources backed by a search index and sparql allows you to do some cool things.


  <p xmlns:ex="http://example.org/vocab#"
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
    Two rectangles (the example markup for them are stored in a triple):
    <svg xmlns="http://www.w3.org/2000/svg"
      property="ex:markup"
      datatype="rdf:XMLLiteral">
    ... svg ...
    </svg>
  </p>
Does `property="ex:markup"` and `datatype="rdf:XMLLiteral"` seem to anyone else like a phenomenally bad idea? Don't inject namespaces in values! How would you escape that anyway?

Is there some reason we don't have e.g. <svg ex:property="markup" rdf:datatype="XMLLiteral" />, or can you not do that with XML? And if so, why not? It seems like it would be very useful for mixing in data like this.


Property and Datatype are in the HTML namespace* and their values are namespaced identifiers (in XML, QNames).

The identifiers are namespaced so that if you have differing vocabs with the same property names (or datatypes), you can distinguish which one you're talking about.

Your proposed transformation means that Property lives in the example namespace, meaning it has a completely different interpretation, which is specific to the example namespace.

*: (before someone pulls me up on this, it's not 100% correct)


ah, so 'property' and 'datatype' are intended to be paired, and intended to be used to store namespaced values? I can't really find anything describing them as such, but then they're about as difficult to search for as it's possible to be :/ if you have a link, I'm curious and I'd appreciate it, but I'll take you at your word :)

Still leaves a bad taste in my mouth (you can't have two on the same element, for example. unless you can whitespace-separate them? and still, what if your value contains a colon?), but I guess less so.


At least it doesn't look actively harmful to what we already have.


Who cares ? Templates are JSON data anyway, in well-established meta-media domains: http://schema.org/AssessAction

All you end up arguing for is how well your Text Editor implements things like git or LightTable watches over text objects.

Sublime text, for example, only gives a deprecated Skeumorph of the microwave when in Find/Replace mode. Compare this to vim/Fuf,Unite, the fact that in this eco system one also discovers Gundo, grep, etc. easily pluggable within this system. Don't do any of it __at__ compilation, so who wholly cares? (Holy C, anyone?)

Vim enables enhanced Python (mocka chaining permutations of tags, WebObjects) development along with things things like Zen Coding and haml linters, which do we decide; does it matter today, tomorrow? And what have you learn'd in the meantime ?

I for one welcome our COBOL Overlords: https://github.com/eevee/project-euler/blob/master/heteroglo...




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: