
Semantic Web, Can it Happen? - messel
http://www.victusspiritus.com/2009/07/30/semantic-web-can-it-happen/
======
keefe
I work for one of the leading semtech companies. It's all about data
interoperability and it is inevitably coming. I am not convinced it will come
as the result of a company advancing a standard like RDF, rather than as a
grass root entity that grows up out of the many existing data formats. The
bottom line is that data interoperability is good and it's really not that
hard to accomplish. RDF puts forward a scheme to accomplish this with a
distributed, directed graph structure that incorporates universally
recognizable IDs. I cringe whenever I hear someone talk about how it makes the
machine more intelligent or how easy it is to do XYZ using semtech... It's
about making extremely precise descriptions of things and gives certain tools
(RDF, OWL and inference engines) to make working on abstract models easier,
but there is a hell of a lot of work involved in getting something really
useful going. As an aside, bing used to be powerset, one of the leading
linguistic semantics companies that I saw present in ISWC 07.

~~~
messel
Thanks for the education keefe. Maybe what I'm expecting isn't a fully
directed graph but a set of automated tools that can help associate disparate
data sets. Do you have any good suggested reading you could point me to on the
semantic web?

~~~
keefe
The basic information is pretty well summarized here
<http://www.rdfabout.com/> Automatically mapping disparate data sets is a
pretty hard problem that does not tend to automate well. My current company
produces tools for this and you can download a beta here
<http://www.topquadrant.com/products/TB_Composer.html> The basic technique is
to import your datasets (XML files or RDB or whatever) into RDF, then define a
master data model in RDF and define a set of sparql construct rules (or SPIN
rules, spin is one of TQ's proprietary languages) which says what I see data
with structure X in data source A construct structure Y in data source A using
the fields of X. The chief scientist at TQ has written a book on how to
effectively do modeling with RDF : <http://workingontologist.org/> There are a
number of other books and tools, you can checkout the semantic technologies
conference for exhibitors for a full list.

~~~
messel
Thanks I'll check it out. I've been using a simple approach of allowing folks
like Zemanta to extract meta tags from content (natural language text),
weighting them by confidence levels or reuse and then finding similar
combinations/weights. Link by content.

~~~
keefe
I'm not sure exactly what you mean, are you talking about entity extraction
with confidence levels ala Calais? That used to be clearforest before Reuters
bought them. I think entity extraction in general is so closely related to
natural language quantification that it is one of those generally tough
problems. In general, I feel work in this area suffers scalability issues and
highly optimized point solutions will do better for a while yet. I guess we'll
have to see how Bing ne Powerset does in mainstreaming this whole thing or if
it'll be just a bunch of hype like twine.

~~~
messel
Yup, Calais & Zemanta have similar APIs. Input text and get back tags.

~~~
andraz
It's much more than tags - in-text exact entities, deep categorization,
related news, etc :)

The exact disambiguation of mentioned concepts and entities is what is
exciting and underused by web developers right now.

Andraz Tori, CTO at Zemanta

~~~
keefe
I think sense disambiguation is certainly one of the most interesting
applications of semantic technologies. Here at TQ, the focus tends to be on
the exploitation of referential semantics and structured data to establish
master data models, merge data models and provide a platform for model driven
application development and analytics. However, the focus here tends to be
more towards corporate IT than mainstream web dev.

