

Crowdsourcing the semantic web - nimbix
http://lexandera.com/2009/04/crowdsourcing-the-semantic-web/

======
olegp
Vertical search engines which scrape a large number of sites already do
something like this and some may already have the tools described in the
article. That being said most use regular expressions instead of XPath because
of malformed markup. CSS selectors are another option.

In my opinion the other side of the problem with uptake of the semantic web is
that the tools and formats used to describe and access the data are kind of
heavy weight. It would be nice if there was a simple way to define new data
types as well as store and access the data. Perhaps something like Google Base
with a bit of server-side JavaScript for scraping thrown in?

~~~
Maciek416
We're working on something remarkably close to what the article describes, but
not taking the search engine approach you allude to. Rather, we're trying to
make it into a useful multi-tool close in spirit to systems like Pachube, a
component that coders/bloggers/site authors/whoever can drop into their
projects.

And you're right. Semantic formats are very heavy weight. There are a lot of
useful things that can be done with semi-semantic data before we achieve full
linked-data across the web, if that even ever happens (you could argue for and
against such visions, IMHO). Check it out at: <http://scrapmetl.com/> and give
us a shout on Twitter @Maciek416 and @corban if you're interested in playing.

------
fizx
Also see <http://parselets.com> for an open source implementation of this from
the guys (tectonic and I) who brought you SelectorGadget.

------
finin
This reminds me of the W3C's annotea project --
<http://www.w3.org/2001/Annotea/>

