I've always been a proponent of placing all data in a script using json which you can consume easily without screen scraping.
And it seems to me that in real world web apps all the markup with microdata could be programmtically added. Create a new object with a microdata schema and get an ORM type object with an HTML write method. This way you never manually type microdata markup anyways.
<h1>Hendershot's Coffee Bar</h1>
<p>1560 Oglethorpe Ave, Athens, GA</p>
<div itemscope itemtype="http://data-vocabulary.org/Organization">
<h1 itemprop="name">Hendershot's Coffee Bar</h1>
<p itemprop="address" itemscope itemtype="http://data-vocabulary.org/Address">
<span itemprop="street-address">1560 Oglethorpe Ave</span>,
name: 'Hendershot's Coffee Bar',
line :'1560 Oglethorpe Ave',
The idea of the data island is to have the data separated from the content and let consumers use the data whenever they need it. I am now studying the possibility to use a link tag and have the data island external, like rss/atom to save bandwidth when 90% of consumers won't care about the data. And for those who care, they just load the external link and there you have it, all the data without scraping.
They're the URLs you see on Facebook that have #! in them.
If I’m writing some text by hand, I’m going to use markdown and just write:
# Hendershot’s Coffee Bar
1560 Oglethorpe Ave, Athens, GA
Edit: Perhaps that risk could be mitigated by having the human-readable markup issue a JS call to the data island, which would have the benefit of being DRY-compliant.