
Introducing schema.org: Search engines come together for a richer web - Uncle_Sam
http://googleblog.blogspot.com/2011/06/introducing-schemaorg-search-engines.html
======
rauljara
All I could think while reading through the getting started was: that is an
awful lot of added text. After a little more thought: that is an awful lot of
added work. And while it won't be hard to have tools that make the process
easier, the sort of work that goes into adding that data can never be
completely automated (otherwise, we would have no need for it). Given that all
the search engines will be using it, all major sites basically have to
implement this or they risk falling in their rankings.

So, at the end of the day, Google, Microsoft, and Yahoo just made web
development more expensive. They probably also just made the web a better
place, too.

~~~
blauwbilgorgel
Sites that don't employ this don't risk falling in their rankings. This added
data allows for richer snippets (which absolutely increase clickthrough
ratio), but that won't directly make you rank higher (or lower it their
absence).

If your website employs an SEO or webstandardista, you should already have
your sites marked up with metadata. Reviews and rich breadcrumbs etc. have
been around and supported for years now.

I suppose that since Google wants to solve these problems algorithmically
first and foremost, that many of these structures already get recognized.
Right now you don't _have_ to mark-up your breadcrumbs, for them to still
appear as rich snippets on your search result listing. Google recognized the
structure without added mark-up.

For now, I will build the new types in my CMS like a good web developer. That
won't cost me any time in the future, and now I'll have a way to separate
myself from those that won't add schema's or metadata to their mark-up. So, at
the end of the day, I just got more expensive :)

~~~
benhoyt
Here's what Google Webmaster Central says about schema.org:
[http://www.google.com/support/webmasters/bin/answer.py?answe...](http://www.google.com/support/webmasters/bin/answer.py?answer=1211158)

"Google currently supports rich snippets for people, events, reviews,
products, recipes, and breadcrumb navigation, and you can use the new
schema.org markup for these types, just as with our regular markup formats.
Because we’re always working to expand our functionality and improve the
relevance and presentation of our search results, schema.org contains many new
types that Google may use in future applications."

"Google doesn’t use markup for ranking purposes at this time—but rich snippets
can make your web pages appear more prominently in search results, so you may
see an increase in traffic."

------
ryanisinallofus
This is going to sound curmudgeonry but it is seems like one more way search
engines want to use your data without giving you the page view. It makes allot
of technical sense and I can imagine some really great ways to use this data
but in the end I guess I would just need to ask "what's in it for us- the
content providers?"

~~~
jeromec
I think that's shortsighted thinking. This is about improving the user's
search experience. If that happens I think everyone wins.

Most mainstream users are _not_ tech savvy. Imagine a traveling person arrives
in a city and spontaneously decides to see a movie. They enter the search
'good movie playing in nowheresville'. As it stands now their query will
likely be matched by the keywords 'movie', 'playing', and 'nowheresville'. The
returned results might include a news article about local theatres, with no
actual focus on reviews. The searcher might get frustrated and just decide to
rent a movie instead. However, with schemas in wide use search engines will
know _exactly_ what web sites are talking about movies and whether it's in the
context of reviews. The searcher can then be passed on to the relevant site.

In other words, do you think it's better to tell search engines this is _sort
of_ what I have or this is _exactly_ what I have?

~~~
famfamfam
Information in specific types — including reviews — exposed using
microformats, RDFa or microdata has already been used by Google for over 2
years, they call it "Rich Snippets", and it does improve the quality of
experience for users, assuming that you equate an increase in clickthroughs to
mean that the user percieves that page to be more useful compared to other
SERPs results (and anecdotally I always go for results which include rich
snippet information gleaned from pages with the required semantic
enhancements).

This announcement is not the proposal of a new technique, but rather the
extension of one which is already working and is a good thing for the web.

------
Loic
For most of my "quick searches", I already have the answer from within the
search results listing. It looks like it will go one step further, we are not
going to have to leave the results page to have complex answers.

I am not sure if I want to provide all my hard work in a format which will
maybe help the search engines a bit, but mainly the spammers a lot as they
will be able to automate the creation of content farms even more.

Mixed feelings... all the world data in a well structured format is a wonder
but at the same time, what will be the incentives to create such an easy to
digest content if the world at the end do not even know you are the one how
produced it?

Kind of the old media against new media dilemma but applied to the new media.
Interesting.

~~~
jaynate
Bingo, this my first response as well. What happens when users stop clicking
through to content because it's being served up by Google, Bing or Yahoo?

I guess it could actually hurt them as well. If users aren't providing
information back to the algorithm in the form of a click through related to a
search term, don't the search engines also risk losing a key signal of
relevance?

~~~
ryannielsen
Why unilaterally assert this will decrease click-throughs?

If I see immediately relevant data for restaurant hours, movie times, a
person's bio, etc, I'm _far_ more likely to click-through and start looking at
a menu, making a reservation, or getting more background.

They may well expect _increased_ click-throughs leading to more site traffic
for those who adopt.

~~~
rapind
For me this varies based on device. If I'm at my laptop then I'll usually
click through, but if I'm on my phone I won't unless I have to.

It takes too long to render sites with loads of images and adverts on my
phone, so I always dread clicking a search result.

------
izendejas
Assuming that authoritative sites, at least, don't abuse these schemas, this
will help all search engines and data mining/nlp researchers build better
models. The biggest gain isn't quick view of search results, it's that now
search results will be better in general because Google et al will now
understand if a page is really about a person and in many cases, who
specifically it is about to point out a specific example.

Information extraction, just got that much easier. Hello, baby semantic web.

------
PaulHoule
if they actually wanted people to use this they'd write better documentation.

if I picked 10 web devs off sitepoint and instructed them to add 10 assertions
to an HTML page and didn't give them a validator, I'd be amazed if more than 3
got 80% of them right.

i like the taxonomy though, but honestly i think instances are much more
interesting than types... rather than saying "George Washington" a
:US_President, can't we say "George Washington" is :George_Washington where
:George_Washington is his identifier in Freebase?

------
ivan_ah
Very cool, but why didn't they use the existing RDFa format/keywords but
invent their own? [ see <http://en.wikipedia.org/wiki/RDFa> ] [ itemprop vs
property ??? ]

I guess when you are big G, you can do anything you want.

------
andresmh
I am wondering why they didn't go for something that allows for richer
semantics like XHTML+RDFa <http://www.w3.org/TR/rdfa-syntax/>

~~~
hoppipolla
<http://schema.org/docs/faq.html#14>

It seems that they chose Microdata over RDFa because the latter's syntax was
deemed to be unwieldy.

It's not really true that RDFa is more extensible than microdata, there are a
small number of missing features related to XML data, but nothing too
significant for these use cases; see, for example, [1]

[1] [http://bnode.org/blog/2010/01/26/microdata-semantic-
markup-f...](http://bnode.org/blog/2010/01/26/microdata-semantic-markup-for-
both-rdfers-and-non-rdfers)

------
mfhepp
Hi all: Two comments: 1\. The GoodRelations RDFa vocabulary remains the
superior way of sending rich data to Google and Yahoo; even Bing just
announced they will support it in the future.

2\. As for tooling, here are two super-easy ways of adding rich GoodRelations
data to your site: \- <http://www.ebusiness-unibw.org/tools/grsnippetgen/>
This creates a snippet of a few additional divs/spans based on your data;
simply paste it before (!) the respective visible content. You are done ;) For
products, see the effect in
<http://www.google.com/webmasters/tools/richsnippets> \- if you are using a
standard shop package, e.g. Magento, osCommerce, Joomla/Virtuemart, or
WordPress/WPEC, there are free extension modules that add GoodRelations:

<http://wiki.goodrelations-vocabulary.org/Shop_extensions>

A similar module for Prestashop and Oxid eSales is in the making.

Best Martin Hepp

Disclaimer: I am the inventor and lead developer of GoodRelations.
GoodRelations is free to use, remix, or adapt under a Creative Commons
license.

------
takinola
This sounds rather similar to the Facebooks OpenGraph protocol
(<http://developers.facebook.com/docs/opengraph>). I wonder if this is related
to Google's planned entry into social. It would help if they knew the context
of search terms so they could match it up to ads in Gmail for example. or use
your gmail conversations to help reorder search results...

------
stbullard
Am I alone in thinking schema.org is a direct answer to Facebook's Open Graph
Protocol?

------
tantalor
Did Google just deprecate microformat?

~~~
astro1138
Yeah, what is itemscope and itemprop? That's a kick in the butt for us
Microformats users.

~~~
ericmoritz
The issue I found with microformats when I was evaluating it was is that there
was no way to automatically transform a microformatted file into a native data
structure without first knowing the schema. Writing a microformat-to-JSON
parser is hard because you have no way of knowing which classes are
significant and which are just there for styling.

~~~
tantalor
Isn't that true of schema.org (microdata) too? That's not a very apt
criticism.

------
ciex
The question remains: Who needs this?

I am still not really convinced that it is possible to integrate handcoded
schemas for a wide range of use cases into search results in a meaningful way.

The solution Google proposes here will also restrain the content of websites
in a lot of ways if it becomes widely adopted. Look at the recipes-example, it
defines markup for including nutrition information for recipes:

"Can contain the following child elements: servingSize, calories, fat,
saturatedFat, unsaturatedFat, carbohydrates, sugar, fiber, protein,
cholesterol"

Every company that serves recipes on the web and decides not to offer this
information because it deems other properties of recipes more important is now
at a disadvantage. Google will show more information about the recipes of
their competitors and presumably also rank them higher because they have
included 'valuable' markup information in their recipe.

This approach favors shallow information ressources over complex ones as the
former can be more easily parsed by metadata-crawlers.

~~~
juiceandjuice
I do, because I can get better search results:

Chicken recipe where calories<=350 and carbs<=20g

~~~
MatthewPhillips
Enjoy your allrecipes.com dishes.

~~~
juiceandjuice
I usually do.

ranking>=4 and reviews>=50

------
ecaron
They neglected a format for job listings, which is unfortunate. We (LinkUp)
put one together at <http://wp.me/pJYG0-1H>, it will be interesting to see if
it gets any attraction and if they're actually seeking external input.

------
netgineer
For a site put together by search engines, the URL structure for the site
search is atrocious. "#q=Product" and not "?q=Product"? Who thought that was a
good idea?

Site also looks a bit like spam. Needs more Firefox-esk awesome graphics, imo.

~~~
blauwbilgorgel
As an SEO that site search structure is a good idea, especially since they
don´t employ canonical or noindex/follow on the search pages.

Let us say we both link to <http://schema.org/docs/search_results.html#q=test>
and <http://schema.org/docs/search_results.html#q=product> .

Since it is a hashtag we link to, all the link juice will get consolidated in
the search page, which passes it on to the rest of the site.

If we had linked to <http://schema.org/docs/search_results.html?q=test> and
<http://schema.org/docs/search_results.html?q=product> we would have created
two (low-quality and near-duplicate) pages in the google index.

The same principle applies to pagination. If you can do javascript pagination
#page=2 vs. dynamic pagination ?page=2 you are nearly always better of with
the hash pagination. If you do it right, you get the benefit of a single page,
with the added bonus of being able to bookmark a certain page and having
browser history working.

------
callmeed
What's going to stop people from gaming this by doing things like adding fake
5-star reviews to their website? (especially brick and mortar stores that show
up in google maps/places)

------
baconner
Putting aside the technology and schema decisions they've made IMO its great
to see these three throwing their weight behind some common metadata even if
it does step on some toes.

Now if they'd only add some schema targeted towards downloadable public data
sets. I'm dying for a good global public dataset search beyond competing data
markets and data.gov.* sites.

~~~
Maxious
They should start with <http://wiki.ckan.net/Schema_for_Packages> for datasets

There's a lot on schema.org for social media websites/business lookup but
should be more for open data. I was looking for a linked data schema to
represent financial transactions (X paid Y $999 for Z) but schema.org only
goes so far as Sales. XBRL explictly states it is not for "A transaction level
activity".

------
fbnt
I thought that snatching the type of content of my pages should be their job.
I'm so naive.

No problem, I'll add a few kbytes to every single page of my sites, so I can
replicate the information I've already stated in a number of sitemaps, video
sitemaps, headers and XML files.

P.S: I'm not agaisnt standardization at all, I'm just saying, this comes a bit
late.

------
ericmoritz
Whatever happen to HTML5's data-* attributes?

------
Kilimanjaro
Nothing an xml data island could not have solved. Even an external xml data
island with internal references for better performance and less clutter. Even
an external JSON data island would have been better for web consumption.

Microformats? I'll pass.

~~~
Kilimanjaro
Ok, here is my proposal. Use a link to an external resource with all the
information you want attached to that page like:

    
    
        <link rel="data" type="data/json" href="http://example.com/recipes/chicken.js" />
        <link rel="data" type="data/xml" href="http://example.com/recipes/chicken.xml" />
    

The resource can be cached, served static or even included in the page inside
a <script data> tag

Here is the html:

    
    
        <div itemid="1234">Chicken marsala</div>
        <div itemid="1235">Fried chicken</div>
        <div itemid="1236">Chicken curry</div>
    
    

Here is the data island in json:

    
    
        {
          head:{
            title:'',
            source:'',
            version:''
          },
          items:[
            {
              id:'1234',
              type:'recipe',
              title:'Chicken marsala',
              ingredients:'here...'
            },
            {
              id:'1235',
              type:'recipe',
              title:'Fried chicken',
              ingredients:'here...'
            }
          ]
        }
    

Here is the data island in xml:

    
    
        <data>
          <head>
            <title>here</title>
            <source></source>
            <version></version>
          </head>
          <items>
            <item id="1234" type="recipe">
              <title>chicken marsala</title>
              <ingredients>here...</ingredients>
            </item>
            <item id="1235" type="recipe">
              <title>fried chicken</title>
              <ingredients>here...</ingredients>
            </item>
          </items>
        </data>

~~~
ericmoritz
actually that's what rel="alternative" is used for; an alternative
representation of the current page. You something like this could be done:

    
    
      <link rel="alternative" type="application/event+json" href="http://example.com/events/2010/06/03/schweet.json" />

~~~
ericmoritz
Actually after reading the microdata spec, there's a
application/microdata+json format that would probably work better:

<http://dev.w3.org/html5/md/#application-microdata-json>

So you'd have an alternative resource

    
    
      <link rel="alternative" type="application/microdata+json" href="http://dev.w3.org/html5/md/#application-microdata-json" />
    

That file would look like this:

    
    
       {
        "items": [
        {
          "id": "http://example.com/events/2010/06/03/schweet",
          "type": "http://schema.org/Event",
          "properties": {
            "startDate": ["2010-06-03"],
            "location": [{
              "id": "http://example.com/places/my-crib",
              "type": "http://schmea.org/Place",
              "properties": {
                "url": ["http://example.com/places/my-crib"],
                "address": [{
                  "type": "http://schema.org/PostalAddress",
                  "properites": {
                    "addressLocality": "Knoxville",
                    "addressRegion": "TN"
                  }
                }]
              }
            }]
          }
        }
        ]
       }
    

Who knows if Google will actually use that file though.

------
MatthewPhillips
So Google both ranks on page speed and encourages you to double your bits by
adding a lot of cruft to your html. Wonderful.

~~~
andrenotgiant
at this point, page speed is affected by things like http requests and
javascript, something as insignificant as a couple kilobytes of compressable
text would have an impact measured in the microseconds

~~~
robrenaud
Maybe your pipes are a lot fatter than mine.

kb/us == mb/ms == gb/s

On the other hand, I agree that a even a kilbotye of extra data to have
fundamentally better search result pages is a big win for everyone.

------
mgkimsal
Seems that if you use this, your documents won't be able to be considered
'valid' by validators (tested on the w3 validator). Unless, perhaps, you just
mark your doctype as html and be done with it?

~~~
akmiller
Exactly, I wonder why they wouldn't incorporate the data-* attributes to help
describe this data AND conform to HTML5 specifications.

~~~
wycats
Looks like they're using HTML5 microdata: <http://dev.w3.org/html5/md/>

