HTML5 Microdata, Schema.org and Rich Snippets – Explained and a Free Tool
Rich snippets are all the rage these days. Ever since Google started enhancing their search results with these extra tidbits of information, everyone is rushing to update their web sites with the metadata to enable them. So what is the benefit of having a “rich” search result for your site? Good question. Other than giving the search engine user a little bit of extra bit of detail, I suppose there’s also a subtle psychological factor that kicks in. Someone might be more inclined to click on a search engine result that has a 5 star rating and a friendly face than one that doesn’t. Plus, they’re just plain cool. Who doesn’t want to add bling to their search results? But this only scratches the surface. There’s much much more to them than that.
Instant information aggregation: It’s only a matter of semantics
Rich Snippets, as Google calls them, are actually semantic markup. The idea of marking up some sort of document with meta information for the benefit of machines is not a new idea. Semantic markup is as old as information technology its self. For example, a Word document contains metadata about its author, and a digital photo contains meta data about the camera it was taken with. You might, for instance, store your digital snapshots in a photo archiving program which uses this semantic data to filter your photos by date taken, lens type, flash used, etc. So, in essence, metadata is data about data.
It’s should be clear, then, how this “data about data” can be extremely useful to search engines. It can provide a search engine the ability to derive a semantic meaning from a document’s meta information rather than having to rely purely on the abstract, human understandable, concepts within the text of the document. Searches can become less about keywords in text documents and more about relationships between semantical data types.
To illustrate this point further, consider the following search: Find all restaurants with a 3.5 star or better rating on the Las Vegas strip that specialize in Italian OR Mexican cuisine AND are open after 11 PM on Sunday nights AND do NOT require reservations. On the semantic web, rather than a list of links to restaurant web sites that may or may not match your given criteria, you might get a list of “restaurant result objects” that DO match exactly that criteria and never even have to visit the restaurant’s web site. This is where the real power of semantic data lies. Instant information aggregation.
This “semantic web”, also, is not a new idea. In fact, Tim Berners-Lee himself envisioned the world wide web as a kind of “Semantic Network Model” and even the earliest HTML specifications included the concept of meta tags, which you are undoubtedly familiar with. Later iterations, such as XHTML, took this idea a step further. Most notably is the RDFa specification, which has been around for quite some time.
These later meta data specifications brought with them the concept of a hierarchical type system. Within these type systems each data type (i.e an abstract representation of a real world object) might have any number of subtypes. So, you might have a base level abstraction such as “Thing” which has a derived type such as a “Business” and then a further derived type such as a “Store”, and further still a more specific type of store such as “Book Store.”
The latest and greatest HTML5 introduces yet another form of semantic web data called Microdata. You can view the documentation on http://schema.org. Whether or not Microdata will supplant the earlier semantic markups seems unclear at this point. However, the really interesting thing about Microdata is that all the major search engines have helped develop it and are beginning to standardize on it. E.g Google currently recommends its use for its rich snippets.
Adding microdata to your html does require a little leg work, but its advantages are likely worth the effort in the long term as it appears search engines are starting to make use of it for some really interesting stuff such as Rich Snippets. I have a feeling we’ll be seeing other uses for it as well as HTML5 starts to really take hold.
Marking up your pages with Microdata
Adding Microdata to your html is a relatively painless process, but you do need to consider the semantical relationships between the data on your page. Essentially, each element you want to specify as an “item” has an itemscope (i.e, where the item begins and ends) and a number of descriptive properties which may either be primitive types such as strings, dates, or numbers, or which may be other “items” which in turn have their own item scope and properties. Consider the following example of a “Book.” A Book has an author property, which is a Person item. In addition the Book has a publisher Property which is an Organization item. Thus, we have a single item that has a relationship between two other data types.
<div class="listItem" itemprop="itemListElement" itemscope itemtype="http://schema.org/Book">
<img itemprop="image" src="footfall.jpg" />
<div class="arating" itemprop="aggregateRating" itemscope itemtype="http://schema.org/AggregateRating">
<span itemprop="ratingValue">3.9</span> stars based on
<span itemprop="reviewCount">25</span> reviews
They first appear as a series of dots on astronomical plates, heading from Saturn directly toward Earth...
<span itemprop="author" itemscope itemtype="http://schema.org/Person">
By <span itemprop="name">Larry Niven</span>
<span itemprop="author" itemscope itemtype="http://schema.org/Person">
<span itemprop="name">Jerry Pournell</span>
<div class="pubInfo" itemprop="publisher" itemscope itemtype="http://schema.org/Organization">
<strong>Published by:</strong> <span itemprop="name">DelRey Books</span>
What type you actually ascribe to your items is, of course, dependent on your content. The type hierarchy on schema.org is fairly extensive, but it also allows you to define your own types, if you need finer grained data types than what the current spec provides.
As for the rest of the nitty gritty details, well the documentation at schema.org is pretty clear so I won’t rehash it here.
A Microdata Parser
To help with marking up your web pages with HTML5 microdata, and for discovering microdata embedded on other websites, I have developed a schema tool which can parse a page and give you a graphical overview of the data types embedded within the html. You can see the relationships between the semantic types on your page, as well as all their properties. Clicking on an object in the diagram will display the full list of properties and the html it parsed.
This simple web page is marked up with Microdata: http://tools.webmoves.net/microdata/example/
View it with the microdata parser here:
You can see each Microdata item embeded within the html.
Meta Utopia or Meta Baloney?
It seems unclear at this point whether the idea of a truly “semantic web” will ever be fully realized. The idea of being able to search not just for keywords in text documents, but for many different related types of data matching a vast array of criteria holds great potential. However, the skeptic inside me sees a few problems.
- Not all abstract concepts can easily be described within a simple
schema. E.g what is “art”
- The data has to be abundant. Creating meta data and classifications is extra work with no real immediate benefit. There’s currently no real “killer application” that rewards people for adding semantic markup to their pages, so the vast majority won’t see any need to do it.
- The data has to be reliable. Spammers will inevitably exploit it thereby making the data useless. I already see pages ascribing types to data that doesn’t fit the schema. For the semantic web to function the data must be reliable.
Ultimately, only time will tell if the semantic web evolves like its proponents envision. In the mean time, it is definitely worth experimenting with microdata, if only for the rich snippet goodness.