Understanding Microdata in HTML5

By: Brian J. Stewart

Microdata is a powerful new feature in HTML5 that enables web developers to embed machine-readable metadata or structured data in a web page. A typical web page contains many fragments of information that can be easily extracted by users, such as event, product, or contact information. This information is generally stored in relational databases, however the semantics (meaning) and structure is lost when rendered for web browsing. To a machine the information is all just text without context. Microdata enables developers to annotate any HTML element and to create structured data that is parseable and understandable by machines (programmatically). It ensures the meaning and structure of the data is not lost when displaying the information in a web browser.

 

Microdata Essentials

Before looking at how to leverage microdata, let’s start with essentials of the Microdata specification. Microdata is an HTML5 alternative to Resource Description Framework (RDF) which used Extensible Markup Language (XML) and required all web pages to be in Extensible Hypertext Markup Language (XHTML) format (HTML in well-formed XML) and microformat (using CSS class name to denote element meaning).

The fundamental data unit in microdata is referred to as an item. A microdata item can contain one or more named properties. A property is an attribute or field which contains specific information.

A microdata item and properties are defined by annotating (adding attributes) to any HTML element on a web page. Below are the standard annotations or markup defined in the microdata specification:

  • itemscope – Defines a microdata item that will contain one or more named properties

  • itemtype – A URL identifying the microdata item’s vocabulary (type of data), such as event or product

  • itemid – A global identifier for the microdata item

  • itemprop – Defines a microdata item property; the HTML element’s content contains the value for the property

  • itemref – Enables a microdata item to include  non-descendent properties of another microdata item; the reference contains the unique identifier of referenced microdata item

At the core of microdata is the schema or item type (referenced in itemtype property). The major search engines, including Bing, Google, and Yahoo maintain a collection of common schemas (Refer to https://www.schema.org/). There is a defined schema for many commonly used entities, such as an organization, person, movie, event, and product. The schemas are hierarchical and support inheritance. For example a Person is a Thing, as is a Product or Event. This allows properties to be shared across types and ensures consistency. Organizations should leverage these schemas when possible, even if the web site is an internal website. This ensures not only interoperability across websites, but saves significant effort in developing a well-defined schema.

For example, in the following code fragment, the DIV element refers to a 'person' (http://schema.org/Person) which also includes the person’s mailing address (http://schema.org/PostalAddress).

<div itemscope itemtype="http://schema.org/Person”>

        <div itemprop="givenName">Moe</div>

        <div itemprop="familyName">Szyslak</div>

        <div itemprop="jobTitle">Owner/Bartender</div>

        <div itemprop="address" itemscope itemtype="http://schema.org/PostalAddress”>

               <div itemprop="streetAddress">123 First Street</div>

                <div itemprop="addressLocality">Springfield</div>

                <div itemprop="addressCountry">USA</div>

               <div itemprop=”addressRegion”>North Takoma</div>

                <div itemprop="postalCode">49007</div>

        </div>

        <img itemprop="img" src="MSzyslak.jpeg" alt="" title="Picture of Moe Szyslak">

        <p itemprop=”description”>Moe is the owner of Moe’s Taven, a popular pub located in Springfield. Moe started the pub in 1989.</p>

</div>

A web page can contain multiple microdata elements of the same or different type. Each microdata item just needs to contain the itemtype attribute.

Leveraging Microdata

The three most important uses are:

Enhanced Internet searching – improved search effectiveness and information accessibility
Improved Corporate Intranets and Portals – improved information accessibility and information leverage for knowledge management systems
Enhanced user experience – enabling rich client-side scripting and browser functionality 

 

 Use 1: Enhanced Internet Searching

 

Microdata can be read and understood by major public search engines, such as Google and Bing, particularly if a standard schema is used (https://www.schema.org/). Microdata enhances websites and searchability through improved search effectiveness, information discoverability, and accessibility.

Microdata improves search effectiveness by providing context to the search engines that crawl public websites. Specifically, microdata facilitates the indexing and ranking of websites. For example, if a user performs a search for developer events in Chicago, IL, they will more likely find an event on a website that uses microdata and the Event schema (https://schema.org/Event) than on a website that has an HTML list of developer events on their website due to its lack of context.

Since search engines understand microdata they are able to add ‘rich snippets’, or additional information and links, along with the website listing in the search results. This improves discoverability and accessibility of information by providing links to specific pages in which the user might be interested. Information such as product reviews, breadcrumbs, ratings, people, postal addresses, and events.

For example, by leveraging Education Event schema and microdata (https://schema.org/EducationEvent), a user searching for developer training, would include not only a link to the Corporate Training Web Site, but also links to specific courses including date, cost, and location. This enables a user to discover specific training courses more quickly.

 

Use 2: Improved Corporate Intranets and Portals

 

Most corporate intranet and portal search engines can also leverage the microdata to improve search effectiveness, discoverability, and accessibility of information. Traditionally most internal search engines simply executed textual based searches. Search results were simply indexed and ranked based on the number of occurrences of a specific word. Not only is this inefficient, it also does not enable organizations to quickly see the relationship of data across systems.

In addition to improved searching, another challenge organizations typically face is that data is locked in various internal web systems, each designed to effectively perform specific tasks and automate specific processes. Data silos are a major obstacle for knowledge management systems which strive to put the right information in front of the right people at the right time (when needed).

Microdata has the potential to be the backbone that enables next generation knowledge management that pulls data from various internal websites into a cohesive and comprehensive view of data. Common vocabularies allow data relationships and linking. For example, next generation knowledge management enables organizations to leverage data through real-time relating of customer feedback, supply chain information, promotional material, internal contacts, and design and strategy meetings.

 

Use 3: Enhanced User Experience

 

Microdata is also a catalyst to enhance user experience of a website. By adding context to information and creating structure data, additional functionality can be easily added to a website to leverage this data. There are two ways to leverage microdata:

Rich client-side scripting – Microdata enables rich client functionality previously only available through significant and complex client-side scripting or browser plugins

Enhanced browser functionality – Microdata enables web browsers to interact with data contained within web pages in new ways

Rich client-side scripting

Microdata can easily be converted into JavaScript Object Notation (JSON) utilizing a variety of libraries. The microdata specification also includes Microdata DOM API which enables microdata to be accessed utilizing JavaScript. The data can be passed to external web services to add functionality to a website. For example, a map can be added to a website with markers for events (https://schema.org/Event) and locations (https://schema.org/Place). Another example is a link to a third party provider to find nearby hotel deals for a specific location (https://schema.org/Place). The possibilities with integrating with third party web services is endless. Microdata coupled with Microdata DOM API or jQuery enables seamless integration with minimal code.

 

Enhanced browser functionality

Over time web browsers will increasingly add functionality to take advantage of microdata.

An appointment can be added to the user’s calendar when a user clicks an Event (https://schema.org/Event) and the browser recognizes the data item type. A contact can be added to the user’s address book when a user clicks a Place (https://schema.org/Place), Organization (https://schema.org/Organization), Local Business (https://schema.org/LocalBusiness), Restaurant (https://schema.org/Restaurant), or Person (https://schema.org/Person).

As web browsers become more sophisticated, microdata will be increasingly leveraged to enable data sharing and interfacing with Office applications and other core applications to create letters to a business, presentations, or spreadsheets.

Web browser vendors will undoubtedly also add sophistication to leverage microdata to sell other products and services or tie into their web services to create ‘smart browsing’. For example, if a web browser recognized a Movie (https://schema.org/Movie), Music (https://schema.org/MusicRecording), or Book (https://schema.org/Book), they can right click on an item and have options to buy the movie, song/album, or book. The integration possibilities are truly endless.

Microdata is Powerful!

Microdata has the potential to revolutionize search effectiveness, information discoverability, and data sharing and linking. Businesses will find ways to integrate data spread across multiple internal web systems to improve productivity and leverage information. Microdata will facilitate the breaking down of data silos and empower next generation knowledge management systems and portals. As microdata is more widely adopted, web browsers will become ‘smart browsers’ and provide actions to leverage recognized information. Microdata has the potential to do what Resource Description Framework (RDF) and microformatting failed to accomplish, create a web of data.

Related Article(s)

  1. 10 Reasons Why HTML5 Matters to Businesses
  2. Understanding the Data Storage Options in HTML5

Additional Sources

  1. World Wide Web Consortium (W3C) HTML Microdata – World Wide Web Consortium (W3C) specification for HTML 5 Microdata

  2. World Wide Web Consortium (W3C) Semantic Web – World Wide Web Consortium (W3C) Semantic Web Standards site containing standards, vision, and related information for a semantic web or web of data.

  3. Schemas.org  – Web site containing standard microdata schemas. It is highly recommended that developers use the schemas on this web site, even if not all fields apply. This ensures the data is recognizable to “smart browsers” and other future data integrations.