By: Brian J. Stewart
The cornerstone of any Enterprise Document Management Strategy is an effective taxonomy and metadata model. According to Merriam-Webster Dictionary, a taxonomy from a biology perspective is “the process or system of describing the way in which different living things are related by putting them in groups”. A taxonomy from a document management or content management perspective is the process of classifying content into groups. Each group has its own unique characteristics, metadata model, content producers and content consumers.
A taxonomy from a document management or content management perspective is the process of classifying content into groups. Each group has its own unique characteristics, metadata model, content producers and content consumers.
For example, a Work Instruction or Standard Operating Procedure is for a specific Department and relates to a specific System, Equipment, or Process. The Work Instructions or Standard Operating Procedures are created, reviewed, and approved by specific users and read by users whose operational responsibilities require its use.
An effective taxonomy and metadata model design is dependent on a methodical approach. Below are the high-level steps to an effective design:
Define clear objectives and goals
An effective taxonomy and metadata model must be designed with specific business objectives and goals in mind. The goals must be more than just ‘organizing’ content in order to drive real business value and results. Below are a few common objectives and questions which will facilitate the design around the objectives:
Improve content accuracy by ensuring the content is reviewed by the appropriate individuals
Improve content consistency by ensuring similar documents leverage a common set of document templates
Ensure legal departments can find all content related to specific litigations without manually sifting through volumes of contents
Ensure regulatory departments can respond to specific regulatory inquiries
Increase the discoverability of content to drive operational efficiencies
Increase productivity through business process automation
|Facilitate the integration of enterprise systems to drive operational efficiencies
An effective taxonomy and metadata model must be designed with specific business objectives and goals in mind. The goals must be more than just ‘organizing’ content in order to drive real business value and results.
Classify content types
The first step in classifying content is to identify the types of content. The content classification process requires striking the right balance in the number of types. Too many content types lead to a taxonomy that is too granular while not enough content types lead to a taxonomy that is too broad. Either extreme has implications on the effectiveness of the taxonomy. For example, segregating content into hundreds of thousands of types might make sense from a scientific background, but it will adversely affect the productivity of both content consumers and content producers. Specifically, it will lead to uncertainty for content producers when selecting the content type and attribute values for a new document. Any uncertainty when producing content inevitably results in inconsistencies and misclassification of content which makes it more difficult for content consumers to locate all relevant content. In contrast, if the taxonomy is too broad (i.e. small number of types), content consumers, such as legal and regulatory departments, will struggle ensuring they located all relevant content or will need to sift through records to find what is needed.
In order to strike a balance as content types are identified, an analysis should be done to determine which types are merely subtypes of the same type vs. unique content types. For example, a Packaging Insert and Patient Information Leaflet are really just different types of Labeling Documents with the same or similar attributes but a different Labeling Type. A Standard Operating Procedure is a Standard Operating Procedure regardless of the Department who owns it. There is no reason to have both, an IT Standard Operating Procedure and a Manufacturing Standard Operating Procedure content types. Adding an attribute such as Department Owner is typically sufficient. Types of documents should not be organized based on owner (department), but rather by whether a document type has a unique metadata model.
Define metadata model (attributes/fields) for content type
The next step in the process is to define the metadata model or the attributes for each content type. This process involves defining each attribute’s name, label, data type and whether single or multi-valued. It also is important to identify the source of dictionary values for each selection list.
Too often the metadata model is focused only on what is needed to locate documents through browse and search. Although these attributes are indeed important, the attributes beyond those for locating content typically offer the greatest opportunity to drive business value. Specifically, the following categories of attributes are critically important to consider, even though they generally require more business and technical analysis:
Attributes for automating business processes and integrating systems – This category includes attributes that are required to automate a business process, such as identifying reviewers and approvers of a document or propagation of metadata to and from one system to another. It is important to carefully consider the integration points and the data contract requirements.
Attributes for establishing content relationships – This category includes attributes that are required to link content to other records within a repository or an external system. It is important to look at the relationships of information and content.
Attributes for facilitating reporting and driving business intelligence – This category includes attributes that are required to generate meaningful business reports or feed big data. It is important to consider what insights and metrics business users hope to gain, as well as identifying the data sources and relationships required to meet those objectives.
After developing the initial metadata model, it is important to revisit the content types. The metadata model may reveal opportunities for the consolidation or reclassification of content types. Two or more content types should be consolidated if their metadata is nearly identical, even if all attributes don’t apply. In this scenario, it is better to customize the user interface to hide attributes that don’t apply than to create separate content types.
Lastly, the metadata model should be reviewed in context of other systems to eliminate data redundancy across systems if possible. It is essential to not capture identical data in more than one system, as it inevitably results in data inaccuracy and user productivity issues. However, if redundant attributes are truly required in two or more systems, it is best to automate the propagation of data from one system to the other in order to eliminate duplicate data entry. Attributes populated by downstream processes improves the quality of data while also meeting taxonomy goals and objectives.
Test taxonomy and metadata model
The last step in defining an effective taxonomy and metadata model is to test the proposed design with real business data. Too often this step is deferred until after system development. Completing this step during informal or formal testing is too late in the software development lifecycle. Rather than face expensive modifications and project delays due to taxonomy and metadata model design changes later in the project, it is better to do the due diligence upfront and validate the design prior to system development.
To test the taxonomy and metadata model:
|Create a spreadsheet with a tab for each content type; each tab should contain the file name in the first cell of each row and a column should exist for each attribute.|
|Have business representatives populate the metadata for each document.|
|System populated (whether by current or external system) attributes should be verified to ensure sufficient data is specified to populate the field programmatically.|
The sample size for each content type should be as large as possible. For global systems, business owners from multiple countries and regions should partake in this activity. In addition, if the system is multi-departmental it is also important to have representatives from each department participate. Including all business owners facilitates the identification and resolution of issues with the taxonomy or metadata model. All feedback from business owners should be analyzed to revise or fine-tune the design.
Benefits of an Effective Design
An effective taxonomy and metadata model provides several key benefits, including:
|Improves information accessibility and discoverability – Enables business users to locate relevant content more easily.|
|Facilitates business intelligence through reporting – Enables data to be turned into valuable insights.|
|Increases user productivity through business process automation – Enables business users to do more in less time through reduction of manual tasks.|
|Ensures compliance with regulatory and legal requirements – Ensures regulatory and legal inquiries can be fulfilled and guarantees regulations are met.|
|Provides consistency data models across systems that lowers support costs – Ensures data consistency across systems which improves supportability of systems.|
|Enables enterprise application integrations – Provides the ability to integrate systems and processes to enhance user productivity.|
An Effective Taxonomy Must Be Aligned With the Business Goals
An effective taxonomy and metadata model not only meets the content management requirements, but is also aligned with clear business objectives and goals. This enables businesses to drive more value from the document management systems. A well designed taxonomy is not too granular or too broad. The metadata model should include attributes for locating documents, establishing content relationships, automating business processes, integrating systems, and facilitating reporting and driving business intelligence. Furthermore, it is important to test the proposed taxonomy and metadata model with real business data to ensure the design meets the requirements and objectives of all business stakeholders.
Lastly, organizations derive many benefits from an effective taxonomy and metadata model design. The key benefits are improved information accessibility through effective search and discoverability of content, increased business intelligence through meaningful data, and enhanced productivity through business process automation. A successful enterprise taxonomy strategy is critical to driving the most business value from enterprise document management investments.