Share This Post

Normalization for IT Data Warehouse Success

 

by John Pusey, jpusey@bdna.com

A number of large organizations are realizing they have a plethora of IT repositories such as discovery tools, security monitoring tools, configuration management tools, etc. The data in each of these silos is often useful for its own specific purpose but there is value to be realized by combining these into a single centralized IT Data Warehouse. For example, multiple inventory solutions can coexist within a large organization as a result of growth and acquisition. It is likely the UNIX servers are managed by a different inventory tool than the configuration management tool used to manage the Windows workstations. Perhaps there is a third party tool that gives a deeper insight into database installations at the organization or maybe there is an agentless continuous monitoring tool that has the latest count of assets and the type of operating system but not the installed software. In this age of analytics and big data there is value to be realized by taking these multiple IT sources and combining them to a single IT Data Warehouse. This becomes your organization’s one-stop-shop for all IT and Business analysis data needs. Think of it as your “go-to” repository for data lookups when performing internal software audits, contract and support renewal analysis, security vulnerabilities, provisioning and profiling, etc. This central IT Data Warehouse increases the return on investment for each of the individual tools (e.g., Inventory Tools, Configuration Management Tools, Continuous Monitoring Tools,Database Analysis tools, etc.) by making this data available to more people. If all of your various IT reporting and analysis tools are required to provide a feed to the central IT Data Warehouse you only need to evangelize and advertise the central IT Data Warehouse to your end users.

When undertaking an IT Data Warehouse or analytics project two often overlooked but critical steps for success are:

1.) Identifying the common data dictionary or IT catalog that you will normalize and map all of your various IT data sources to

2.) Defining and implementing the process you will put in place to maintain the catalog and your mappings between the data sources and the catalog

The first step, identifying the data dictionary plays several roles for the IT Data Warehouse. It provides a common language to all customers of the warehouse so that a “Laptop” is defined this way, a “Desktop Publishing Tool” is defined this way, and so on. The dictionary also provides a single software publisher name (or hardware manufacturer name) for all of the instances recorded in the various IT data feeds to the warehouse. This value becomes what your organization standardizes on for that publisher or manufacturer. This in turn simplifies how customers can extract data from the warehouse for consumption in their business and IT processes. So for example, the SCCM and Altiris data feeds to the warehouse contain references to software published by Microsoft and Adobe. However, the values for these software publisher names are not consistent between the two sources nor within the same source. By normalizing these feeds first before they are loaded into the IT Data Warehouse you now have uniform values for these publishers in your IT Data Warehouse (i.e., Adobe Systems or Microsoft Corporation), variances between how the different inventory tools (or even within the same tool) list the manufacturer or publisher such as abbreviations or punctuation are normalized out (e.g., Adobe Inc., Adobe Incorporated., etc.). You can think of the data dictionary as a metadata layer wrapping and unifying the underlying data sources to the warehouse. This metadata layer normalizes the underlying varied data sources to a common language in the catalog . For large organizations, the IT Data Warehouse could be populated with feeds from several different configuration and asset management tools as well as various security monitoring and inventory tools, not to mention the data feeds from the LDAP/User Management, HR, and Purchasing systems. By normalizing the data feeds to a common language at the point of entry to the warehouse you increase the value of the data for the end users by reducing the amount of time and effort required to scrub the data to make it useful. You also open up your warehouse to a number of technologies that can make the data more accessible to end users. For example, by normalizing your data sources to a common language you can then easily index the results with your enterprise search server (e.g., Autonomy or SOLR) so that non technical users (i.e., no familiarity with SQL) can access the data using basic search techniques similar to search engines like Bing and Google. Also your Business Analysts or those with basic SQL or excel skills can query the data without having to worry about the various clauses and filters to include in order to cast a wide enough net to get all of the intended results. Take for example, the query:

Select * from Hardware_table where type = ‘Laptop’

This is much more straightforward than the query:

Select * from Hardware_table where model = ‘D620’

Or model = ‘D610’

Or model = ‘E4310’

Or etc.…

This can be a non starter for a non-technical user unfamiliar with what models are laptops or not. It is also a time waste for the technical users as they have to continuously track new laptop models in order to maintain the queries’ filter clauses.

However, the dictionary is just part of the equation and will play a key role when designing and implementing the process to populate the warehouse. When designing the system one of the key technical discussions will be how to handle the mapping of the data feeds to this data dictionary or IT Catalog. During this design phase of the IT Warehouse, you should consider which approach you are going to take in regards to mapping your data sources to your dictionary/IT Catalog. There are several paths you can take when designing the repeatable mapping process

  • Will it be a manual mapping process you undertake on a recurring basis?
  • Will you try to build something in-house to automate this normalization and mapping?
  • Will you outsource this step of the process in populating the warehouse to a COTS Normalization product (e.g., BDNA Normalize)

In addition to the mapping question you will need to consider if you will be using a homegrown or community driven catalog to do your custom mapping and normalization against or if you will use a commercially maintained IT catalog. The commercial offering for IT Catalogs often have richer feature sets than just clean normalized market names for software publishers and hardware manufacturers. For example, commercial IT Catalogs will maintain software suite associations, multiple levels of categorizations for hardware (e.g., Computer, Network Equipment, Laptop, or Workstation) and software (e.g., Productivity Tool, Desktop Publishing Tool, Web Server, Database, etc.). Many are also extensible so that custom fields (e.g., SKU, internal packaging names, etc.) can also be maintained in the form of a private catalog. The point to consider when deciding if a commercial IT catalog is right for your central IT Data Warehouse needs is how many resources (i.e., headcount and time) do you want to put towards maintaining a catalog of IT data and categories as well as the mappings. The IT market place changes and a commercial IT Catalog will maintain these updates for you as part of the Licensing terms. Outsourcing the normalization, mapping, and IT Catalog maintenance to a third party software vendor frees up your resources to spend on analysis and mining of the data. With a commercial catalog you also benefit from the “crowd sourcing” effect that is a result from the vendor maintaining this catalog for a large number of customers.

To summarize, a critical success factor for your IT Data Warehouse project will be not only identifying the appropriate IT Catalog to act as the data dictionary but also implementing the repeatable and maintainable process for maintaining the catalog and mapping of the data feeds to it. Consider outsourcing this step in the process to a COTS product to free up resources for higher value activities such as analysis. By normalizing the various data feeds to your IT warehouse against a common IT catalog you open up your data to be consumed by a wider range of technologies and users. By putting in place a repeatable process to normalize your IT warehouse data feeds to a common language you will make it more usable for end users which in turn should equate to the long term success of your IT Data Warehouse project.

Share This Post

BDNA provides the most comprehensive, accurate view into an organization’s IT infrastructure in order to drive critical IT processes and projects around data centers, desktops and enterprise architecture. From IT administrators to managers to executives, BDNA delivers business-relevant information required to accelerate corporate initiatives. BDNA is a Gold sponsor of myITforum.com.

Leave a Reply