Data Hubs Boost Business Integration

Here we go again -- circling back to a proven concept that has always been a good idea: the data hub.

Hub-and-spoke architectures are a common, practical design for integrating any complex set of linkages such as telephone networks, airline flight connections or distribution centers.

In similar fashion, data hubs minimize the spaghetti of point-to-point linkages between applications and provide key business process integration points in the enterprise.

So what's new? Finally, key suppliers and commercial products are emerging in this area, and we may not have to build these things from scratch anymore (see story).

If you haven't heard the term before, data hubs are part of your transaction-processing architecture. Data hubs are a practical way of integrating key data across multiple transaction-processing applications. Their purpose is to support operational integration across all or a subset of the enterprise. If they integrate the whole enterprise, they are sometimes called enterprise data hubs. Data warehouses perform the same data integration task but for decision-support purposes such as historical trend analysis.

What's a data hub?

What is a data hub? A data hub is organized around the integration of a single type of data, such as customer, product or order data, to be used across multiple systems. It's updated in near real time to provide access to current information for operational purposes. A data hub is therefore subject-oriented, integrated, updated on a regular basis (volatile) and contains the most recent view of the data (current-valued and not archival).

One type of data hub provides reference data used across multiple transaction systems. For example, a customer or product data hub can be used to support order processing, distribution and service. But data hubs can also be created for transaction data such as customer orders or supplier purchase orders that may be sourced from a distributed set of applications but still need to be integrated.

Most large corporations already have a few de facto data hubs. In the old days, they may have been called systems of record or master files. However, these legacy data hubs are typically not supporting the current operational integration needs of the enterprise because they were designed around very narrow integration requirements and not the entire enterprise.

Perhaps you have a customer data hub for sales, but not service or a part data hub for manufacturing but not engineering. These de facto, legacy data hubs are huge opportunities for the creation of enterprise data hubs when they are up for renewal.

Enterprise data hubs, at minimum, contain those elements of data necessary to operationally integrate at the enterprise level. For example, if an organization requires aggregated customer credit exposure for operational use across the enterprise, then it would be contained in the enterprise customer data hub. If the enterprise requires global inventory visibility for operational purposes, it would be contained in some sort of inventory data hub.

A highly integrated corporation should have a large number of data hubs; a very decentralized corporation should have a smaller number of data hubs or perhaps none. Watch out for scope creep on your data hubs; it's best to have a clear operational business case for each element of data required in the hub.

Extremely large, complex or global organizations may want to deploy data hubs at the subenterprise level such as regional customer data hubs. Those hubs can then be linked to an enterprise hub in a hub-to-hub data architecture if enterprise integration is also required for that subject area. In the above example, the regional data hubs for customer may contain more data elements than required in the enterprise data hub. Hub-to-hub architectures are perhaps more technically complex but less politically complex because data ownership may be easier to manage.

Data hubs vs. data warehouses

What is the difference between a data hub and a data warehouse? Both the data hub and data warehouse are subject-oriented and integrated. However, the data hub is optimized for operational integration requirements and the warehouse is optimized for analytical integration requirements. Therefore, a data warehouse doesn't need to be updated as often as a data hub. A data hub might be updated hourly, whereas a warehouse might be updated weekly or monthly.

A data warehouse typically contains archival data such as a customer's address changes over time or a history of all their order activity for the past 10 years. A customer data hub would likely only contain the customer identification, current address and a few additional elements to be used in transaction processing, and an order data hub would likely only contain open orders that haven't been shipped yet.

There is a risk with data hubs. The hub of any complex architecture becomes the lowest common denominator of the system. If an enterprise data hub has a poor data structure or slow distribution mechanism, it impacts all related applications.

Many of the largest business problems attributed to IT are related to poorly designed data models in legacy data hubs. For this reason, it's critical to replace aging legacy data hubs and/or implement new data hubs with extremely well-thought-out designs. The data elements contained in enterprise data hubs become the de facto data standards for the enterprise, whether this is intentional or not. In fact, a data hub is one of the few effective enforcement strategies for data standards.

Replacing legacy data hubs is usually a difficult task because of the number of applications that rely on their data. The best strategy (and perhaps the only workable approach) to replace a legacy enterprise data hub is to assume that it will need to be backwards-compatible until all applications accessing it can be updated or replaced.

Implementing a new data hub for a subject area that hasn't been integrated across the enterprise before is relatively easy. The best approach is to build the data hub for use by a single new application but with a design for future enterprise use. Then new applications can come on as time and money permits and data elements can be added to the hub as required to support the new applications. It's absolutely critical that the original design of the new hub takes as broad a view of the future needs of the enterprise as possible.

Enterprise data hubs don't have to be designed as monolithic mainframe systems. Although they can have extremely large storage and processing requirements in a large corporation, they can still be designed using technical architectures that can scale up and out to allow for the enterprise data hub to grow and shrink in size and cost as the enterprise requires.

Enterprise data hubs don't have to be accessed real-time where there are performance or international networking limitations. In these cases, the appropriate subset of enterprise data is replicated directly to applications that need it using a push or pull data distribution mechanism.

Who owns the data?

Data ownership becomes a sticking point for data hubs. The data owner is usually defined as the organization empowered to define the rules of data management such as accessing, creatiing, updating and deletiing a set of data.

The data steward is usually defined as the organization responsible for administering those rules. The obvious data steward for an enterprise data hub is a corporate IT organization. Stewardship of an enterprise data hub is a relatively simple task if the rules for data ownership have been clearly defined. Since the enterprise data hubs are integrated views of the subject area, data ownership is typically shared across multiple organizations. However, shared ownership can mean no ownership. It's important to clearly establish a single owner for each data element or group of elements in the data hub.

For example, the credit line data element in the customer data hub might be owned by the credit organization, while the customer shipping address might be owned by the order-fulfillment organization. It can be even more complex in large companies, where data ownership must be defined by the actual value of the record. So European customer data might be owned by a European organization, and Asia-Pacific customer information might be owned by the Asia-Pacific organization. Sounds like a lot of work, but you're going to have to do this part to be successful in deploying a data hub.

Data hubs are a key enabler of enterprise integration; they're an old idea but a really good thing. Buy, build or get one today.

Copyright © 2005 IDG Communications, Inc.

7 inconvenient truths about the hybrid work trend
Shop Tech Products at Amazon