Data Platform

Data Hub Architecture

A strong multi-layered data platform

Managing data for timely decision support is a challenge. Data continues to grow and is delivered in multiple formats. There are still many data silos that contain data that need to be integrated. There have been attempts to create a single source of truth for data, information, and knowledge that the organization needs to use collaboratively across the organization. Early adoptions can be seen in how ERP systems work across multiple functional areas in an organization. Still, this is not enough because of the growth and usage of data for various data consumers.

What is Data Hub Architecture?

Data Hub Architectures are a collection of data and information from multiple disparate sources for specific consumer decisions. The data that is collected can reside anywhere and in any format. The hub consumes the necessary data, helping to remove the data noise and improve performance for decisions. Data is integrated and organized efficiently, effectively, and economically to support functional business outcomes.

Data hubs can consume data from various sources such as data lakes. The Data Hub Architecture that is built is dependent upon understanding the consumers of the data and the decisions that need to be made and the data sources themselves and how they relate to each other for the needs of the business, essentially the business decision support across functional units.

The Data Hub Architecture has to consider the value chain between all functions and data transitions that occur between these functions, including automated data decision activities found in Artificial Intelligence (AI) and Machine Learning (ML) capabilities. The ultimate value can be thought of in four unique areas for the services and products delivered, supported, and consumed. The four areas are related to decisions of innovation, growth, customer fixes, and competitive needs of the organization.

Open data hubs help address different criteria for data access needed by different people in the organization. Since every function and role consumes data differently and sometimes in the same manner. Open data hub also helps with the usage of hybrid-cloud infrastructures, collaboration, and integration between various functional teams. This is especially true for collaboration between developers, data scientists, and data engineers.

Data integration hub architectures help connect through the data hub multiple data stores no matter where they reside, such as in the cloud or on-premises. This helps create specific systems of record that are needed for particular applications of the data.

Types of Data Hub Architecture

Data Hub architectures enable data, information, and knowledge sharing by collaborating specific data producers with specific data consumers. This should be done in a customer or consumer fashion so that data collected benefits the customer of the data. Customer data hub architectures should be considered as the approach to enable the lifecycle of customer data for intelligent enterprise decision support. The customer data hub should be considered the epicenter for understanding and responding to customer needs.

Types of data hub architectures are based on data patterns discovered that the data consumers need. The architecture of a data hub can be different from hub to hub based on data inputs and the consumers of the data. Although following a basic construct such as hub and spoke, each architecture can be very different in how the data is architected for consumption. Not all data sources are copied to the hub, only what is needed for consumer decisions.

Hub and Spoke vs. Bus Architecture Data Warehouse

Hub and Spoke data architecture is basically a centralized approach to connecting the data hub to multiple inputs or data providers for various data consumers. A Hub can be created with a Spoke or a Bus architecture. A Bus architecture is used to create a data warehouse similar to a Hub, but the data does not have a standard reference related to the consumers of the data. Spoke architectures tend to have the hub as a central point for data and therefore is often the standard point of reference. In use cases where there is a need for tight control and governance, Enterprises tend to use spoke over bus architectures. The bus may contain all types of data, but the applications are the main source of data to extract from the large bus warehouse architecture. The Hub and Spoke architecture for the Data Hub is explicitly created for the consumers. Hub and Spoke are considered faster because of the specificity of the data contained in the data hub.

Data Hub Architecture Diagram

The following data hub architecture diagram dictates a simple hub and spoke perspective between the data and the data consumer. The architecture can contain multiple data hubs that are fit for purpose for the consumers of the data. This helps with performance and overall understanding of how data is used to make decisions within the organization.

Inputs to the data hub can come from data warehouses, XML, JSON, Sharepoints, other data silos, and anywhere data resides. The critical consideration here is that when building a data hub is to be specific with the data choice needed for the consumers’ decisions. Otherwise, too much data to manage will decrease performance and increase complexity.

Data provides answers in the form of the data itself or transformation with other data into information and knowledge for the consumer of the data. Data consumers can be people or automated sources such as in machine learning and artificial intelligence. In either case, the organization should realize that applying a data hub is a continuous improvement initiative for supporting high-performing business outcomes.

Data Hub Architecture Diagram

Data Hub Architecture

Download the eBook that provides key reference architectures for data-driven application builders and key cloud data warehouse use cases covering key areas including: Customer 360 real-time analytics, supply chain analytics, IT operations analytics, IoT powered Edge-to-Cloud analytics, Machine Learning and Data Science, and healthcare analytics.