Data Vault

Group of professionals discussing data vault strategies around a laptop in a modern office.

A data vault is a methodology for organizing analytics data encompassing raw data storage, business rules to support raw data transformation, and multiple data marts. The data vault architecture addresses the inherent shortcomings in alternatives, including  3rd Normal Form, Enterprise Data Warehouses, and Dimensional Design approaches.

A data vault uses a specific structure centered around three main elements: hubs, links, and satellites. Here’s a breakdown of each:

Hubs: These tables store core business entities like customers, products, or locations. They contain a unique identifier (business key) and minimal descriptive attributes that are unlikely to change frequently.

Links: These tables represent the relationships between hubs. They contain foreign keys referencing the business keys from the connected hubs. Links provide context for how entities are associated with each other.

Satellites: These tables hold detailed data associated with hubs or links. They include various descriptive attributes that can change over time. Importantly, satellites also contain metadata like the source of the data and the load date, enabling historical tracking.

Why is a Data Vault Important?

The data vault offers a more flexible approach to data warehousing than traditional 3rd Normal Form (3NF) and Dimensional Design by retaining the original raw data, making it easy to audit changes over time. The business rules vault stores any transformations, filters, or calculations that can be easily changed or extended, and the data marts are simply views and some optional tables that make them easy to change.

The data vault structure allows an organization to start small using a small number of raw data sets and grow incrementally as their business needs grow. The ready availability of the raw data vault makes the data lineage clear. Overall, this approach is more suitable when business goals change often and you need built-in version-control.

The Advantages of a Data Vault Design

Data vault designs offer several advantages over traditional data warehouse approaches:.

Flexibility

A data vault’s structure is designed to be adaptable. New data sources and fields can be incorporated easily without impacting the existing model, unlike traditional dimensional models that require significant refactoring for changes.

Scalability

Data vaults are built to handle growing data volumes. Their modular design allows for easy expansion as data storage needs increase.

Data Lineage

Data vaults excel at tracking the history of your data. Every record is preserved, with flags indicating changes over time. This is crucial for regulatory compliance and auditing purposes.

Faster Loading

Data vault architectures often enable parallel loading of data due to the lack of complex relationships between tables. This can significantly improve data ingestion speed.

Simplified ETL Processes

Because the data vault doesn’t require pre-modeling of data, the Extract, Transform, Load (ETL) process is streamlined. This reduces development time and maintenance efforts.

The Actian Data Platform and Data Vault

The Actian Data Platform can host a data vault schema with a repository to store raw data with minimal formatting, a second set of tables containing the business rules with lineage data, and multiple data marts containing views and tables that analyze user access. The Vector Columnar Database provides SQL functions to apply filters and transformations to raw data tables. This functionality resembles using ELT (extract, Load and Transform) capabilities.

The resulting data marts can be linked to business intelligence solutions to ease analysis and data visualization. The Vector database enables high-performance thanks to features such as vectorized parallel queries that exploit chip-level acceleration to provide multi-threading across cores and caching across processors.