Data Synchronization

AI Vector Generator Illustration

Data synchronization technology maintains the consistency between copies of data across multiple repositories or applications. Data synchronization is an ongoing process that can be automated to ensure changes and updates are maintained across all copies of the data.

Why is Data Synchronization Important?

Maintaining multiple copies of data is helpful for data protection, resiliency, compliance, performance, and scalability. Offsite copies of data protect against hardware failures, power failures, and natural disasters for mission-critical systems by ensuring current, accurate data is available.

Data Synchronization Schemes

Different mechanisms exist to replicate and synchronize data. The following are approaches for synchronizing data:

One or Two-Way Data Synchronization

Data Synchronization can be configured to be unidirectional or bi-directional. However, care must be taken to avoid synchronization loops in a bidirectional configuration.

Fan-In and Fan-Out Data Synchronization

Data can be synchronized from many sources into one consolidated dataset, known as a fan-in mode, or fanned out from one master source to multiple target copies. Careful management is needed to handle data conflicts using a rule-based approach, such as prioritizing the latest copy and avoiding creating duplicate records.

Partial and Full Data Copies

Full copies of data are essential for business continuity-type scenarios. In a situation where distributed copies are only used for regional reports, a partial copy will suffice. An example is a national retailer that rolls up regional sales data into a centralized data warehouse at headquarters and then distributes localized copies for regional store managers to get insights into their stores.

Synchronous Data Copies

In many applications, such as finance, a transaction is incomplete unless the source and destination ledgers have been updated. In this situation, a two-phase-commit mechanism ensures that both ledger updates have been confirmed before the transaction is committed. In a cloud scenario, the master copy of data is the one in the cloud, with a secondary local copy. In this case, the cloud copy is updated first and remains pending until the local copy is updated. The downside of maintaining synchronous copies is performance because the application must wait for both updates before proceeding.

Asynchronous Replication

The primary advantage of asynchronous data replications is that applications can proceed once one copy of data is updated. Asynchronous replication is a good choice if the copies of data are distributed over a wide area network (WAN) or if data needs to be distributed to many copies. Many replication systems use a publish and subscribe scheme in which the master data repository is updated before the data changes are posted to a queue that all subscribers can consume.

Physical Data Copies

Database systems use physical operational backups containing the database files, configuration, and log archive files.

Logical Data Copies

Logical copies of data stored in a database might be a user schema that a DBA can export to a flat file. Logical copies of schemas can be replicated using replication software such as High Volume Replicator (HVR), which scrapes log files to insert and update records that it uses to create structured query language (SQL) instructions executed in a logical target copy for reporting use. This approach is often referred to as Change-Data-Capture (CDC).

Another form of a logical data copy can be created when transforming the file format when unloading or exporting a database table into a flat-file format, such as a Comma Separated Value (.CSV) file.

Replication

Many database systems provide the ability to run a stored procedure before or after a change is made to a database table. These are known as database Triggers. Data can be replicated using Triggers. For example, a POST INSERT TRIGGER may make a copy of the inserted record in a remote copy of that database object. Products such as Microsoft OneDrive update a cloud-based synchronized copy of a file system file whenever it is saved or closed locally.

Benefits of Data Synchronization

Below are some primary benefits of Data Synchronization:

  • Eliminates data loss due to device failure.
  • Removes the vulnerability of a single copy of data.
  • Provides redundancy in a business continuity scenario.

Challenges of Data Synchronization

Below are the challenges associated with maintaining multiple copies of synchronized data:

  • Increases complexity.
  • Potentially slows an application using the data set.
  • Increases cost because more cloud resources can be consumed or replication software needs to be licensed.

Data Synchronization Using The Actian Data Platform

The Actian Data Platform provides a unified experience for ingesting, transforming, analyzing, and storing data. Actian has partnered with HVR Software to enable data synchronization.