As businesses become more data-driven, the data they collect and store becomes increasingly valuable. A business process can be internal or outsourced, but the data and metadata they leverage define the business. Over time, the data a business creates and consumes becomes its lifeblood and competitive differentiator.
Data management is the practice of treating data as a valuable business resource. Data should be managed from creation to the point when it is no longer considered valuable. The Data Management Association (DAMA) is an international body for data professionals that has a formal definition of data management: “The development and execution of architectures, policies, practices and procedures in order to manage the information lifecycle needs of an enterprise in an effective manner.” Manual business functions consist of process steps with branches that support associated data. Digital business applications similarly consist of a series of process steps with branches and associated data. The difference is that the data from the digital business process can be immediately reused or transformed to add further value.
Data Protection
As data is a valuable resource, it needs to be protected. Below are three aspects of data protection:
- For storage management, data must be protected from device failures and natural disasters to provide business continuity. RAID technology can mirror disk volumes for data protection.
- Security policies and controls protect data from Cybercrime, including theft and ransomware attacks.
- Transaction processing systems and database management systems use data logging and memory locking schemes to ensure inserts and updates are protected from power outages and maintain data integrity. Storage can be tuned for transaction systems using RAID technology to maximize throughput by striping data across physical volumes.
Storage Management
The business value of data can vary over time. The change of a currency exchange rate or stock price is critically important to traders when it changes as it impacts financial transactions. As soon as the data value is updated, the previous value becomes much less valuable. In the days when mainframe computing ruled storage management, professionals used Hierarchical Storage Management (HSM) systems to match storage media selection based on the data value. The data used most frequently could be coalesced to the inner edge of disk platters for the best access speeds by reducing latency due to the rotational delay of the spinning disk platters.
Mainframe CPU main storage was very small by today’s standards, so Solid State Disk (SSD) was at the top of the storage hierarchy, bypassing seek time and rotational delays associated with spinning disks. Next in the performance hierarchy came the outer cylinders of disk storage which were used to store less performance-critical data. Disk utilities would periodically optimize data location across disk cylinders to minimize the seek time across cylinders for the reading head. When data became less valuable or colder, it was archived in magnetic tape volumes. Volume management software cataloged what was stored on each tape volume. Older tape volumes were sent to be stored in offsite archives.
Today, cloud-based storage is priced by access speed. CPU cache is the priciest. Next comes RAM, followed by SSD storage. Spinning disks are the least costly storage tier. Virtual Storage Managers can create disk volumes that are not limited to the capacity of a single physical device and can stripe and mirror data under the covers.
Storage Management for a DBMS
Database Management Systems (DBMS) can use file systems or dedicated disk volumes to manage storage internally. Clustered file systems such as Hadoop HDFS provide scalable storage by sharing one pool of disks across multiple physical servers, making it a popular repository for data lakes. Many database management systems are cluster-aware to process queries using multiple physical servers.
A DBMS will keep the hottest or most frequently accessed data in shared memory, L1, L2 cache or main memory RAM. To maintain data integrity, all writes are to non-volatile storage such as SSD or Disk. Distributed transactions use mechanisms such as two-phase commit to ensure all writes are atomic or as-one across multiple nodes. Memory latches or semaphore mechanisms ensure no writes are overwritten before being committed to disk.
A DBMS will perform physical block-level archiving to maintain a copy of the database for operational backups and recovery. The database administrator can also export data at an object or schema level to create a logical copy of an application dataset. Replication software can use database triggers to intercept SQL INSERT, UPDATE and DELETE statements and send the same instructions to a remote database copy. Change Data Capture software can be used to manage multiple related database objects in a scalable way, typically by scraping log files for changes.
Data Catalogs
Today’s data management professional is less focused on storage and more on data governance, ensuring the data a business uses is trustworthy. Data lineage tracks where data originates, and catalogs track what applications and users consume it. This helps to reduce redundant data storage and promotes the usefulness of data. Data Management manages the data lifecycle, including storage use and retirement.
Data Management With Actian
Data sprawl can be eliminated by putting data analysis capabilities where the data is stored. Data can be on-premise or across different cloud platforms. The Actian Data Platform makes it easy to use your data assets wherever they reside. Data can be loaded into a data warehouse or accessed as an external file once registered within a data warehouse. Analytic queries can span multiple instances wherever they reside.