Structured data is intended to be easily consumed by application programs and humans. It is in a consistent, standard format and follows a data model. Structured data is commonly found in a relational database, so it can be read and manipulated using structured query language (SQL).
How Does Structured Data Differ From Unstructured and Semi-Structured Data?
One way to better understand it is to compare it with semi-structured and unstructured data:
Structured Data
Has records with addressable fields. In a relational database, a table would be made up of rows of records, more formally known as tuples. Multiple tables are related to each other through key relationships. Structured data is readily organized for analysis. A relational database schema is the implementation of a data model that maps interrelationships between entities represented by a table structure.
Semi-Structured
Data is commonly a construct of variable character length using a notation such as JASON or XML that contains named elements along with their data values. Semi-structured data is self-describing, which makes processing it straightforward, and may be stored in a Large Object (LOB) field.
Unstructured
A single record containing coded data such as video, audio, or text files. This type of data is usually stored in a file system rather than a structured database. Many database systems can reference external data, which is more efficient than storing such objects internally as LOBs.
Examples of Structured Data
A customer table in a database is a good example. The customer table contains details of multiple instances of the customer entity represented by rows. Each row consists of multiple columns, with each containing a specific attribute about the customer, such as First Name, Last Name, Address, and Customer ID. The Customer ID is typically the unique identifier that relates customers to other entities in the schema, such as Orders.
You can expect to find data on Employees and Departments in an HR application schema.
A salesforce automation database would track Salespeople, Prospects, and Open and Closed Sales Leads.
Processing Structured Data
One of the most common forms of structured data is employed in spreadsheets. Data import utilities typically operate on the comma-delimited values (CSV) version to read data values. If the file contains column header labels, the utility can use these as metadata for naming the data values. Data integration products such as Actian DataConnect can ingest, map, transform, and load the data into it’s final destination.
Application programming languages (API) can read flat files using field separators and line-end special characters to delineate fields and records. Records are typically read into an array of named variables that the application program can process.
Web application services can use streaming APIs to receive data flows. To provide resilience, the output data stream flows into a data store with a memory cache, where it can accumulate in case of a network failure. When connectivity returns, the buffered data is read by the receiving web application asynchronously. Streaming data utilities such as Apache Kafka support publishing and subscribing mechanisms to share source data with multiple subscribing applications. Streaming APIs can be used equally well for sharing structured and semi-structured data.
Creating Structured Data
Accurate data capture requires some validation if it involves human input, which can be very error-prone. Applications use a graphical user interface (GUI) to collect into one named field at a time, validating formats and accepting only valid values. Common interface widgets such as radio buttons, checkbox items, and drop-down lists improve the quality of the entered data values and maintain consistency. Calculated fields help eliminate redundant data entry. Examples of human data entry applications are order entry systems, tax preparation software, and surveys.
In the Logistics industry, structured data is commonly exchanged between shippers and carriers using Electronic Data Exchange (EDI) technology. The EDI standard has evolved over decades to become prevalent in other industries, including healthcare and telecoms.
Structuring Data at the Edge
IoT systems don’t rely on human input, so they usually employ machine-to-machine processing through APIs. Edge processing is concerned with filtering, transforming, and structuring data close to where data is created at the edge of networks. IoT processing uses smart devices to capture sensor data and preprocess it to make the central processing servers operate more efficiently. Actian ZEN Edge Data Management is a lightweight, compact database suited to edge use cases.
Log Data
Reactive security and marketing systems need to process the data in near real-time to capture critical events such as cyber-attacks or a hot prospect visiting a website. These activities are captured as log records, including timestamps, IP addresses and URLs of visited pages. Data management companies such as Actian have developed specialized data types to map timestamps and IP address formats into database values for more accessible log data analysis.
Leveraging Structured Data With the Actian Data Platform
The Actian Data Platform has been designed to make it easy to import and analyze structured and semi-structured data. The Actian Data Platform is available on multiple cloud platforms and on-premise, so analytics processing is handled close to where data resides. Built-in data integration technology uses predefined templates to load common data formats, including CSV, EDI, and log data. Streaming APIs are supported along with a visual data studio for easing data capture.