Data Management

A Comprehensive Guide to Understanding Different Data Types

Rows of virtual files in a data catalog, contributing to powerful data management

Main Types of Data

There are three broad types of data: structured, semi-structured and unstructured. Data may have the following characteristics:

  • Primary data is from an original source, such as a weighing scale.
  • Secondary data comes from a secondary source, such as a report that interprets the original data.
  • Qualitative data is subjective in nature.
  • Quantitative data is a numerical value such as a score.
  • Discrete data is an unrounded whole number.
  • Continuous data can be a rounded measurement.

Actian’s Types of Data

In this article, we will focus on data types that Actian’s databases management systems (DBMSs) can access. These fall into the following five categories:

  • Character
  • Numeric
  • Date and Time
  • Abstract
  • Boolean

Character Data

Character data types are strings of ASCII characters, both printable and non-printable. Uppercase and lowercase alphabetic characters are accepted literally. Character data can be of fixed or variable-length data types. Variable length columns occupy more space than a fixed length type because a length specifier must be stored. If a data field can contain a null value, an additional byte is used to store a null indicator.

Spaces in character strings are treated as part of the string. A fixed-length string such as CHAR(4) will be padded with trailing spaces like “ABC “. Leading and trailing blanks are significant when comparing values.

As with fixed-length CHAR strings, variable-length or VARCHAR strings can contain any character, including non-printing characters, except the ASCII null character, which occupies an additional byte if allowed. Blank characters are significant when stored or compared. The Actian Data Platform uses NCHAR and NVARCHAR data types to store UTF8 encoded characters.

JSON Data

An example of a semi-structured data type is JSON. JSON use its own data type. JSON values are stored in any string column, such as CHAR, VARCHAR, NCHAR, and NVARCHAR. Values can be a scalar, arrays or a JSON object.

A JSON object is a comma-separated list of key:value pairs surrounded by brackets {}.

A key must be a double-quoted string. A value can be any JSON value, including a JSON object or JSON array. It cannot be blank, and whitespace is ignored in a JSON object string except for whitespace within the double quotes of a string.

XML and JSON semi-structured data strings are stored as variable-length strings.

Numeric Data

Integer Data Types

Four Integer data types are used to hold whole numbers. The more bytes the data type uses, the bigger number it can hold. The four integer types that the Actian Data Platform uses are:

  • INTEGER1 or TINYINT (one-byte)
  • INTEGER2 or SMALLINT (two-byte)
  • INTEGER4 or INTEGER (four-byte)
  • INTEGER8 or BIGINT (eight-byte)

Decimal Data

The decimal data type stores fractional numbers by specifying the total number of digits and the number of decimal places. For example, DECIMA(20,5) stores a number with 20 digits of precision, with 5 being to the right of the decimal point.

Floating Point Data Type

Floating-point values can be expressed as FLOAT4 for four-byte precision or FLOAT8 for 8 bytes of precision. The exact precision of 4-byte numbers is processor dependent. Internally, eight-byte numbers are rounded to fifteen decimal digits.

Money Data Type

MONEY is an example of an abstract data type. Stored values are rounded to 2 decimal places. Values must be in the range of $-999,999,999,999.99 to $999,999,999,999.99. The currency symbol is optional.

Date and Time Data

Timestamp Data Type

The TIMESTAMP data type is used to record when events happen. It consists of a date and time, with an optional time zone. For example, TIMESTAMP(5) WITH TIME ZONE could look like this:

2023-15-20 9:30:55.12345-08:00, which would be in the pacific time zone.

Abstract Data

Boolean Data Type

BOOLEAN columns contain literal values of ‘TRUE’ or ‘FALSE’, which internally have values of 0 and 1.

IP Network Address Data Type

An abstract data type for IPV4 and IPV6 addresses is very useful when storing and manipulating weblogs. An IPv4 address might look like 176.12.254.1. The newer IPV6 has far more variations, so it looks like the following format: 2101:0cb8:8ca3:0d42:1900:8d2e:0e70:7734.

Using IPV4 and IPV6 data provides input error checking and supports specialized operators and functions.

Universal Unique Identifier (UUID)

A Universal Unique Identifier (UUID) is a 128-bit, unique identifier generated by the local system upon request or loaded from external sources. They are suitable for reliably identifying persistent objects across a network or generating unique values such as transaction IDs.

Geospatial Data

The Ingres Transactional Database provides deep support for geospatial data types. All spatial data types store features using the Well-Known-Binary (WKB) format, a specification of the Open Geospatial Consortium (OGC).

2D data types exist in a two-dimensional coordinate space represented by X (longitude) and Y (latitude) coordinates. These include geometry and line strings, for example. 3D data types add a third dimension of Z in X, Y, and Z coordinate spaces. 4D data adds a fourth, application-dependent dimension to a 3D coordinate.

Unstructured Data

Unstructured data, such as text, is stored in CHAR or VARCHAR formats in the database. Video and audio data ARE generally accessed as an externally stored object in a file system using a database connector like Spark.

Actian and Supported Data Formats

You can learn more about Actian transactional databases by visiting our website.