What is a Data Analytics Platform?
A data analytics platform provides comprehensive capabilities to connect, ingest, organize, visualize, and analyze data at scale. The data platform must support multiple on-premises instances and multiple cloud providers so analytics can run anywhere the data resides. The platform must be secure and work with multiple programming APIs, business intelligence (BI), visualization and data science tools, and development languages. It forms the basis for gaining insights, training, and executing machine learning models and supporting other artificial intelligence (AI) applications.
Getting Data into the Data Analytics Platform
One of the most common challenges a business faces is getting vast quantities of data with different data types into the data analytics platform. Data integration technology provides the ability to connect to multiple data sources to load batch, real-time, and streaming data to the data platform. It has to ingest and store structured, semi-structured and unstructured data.
Data Transformation
Data aggregated in the data analytics platform comes from diverse sources. Data transformation can be a step in the Extract Transform Load (ETL) pipeline that converts data into more uniform formats and removes unwanted data. Transformations can include filtering, changing date formats to be uniform and altering data types to ease analysis. More complex transformations can involve merging multiple data sources, filling gaps using interpolation, enforcing data quality standards, and data masking for compliance reasons.
Scaling the Data Analytics Platform
Modern columnar databases can provide surprising analytics capabilities for massive amounts of data and large numbers of concurrent users. On a single server, vector processing allows a single query to parallelize operations to utilize all the available CPU cores and caches. This is a big step in performance, but multi-user workloads need to go to the next step: being cluster aware. A cluster-aware data analytics platform can spread a workload across multiple servers that make up the cluster.
Cloud-based data processing offers the added benefit of elastic scaling by allocating computing resources on demand to meet the user’s needs or query load at any instant. Block storage decouples compute and storage so they can scale independently.
When data cannot be kept on a single cluster, data meshes, data fabrics, and distributed queries can create a single virtual view that spans multiple dispersed database instances.
Using SQL for Analysis
Standards-based SQL remains the most common language for writing queries. Query editors make writing SQL queries much easier as they check the syntax as you write the query. Editors also provide the ability to save queries, so making tweaks to a query is quick and easy. Most BI tools use visual editors allowing users to construct SQL queries by clicking and dragging database objects to a workspace.
Visualizing Data
Many analytics platforms provide query editing, but few provide built-in charting to visualize data relationships immediately. Data analysts commonly export data from the data analytics platform into spreadsheets and visualization tools for graphical or visual representation. Tiled dashboards in tools such as Tableau, Qlik and Looker provide more chart types than a spreadsheet. Their visualizations can refresh often to stay current as the underlying data is updated.
Sharing Analytic Insights
In the 1980s, sharing insights used to mean printing reports. Today, being able to publish a dashboard in the cloud with secure authentication makes sharing data easy. Many analytics tools can export reports directly to PowerPoint and Excel to easily communicate insights.
Securing Data
A data analytics platform must be able to keep data secure. Mechanisms for securing data include encryption of data at rest and in motion, authentication, access control and role separation. Sensitive data must be masked or obfuscated to protect privacy, such as personally identifiable data.
Actian Data Analytics Capabilities
The Actian Data Platform is a highly scalable data analytics platform with a rich feature set for ingesting, organizing, analyzing, and publishing data.