Cloud Data Warehouse

Where You Do Analytics Processing Matters

Actian Corporation

July 20, 2020

Analytics Processing

The Vector for Hadoop offering from Actian delivers increased performance for analytic queries without the associated increase in cost. If you are looking for high-performance analytics processing to drive operational decision-making, where you do your processing matters. By minimizing the movement of data and processing locally, you can drastically reduce latency. By using a system like Actian Vector to perform that local processing, you can achieve even higher levels of performance.

On the Box, in the Datacenter or Across the Country

When people hear the statement “where you do your processing matters,” the first thought that comes to mind is network latency. It’s easy to understand how transmitting data over the internet, across the country, or even across town, can slow down your processing.  The same holds true within your data center. Co-locating storage and compute near each other (on the same rack or even the same device) decreases processing latency.

Many companies are leveraging cloud services and distributed systems to increase performance for end-user OLTP operations. When it comes time to perform analytics, the distance issue comes into play again. Where should you be doing your analytics processing? For most companies, the cloud is the right place to host your data warehouse and perform analytics compute because it enables you to locate your analytics closer to your data stores and, at the same time, leverage cloud-scale compute resources.

Assuming you’ve addressed these “big distance” issues, is it possible to optimize further? Yes, it is. If big data processing or real-time analytics to drive operations and decision-making are the goals you are trying to achieve, you need to take your analytics performance to the next level and look at how the databases, and software you use can be optimized to take maximum advantage of the resource capacity available.

Disk is Slow. Memory is Better. Chip Cache is the Fastest

Let’s take a look at what happens within an analytics system (the hardware and software you use). These systems are typically comprised of three hardware components that have a direct influence on performance – disks, memory, and chip cache. When you perform compute operations (which are really just a bunch of mathematical formulas), you are manipulating data that is stored in one of these three places. Chips have some internal cache memory, which offers the fastest performance but the smallest capacity. RAM memory chips have more capacity (though it is limited) and performance that is fairly fast because data is temporarily held in a suspended state instead of written to a physical medium but much slower than chip cache. Disk storage is slowest because data is written to a physical media (a disk) and read from this physical media when it needs to be accessed. With cloud storage, the disk capacity available is nearly unlimited.

Data warehouse and analytics systems utilize each of these types of storage along with the compute capacity of the CPUs in different ways. This is what gives Actian Vector a performance advantage over other solutions. Vector optimizes the use of each layer in the system infrastructure, eliminating the wasted capacity to both maximize performance and minimize costs. Here are a couple of examples:

Maximize Utilization of CPU Cores

Modern CPUs have multiple cores, meaning they can execute multiple operations at the same time. Unfortunately, most software (including data warehouse systems) aren’t designed to take advantage of this parallel processing capability, and as a result, you end up utilizing a small portion of the available capacity. The Actian Data Platform and Vector are designed to efficiently run a large number of concurrent queries requested by a large number of users. Queries are split into small chunks where they can be executed in parallel. This is important because it maximizes the use of the CPU capacity you have available. CPU cycles are time-based capacity. Think of it like hours in the day you have for work tasks. The challenge is to use your available capacity most efficiently and avoid idle time because once the time is passed, you can never get it back.

Reducing the Amount of Data That is Written to and Read from Disks

Actian solutions are designed for highly efficient use of disks – reducing I/O operations that can slow down analytics processing. Actian Data Platform is a pure columnar database. Traditional databases are row-based – records are in rows, and you have to read the entire row to perform a query and do analytics. Actian treats data as a series of columns – this is what optimizes it for analytics processing. Because a column of data is all the same data type, analytics operations can be optimized. Going under the hood, you’ll find that each column is stored as files on the disk with various blocks of data. MinMax indexes on data blocks enable faster sorting of data by helping the platform to more efficiently identify what data the user is trying to analyze and what can be ignored.

When you are doing operational analytics and trying to drive real-time decision making with data, you need the best performance you can get. Through a combination of increased operations taking place using chip cache and cache memory along with a more efficient process of managing the data stored to disk, Actian can optimize the performance and utilization of database hardware while at the same time minimizing the amount of data written to disk.  Both of these are important because they directly translate to lower operating costs. What it comes down to is “use the resources you have more efficiently” to achieve peak performance and minimize costs.

To learn more, visit https://www.actian.com/lp/actian-vector-sql-accelerator-for-hadoop/

actian avatar logo

About Actian Corporation

Actian makes data easy. We deliver cloud, hybrid, and on-premises data solutions that simplify how people connect, manage, and analyze data. We transform business by enabling customers to make confident, data-driven decisions that accelerate their organization’s growth. Our data platform integrates seamlessly, performs reliably, and delivers at industry-leading speeds.