How Partitioning on Your Data Platform Improves Performance
Colm Ginty
December 14, 2023
One of my goals as Customer Success Manager for Actian is to help organizations improve the efficiency and usability of our modern product suite. That’s why I recently wrote an extensive article on partitioning best practices for the Actian Data Platform in Actian communities resource.
In this blog, I’d like to share how partitioning can help improve the manageability and performance of the Actian platform. Partitioning is a useful and powerful function that divides tables and indexes into smaller pieces and can even subdivide them further into even smaller pieces. It’s like taking thousands of books and arranging them into categories—which is the difference between a massive pile of books in one big room and having the books strategically arranged into smaller topic areas; like you see in a modern library.
You can gain several business and IT benefits by using the partitioning function that’s available on our platform. For example, partitioning can lower costs by storing data most optimally and boost performance by executing queries in parallel across small, divided tables.
Why Distributing and Partitioning Tables are Critical to Performance
When we work in the cloud, we use distributed systems. So instead of using one large server, we use multiple regular-sized servers that are networked together and function like the nodes of a single enormous system. Traditionally, these nodes would both store and process data because storing data on the same node it is processed on enables fast performance.
Today, modern object storage in the cloud allows for highly efficient data retrieval by the processing node, regardless of where the data is stored. As a result, we no longer need to place data on the same node that will process it to gain a performance advantage.
Yet, even though we no longer need to worry about how to store data, we do need to pay attention to the most efficient way to process it. Oftentimes, the tables in our data warehouse contain too much data to be efficiently processed using only one node. Therefore, the tables are distributed among multiple nodes.
If a specific table has too much data to be processed by a single node, the table is split into partitions. These partitions are then distributed among the many nodes—this is the essence of a “distributed system,” and it lends itself to fast performance.
Partitioning in the Actian Data Platform
Having a partitioning strategy and a cloud data management strategy can help you get the most value from your data platform. You can partition data in many ways depending on, for example, an application’s needs and the data’s content. If performance is the primary goal, you can spread the load evenly to get the most throughput. Several partitioning methods are available on the Actian Data Platform.
Partitioning is important with our platform because it is architected for parallelism. Distributing rows of a large table to smaller sub-tables, or partitions, helps with fast query performance.
Users have a say in how the Actian platform handles partitions. If you choose to not manage the partition, the platform defaults to the automatic setting. In that case, the server makes its best effort to partition data in the most appropriate way. The downside is that with this approach, joining or grouping data that’s assigned to different nodes can require moving data across the network between nodes, which can increase costs.
Another option is to control the partitions yourself using a hash value to distribute rows evenly among partitions. This allows you to optimize partitioning for joins and aggregations. For example, if you’re querying data in the data warehouse and the query will involve many SQL joins or groupings, you can partition tables in a way that causes certain values in columns to be assigned to the same node, which makes joins more efficient.
When Should You Partition?
It’s a best practice to use the partitioning function in the Actian Data Platform when you create tables and load data. However, you probably have non-partitioned tables in your data warehouse, and redistributing this data can improve performance.
You can perform queries that will tell you how evenly distributed the data is in its current state in the data warehouse. You can then determine if partitioning is needed.
With Actian, you have the option to choose the best number of partitions for your needs. You can use the default option, which results in the platform automatically choosing the optimal number of partitions based on the size of your data warehouse.
I encourage customers to start with the default, then, if needed, further choose the number of partitions manually. Because the Actian Data Platform is architected for parallelism, running queries that give insights into how your data is distributed and then partitioning tables as needed allows you to operate efficiently with optimal performance.
For details on how to perform partitioning, including examples, graphics, and code, join the Actian community and view my article on partitioning best practices. You can learn everything you need to know about partitioning on the Actian Data Platform in just 15 minutes.
Subscribe to the Actian Blog
Subscribe to Actian’s blog to get data insights delivered right to you.
- Stay in the know – Get the latest in data analytics pushed directly to your inbox
- Never miss a post – You’ll receive automatic email updates to let you know when new posts are live
- It’s all up to you – Change your delivery preferences to suit your needs