Cloud Data Warehouse

Affinity Analytics Using Actian Data Platform

Mary Schulte

October 29, 2021

affinity analytics

Affinity analytics is the practice of finding relationships and patterns in data. Businesses can use the results from affinity analytics for many positive impacts.   Here are just two examples from real customer use cases. First, in retail, management wants to know what products typically sell well together for product placement and advertising purposes. This information is critical to successfully upselling additional products. Another example, telecommunications providers need to study network traffic data to understand routing patterns and maximize equipment and topography. Like these use cases, your business likely has occurrences of data affinity that you can harness to make better business decisions. Actian provides the data warehouse platform to help you do it.

Despite being clearly useful, affinity is difficult to find in traditional data warehouses because it involves executing one of the most difficult, resource-intensive SQL statements known, the fact-table self-join (also known as a “market-basket” query). This query is difficult because data warehouse “fact” tables often contain billions of rows (like mine does here), and joining billions of rows back to themselves to find affinity takes a lot of processing power. In fact, some platforms can’t do it at all, or it takes so long it’s not usable. That is where the power of the Actian Data Warehouse shines.

In this blog, I discuss how to successfully achieve affinity analytics using solely the built-in functionality of the Actian Data Warehouse, with no other tooling required!

Actian provides industry-leading cloud analytics, purpose-built for high performance.  What I will show here is that Actian – natively – provides the necessary tooling to accomplish SQL analytics, allowing you can achieve things like affinity analytics without having to embark on giant, expensive projects involving additional third-party tooling.

Here is My Scenario:

I have a retail data warehouse. Marketing wants to plan an outreach mail campaign to promote sales of products that typically sell well with the store’s best-selling products. In particular, they want to mail coupons to customers that have NOT bought products that are normally bought together, but HAVE purchased at least one of the best-selling products. They would like me to provide data to support this campaign.

My Analytics Process Will Be As Follows:

  1. Investigate the data.
  2. Find best-selling products (A).
  3. Find products commonly sold with top products (B).
  4. Find the customer population who bought A but not B.
  5. Provide appropriate information to marketing.

For this blog, I have created an 8 AU (Actian Unit) warehouse in the Google Cloud Platform.  An Actian Unit is measure of cloud computing power that can be scaled up or down. See Figure 1.

Figure 1: Avalanche console warehouse definition
Figure 1: Avalanche console warehouse definition

My Actian database has a typical retail schema, but for this blog, I will just focus on four tables.  See Figure 2.

Figure 2: Retail ER diagram
Figure 2: Retail ER diagram

I have used a data generator to generate a large amount of data, but I’ve added some artificially superimposed patterns to make this blog more interesting. My tables have the following number of rows in them:

customer 5,182,631
order 1,421,706,929
lineitem 45,622,951,425
product 16,424

 

I can now use the tools provided in the Actian console Query Editor to execute my analytics process. You can find the Query Editor in the top right corner of the warehouse definition page. I have circled it in blue in Figure 1.

For all the queries in this blog, I performed the following sequence: I put my query into the query editor pane (1), formatted the query (optional) (2), then executed the query (3), then saved the query (4) for future reference. See sequence layout in Figure 3. Notice that you can also see the layout of my entire schema (red circle) in the Query Editor.

Figure 3: Query Editor layout
Figure 3: Query Editor layout

Investigate the Data

First, I want to understand my data by executing a few interesting queries.

I want to understand what months of data are in my Actian warehouse and understand some overall numbers.  (Note this blog was authored in early 2021).  I execute this query:

Figure 4: Line item statistics
Figure 4: Line item statistics

Because of the speed of Actian, in just a few seconds, I gleaned some valuable information from my warehouse. It looks like I have five years’ worth of data including over 45 billion line items sold, showing an average sale of $625. That’s terrific! See Figure 4.

Also, I would like to see trended sales by month. I execute this query:

Figure 5: Trended sales
Figure 5: Trended sales

This query also finished in just a few seconds, but with all these big numbers, it’s a little hard to grasp their relative values. It will be helpful to make a chart using the Actian Query Editor’s charting function.

I’ve used the charting function (see Figure 6) to create a bar chart. I’m running the same query essentially, but I’ve simplified it and limited the output to just last year. It’s easy to see now, that my sales really accelerated around Christmas. I’ve shown how I configured this chart in Figure 7.

Figure 6: Trended sales with chart
Figure 6: Trended sales with chart
Figure 7: Chart configuration
Figure 7: Chart configuration

Find Best-selling Products (A)

Now that I understand my data, I execute this query to find the best-selling product categories by spend in the last year:

Figure 8: Top categories by spend
Figure 8: Top categories by spend

In just a few seconds, I learn that Clothing and Electronics were my best-selling product categories overall. I know that marketing always likes to work with Electronics, so I’m going to concentrate there.

Next, I want to find the top-selling products in Electronics last year. I execute this query:

Figure 9: Top products in Electronics
Figure 9: Top products in Electronics

Again, because of the speed of Actian, in a few seconds, I learn that many of the top products in my Electronics category are Canon products.  See Figure 9.

Find Products Commonly Sold with Top Products (B)

Now I want to find the Electronics products that are most often sold with these top-selling Canon products in the last six months. This is the resource-intensive market-basket query that I referred to in my introduction.  To execute, this query will join my 45 billion line items back to the same 45 billion line items to see which items are typically bought together. I execute this query:

Figure 10: Market-basket query
Figure 10: Market-basket query

This query is much more complex than the previous queries, still, it only took a mere 17 seconds to execute in Actian. It is obvious from this query that Canon customers often buy SDHC Memory Cards of different types. This is something that seems logical, of course, but I have now proven this with analytics.

Find the Customer Population Who Bought A But Not B

Now I need to find the names and addresses of customers who have NOT bought memory cards. This is basically a reverse market-basket query. Actian will join the 45 billion row line item table back to itself, this time to find missing relationships…customers who have not bought memory cards. It then also needs to join the line item and order information back to the customer table to get the corresponding name and address information. Also, I need to make sure I don’t send duplicate mailings to any customer that may have bought multiple Canon products, so I have added the DISTINCT keyword to my SQL. I execute the query below. Once it is finished, I then choose the .csv download option to create an output file.  See the red circles in Figure 11.

Figure 11: Reverse market-basket.  No affinity.
Figure 11: Reverse market-basket.  No affinity.

Provide Appropriate Information to Marketing

I can now easily mail the .csv file of prospect customers to market so they can send out their marketing mail campaign.

Figure 12: Email with target list
Figure 12: Email with target list

In conclusion, the Actian Data Warehouse is a very powerful cloud data warehouse platform that also includes the basic tools and speed you need to be productive with affinity analytics in your business.

mary schulte headshot

About Mary Schulte

Mary Schulte is the Senior Sales Engineer for Actian. She spent the bulk of her career working for powerhouse database vendors such as Informix, Netezza, and now Actian. For over two decades, she worked solely with Informix as a consultant at a boutique consulting firm, then later as a trainer and sales engineer for Informix Corp. and then IBM. She has written thousands of lines of Informix 4GL and ESQL/C programs for clients in varieties of industries worldwide. Based in Dallas, she worked closely with American Airlines on its ground-breaking implementation of the then-new Informix datablade technology in the late 1990s. Mary has been at Actian since 2006 primarily working with their Vector analytics database technology. She feels that it was serendipity that IBM entered into an agreement with HCL and that HCL acquired Actian because she is now working with her beloved Informix again!