Data Management

Data Mining

Rows of virtual files in a data catalog, contributing to powerful data management

Data Mining describes discovering hidden insights in large data sets using a combination of database queries, statistical analysis, Machine Learning (ML), and Artificial Intelligence (AI) techniques. It is less sophisticated than advanced analytics because it does not go as far as offering recommendations from the insights it uncovers. It can uncover hidden trends, patterns, and anomalies in data that traditional structured query language (SQL) queries would miss.

Why is it Important?

Data mining is particularly useful for risk management or fraud detection applications because it can analyze data streams in real-time. This is more sophisticated than typical Business Intelligence (BI) queries because it applies statistical analysis models to uncover hidden patterns in data. BI dashboards can be populated with data mining insights, making them complementary.

Is KDD the Same as Data Mining?

Knowledge Discovery in Databases (KDD) is distinct from data mining. KDD refers to data mining methods for uncovering high-level patterns in large databases. Data mining is a step in a broader KDD process.

Types of Data Mining

Below are some methods used in data mining:

  • Data can be mined to assess groupings of data elements with common attributes. Data elements are clustered if they can be classified as similar objects. Clustering methods can be hierarchical or non-hierarchical. Non-hierarchical methods divide a data set of N objects into M clusters. K-means is an example of a non-hierarchical clustering method that divides observations into K groups of related observations.
  • Path or sequence analysis looks for a set of observations that appear to lead to other ones to form a sequence or path.
  • Regression analysis calculates predicted data values in a data set based on single or multiple variables. Their relationship strength can be determined by comparing dependent and one or more independent variables. This knowledge can be used, in turn, to predict future relationships using forward regression.
  • Neural networks and deep learning simulate the workings of the human brain to seek out and derive patterns in a data set.
  • Association rule mining applies if-then analysis on data pairs in a set to look for potential relationships. The more observation pairs exhibit a relationship, the more confident they can be about an assertion.

Benefits of Data Mining

Data mining provides benefits beyond basic analytics through forecasting and predictive analytics. These include:

  • Improving customer interactions. Gaming companies and online retailers depend on predictive analysis of clickstreams to drive recommendation engines. Personalization of online interactions is the key to keeping customers coming back.
  • Financial services companies use factors such as interaction analysis, credit scoring and demographics to tailor offers to maximize the value they can provide to customers and increase the lifetime revenue the customer contributes to the provider. On the flip side, customer behavior data can be used for churn analysis and highlighting potential customer losses.
  • Manufacturers use data mining to increase uptime and productive life of expensive industrial machinery. IoT sensors embedded in complex machines such as jet engines, turbines in power plants and diesel engines in locomotives continuously analyze sensor data streams. This data is used to proactively schedule maintenance intervals and operational adjustments that can be explored to extend the machine’s working life.
  • Marketing automation systems use interactions prospective customers make to predict what best response email or digital asset to share to keep them on the journey to becoming a customer.
  • Sales automation systems study customer touchpoints, including website visits, digital assets consumed, search keywords, and digital ads that were clicked to predict purchase intent. Subtle buying signals can be assimilated to alert the sales team that the prospect is seriously considering a product or service and for a salesperson to engage directly.
  • Fraud prevention benefits by detecting anomalous credit card transactions, bank transfers, or bogus insurance claims.
  • Network management systems look for signs of traffic jams in routers and network routing nodes to predict potential packet loss and proactively reroute traffic to minimize latency. These same algorithms can be applied to optimize routing through road navigation systems and rail networks.
  • Healthcare benefits from data mining patient records and test results to predict outcomes and potential complications so doctors can proactively prescribe appropriate treatments.

Data Mining on the Actian Data Platform

The Actian Data Platform can build and schedule data pipelines for data mining projects. The Actian Data Platform uses a vectorized columnar database that outperforms alternatives by 7.9x. Because it stores table data as columns, these smaller data elements can better use available CPU caching. Actian uses Single Instruction, Multiple Data (SIMD) capabilities that allow an operation in a single processor to use all the L1 CPU caches across a server to achieve industry-leading analytic processing. Traditional databases that store data as rows have to scan and cache wide rows, which is less efficient with cache.