Using cloud-based services, Big Data as a Service (BDaaS) allows a business to analyze large or complex data sets.
Why is Big Data as a Service Important?
The traditional approach to accessing the insights that big data delivers is to acquire a cluster of servers housed in an on-premise data center. This requires upfront expense and delays while IT procures, installs, configures, and tests hardware and software components. Scalability is limited to the amount of hardware purchased, and running out of capacity means expensive upgrades and more delays.
What are the Benefits of Big Data as a Service?
The primary advantages of using big data as a service include the following:
- Cloud Economics – The business can sign up for a subscription service without upfront investment and only having to pay for the CPU and storage consumed.
- Elastic Scalability – A cloud-based service can provide CPU and storage on demand almost instantly and far more than an in-house data center can.
- Time to Value – This is shorter with a big data service as the required hardware and software are available as soon as the subscription or trial starts—no waiting for IT.
- System Management – Cloud-based providers manage most low-level systems management tasks, reducing the management overhead on in-house IT teams.
- Software Upgrades – These are handled by the cloud provider and/or SaaS vendor, reducing downtime compared to traditional on-premise implementations.
- High Availability – Cloud-based solutions can offer disaster recovery services by replicating services to multiple cloud data centers in different geographies, reducing any impact due to downtime from power outages or natural disasters that could impact a single data center.
Potential Challenges
- Data Compliance – Many businesses deal with regulated data that must be kept on-premise in a company’s data center. A private cloud often meets compliance requirements as the data never leaves the organization’s dedicated cloud environment.
- Data Analysis – The data has to be uploaded to the cloud to perform data analysis. This can take time. Ideally, the business will create its data lake or big data warehouse in its preferred cloud platform so it can be processed where it is created.
- Data Migration – Moving to a different cloud provider can be expensive as they often charge an egress fee based on data volume. This would be true of data offerings provided by cloud-based SaaS providers or data solutions from cloud service providers like Google and AWS.
Big Data Services
All the major cloud providers offer services for customers planning their existing big data stores to the cloud. These include:
- Amazon Web Services (AWS) provides Elastic MapReduce (EMR) to deliver BDaaS.
- Microsoft Azure HDInsight is a cloud-based big data solution that provides managed Hadoop, Spark, and Hive clusters.
- Google Cloud Platform (GCP) – Google Cloud Dataproc is a fully managed cloud service for running Apache Spark and Apache Hadoop clusters.
- Actian offers BDaaS, which is portable across all three cloud providers, offering customers the flexibility to run analytics regardless of which cloud platform hosts their data.
Actian
The Actian Data Platform is built to offer BDaaS across AWS, Azure and Google Cloud. The platform uses the Hadoop Spark API to access data formats, including ORC and Parquet. Actian Data Platform queries can connect data across data warehouse instances. This distributed query feature enables customers to store their data close to where it is created and power to perform data analysis wherever the data resides.
The data integration capabilities of the Actian Data Platform work with popular data storage structures, including S3 buckets, Google Drive folders, and Azure Blob storage.
Fraud Detection
One of the UK’s largest automotive insurers uses the Actian Data Platform for fraud detection. Many variables go into calculating a potential client’s insurance premium, such as demographics, credit score, insurance claims, and driving history. They also use Actian to build models to detect anomalies and potential fraudulent accounts.
Healthcare Claims Processing
Healthcare claims can be very complex and time-consuming to process. Health insurers rely on external agencies to validate claims, ensure care providers are not overcharging patients for treatment, and detect fraudulent claims. Big data as a service plays a crucial role in ensuring enough compute power is available for processing peaks. Claims data includes unstructured data in the form of scanned documents and structured data from billing systems that must be processed.
Retail
A large U.S. retailer that operates hundreds of convenience stores at truck stops and small towns uses BDaaS to analyze shoppers’ baskets so they can optimize products for each outlet. A French-owned hardware store chain uses big data as a service to project future demands based on seasons, holidays and expected weather patterns so they can proactively stock for expected conditions.