Streaming Data

SaaS data shapes and graphics over the image of a woman in tech

Streaming data is the process by which a continuous flow of data from single or multiple sources is processed in near real-time. Depending on the time criticality or server resource constraints, data streams can be processed after small intervals as micro-batches.

Three Forms of Data Sharing

The following are the three broad categories of data sharing:

  • A data stream such as a stock price tracking application creates a data record every time the stock price changes. The record is stored in a queue that is immediately read by an application subscribing to that queue. This provides subscribers with the last stock prices as soon as it changes with latency measured in milliseconds.
  • Batch data is produced periodically, such as nightly, as a new data file. The data is processed overnight. An end-of-day process at a bank branch would produce a daily journal of transactions used to calculate cash on hand, which is then carried over to the next days opening balance.
  • Incremental data sharing. In this case, the receiving application maintains a copy of past data that is updated to reflect changes since the previous update. This form of changed data capture is commonly used for data backups to a remote site or to maintain multiple copies of the source data set.

Characteristics

Stream or event data typically have the following characteristics:

  • Data streams are continuous in that an event stream is incomplete without a beginning or end of the data set.
  • Data streams can be configured to be resilient in that every event is captured and stored until every receiver has acknowledged receipt.
  • Streaming data events are timestamped so they can be analyzed on a timeline. For example, sensor data in a factory drive downstream operations based on what is identified in the stream.
  • Data streams can contain mixed formats as IoT streams. Gateway processes at the edge can filter and standardize formats.
  • Data streams can have gaps and be unordered due to different latencies of connecting networks.
  • Streams can be incomplete because one event can supersede a previous event before the reader has processed it. In real-time use cases like odds tracking for a casino or sporting event, only the latest value matters, so previous values can be immediately dropped.

Streaming Data Software

There is a spectrum of tools that are classed as stream processors:

  • Many evolved from message processing systems such as IBM MQ and Tibco Spotfire.
  • Apache Spark provides a streaming API on Hadoop clusters. Spark is well suited to processing data in groups of rows.
  • Apache Kafka and Apache NiFi are open-source broker-based services that process events one record at a time and operate at a lower latency than Spark. Kafka uses a publish-subscribe model for connecting data streams to consuming applications.
  • Real-time data-sharing platforms like diffusion data use systems that push streamed data to clients.

Streaming Data Examples

Financial trading platforms use it to provide real-time price changes for stocks and currencies. Stock information services use streaming data to share company news as it breaks, helping institutional and individual investors make more informed trading decisions.

Gaming businesses need to keep players engaged, so they use streaming data to learn what teams interest them so they can adapt their experience by feeding them relevant offers and promotions. It is also used to share odds and scores for betting on sports events.

Security systems use sensors to detect suspicious activity. Sensors collect video streams that are analyzed, and alerts are generated when potential threats are observed.

Autonomous driving uses real-time sensor input to control vehicle speed and safety systems. Cameras, sonar and lidar sensors generate data streams for image processing software to analyze.

Industrial systems use sensors to monitor manufacturing systems for quality control and drive production. Digital streams enable manufacturers to remotely monitor the health of systems such as locomotive engines to make timing decisions for preventive maintenance, ordering parts and alter performance to maximize the equipment’s useful life.

Marketing systems use clickstream data to analyze what ads and web pages a prospect views so chatbots can offer the most compelling real-time engagement tactics.

Retail streamed data from in-store beaconing systems inform text and email offers based on the shopper’s location.

Streaming Data With Actian Solutions

The Actian Data Platform has built-in support for streamed data integration.