Apache Flink - Batch vs Real-time Processing


Advertisements

In terms of Big Data, there are two types of processing −

  • Batch Processing
  • Real-time Processing

Processing based on the data collected over time is called Batch Processing. For example, a bank manager wants to process past one-month data (collected over time) to know the number of cheques that got cancelled in the past 1 month.

Processing based on immediate data for instant result is called Real-time Processing. For example, a bank manager getting a fraud alert immediately after a fraud transaction (instant result) has occurred.

The table given below lists down the differences between Batch and Real-Time Processing −

Batch Processing Real-Time Processing

Static Files

Event Streams

Processed Periodically in minute, hour, day etc.

Processed immediately

nanoseconds

Past data on disk storage

In Memory Storage

Example − Bill Generation

Example − ATM Transaction Alert

These days, real-time processing is being used a lot in every organization. Use cases like fraud detection, real-time alerts in healthcare and network attack alert require real-time processing of instant data; a delay of even few milliseconds can have a huge impact.

An ideal tool for such real time use cases would be the one, which can input data as stream and not batch. Apache Flink is that real-time processing tool.

Advertisements