Sunday, 16 October 2016

Apache Spark vs Apache Flink

In last few years the real time processing is one of hot topic in the market and also it has gained lots of value as well. The emerging tools tools which provides support for real time processing are Apache Storm, Apache Spark and Apache Flink. In this blog I am going to compare different features of Spark and Flink. 

Feature by feature Comparison of Apache Spark and Flink:

Exactly once semantic:
Both Spark Streaming and Flink provide exactly once guarantee, which means that every record will be processed exactly once, thereby eliminating processing of data multiple times.

High Throughput and Fault Tolerance overhead:
Both Spark Streaming and Flink provides very high throughput compared to other processing systems like Storm. The overhead of fault tolerance is also low in both the processing engines.

Computational Model:
Spark Streaming and Flink differs is in its computation model. Spark has adopted micro-batching model whereas Flink has adopted a continuous flow, operator-based streaming model.

Data Windowing/Batching: 
Spark has a time-based Window criteria, whereas Flink has Windows over time, record counts, sessions, data-driven windows or any custom user-defined Window criteria.

Stream Splitting: 
Flink has direct API for splitting the input Data Stream into multiple streams, whereas in Spark it is not possible.
 
Complex Event Processing:
Flink comes along with the complex event processing API which means Flink has support for Event Time and Out-of-Order Events while Spark does not have it.

Memory Management: 
Spark provides configurable memory management while Flink provides automatic memory management. Spark has also moved towards automating memory management (Unified memory management) in Spark 1.6.0 version.

No comments:

Post a Comment