Here is a comprehensive table, which shows the comparison between three most popular big data frameworks: Apache Flink, Apache Spark and Apache Hadoop.
Apache Hadoop | Apache Spark | Apache Flink | |
---|---|---|---|
Year of Origin |
2005 | 2009 | 2009 |
Place of Origin |
MapReduce (Google) Hadoop (Yahoo) | University of California, Berkeley | Technical University of Berlin |
Data Processing Engine |
Batch | Batch | Stream |
Processing Speed |
Slower than Spark and Flink | 100x Faster than Hadoop | Faster than spark |
Programming Languages |
Java, C, C++, Ruby, Groovy, Perl, Python | Java, Scala, python and R | Java and Scala |
Programming Model |
MapReduce | Resilient distributed Datasets (RDD) | Cyclic dataflows |
Data Transfer |
Batch | Batch | Pipelined and Batch |
Memory Management |
Disk Based | JVM Managed | Active Managed |
Latency |
Low | Medium | Low |
Throughput |
Medium | High | High |
Optimization |
Manual | Manual | Automatic |
API |
Low-level | High-level | High-level |
Streaming Support |
NA | Spark Streaming | Flink Streaming |
SQL Support |
Hive, Impala | SparkSQL | Table API and SQL |
Graph Support |
NA | GraphX | Gelly |
Machine Learning Support |
NA | SparkML | FlinkML |