MapReduce is a programming model designed for processing large volumes of data in parallel by dividing the work into a set of independent tasks.
Hadoop is capable of running MapReduce programs written in various languages: Java, Ruby, Python, and C++.
The whole process of Map reduce goes through four phases of execution:
For example, take following data as input for the Map Reduce.
Easy exam notes Notes are good Notes are bad
The final output of the MapReduce task is:
The process of Map Reduce is:
- Splitting: Input divided into fixed-size pieces called input splits.
- Mapping: In this phase data in each split is passed to a mapping function to produce output values.
- Shuffling: The same words are clubed together along with their respective frequency.
- Reducing: This phase summarizes the complete dataset.