Concepts of Map reduce

By Team EasyExamNotes

MapReduce is a programming model designed for processing large volumes of data in parallel by dividing the work into a set of independent tasks.

Hadoop is capable of running MapReduce programs written in various languages: Java, Ruby, Python, and C++.

The whole process of Map reduce goes through four phases of execution:

Splitting
Mapping
Shuffling
Reducing

For example, take following data as input for the Map Reduce.

Easy exam notes
Notes are good
Notes are bad

The final output of the MapReduce task is:

Words	Counts
Easy	1
Exam	1
Notes	3
Are	2
Good	1
Bad	1

The process of Map Reduce is:

Splitting: Input divided into fixed-size pieces called input splits.
Mapping: In this phase data in each split is passed to a mapping function to produce output values.
Shuffling: The same words are clubed together along with their respective frequency.
Reducing: This phase summarizes the complete dataset.

Download as PDF

Leave a Comment