MapReduce
Imperative Programming
a programming paradigm that describes computation in terms of statements that change program state.
Functional Programming
A programming paradigm that treats computation as the evaluation of mathematical functions
Reduce phase(reducers)
Aggregates key/value pairs based on the user-define code
Map Phase
transforms the input splits into key/value pairs based on the user-define code
Mapreduce phases
1. Split phase 2. Map phase 3. shuffle & sort 4. Reduce phase 5. Output
Job Tracker(Master)
Coordinates Jobs. Come up with execution plan. Phase coordination
Task Tracker(Slave)
Execute job tasks. Break down jobs into tasks. Map and reduce tasks. Takes map /reduce function out of compiled binaries and put them into the task slots. Report their progress back to the job tracker.
Split phase
Input data divided into input splits based on the input format. Text format is split line by line.
shuffle & sort
Moves map outputs to the reducers and sorts them by key
Job Client
Submits jobs
Input format
determines how the files are parsed into the mapreduce pipeline.
Output format
determines how the results are written to the output directory
MapReduce
programmable framework for pulling data in parallel from our framework