Fault tolerance in mapreduce

Explain how the fault tolerance work in the mapreduce?




In mapreduce job the master pings each worker periodically. In case, the worker does not respond to that particular system then the system is marked as failed. Even completed tasks are rescheduled due to the output was stored within a local disk of a worker which failed. Thus, the mapreduce is able to handle the large-scale failures easily by simply restarting the task. The master node always saves itself at the checkpoints and if there is any failure it simply restarts from that particular checkpoint.

   Related Questions in Computer Engineering