Next: Checkpointing and Roll-back Up: Towards Fault-Tolerance Aspects for Previous: Fault-Tolerance!

Failure detection

p97-felber.pdf (Pascal Felber - Putting OO Distributed Programming to Work) According to the network topology and the communication pattern of the application, the choice between a push (heartbeat) or a pull (are you alive?) monitoring model can have an important impact on the performance of the system. In a push model, every component of the system is supposed to regularly send heartbeat information to the other components: a component is considered faulty if its heartbeats are not received by the other components in time (a heartbeat protocol is comprehensively presented in [ACT97]). In a pull model, a component A monitors a component B by sending are you alive? messages. If B fails to respond to these messages after some timeout, A considers B to be faulty. Neither the push nor the pull model fits all situations. In a large-scale system, one might use either of those models or even mix them to reduce the number of messages exchanged in the network. [AG02] explains push and pull models and lists the advantages and disadvantages of them, then introduces QoS tehcniques to enhance the performance of failure detectors: Push Model The Push model requires that only the monitored component send messages to the monitor whereas in Pull model both the monitor and monitored component send messages. So it appears that Push model is more efficient as it involves only one-way messages whereas Pull model requires two-way messages. But the argument in favor of Pull model is that the monitor need not send the liveness request regularly. Instead it can choose to do it when it really needs to know whether the process is alive. Also there are variations of Pull model that try to make it even more efficient. There are also some hybrid models that combine both models.

Next: Checkpointing and Roll-back Up: Towards Fault-Tolerance Aspects for Previous: Fault-Tolerance!

Tolga Ayav 2004-10-25