Off-line real-time fault-tolerant scheduling

Catalin Dima, Alain Girault, Christophe Lavarenne, and Yves Sorel
Euromicro Workshop on Parallel and Distributed Processing
Mantova, Italy, February 2001


We address the problem of off-line fault-tolerant scheduling of an algorithm onto a given architecture with distributed memory and provide a generic algorithm which solves this problem. We take into account two kinds of failures: permanent fail-stop and intermittent fail-silent. The basic technique we use is the replication of operations and data communications. We then discuss the principles which govern the execution of schedulings with replication under the state-machine and the primary/backup arbitrations between replicas. We also show how to compute the execution date for each operation and the timeouts which are used for detecting failures. We end with a heuristic which, using this calculus, computes a possibly non optimal scheduling which tries to minimize locally the total duration of execution of the distributed fault-tolerant algorithm.

BibTeX entry

  author = 	 {C. Dima and A. Girault and C. Lavarenne and Y. Sorel},
  title = 	 {Off-Line Real-Time Fault-Tolerant Scheduling},
  booktitle = 	 {Euromicro Workshop on Parallel and Distributed Processing},
  year =	 {2001},
  address =	 {Mantova, Italy},
  month =	 {February},
  pages =	 {410--417},

[PDF] [Postscript]

Send comments to Alain Girault at