[Sch90]: Presents state machine approach to FT embedded systems and describes protocols for two kind of failures: a) Byzantine and b) Fail-stop. When processors experience Byzantine failure, the system implementing a FT state machine must have at least replicas, and the output of the system is produced as majority of the replicas (the majority of the outputs remains correct even after as many as failures). If processors experience only fail-stop failures, then a system containing replicas suffices, and the output can be the output produced by one of its members.
[SSO$^+$99]:
Presents a work on tolerating hardware faults by means of software
(software-implemented hardware fault tolerance).
This work mainly targets transient failures and proposes error detection by
duplicated instructions (EDDI) such that:
.
The other technique proposed is error detection and correction (EDAC)
for protection of memory against SEU (single-event upset).
[TPDF98] Presents an automatic code generation tool namely xOLT for generating an FT layer to separate functional and non-functional requirements. It assumes fail-silent node failures. In node domain, FT Layer provides these software-based error detection strategies: double execution, double execution with reference check, validity checks (for messages, history-states, and resources), assertion checking, and signature checks. All of them are for detecting transient fail-silent failures. On the other hand, in distributed domain, the FT-Layer provides active replication. All protocols are mentioned comprehensively.
[NJD94] tells that there are three layers (hardware, OS and application) and a key issue in designing multi-layered FT systems is how to balance the amount of redundancy at various levels in order to obtain the best performance. Too much redundancy at the lower levels of abstraction can be wasteful from overall cost/effectiveness point of view and it may be cost-effective to put more effort on higher levels to avoid duplication of effort. On the other hand, a small investment at a lower level can contribute to cost saving and speed improvements at higher levels and can result in lower overall cost. Therefore, a good balance is required. The paper just lists the methods that can be implemented at each layer without any explanation.
[KE04] and [KA00] give algorithms for transforming a fault-intolerant program to a fault-tolerant one, considering fail-safe, non-masking and masking fault tolerance for high atomicity and low atomicity models.
[Liu91] (Thesis: Fault-Tolerant Programming by Transformations) is a comprehensive work on FT program transformation. However, it does not contain anything about satisfying real-time constraints while doing transformations. It leaves this problem for future work. [LJ99] mentions preserving RT constraints in FT transformation. There does not seems any other considerable work on this matter.
[GSPS02] simply explains how to utilize AOP by considering distribution, real-time and fault-tolerance aspects. It points out that the use of a distribution aspect program (containing middleware-specific code) enables user to support many different middlewares with the same functional code. The real-time aspect proposed as an example in the paper simply adds watch-dog-timers into functional code. In fault-tolerant example, the aspect adds redundancy by sending a specific request in the functional code several times. Very basic paper.
and [SH01] introduces replication as a non-functional property and a new language called JReplica that provides replication of objects to achieve fault-tolerance.
[SP02] implements an aspect weaver for integration of the aspect and component codes on .NET environment. In the examples presented, it considers just crash-faults at object level. After weaver transparently adds fault-tolerance, whenever a client creates an object (via new), multiple instances of the object are created and managed consistently (replication in space).
[AMS04] explores the semantics of aspect programming languages in the context of reactive embedded systems. A short paper.
[PDF$^+$00] presents a framework for aspect-oriented distributed programming based on aspect components.
[Bau01]: (PhD thesis) Transparent Fault Tolerance in a Time-Triggered Architecture.
[CLW]: A calculus for studying aspect oriented programming.