next up previous
Next: Aspect Oriented Programming (POOL) Up: Towards Fault-Tolerance Aspects for Previous: Checkpointing and Roll-back

Layered FT Approaches

A FT-layer has the task of separating the implementation of the non-functional requirements for fault-tolerance from the functional requirements as implemented by the application software. Thus, the FT-layer must handle error detection and error recovery transparently with respect to the application software [Laprie92]. Furthermore, the FT-layer is responsible for handling the information transfer with its replicated channels transparently.

Questions for brain storming:

1. Can fail-silent behavior be transient? In other words, shall we deal with permanent and transient failures or just permanent fail-silent failure?

2. A distributed CP/rollback mechanism is required when we also consider transient and intermittent failures. But, if we restrict the work to fail-silent, then do we still need CP/rollback?

3. Do we have to keep the global consistent state of the system always???

4. Is clock synchronization necessary? If so, byzantine clock resilience can be used?

5. If we consider real-time embedded systems, then the fact is that all tasks are very small, and therefore have very short execution times, but usually they appear with high frequencies. This means that it is more efficient not to insert even a few CPs into the body of the tasks. Okay, if CP is necessary, one CP at the end of the task instance should be enough? (modular checkpointing)

6. Is our work to distinguish between time-triggered and event-triggered architectures? Each requires different protocols and approaches.

7. How to detect the fail-silence failure? By implementing push (heartbeat) and pull (are you alive) models?

8. Can our aspects be:
- Information redundancy: To insert aspect code into communication (i.e. send and receive) functions (e.g. a CRC checksum or triplication of the messages sent)
- Time redundancy: To insert aspect code that duplicates or triplicates a task (on the same processor or different processors) and applies any voting scheme.
- N-version programming or recovery blocks may help recovering from fail-silent, moreover they may help recovering software design errors. Can we do this with aspect programming?
- ???


next up previous
Next: Aspect Oriented Programming (POOL) Up: Towards Fault-Tolerance Aspects for Previous: Checkpointing and Roll-back
Tolga Ayav 2004-10-25