TOLÈRE joint research project: Bip - Sosso


Fault tolerant distributed code
for embedded systems



Page d'accueil en français Participants Results

Research project context

Constraints for embedded systems

Embedded systems take a growing part in numerous applications by controlling more and more their behavior automatically. Present for a long time in costly applications (space, aeronautics, military, and so on), where they account for a major part of the application, they now begin to appear in public domain applications such as automatic car driving. Their main features are:

Synchronous programming [Hal93] offers specification methods and formal verification tools that give satisfying answers to the needs mentioned above. These methods are based upon: the modeling with automata, the specification with high level languages formally defined, and finally the theoretical analysis of the models used to obtain formal proof methods.

However, the following aspects, extremely important within the aimed application fields, are not taken into account by synchronous programming:

As a summary, synchronous programming allows the design of embedded systems with several key advantages: timing constraints, formal verification, clean and safe programming, thanks to the synchrony hypothesis. Modern distribution techniques allow then the automatic production of distributed code. However, the fault tolerance of the final distributed program is not insured. We intend to address this difficult problem within a collaboration between the Bip and Sosso teams of Inria.

Softwares involved in the TOLÈRE research project

The approaches and methods presented above are implemented within two software environments: they allow the design of critical embedded systems with a reasonable developing time and design safety.

The Orccad environment [SECK93] (for Open Robot Controller CAD) is a high level software well adapted to the specification of robotic applications involving automatic control and discrete event aspects. Orccad is developed jointly by the Icare team at Sophia-Antipolis, the Bip team at Montbonnot, and by the robotics systems administrators at Montbonnot. Within Orccad, an elementary action is modeled as a Robot-Task (RT), which is a command law merged with a logical reactive behavior. RTs are the interface between the continuous time (in fact sampled time) and the discrete events. Such control laws are built with modules connected through a data-flow network. The logical reactive behavior is specified in the Esterel language [BG92]. Such elementary actions are then combined to form Robot-Procedures (RPs) of growing complexity, up to the final application. Each PR, also specified in Esterel, describes the behavior of a robotic mission along with predefined exception handling. Programming in Esterel all the logical aspects of the application allows us to benefit from the associated formal proof environment (FC2Tools and Xeve). A prototype version of Orccad is currently available. So far, the code generated is for a single processor running under VxWorks or Solaris.

The SynDEx environment [Sor94] (for Synchronized Distributed Executive), developed by the Sosso team at Rocquencourt, is a software environment dedicated to the multi-processor implementation of synchronous programs. SynDEx supports the "Algorithm/Architecture Adequation" method. It takes into account the real-time and embedding constraints that must be satisfied by the application. The application algorithm is described as a conditioned data-flow graph, either specified graphically or produced by one of the synchronous compilers under the DC format. The heterogeneous target architecture is specified as a network of hardware components, processors, and/or specific circuits, connected through communication medias (links, busses). Fast heuristics allow, as statically as possible, the automatic distribution and scheduling of the program on the target architecture, while minimizing its execution duration as well as the number of necessary components. Finally, SynDEx generates the minimal real-time distributed executive required to run the distributed algorithm.

The TOLÈRE research project

Work topics

For each topic, we have indicated the involved researchers:

  1. To deal with the automatic-control/discrete-event aspects (Sorel et Simon). In numerous computer systems, in particular in embedded ones controlling physical processes, the computing power of the machine is shared between a periodic computing task (control law) and a discrete-event task related to reactive systems (mode change, exception handling, and so on). The design safety relying on the possibility to formally verify the program, it requires to be able to make cooperate both aspects and to model accurately their behaviors. Although they are complementary, both aspects are addressed separately in the robotics field. It seems interesting to merge them under a unified model. To do so, several problems must be carefully addressed:
  2. To propose solutions to make the distributed code fault tolerant (Sorel et Girault). It concerns two aspects: 1) taking into account faults at the level of the algorithm itself, by modifying it to provide more and more degraded modes, and 2) adding hardware redundancy at the architecture level, with a vote mechanism between processors and communication media, as well as adding dedicated hardware components to embed the most critical parts of the program. For instance, to prevent a faulty processor from blocking the whole application, we can imagine a sub-network dedicated to the detection of faults, in charge of switching the application into a degraded mode. A final point concerns performances which must remain at a sufficient level.
  3. To propose solutions to insure that the specification of the problem is complete (Sorel et Simon). Currently, exception handling within Orccad are programmed with the Esterel synchronous language. This specification and coding phase is followed by a formal verification phase performed with FC2Tools and Xeve. During this phase, generic properties are automatically checked. However, application specific properties must be checked manually, which is of course less reliable. Taking into account distribution and fault tolerance constraints will inevitably make things worse. Another solution consists in synthesizing the discrete-event part from symbolic constraints. These synthesis techniques seem to derive from the Ramadge and Wonham theory [RW87].
  4. To combine Orccad and SynDEx (Sorel, Simon, and Girault). The goal here is to propose a unique software environment that is coherent from the system's specification to the real-time optimized implementation. This software environment also has to take into account the fault tolerance aspects, both at the hardware and software levels. Here the goal is to study a way to unify the Orccad and SynDEx semantics, with respect to the models described in the previous topics, with DC as an intermediate format. Note that the Esterel compiler can already produce DC code, and that SynDEx accepts DC programs as input.

Application domains

The two teams are involved in the following domains:

Other applications are in the transport (automotive and aeronautics) field.

Inria teams involved in the TOLÈRE research project

References

[BG92] G. Berry and G. Gonthier. The Esterel synchronous programming language: Design, semantics, implementation. Science of Computer Programming, 19(2):87-152, 1992.

[Hal93] N. Halbwachs. Synchronous programming of reactive systems. Kluwer Academic Pub., 1993.

[RW87] P.J. Ramadge and W.M. Wonham. Supervisory control of a class of discrete event processes. SIAM Journal on Control and Optimization, 25(1):206-230, January 1987.

[SECK93] D. Simon, B. Espiau, E. Castillo, and K. Kapellos. Computer-aided design of a generic robot controller handling reactivity and real-time control issues. IEEE Transactions on Control Systems Technology, 1(4), December 1993.

[Sor94] Y. Sorel. Massively parallel computing systems with real-time constraints, the "algorithm/architecture adequation" methodology. In Massively Parallel Computing Systems Conference, Ischia, Italy, May 1994.