Tree exchange example |

- Core and Instrumentation predicates
- Core predicates
- Standard instrumentation predicates
- Invariant satisfied by a tree node
- Cross instrumentation predicates
- The property we want to prove
- Additional constraint rules
- CFG and equation system
- Invariants enforced and checked during postcondition
- Experiments

The previous example is not so convincing to demonstrate the usefulness of interprocedural analysis, as the list reversal algorithm can be implemented without recursion (it is tail recursive). Its purpose was just to experiment the interprocedural method and to help understanding the necessity to introduce instrumentation predicates linking the input and output states of a procedure. Such instrumentation predicates allows to represent with enough accuracy the transformer associated to the procedure.

We afford now a algorithm on binary trees, which consists in traversing every node of the tree and to exchange for each node the left and the right subtree. This example is more challenging and also more interesting, as it works on binary (full) trees, which is a more complex data-structure, and it is not tail-recursive, as it performs a depth-first search traversal.

Here is the program:

procedure main(tree) is begin main_entry: exchange(tree); main_end: end procedure exchange(x) is var bRight,bLeft; begin exchange_entry: if (x!=null) then begin exchange1: bRight := x->right; exchange2: bLeft := x->left; exchange3: x->right := null; // Exchange of the two subtrees exchange4: x->left := null; exchange5: x->right := bLeft; exchange6: x->left := bRight; exchange7: exchange(bRight); // Recursive calls exchange8: exchange(bLeft); end exchange_end: exchange_summary: end

The specification is split in several files.

File | Description |

exchange.tvp | main file, including auxiliary files and defining the equation system |

predicate.tvp | standard instrumentation predicates and constraint rules for trees |

functionInterprocedural.tvp | standard semantic functions for procedure calls and returns (same as list) |

function.tvp | semantic functions for various intraprocedural instructions for trees (depends on predicate.tvp for some update-formulae) |

exchange_inputA.tvs | set of initial states for the analysis, namely any subtree of a full binary tree |

exchange_inputB.tvs | set of initial states for the analysis, namely any full binary tree |

The core predicates we will use are the unary predicates corresponding to
variables of reference type, and the two binary predicates `left` and
`right` corresponding to the two record fields of a tree node, wich are
also of reference type.

We detail below rather "standard" instrumentation predicates.

Our algorithm will manipulate trees of unbounded sizes, so we should be able
to characterize the fact that a set of cells/nodes belongs to a tree by some
invariant, instead of relying on the (abstract) universe. In the latter case
indeed, any bluring of node will loose the information. Here are the
properties that defines a node *v* belonging to a full binary tree:

*full_left[m](v) & full_right[m](v)**!both[m](v)*(left and right subtrees are different)*!shared_down[m](v)*(the cell is not pointed by more than one cell)*!cyc_down[m](v)*(a cell cannot be both an ancestor and a son of another node

Of course, we should at least prove that our tree exchange procedure returns a structure satisfying this invariant, if it is applied to a structure satisfying it. The reachability and ancestor predicates will be needed to establish local invariants, for instance to distinguish nodes belonging to left or right subtree of a node, and to distinguish the ancestors cells from other cells in the memory heap.

We will also need so-called "cross" instrumentation predicates, to represent relationship between the input memory heap and the current/output memory heap.

Coming back to the example, assuming that all cells reachable from the
variable *tree* at the entry of procedure *main* are satisfying the treeness
invariant, we want to prove that after the call to *exchange*

- These cells still satisfy the treeness invariant;
- They also satisfy
*exch_right_succ[inp,out](v) & exch_left_succ[inp,out](v)*; - All the other cells satisfy
*id_succ|inp,out](v)*, i.e. they have not been modified.

Notice that the leaf cells of the tree will satisfy not only
*exch_right_succ[inp,out](v) & exch_left_succ[inp,out](v)* but also
*id _{s}ucc[right,inp,out](v) & id_{s}ucc[left,inp,out](v)*, as their fields

Among others, we have to show in the algorithm that

- At point
*exchange7*, after the first recursive calls, nodes reachable from*bRight*have been exchanged, node pointed by*x*has been exchanged, all the other are unmodified. - At point
*exchange8*, all nodes reachables from*x*have been exchanged, and none of the other.

Thus, we will need to focus on *r_down[m,x](v)*, *r_down[m,bRight](v)* and
*r_down[m,bLeft](v)* to separate the different kind of nodes, and to make sure
that the treeness invariant is preserved w.r.t. the absence of sharing
(*not (exist v : r_down[m,bRight](v) & r_down[m,bLeft](v))*).

TVLA does not infer all logical relationships bewteen the different core and instrumentation predicates, for the good reason that the problem is not decidable. It infers some elementary ones. However, we needed to specify manually some others.

- As for the list example, we need to specify transitivity rules between
the "cross" instrumentation predicates. For instance:
%r id_succ[f,m1,m2](v) & id_succ[f,m2,m3](v) ==> id_succ[f,m1,m3](v) %r id_succ[right,m1,m2](v) & exch_right_succ[m2,m3](v) ==> exch_right_succ[m1,m3](v) %r exch_right_succ[m1,m2](v) & id_succ[left,m2,m3](v) ==> exch_right_succ[m1,m3](v) %r exch_right_succ[m1,m2](v) & exch_left_succ[m2,m3](v) ==> id_succ[right,m1,m3](v) ...

This is standard in our approach - However, we discovered, after lengthy investigations on experimental
results, that the following rule, also of type "cross" and which is much
less standard,
**was needed**. They link reachability properties in different modes:%r r_down[m1,z](v) & (A(v_1) (r_down[m1,z](v) -> ((id_succ[right,m1,m2](v_1) & id_succ[left,m1,m2](v_1)) | (exch_right_succ[m1,m2](v_1) & exch_left_succ[m1,m2](v_1))))) ==> r_down[m2,z](v)

It says that if a node is reachable via*down*from*z*in mode*m*, and if either it has not been modified or its fields have been exchanged, then it is still reachable via_{1}*down*from*z*in mode*m*. This is mandatory so that the focusing on reachability predicates in say input mode is propagated to reachability predicates in output mode._{2} - Another constraint is needed at point
*exchange7*, to make apparent the fact that a node cannot be reachable from both*bLeft*and*bRight*, if the tree invariant is satisfied. More generally, if two (pointer) variables are not reachable from each other, and if the no-sharing assumption is satisfied, then a node of the subtree of the first variable cannot be reachable from tjhe other variable. This gives, in the case of the two variables*bLeft*and*bRight*:%r E(v_1,v_2) bLeft(v_1) & bRight(v_2) & v_1!=v_2 & // Variables points to different nodes !r_down[m,bLeft](v_2) & !r_down[m,bRight](v_1) & // These nodes cannot be reached from each other (A(v_3) r_down[m,bLeft](v_3) -> !shared_down[m](v_3)) & // There is no sharing in the subtree of bLeft r_down[m,bLeft](v) ==> !r_down[m,bRight](v) %r E(v_1,v_2) bLeft(v_1) & bRight(v_2) & v_1!=v_2 & // Variables points to different nodes !r_down[m,bLeft](v_2) & !r_down[m,bRight](v_1) & // These nodes cannot be reached from each other (A(v_3) r_down[m,bRight](v_3) -> !shared_down[m](v_3)) & // There is no sharing in the subtree of bRight r_down[m,bRight](v) ==> !r_down[m,bLeft](v)

We have already mention the necessity to focus on reachability predicates at
chosen point in the *exchange* procedure, so as to be able ot establish the
relevant invariants. As a consequence, we inserted between some control points
of the CFG the appropriate focusing transformations. In our case, in the
*exchange* procedure, after having initialized *bLeft* and *bRight*, between CFG nodes *exchange2* and *exchange3*, we inserted
focus operations on *r_down[inp,bRight](v)* and *r_down[inp,bLeft](v)*.

However, we also inserted bluring transformations, to limit the combinatorial explosion. The complexity of the procedure return operation is indeed quadratic in the number of the structures reachable at the exit point of the callee and the call point of the callee in the caller. Moreover, it appears that many structures could be merged, because they differ only in minor aspects that is irrelevant w.r.t. the property we want to check. There are typically two places where many things can be abstracted:

- At the entry point of the procedure, what is important is the property of the cells reachable from the tree root
*x*, whereas the number of cells in the context of the tree for instance is indifferent; - At the exit point of the procedure, we have exactly the same phenomenon.

Generally speaking, this depends on the effect of the considered procedure on the memory heap, the intuition being that precision is needed only in places where modification take place.

Technically, we would have enjoyed to make the set of abstraction predicates depend on the CFG node/equation system variable, as well as the set of active constraint rules. As this feature is currently not implemented and any way not necessarily desirable, we used Update transformations with update formula like:

sm(v) = r_down[inp,fp1](v) ? sm(v) : 1/2 exch_right_succ[m1,m2](v) = r_down[inp,fp1](v) ? exch_right_succ(v) : 1/2 ...

followed immediately by a Blur transformation (before applying Coerce, as Coerce can transform non-definite valuations (1/2) in definite valuations (0 or 1)).

We will give experiments an idea of the gain in the number of structures that are generated during the analysis.

Different input structures tested:

- A: single summary node, with no cyclicity neither sharing by
*down*,*tree*maybe pointing somewhere. Focusing on*tree*and*r_down[tree]*is performed before the call to*exchange*, which results in 5 different structures. - B: not yet

In the "Number of Structures" field, we indicate the number of reachable structures at the following control points:

*exchange_entry*;*exchange7*(before the first recursive call);*exchange8*(after the first and before the second recursive call);*exchange_end*(after the second recursive call);*exchange_summary*(after elimination of local variables and possibly additional blur, which gives the summary function).

The Postscript files give the result of the analysis (reachable structures) at the following control points:

*exchange7*(before the first recursive call);*exchange8*(after the first and before the second recursive call);*exchange_summary*;*main_begin*;*main_call*;*main_end*

The predicates *downStar[f,m](v _{1},v_{2})* does not appear in the graphs, in order to make them more readable.

Input TVS | Full Trees | Ancestor | Blur at entry | Blur at exit | Nb of predicates (abs/total) and constraints | Time | Success ? | Nb of Structures | File | Comment |

A | yes | no | no | no | 46/71 & 500 | 1m32s | yes | 6/10/14/22/10 | exchangeA1000.ps.gz | -- |

A | yes | no | no | yes | 46/71 & 500 | 1m36s | yes | 6/10/14/22/10 | exchangeA1001.ps.gz | -- |

A | yes | no | yes | yes | 46/71 & 500 | 1m36s | yes | 6/10/14/22/10 | exchangeA1011.ps.gz | -- |

A | yes | no | yes | no | 46/71 & 500 | 1m32s | yes | 6/10/14/22/10 | exchangeA1010.ps.gz | -- |

A | yes | yes | no | no | 58/83 & 680 | >13m | ? | >35/50/32/23/23 | -- | combinatorial explosion if more running time is given (no blur at entry) |

A | yes | yes | no | yes | 58/83 & 680 | >13m | ? | 34/80/100/136/11 | -- | combinatorial explosion if more running time is given (no blur at entry) |

A | yes | yes | yes | yes | 58/83 & 680 | 2m14s | yes | 18/10/14/22/10 | exchangeA1111.ps.gz | -- |

A | yes | yes | yes | no | 58/83 & 680 | 2m03s | yes | 18/10/14/22/10 | exchangeA1110.ps.gz | -- |

A | no | yes | yes | yes | ? & ? | >590m | no | >26/58/329/1176/639 | ? | failure to sharing violation |

B | yes | no | no | no | 46/71 & 500 | 1m48s | yes | 7/10/14/22/10 | exchangeB1000.ps.gz | -- |

We did not yet succeed to show the property on non-full binary trees: either the tree invariant is violated during the analysis, or the additional instrumentation predicates induces a combinatorial explosion. Also, a phenomenon is that our meet is approximated, because the additional constraints representing one of the structure are taken into account only if their body has a definite value, which requires in theory focusing on all the involved predicates, which too expensive in the general case. Progress should be made to improve the implementation of the meet operation, probably using a different technique.

Tree exchange example |