Tree exchange example

Tree exchange example

Core and Instrumentation predicates

Core predicates
Standard instrumentation predicates
Invariant satisfied by a tree node
Cross instrumentation predicates

The property we want to prove
Additional constraint rules
CFG and equation system

Insertion of Focus operations
Insertion of Blur operations

Invariants enforced and checked during postcondition
Experiments

The previous example is not so convincing to demonstrate the usefulness of interprocedural analysis, as the list reversal algorithm can be implemented without recursion (it is tail recursive). Its purpose was just to experiment the interprocedural method and to help understanding the necessity to introduce instrumentation predicates linking the input and output states of a procedure. Such instrumentation predicates allows to represent with enough accuracy the transformer associated to the procedure.

We afford now a algorithm on binary trees, which consists in traversing every node of the tree and to exchange for each node the left and the right subtree. This example is more challenging and also more interesting, as it works on binary (full) trees, which is a more complex data-structure, and it is not tail-recursive, as it performs a depth-first search traversal.

Here is the program:



procedure main(tree) is
begin
main_entry:     exchange(tree);
main_end:
end

procedure exchange(x) is
var bRight,bLeft;
begin
exchange_entry: if (x!=null) then begin
exchange1:         bRight := x->right;
exchange2:         bLeft := x->left;
exchange3:         x->right := null; // Exchange of the two subtrees
exchange4:         x->left := null;
exchange5:         x->right := bLeft;
exchange6:         x->left := bRight;
exchange7:         exchange(bRight); // Recursive calls
exchange8:         exchange(bLeft);
               end
exchange_end:
exchange_summary:
end

The specification is split in several files.

File	Description
exchange.tvp	main file, including auxiliary files and defining the equation system
predicate.tvp	standard instrumentation predicates and constraint rules for trees
functionInterprocedural.tvp	standard semantic functions for procedure calls and returns (same as list)
function.tvp	semantic functions for various intraprocedural instructions for trees (depends on predicate.tvp for some update-formulae)
exchange_inputA.tvs	set of initial states for the analysis, namely any subtree of a full binary tree
exchange_inputB.tvs	set of initial states for the analysis, namely any full binary tree

Core and Instrumentation predicates

Core predicates

The core predicates we will use are the unary predicates corresponding to variables of reference type, and the two binary predicates left and right corresponding to the two record fields of a tree node, wich are also of reference type.

Standard instrumentation predicates

We detail below rather "standard" instrumentation predicates.

The down and downStar predicates allows mainly to define the predicates shared_down, cyc_down, r_down[z] and a_down[z]. The two predicates full_left and full_right allows to model the fact that a tree is full, i.e. a node has either two or zero sons.

Invariant satisfied by a tree node

Our algorithm will manipulate trees of unbounded sizes, so we should be able to characterize the fact that a set of cells/nodes belongs to a tree by some invariant, instead of relying on the (abstract) universe. In the latter case indeed, any bluring of node will loose the information. Here are the properties that defines a node v belonging to a full binary tree:

full_left[m](v) & full_right[m](v)
!both[m](v) (left and right subtrees are different)
!shared_down[m](v) (the cell is not pointed by more than one cell)
!cyc_down[m](v) (a cell cannot be both an ancestor and a son of another node

Of course, we should at least prove that our tree exchange procedure returns a structure satisfying this invariant, if it is applied to a structure satisfying it. The reachability and ancestor predicates will be needed to establish local invariants, for instance to distinguish nodes belonging to left or right subtree of a node, and to distinguish the ancestors cells from other cells in the memory heap.

Cross instrumentation predicates

We will also need so-called "cross" instrumentation predicates, to represent relationship between the input memory heap and the current/output memory heap.

Predicate id_succ allows to model the identity relationship of a cell between input and output state, and the predicates exch_right_succ and exch_left_succ the fact the two fields of a node have been exchanged.

The property we want to prove

Coming back to the example, assuming that all cells reachable from the variable tree at the entry of procedure main are satisfying the treeness invariant, we want to prove that after the call to exchange

These cells still satisfy the treeness invariant;
They also satisfy exch_right_succ[inp,out](v) & exch_left_succ[inp,out](v);
All the other cells satisfy id_succ|inp,out](v), i.e. they have not been modified.

Notice that the leaf cells of the tree will satisfy not only exch_right_succ[inp,out](v) & exch_left_succ[inp,out](v) but also id_succ[right,inp,out](v) & id_succ[left,inp,out](v), as their fields left and right are null.

Among others, we have to show in the algorithm that

At point exchange7, after the first recursive calls, nodes reachable from bRight have been exchanged, node pointed by x has been exchanged, all the other are unmodified.
At point exchange8, all nodes reachables from x have been exchanged, and none of the other.

Thus, we will need to focus on r_down[m,x](v), r_down[m,bRight](v) and r_down[m,bLeft](v) to separate the different kind of nodes, and to make sure that the treeness invariant is preserved w.r.t. the absence of sharing (not (exist v : r_down[m,bRight](v) & r_down[m,bLeft](v))).

Additional constraint rules

TVLA does not infer all logical relationships bewteen the different core and instrumentation predicates, for the good reason that the problem is not decidable. It infers some elementary ones. However, we needed to specify manually some others.

As for the list example, we need to specify transitivity rules between the "cross" instrumentation predicates. For instance:


%r id_succ[f,m1,m2](v) & id_succ[f,m2,m3](v) ==> id_succ[f,m1,m3](v)
%r id_succ[right,m1,m2](v) & exch_right_succ[m2,m3](v) ==> 
        exch_right_succ[m1,m3](v)
%r exch_right_succ[m1,m2](v) & id_succ[left,m2,m3](v) ==> 
        exch_right_succ[m1,m3](v)
%r exch_right_succ[m1,m2](v) & exch_left_succ[m2,m3](v) ==> 
        id_succ[right,m1,m3](v)
...

This is standard in our approach

However, we discovered, after lengthy investigations on experimental results, that the following rule, also of type "cross" and which is much less standard, was needed. They link reachability properties in different modes:
```
%r r_down[m1,z](v) & 
   (A(v_1) (r_down[m1,z](v) -> 
        ((id_succ[right,m1,m2](v_1) & id_succ[left,m1,m2](v_1)) | 
         (exch_right_succ[m1,m2](v_1) & exch_left_succ[m1,m2](v_1)))))
   ==> 
   r_down[m2,z](v)
```
It says that if a node is reachable via down from z in mode m₁, and if either it has not been modified or its fields have been exchanged, then it is still reachable via down from z in mode m₂. This is mandatory so that the focusing on reachability predicates in say input mode is propagated to reachability predicates in output mode.

Another constraint is needed at point exchange7, to make apparent the fact that a node cannot be reachable from both bLeft and bRight, if the tree invariant is satisfied. More generally, if two (pointer) variables are not reachable from each other, and if the no-sharing assumption is satisfied, then a node of the subtree of the first variable cannot be reachable from tjhe other variable. This gives, in the case of the two variables bLeft and bRight:


%r E(v_1,v_2)
    bLeft(v_1) & bRight(v_2) & v_1!=v_2 &                   // Variables points to different nodes
    !r_down[m,bLeft](v_2) & !r_down[m,bRight](v_1) &        // These nodes cannot be reached from each other
    (A(v_3) r_down[m,bLeft](v_3) -> !shared_down[m](v_3)) & // There is no sharing in the subtree of bLeft
    r_down[m,bLeft](v) ==> !r_down[m,bRight](v)

%r E(v_1,v_2)
    bLeft(v_1) & bRight(v_2) & v_1!=v_2 &                   // Variables points to different nodes
    !r_down[m,bLeft](v_2) & !r_down[m,bRight](v_1) &         // These nodes cannot be reached from each other
    (A(v_3) r_down[m,bRight](v_3) -> !shared_down[m](v_3)) & // There is no sharing in the subtree of bRight
    r_down[m,bRight](v) ==> !r_down[m,bLeft](v)

CFG and equation system

Insertion of Focus operations

We have already mention the necessity to focus on reachability predicates at chosen point in the exchange procedure, so as to be able ot establish the relevant invariants. As a consequence, we inserted between some control points of the CFG the appropriate focusing transformations. In our case, in the exchange procedure, after having initialized bLeft and bRight, between CFG nodes exchange2 and exchange3, we inserted focus operations on r_down[inp,bRight](v) and r_down[inp,bLeft](v).

Insertion of Blur operations

However, we also inserted bluring transformations, to limit the combinatorial explosion. The complexity of the procedure return operation is indeed quadratic in the number of the structures reachable at the exit point of the callee and the call point of the callee in the caller. Moreover, it appears that many structures could be merged, because they differ only in minor aspects that is irrelevant w.r.t. the property we want to check. There are typically two places where many things can be abstracted:

At the entry point of the procedure, what is important is the property of the cells reachable from the tree root x, whereas the number of cells in the context of the tree for instance is indifferent;
At the exit point of the procedure, we have exactly the same phenomenon.

Generally speaking, this depends on the effect of the considered procedure on the memory heap, the intuition being that precision is needed only in places where modification take place.

Technically, we would have enjoyed to make the set of abstraction predicates depend on the CFG node/equation system variable, as well as the set of active constraint rules. As this feature is currently not implemented and any way not necessarily desirable, we used Update transformations with update formula like:

  sm(v) = r_down[inp,fp1](v) ? sm(v) : 1/2
  exch_right_succ[m1,m2](v) = r_down[inp,fp1](v) ?  exch_right_succ(v) : 1/2
  ...

followed immediately by a Blur transformation (before applying Coerce, as Coerce can transform non-definite valuations (1/2) in definite valuations (0 or 1)).

We will give experiments an idea of the gain in the number of structures that are generated during the analysis.

Invariants enforced and checked during postcondition

Experiments

Different input structures tested:

A: single summary node, with no cyclicity neither sharing by down, tree maybe pointing somewhere. Focusing on tree and r_down[tree] is performed before the call to exchange, which results in 5 different structures.
B: not yet

In the "Number of Structures" field, we indicate the number of reachable structures at the following control points:

exchange_entry;
exchange7 (before the first recursive call);
exchange8 (after the first and before the second recursive call);
exchange_end (after the second recursive call);
exchange_summary (after elimination of local variables and possibly additional blur, which gives the summary function).

The Postscript files give the result of the analysis (reachable structures) at the following control points:

exchange7 (before the first recursive call);
exchange8 (after the first and before the second recursive call);
exchange_summary;
main_begin;
main_call;
main_end

The predicates downStar[f,m](v₁,v₂) does not appear in the graphs, in order to make them more readable.

Input TVS	Full Trees	Ancestor	Blur at entry	Blur at exit	Nb of predicates (abs/total) and constraints	Time	Success ?	Nb of Structures	File	Comment
A	yes	no	no	no	46/71 & 500	1m32s	yes	6/10/14/22/10	exchangeA1000.ps.gz	--
A	yes	no	no	yes	46/71 & 500	1m36s	yes	6/10/14/22/10	exchangeA1001.ps.gz	--
A	yes	no	yes	yes	46/71 & 500	1m36s	yes	6/10/14/22/10	exchangeA1011.ps.gz	--
A	yes	no	yes	no	46/71 & 500	1m32s	yes	6/10/14/22/10	exchangeA1010.ps.gz	--
A	yes	yes	no	no	58/83 & 680	>13m	?	>35/50/32/23/23	--	combinatorial explosion if more running time is given (no blur at entry)
A	yes	yes	no	yes	58/83 & 680	>13m	?	34/80/100/136/11	--	combinatorial explosion if more running time is given (no blur at entry)
A	yes	yes	yes	yes	58/83 & 680	2m14s	yes	18/10/14/22/10	exchangeA1111.ps.gz	--
A	yes	yes	yes	no	58/83 & 680	2m03s	yes	18/10/14/22/10	exchangeA1110.ps.gz	--
A	no	yes	yes	yes	? & ?	>590m	no	>26/58/329/1176/639	?	failure to sharing violation
B	yes	no	no	no	46/71 & 500	1m48s	yes	7/10/14/22/10	exchangeB1000.ps.gz	--

We did not yet succeed to show the property on non-full binary trees: either the tree invariant is violated during the analysis, or the additional instrumentation predicates induces a combinatorial explosion. Also, a phenomenon is that our meet is approximated, because the additional constraints representing one of the structure are taken into account only if their body has a definite value, which requires in theory focusing on all the involved predicates, which too expensive in the general case. Progress should be made to improve the implementation of the meet operation, probably using a different technique.

Tree exchange example