Tree exchange example |
The previous example is not so convincing to demonstrate the usefulness of interprocedural analysis, as the list reversal algorithm can be implemented without recursion (it is tail recursive). Its purpose was just to experiment the interprocedural method and to help understanding the necessity to introduce instrumentation predicates linking the input and output states of a procedure. Such instrumentation predicates allows to represent with enough accuracy the transformer associated to the procedure.
We afford now a algorithm on binary trees, which consists in traversing every node of the tree and to exchange for each node the left and the right subtree. This example is more challenging and also more interesting, as it works on binary (full) trees, which is a more complex data-structure, and it is not tail-recursive, as it performs a depth-first search traversal.
Here is the program:
procedure main(tree) is begin main_entry: exchange(tree); main_end: end procedure exchange(x) is var bRight,bLeft; begin exchange_entry: if (x!=null) then begin exchange1: bRight := x->right; exchange2: bLeft := x->left; exchange3: x->right := null; // Exchange of the two subtrees exchange4: x->left := null; exchange5: x->right := bLeft; exchange6: x->left := bRight; exchange7: exchange(bRight); // Recursive calls exchange8: exchange(bLeft); end exchange_end: exchange_summary: end
The specification is split in several files.
File | Description |
exchange.tvp | main file, including auxiliary files and defining the equation system |
predicate.tvp | standard instrumentation predicates and constraint rules for trees |
functionInterprocedural.tvp | standard semantic functions for procedure calls and returns (same as list) |
function.tvp | semantic functions for various intraprocedural instructions for trees (depends on predicate.tvp for some update-formulae) |
exchange_inputA.tvs | set of initial states for the analysis, namely any subtree of a full binary tree |
exchange_inputB.tvs | set of initial states for the analysis, namely any full binary tree |
The core predicates we will use are the unary predicates corresponding to variables of reference type, and the two binary predicates left and right corresponding to the two record fields of a tree node, wich are also of reference type.
We detail below rather "standard" instrumentation predicates.
Our algorithm will manipulate trees of unbounded sizes, so we should be able to characterize the fact that a set of cells/nodes belongs to a tree by some invariant, instead of relying on the (abstract) universe. In the latter case indeed, any bluring of node will loose the information. Here are the properties that defines a node v belonging to a full binary tree:
Of course, we should at least prove that our tree exchange procedure returns a structure satisfying this invariant, if it is applied to a structure satisfying it. The reachability and ancestor predicates will be needed to establish local invariants, for instance to distinguish nodes belonging to left or right subtree of a node, and to distinguish the ancestors cells from other cells in the memory heap.
We will also need so-called "cross" instrumentation predicates, to represent relationship between the input memory heap and the current/output memory heap.
Coming back to the example, assuming that all cells reachable from the variable tree at the entry of procedure main are satisfying the treeness invariant, we want to prove that after the call to exchange
Notice that the leaf cells of the tree will satisfy not only exch_right_succ[inp,out](v) & exch_left_succ[inp,out](v) but also idsucc[right,inp,out](v) & idsucc[left,inp,out](v), as their fields left and right are null.
Among others, we have to show in the algorithm that
Thus, we will need to focus on r_down[m,x](v), r_down[m,bRight](v) and r_down[m,bLeft](v) to separate the different kind of nodes, and to make sure that the treeness invariant is preserved w.r.t. the absence of sharing (not (exist v : r_down[m,bRight](v) & r_down[m,bLeft](v))).
TVLA does not infer all logical relationships bewteen the different core and instrumentation predicates, for the good reason that the problem is not decidable. It infers some elementary ones. However, we needed to specify manually some others.
%r id_succ[f,m1,m2](v) & id_succ[f,m2,m3](v) ==> id_succ[f,m1,m3](v) %r id_succ[right,m1,m2](v) & exch_right_succ[m2,m3](v) ==> exch_right_succ[m1,m3](v) %r exch_right_succ[m1,m2](v) & id_succ[left,m2,m3](v) ==> exch_right_succ[m1,m3](v) %r exch_right_succ[m1,m2](v) & exch_left_succ[m2,m3](v) ==> id_succ[right,m1,m3](v) ...This is standard in our approach
%r r_down[m1,z](v) & (A(v_1) (r_down[m1,z](v) -> ((id_succ[right,m1,m2](v_1) & id_succ[left,m1,m2](v_1)) | (exch_right_succ[m1,m2](v_1) & exch_left_succ[m1,m2](v_1))))) ==> r_down[m2,z](v)It says that if a node is reachable via down from z in mode m1, and if either it has not been modified or its fields have been exchanged, then it is still reachable via down from z in mode m2. This is mandatory so that the focusing on reachability predicates in say input mode is propagated to reachability predicates in output mode.
%r E(v_1,v_2) bLeft(v_1) & bRight(v_2) & v_1!=v_2 & // Variables points to different nodes !r_down[m,bLeft](v_2) & !r_down[m,bRight](v_1) & // These nodes cannot be reached from each other (A(v_3) r_down[m,bLeft](v_3) -> !shared_down[m](v_3)) & // There is no sharing in the subtree of bLeft r_down[m,bLeft](v) ==> !r_down[m,bRight](v) %r E(v_1,v_2) bLeft(v_1) & bRight(v_2) & v_1!=v_2 & // Variables points to different nodes !r_down[m,bLeft](v_2) & !r_down[m,bRight](v_1) & // These nodes cannot be reached from each other (A(v_3) r_down[m,bRight](v_3) -> !shared_down[m](v_3)) & // There is no sharing in the subtree of bRight r_down[m,bRight](v) ==> !r_down[m,bLeft](v)
We have already mention the necessity to focus on reachability predicates at chosen point in the exchange procedure, so as to be able ot establish the relevant invariants. As a consequence, we inserted between some control points of the CFG the appropriate focusing transformations. In our case, in the exchange procedure, after having initialized bLeft and bRight, between CFG nodes exchange2 and exchange3, we inserted focus operations on r_down[inp,bRight](v) and r_down[inp,bLeft](v).
However, we also inserted bluring transformations, to limit the combinatorial explosion. The complexity of the procedure return operation is indeed quadratic in the number of the structures reachable at the exit point of the callee and the call point of the callee in the caller. Moreover, it appears that many structures could be merged, because they differ only in minor aspects that is irrelevant w.r.t. the property we want to check. There are typically two places where many things can be abstracted:
Generally speaking, this depends on the effect of the considered procedure on the memory heap, the intuition being that precision is needed only in places where modification take place.
Technically, we would have enjoyed to make the set of abstraction predicates depend on the CFG node/equation system variable, as well as the set of active constraint rules. As this feature is currently not implemented and any way not necessarily desirable, we used Update transformations with update formula like:
sm(v) = r_down[inp,fp1](v) ? sm(v) : 1/2 exch_right_succ[m1,m2](v) = r_down[inp,fp1](v) ? exch_right_succ(v) : 1/2 ...
followed immediately by a Blur transformation (before applying Coerce, as Coerce can transform non-definite valuations (1/2) in definite valuations (0 or 1)).
We will give experiments an idea of the gain in the number of structures that are generated during the analysis.
Different input structures tested:
In the "Number of Structures" field, we indicate the number of reachable structures at the following control points:
The Postscript files give the result of the analysis (reachable structures) at the following control points:
The predicates downStar[f,m](v1,v2) does not appear in the graphs, in order to make them more readable.
Input TVS | Full Trees | Ancestor | Blur at entry | Blur at exit | Nb of predicates (abs/total) and constraints | Time | Success ? | Nb of Structures | File | Comment |
A | yes | no | no | no | 46/71 & 500 | 1m32s | yes | 6/10/14/22/10 | exchangeA1000.ps.gz | -- |
A | yes | no | no | yes | 46/71 & 500 | 1m36s | yes | 6/10/14/22/10 | exchangeA1001.ps.gz | -- |
A | yes | no | yes | yes | 46/71 & 500 | 1m36s | yes | 6/10/14/22/10 | exchangeA1011.ps.gz | -- |
A | yes | no | yes | no | 46/71 & 500 | 1m32s | yes | 6/10/14/22/10 | exchangeA1010.ps.gz | -- |
A | yes | yes | no | no | 58/83 & 680 | >13m | ? | >35/50/32/23/23 | -- | combinatorial explosion if more running time is given (no blur at entry) |
A | yes | yes | no | yes | 58/83 & 680 | >13m | ? | 34/80/100/136/11 | -- | combinatorial explosion if more running time is given (no blur at entry) |
A | yes | yes | yes | yes | 58/83 & 680 | 2m14s | yes | 18/10/14/22/10 | exchangeA1111.ps.gz | -- |
A | yes | yes | yes | no | 58/83 & 680 | 2m03s | yes | 18/10/14/22/10 | exchangeA1110.ps.gz | -- |
A | no | yes | yes | yes | ? & ? | >590m | no | >26/58/329/1176/639 | ? | failure to sharing violation |
B | yes | no | no | no | 46/71 & 500 | 1m48s | yes | 7/10/14/22/10 | exchangeB1000.ps.gz | -- |
We did not yet succeed to show the property on non-full binary trees: either the tree invariant is violated during the analysis, or the additional instrumentation predicates induces a combinatorial explosion. Also, a phenomenon is that our meet is approximated, because the additional constraints representing one of the structure are taken into account only if their body has a definite value, which requires in theory focusing on all the involved predicates, which too expensive in the general case. Progress should be made to improve the implementation of the meet operation, probably using a different technique.
Tree exchange example |