The evolution of molecular sequences was simulated along the edges of these phylogenies, from the root to the leaves, according to a given condition of evolution (fast evolutionary rates on all edges, medium rates on all edges, slow rates on all edges, fast/slow rates on half of the edges, fast/slow rates on half of the sites) and to the Kimura 2-parameter model of evolution (transi/transv rate of 2). In this way, we generated 25,000 data sets for each condition of evolution.
We applied the Qstar independently to each data set, measuring each time the number of incorrect edges inferred by the method (which may be seen as false positives), as well as the size of the tree T* it outputs. The same process was applied to the NJ method. This method always inferred a fully resolved tree, so that each time it inferred a wrong edge, it forgot a correct edge (which may be seen as a false negative). Depending on the condition of evolution, the sequence length and the data set, some edges of the model phylogeny T did not support any mutation. As a result, data sets did not always contain information for each edge of T. In these cases, the method had no support to infer the corresponding edges. To account for this phenomenon, we also measured, for each data set, the number e_R of "realized" internal edges (ie, edges which supported at least one substitution, cf Kumar 96).
Results confirm that the Qstar method usually produces trees which possess almost only safe edges. More precisely, it induced less than one wrong edge in ten trees (average of 1.3% incorrect edges) over all conditions of evolution. Even for the most difficult condition considered, ie, unequal rates of evolution among different sites (which violates an assumption of the Kimura model and thus lowers the accuracy of the distance corrections), the Qstar method only induced $\approx 3.9\%$ incorrect edges on average.
As a consequence of inferring almost only safe edges, Qstar
usually produces trees which are to some extent partially
resolved. This implies that some correct edges were not
inferred. However, less than 1/3 of the correct edges were missing
on average. Moreover, we can see from the
table that there is a real correlation between %e_R and %e_T,
meaning that the Qstar method does not try to randomly resolve
edges for which the data set does not contain any information.
This behavior contrasts with that of most other methods, which infer
fully resolved trees but usually with a non-negligible percentage of
unsafe edges. Eg, the NJ tree contained on average more than
one wrong edge in a tree (15.3% incorrect edges).
Thus, the resulting tree usually contains some edges specific to the
data set rather than from the species' history. The Qstar
method is one of the few methods which tries to avoid this overfitting
effect (see Berry (98) for other methods designed in that sense).