Test sets from random trees ¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤¤ For the data sets 24_tax.tar.gz and 96_tax.tar.gz ------------------------------------------------- The trees were generated by the stochastic speciation process described in (Kuhner and Felsenstein 1994). Deviation from molecular clock was obtained by multiplying every branch length by an exponentially distributed variable. The parameter of this exponential distribution was tuned to produce a realistic deviation. The following values are the distribution quantiles of the ratio between the lengths of the longest and the shortest lineages from root to taxa; under a strict molecular clock this ratio is equal to 1. 24 taxa 0%: 1.28, 25%: 1.78, 50%: 2.12, 75%: 2.51, 100%: 5.31 96 taxa 0%: 1.46, 25%: 2.13, 50%: 2.37, 75%: 2.62, 100%: 4.22 The average maximum pairwise divergence is close to 0.4 in both 24 taxa and 96 taxon trees. The following values are the distribution quantiles of the maximum pairwise divergence. 24 taxa 0%: 0.26, 25%: 0.36, 50%: 0.40, 75%: 0.45, 100%: 0.70 96 taxa 0%, 0.30, 25%: 0.37, 50%: 0.41, 75%: 0.45, 100%: 0.65 Trees are given in NEWICK format; there is one tree per line. Sequences with length 500 were obtained using SeqGen 1.1 (Rambault and Grassly 1997). We used the Kimura two parameter model (Kimura 1980), with a transition/transversion ratio equal to 2. Sequences are given in PHYLIP interleaved format, one after the other. For the data sets 24_xxx_seq.gz and 96_xxx_seq.gz ------------------------------------------------- Those are only homologous sequences data sets. For "small" data sets, each branch lengths of the original trees were divided by 2.0. These trees were then used to generate new sequences data sets (500 bp long sequences generated with Seq-Gen 1.1 under the K2P model, with transition/transversion ratio equal to 2). For "large" data sets, each branch lengths of the original tree were divided by 0.4 (i.e. x 2.5). Files: ------ The files 24tax.tar.gz and 96tax.tar.gz have to be decompressed. On UNIX systems, you have to use the following instruction: tar -zxvf 24tax.tar.gz; tar -zxvf 96tax.tar.gz ; The following directories will appear: 24tax and 96tax In each of these directories you'll find trees (24_tree, 96_tree), and sequences (24_seq, 96_seq). For the files 24_xxx_seq.gz and 96_xxx_seq.gz, type : gzip -d 24_xxx_seq.gz or gzip -d 96_xxx_seq.gz You can then use the seqences files 24_xxx_seq or 96_xxx_seq. ---------------------------------------------------------------------- BIBLIOGRAPHY Kuhner, M and Felsenstein, J (1994). A simulation comparison of phylogeny algorithms under equal and unequal evolutionnary rates. Mol. Biol. Evol., 11:459-468 Rambaut, A and Grassly, N (1997). Seq-gen : An application for the monte carlo simulation of dna sequence evolution along phylogenetic trees. Comput. Appl. Biosci., 13, 235-238 Kimura, M (1980). A simple method for estimating evolutionnary rates base substitutions through comparative studies of nucleotides sequences. J. Mol. Evol., 16:111-120. ---------------------------------------------------------------------- These data sets have been generated by Stephane Guindon: guindon@lirmm.fr.