PhyloQuart v1.4 - a package of programs for reconstructing phylogenies from quartets

This Phylip-compatible package contains quartet-based phylogeny reconstruction programs (ie programs that take as input sets of quartets on the taxa), as well as programs for inferring sets of quartets from nucleotide or distance datasets.

The tree-reconstruction programs include the Q* method described in the paper
Inferring Evolutionary Trees with Strong Combinatorial Evidence (TCS, to appear)
[Berry and Gascuel 97] and the Buneman construction described in [Buneman 71] and revisited in the paper Faster reliable phylogenetic analysis (RECOMB 99) [Berry and Bryant 99].

Precise description of PhyloQuart v1.4

PhyloQuart is a collection of programs aimed at estimating phylogenies from a quartet principle. The idea is to proceed in two steps:
  • i) every set of four taxa is considered and one (or none) of the three possible bifuracting trees is chosen;
  • ii) the "quartets" (or trees on four taxa) are combined in an overall tree on the entire set of taxa.

    The PhyloQuart package contains different kinds of programs:

  • quartet inference programs (PARCIQUART and DISTQUART) which infer a set of quartets from biological data, mainly nucleotide sequences for the taxa or inter-taxa distances.

  • phylogeny reconstruction programs (QSTAR and ADDQUART) which infer a tree from a set of quartets.

  • translation programs (TREE-POPPING) which enable to translate data from one format to the other.



    If you downloaded the file "phyloquart-v1.4.tgz" you first need to create the directory containing the files of the package with the two following commands:

    gzip -d phyloquart-v1.4.tgz
    tar -xf phyloquart-v1.3.tar

    Then you need to build the executable files by running the command:

    make all

    That's it! You can used the different programs by entering their name on the command line (possibly followed by parameters, separated by blanks).



    - DISTANCE FILES (eg, "infile.dist","outfile.dist"):

    they have the same format as distance files of the PHYLIP package. the first line states the number of taxa, then the other lines describe the lower-left half of the intertaxa distance matrix. Here is an example of distance file:

    Chimp 0.4664
    Gorilla 0.7236 0.6774
    Orang 1.3870 1.4985 1.3158
    Gibbon 1.4696 1.5257 1.4446 0.4860

    - CHARACTER FILES (eg, "infile.nuc"):

    they contain nucleotide sequences describing the sequences, with the same format as the PHYLIP character files. The first lie indicates the number of taxa, then the number of characters. The following lines give the sequences. Each line begins with the name of the taxon (on 10 characters exaclty with blanks at the end if need be), then the nucleotide sequence associated to the taxon begins at the first nucleotide encountered starting from the 11th position of the line and ends when as much characters are read as indicated in the first line of the file.

    Sequences may include blank characters to parse them every short period of characters (e.g., every 10 chars), as output produced by the readseq program.

    Note that unlike PHYLIP, sequences can't be interleaved, each sequence must be given at once). Here is an example of character file:

    5 100

    Note that no gap are currently allowed and that nucleotide sequences are expected by the program. Nucleotide (T and U are treated in the same way).

    Morphological charcters can be incorporated by coding the presence of the character for a taxa as, e.g., the nucleotide A, and the absence of a character as, e.g., the nucleotide C. (see fossile-horses.nuc as an example. This file corresponds to the morphological example of input for the PHYLIP package).

    - QUARTET FILES ("quartfile","quartfile.res","quartfile.left"): they contain the number of taxa, an assignement of a number to every taxon (one by line), then a list of quartets on the taxa' numbers (one by line). Here is an example of a quartet file:

    01 Human
    02 Chimp
    03 Gorilla
    04 Orang
    05 Gibbon

    01 02 || 03 04
    01 02 || 03 05
    01 02 || 04 05
    01 03 || 04 05
    02 03 || 04 05

    - BIPARTITION FILES (eg, "bipfile"):
    such a file contains the number of taxa, the correspondance number-name of taxa, the list of edges of a tree, described in terms of the biparitions they each induce on the taxa set X (removing any edge of a tree splits the taxa into two components and thus splits X into two subsets, depending on the components its elements belong). The first lines of the file contain an assignement of a number to every taxon (one by line), then a list of bipartitions on the taxa numbers (one by line). Each bipartition is followed by a bracketed weight, eg, 32000 (ie, a big constant) in the case of the QSTAR program, or a value indicating the ratio of the number of quartets satisfied by the edge over the number of contradicted quartets, in the case of the ADDQUART program. Here is an example of bipartition file:

    01 Human
    02 Chimp
    03 Gorilla
    04 Orang
    05 Gibbon

    03 05 04 02 | 01 (32000)
    02 | 03 05 04 01 (32000)
    04 05 03 | 02 01 (32000)
    03 | 04 05 01 02 (32000)
    05 04 | 03 01 02 (32000)
    04 | 05 02 01 03 (32000)
    05 | 04 02 01 03 (32000)

    Further information


    See the documentation files specific to each program (they have the same name as the program, but have extension ".doc", e.g., "qstar.doc").

    You might read the following papers for more information on the Q* method and other quartet-based phylogeny reconstructuin methods:

    - Berry V. and Bryant D., 1999, Faster reliable phylogenetic analysis, 3rd Ann. Int. Conf. on Computational Biology (RECOMB'99).

    - Berry V. and Gascuel O., 1998, Inferring evolutionary trees with strong combinatorial evidence, Theoretical Computer Science (to appear).

    - Berry V., 1997, Méthodes et Algorithmes pour Reconstruire les Arbres de l'Evolution, Thèse de doctorat, Université de Montpellier, France.

    - Bandelt H.J. and Dress A., 1986, Reconstructing the shape of a tree from observed dissimilarity data, Adv. in Appl. Math, 7:309-343.

    - Buneman P., 1971, The revovery of trees from measures of dissimilarity, in Mathematics in archeological and historical sciences, 387-395, Edhimburgh University Press.




    v1.4 : Improvement of quartet storing (after discussion with D. Swofford and K. Strimmer) allowing the handling of data sets with more taxa. Change in the bipfile format (weights removed). Also: running time of QSTAR improved.

    Addquart can now accept weighted quartets and accepts more parameters.

    Choice of edges improved.

    v1.3 : accepted format of character files (infile.nuc) has been extended. The sequences can now include blanks that split a sequence every so often, as file output by the readseq program.

    v1.3 : the parameter indicating the taxa number to the various programs is no longer necessary. This information is now included in the files exchanged by the programs.

    v1.2 : taxa names can be speciefied instead of 2-digit numbers previously.



    ------------------------ This package is distributed as a freeware. No money is required for its use.
    If you surprise yourself using this software several times, well, please let me know (this will enable me to measure the need in this kind of software).
    Register (which is also free) as a user of PhyloQuart by filling the form of the file REGISTER.SVP and emailing it to me. Fill free to distribute or advertize the COMPLETE package and to ask any question that the included documentation files wouldn't answer. Suggestions for further improvement are also welcomed. Thanks.

    Vincent Berry (vberry AT lirmm.fr)

    (Comments on the package or on this pages are welcome).