Phyl-ARIANE - Phylogenomics: integrated algorithms and visualizations for analyzing the evolution of life

Phylariane meeting 8-9/12/09

Several members from the 3 labs involved in the project are present.

8/12/09 morning:

Scornavacca defends her thesis (jury = D. Huson, MF Sagot, S. Gribaldi, O .Gascuel*, V. Daubin*, V. Ranwez*, V. Berry*)

*project members.

8/12/09 afternoon:

J.F. Dufayard presents the 2nd version of the web site, due to the project entering into a second operational phase (sharing of tools and experiments on real data). Mailing list of the project members is now effective, moreover, everyone can log in and put some material for other members (data, software, scripts, etc). Publications of work done in this project are listed, software and data bases are listed, shortly presented with links to specific pages.

S.Penel presents Hogenom 5. Sources = Genome Reviews (EBI): Archaea & Bacteria, Microbial Genomes, EnsEMBL (9 mammals), EBI (eukaryotes), ...

There is a process of detection for taxon name errors
- CDS -> proteins -> UNIPROT Annotation.
Now 235k families
- Currently : 2 trees are stored per gene family (FastTree / TreeFinder / Phyml depending on #taxa).
- New naming convention : use of the NCBI 5-letter codes (use of 6 letters home code otherwise).

P.Gambette presents the Galled Network method (joint work with D. Huson) and a solution to find a «core» of genes (work in data representation with Veronis).

Pbm of finding a core of genes: «find a set of genes such that every such gene is present in exactly one copy of the retained subset of taxa.

input = single labeled trees.
output = network with a min nb of reticulations that contains all clusters of the input trees.

Discussion of interest of data on presence/absence of genes in genomes.

JP Doyon presents his work on the space of reconciliations.

«Lca reconciliation» (most parsimonious one in terms of D+L...[various references]).

He generalized reconciliation definition to investigate more possibilities (mainly one node of gene tree G can be mapped in species tree S to the lca of L(G) *OR* on an edge above.

There exists an exponential nb of reconciliations but:

there is polynomial algorithm that computes their number. Same algorithm is able to compute the nb of reconciliations that minimize the nb of duplications.
Another algorithm is uniform random generation of a reconciliation (given G & S).
Operator to go from one reconciliation to a neighboring reconciliation.
Nb of D&L updatable in O(1) when doing one move in the reconciliation space.

Probabilistic framework to compute the Probability (G, reconciliation alpha given Species tree and rates of events along its edges).

[Arvestaad et al] have linear or quadratic algorithms to compute P(G,alpha), P(G), and P (alpha given G) Rmk if alpha and alpha’ are 1 move away, P(G,alpha’) can computed in O(n+m).

Then important remarks are done that lead to future research in collaboration with partners of the project.

9/12/09 morning:

G. Szöllözi presents the current model of maximum likelihood investigated by Lyon for dealing with transfers, duplication and losses of gene trees Gi wrt species tree S (closely inspired from Tofigh et al 09).

Restriction to time consistent scenario by determining a total order on the internal nodes of S according to HGT constraints from previous reconciliations (they will be contradictions but the hope is that some maximally satisfactory scenario can still be obtained (combinatorial problem to solve). Then, each edge of S is split into small regular slices of dT. Then a Birth&Death process can be applied to compute the likelihood of a transfer (birth) or loss (death). This is done for each Gi.

Then, once the likelihood are computed, there is a cycle of optimization processes where alternative shapes of S, then of the Gis are sought, etc until convergence.

Long disussion follows on how to improve the procedure, among which several tracks for collaboration in the following months.

9/12/09 afternoon:

We explore different variants of models. We focus on the combinatorial aspects that make the problem go from P to NP complexity classes, discussing examples & conjectures (one of them being solved).

Then, we list the collaborations between our groups to be realized during the next months:

List possible models for reconciliation. Investigate counter-examples and proofs to sort out the different versions of the reconciliation problems from a computer science point of view (as started during the last meeting day afternoon). State similarities/differences between the «SPR» model and the «Russian» model. Actors: ET+CS+CP+VB+VR+GS+JPD.
Tree of 90%. test the speciation extraction procedure proposed in CS’s thesis as well as the STC preprocess to correct source trees, on simulated gene sequences obtained from a model with macro-events. Test also whether it’s better to use a cloud of «best» gene trees for each dataset rather than just one best gene tree. Actors: GS+CS+VR+VB.
Choose a model for reconciliations, specify it completely (badly done in current publications of the domain) and stick to it. Actors: GS+JPD.
Compare detection of transfers between different kind of methods on existing datasets: Galled networks, SA’s method, reconciliations. The first two need single-labeled trees. Actors: SA+PG.
Compare reconciliation algorithms (Lagergren, GS, CS) on concrete and simulated examples.

Phylariane meeting 8-9/12/09

8/12/09 morning:

8/12/09 afternoon:

9/12/09 morning:

9/12/09 afternoon:

A visio-conference is planned for the end of the following month.