Using repeated measurements to validate hierarchical gene clusters

Laurent Bréhélin, Olivier Gascuel and Olivier Martin



Supplementary Material

The Supplementary Material in pdf format can be downloaded here.

Source code

Here is the source code used to compute and visualize the stability of all clusters of a hierarchical clustering, as described in the paper.

To use this code:

  1. download the stability.c and stability.R files
  2. compile file stability.c with R:
        R CMD SHLIB stability.c
      

    This produces the library stability.so

  3. run R and load the stability.so library and the R programs
        dyn.load("stability.so")
        source("stability.R")
      
  4. Below is a small example of use with the wood datasource:
        # Load the wood file that contains the array.bois object
        load("Wood.RData")
        
        ## Run a hierarchical clustering and compute the stability of every cluster
        ## array.dat is an array object: rows are genes/proteins, columns are
        ## biological conditions and depth correspond to experimental repetitions 
        ## iterhc: number of samplings for the bootstrap procedure
        reswood<-validTree(array.dat=array.bois,iterhc=10)
        
        ## Compute the node stabilities by averaging on the different samplings
        stabwood<-apply(reswood$stab,2,mean)
        
        ## plot the dendrogram with the stabilities.
        ## minstab indicates the threshold stability. Only stabilities above this threshold are plotted.
        ## minmembers indicates the threshold of size clusters. Only stabilities of clusters above this threshold are plotted.
        dend <- decoreDendo(hc=reswood$hc,stab=stabwood,minstab=0.8,minmembers=5)
        plot(dend,leaflab="none",cex.axis=1.8)
      

brehelin at lirmm.fr