Health data mining

The team, since 2008, has applied the new methods to health and environmental applications.

Health domain implies new prospects for purposeful research. Our studies have focused on the extraction of patterns (broadly defined) from microarrays to identify unexpected knowledge for biologists. In collaboration with the MMDN-UM2 and in the framework of the ANR Pradnet (Fondation de Coopération Scientifique Maladie d’Alzheimer et Maladies Apparentées), we have reconsidered the problem by organizing the data according to the level of expression of the different genes (Paola Salle PhD) . This new approach has allowed biologists to reveal new correlations. In addition, to assist the biologists in the analysis of the extracted patterns, we have defined and implemented a visualization tool using the PubMed articles dealing with the various genes involved in the extracted patterns (selected article for journal "Journal of Biomedical Informatics").

The discovery of new knowledge through the patterns underlines the problem of their generalization: are they representative of a particular or more general context? These questions led us to develop new approaches for pattern classification (Master Mickael Fabregue). This promising work, in partnership with the INSERM Val d'Aurelle, unable us to characterize the different grades of breast cancer. The results significantly improve the results of existing studies (precision and recall of 96%) (other article selected in the "Journal of Biomedical Informatics").

In the framework of a PEPS project with the IGMM, we are currently working on the problem of different strains of HIV. Besides the application of patterns, we define trajectories of genes, i.e. genes expressed in the same manner over time. Biological experiments are carried out to validate the new obtained knowledge.

At another level, the team is interested in rare cells in the blood. The problem is to determine from million cells, the cells (5 to 10) corresponding to the beginning of an infection or a disease (cancer, stroke). In the framework of the LABEX Numev, we have defined an innovative approach to detect outliers. It is thus possible for the physician to extract a minimal subset of clusters in which the rare cells appear. An European patent has been filed and a request for entrepreneurship via regional SATT and managed by physicians of the LCCRH - Institute for Research in Biotherapy is underway. Recently, through visual analytics techniques, we proved that it is possible to extract only the real rare cells. This result limits false positives and offers new therapeutic perspectives for physicians.

To better understand the evolution of a disease, we are interested to data exchanged between patients in the media. In this context and in collaboration with the text team of the LIRMM and the INVS , we studied the echoes of H1N1 on the web to better understand how this disease spread in the media. This method is based on classification and natural language processing techniques. This work has led us to better understand the impact of social media. In this context, we propose a new approach on text cube, to analyze the trends, the peak of a disease, etc. . We have defined new functions for aggregate (selected paper in Emerging Trends in Knowledge Discovery and Data Mining, 2013). We still work on this topic in the context of a collaboration with the University of Alberta and funding by the MSH -M, and including the Clinical Research Centre of the CHU Montpellier to identify feelings and emotions expressed in the health forums.

Last update on 01/12/2014