Mastodons objectives

Teams

Managing and analyzing the data tackled in this project raise two major challenges. The first challenge refers to the sharing of these datasets among scientists of different disciplines who want to collaborate. These datasets get stored in different formats and systems, specific to a community, refer to the raw data and their contents end up being described in associated documents (i.e., published scientific papers). Thus, when a scientist wants to simply select a dataset that best matches his/her requirements (i.e., to address a scientific question), he/she needs to search through a number of documents and understand associated datasets. The problem gets worse as more heterogeneous datasets and associated documents get produced. What makes things more complicated is that the different communities typically want to keep full control over their transformed data. Thus, simply storing the datasets in a single database server does not work.

The second challenge refers to data analysis. Data analysis for scientific applications may allow discovering different kinds of knowledge. Let us cite Association Rules (or frequent correlations), Sequential Patterns (or frequent sequences of events) or Clustering (how to build consistent groups of values?) as the main ones. In this project, we aim to focus on Association Rules discovery. Generally in data mining, and in particular with scientific data coming from observations, there are strong correlations that cannot be considered as valuable information or knowledge, because they are obvious and end up in non-controversial conclusions. A mere and ironic illustration would be to discover that a high temperature and strong light has a positive impact on plant growth. Such first-order mechanisms might hinder discovering less obvious patterns or the effects of other environmental conditions that would be of interest for identifying novel mechanisms or breed plants better adapted to changing environmental conditions associated with climate changes.