Menu Close

ZENITH Team: Gestion de données scientifiques

Patrick VALDURIEZ
Head
Florent MASSEGLIA
Deputy

ZENITH Team

Scientific Data Management

The three main challenges of scientific data management can be summarized as follows: (1) scale (large data, large applications); (2) complexity (uncertain data, multi-scale, with many dimensions), (3) heterogeneity (in particular, the semantic heterogeneity of data). They are also those of data science, whose goal is to make sense of data by combining data management, machine learning, statistics and other disciplines.
 
Zenith’s overall goal is to address these challenges by offering innovative solutions with significant benefits in terms of scalability, functionality, ease of use and performance. To produce generic results, these solutions are in terms of architectures, models and algorithms that can be implemented in terms of components or services in clusters or the cloud.
 
We design and validate our solutions by working closely with our scientific application partners such as INRAe and CIRAD in France, or MACC in Brazil. To further validate our solutions and extend the reach of our results, we also encourage industrial collaborations, even in non-scientific applications, provided they present similar challenges.
curlGET failed: Operation timed out after 5001 milliseconds with 0 bytes received curlGET failed: Operation timed out after 5001 milliseconds with 0 bytes received curlGET failed: Operation timed out after 5001 milliseconds with 0 bytes received

Our approach is to capitalise on the principles of distributed and parallel data management. In particular, we exploit: high-level languages as a basis for data independence and automatic optimisation; data semantics to improve information retrieval and automate data integration; declarative languages (algebra, calculus) to manipulate data and workflows; and highly distributed and parallel environments such as P2P, cluster and cloud. To reflect our approach, we organise our research programme into five complementary themes:

  • Data integration, including polystores;
  • Query processing, including indexing and privacy; and
  • Management of scientific workflows;
  • Data analysis, including data mining and statistics;
  • Machine learning for high-dimensional data processing and retrieval.