Menu Close

ZENITH Team: Gestion de données scientifiques

Patrick VALDURIEZ
Head
Florent MASSEGLIA
Deputy

ZENITH Team

Scientific Data Management

The three main challenges of scientific data management can be summarized as follows: (1) scale (large data, large applications); (2) complexity (uncertain data, multi-scale, with many dimensions), (3) heterogeneity (in particular, the semantic heterogeneity of data). They are also those of data science, whose goal is to make sense of data by combining data management, machine learning, statistics and other disciplines.
 
Zenith’s overall goal is to address these challenges by offering innovative solutions with significant benefits in terms of scalability, functionality, ease of use and performance. To produce generic results, these solutions are in terms of architectures, models and algorithms that can be implemented in terms of components or services in clusters or the cloud.
 
We design and validate our solutions by working closely with our scientific application partners such as INRAe and CIRAD in France, or MACC in Brazil. To further validate our solutions and extend the reach of our results, we also encourage industrial collaborations, even in non-scientific applications, provided they present similar challenges.

Staff
Esther Pacitti, Professeur des universités, UM
Florent Masseglia, Directeur de recherche, INRIA
Alexis Joly, Directeur de recherche, INRIA
Reza Akbarinia, Chargé de recherche, INRIA
Patrick Valduriez, Directeur de recherche, INRIA
Jean-Christophe Lombardo, Ingénieur de recherche, INRIA
Antoine Affouard, Ingénieur d’étude, INRIA

Associates & Students
Tanguy Lefort, UM
Joaquim Estopinan, INRIA
Camille Garcin, UM

Regular Co-workers
Mathias Chouet, CDD Ingénieur-Technicien, INRIA
Titouan Lorieul, CDD Chercheur, INRIA
François Munoz, Invité longue durée Chaire INRIA, INRIA
Hugo Gresse, CDD Ingénieur-Technicien, INRIA
Baldwin Dumortier, CDD Chercheur, UM
Cathy Desseaux, CDD Ingénieur-Technicien, INRIA
Benjamin Bourel, CDD Ingénieur-Technicien, CNRS
Ondrej Cifka, , UM

Our approach is to capitalise on the principles of distributed and parallel data management. In particular, we exploit: high-level languages as a basis for data independence and automatic optimisation; data semantics to improve information retrieval and automate data integration; declarative languages (algebra, calculus) to manipulate data and workflows; and highly distributed and parallel environments such as P2P, cluster and cloud. To reflect our approach, we organise our research programme into five complementary themes:

  • Data integration, including polystores;
  • Query processing, including indexing and privacy; and
  • Management of scientific workflows;
  • Data analysis, including data mining and statistics;
  • Machine learning for high-dimensional data processing and retrieval.