Yoann Pitarch, Ph.D

General Research Interests

Data mining, Knowledge Discovery in Databases, Data Streams Summarization, Multidimensional Sequential Patterns, Hierarchical Data, Spatio-temporal Mining, Database Design, Data Warehouse Design.

Major Research

My research activities was done in the « Laboratoire d’Informatique, de Microélectronique et de Robotique de Montpellier (LIRMM) » in the database and information system team from 2008 to 20011.

Preamble

Excepted mentioned approaches, all these researches were conducted under the supervision of my two advisors (Pr. Anne Laurent and Pr. Pascal Poncelet)

A detailed CV, my thesis report and my thesis slides can be found in the Download section.

Summary

Multidimensional Data Stream Summarization (from 2008 to 2011)

[top]

Due to the rapid increase of information and communication technologies, the amount of generated and available data exploded and a new kind of data, stream data, has appeared. One possible and common definition of data stream is an unbounded sequence of very precise data incoming at high rate. Thus, it is impossible to store such a stream to perform a posteriori analysis. Moreover, more and more streams concern multidimensional and multilevel data and very few approaches take these specificities into account. Thus, we propose some practical and efficient solutions to deal with such particular data in a dynamic context. More specifically, we are interested in adapting OLAP (On Line Analytical Processing ) techniques to build relevant summaries of the data. First, after describing and discussing existent similar approaches, we propose two solutions allowing to build a data cube on stream data. Second, in collaboration with CEREGMIA at La Martinique (France), we investigate the combination of frequent patterns and hierarchies to build a summary based on new generalized sequences. Finally, since this work is founded by the ANR (French National Agency of Research) through the MIDAS Curriculum vitæ – Yoann Pitarch project (with LIRMM, ENST, INRIA, EDF R&D, Orange Labs), we evaluate our approaches on real datasets provided by the industrial partners of this project (e.g., Orange Labs or EDF R&D).

Designing Contextual Data Warehouses (2009-present)

[top]

One of the main strengths of data warehouses is the ability to navigate through the hierarchy dimensions thanks to OLAP operators (drill-down and roll-up). Unfortunately, existing hierarchy types suffer from a certain lack of flexibility. Two arguments motivate this assumption. First, in most cases, it is not possible to design hierarchies on measure attributes. Nevertheless, aggregating numerical values into categorical and more meaningful value could be very helpful for decision makers. Correlated to this point, we observed that bringing semantic to numerical values is not so obvious. For instance, if we consider a medical scenario, we could easily guess that a given blood pressure value can be said as either “normal” or “low” depending on some features (e.g., the patient age). In general, designing hierarchies on attributes where the finest level of granularity is defined on a numerical domain requires the introduction of some “expert knowledge” to correctly generalize these values. Since there did not exist such a type of hierarchies in the literature, we introduced a conceptual model for “contextual hierarchies”. Moreover, in collaboration with Cecile Favre from the ERIC lab at Lyon, we developed a new data warehouse model, namely the “contextual data warehouse” to handle such hierarchies. This model was conceptually, graphically and logically defined. Moreover, we proposed efficient solutions to store and exploit the expert knowledge allowing to correctly generalize (and specialize) data. Parts of this model and associated solutions were published on both national and international conferences (DOLAP 2010, MEDES 2009, EDA 2010 and 2011) and we are currently working on a international journal paper presenting a global description of this model.

Introducing Flexibility when Querying Databases (2009-present)

[top]

When a user is looking for some interesting (in the sense of his preferences) products (e.g., cars, hotels, flat) into a database, he generally adopts an iterative process relaxing at each step at least one of his criteria until the displayed set of products are satisfactory. This tedious process can slow down the data analysis and introduce user’s unsatisfactory. To overcome this drawback, some researches focus on introducing flexibility when querying a database in order to return satisfactory results. Thus, in collaboration with Pr. Pascal Poncelet (LIRMM, Montpellier), Pr. Dominique Laurent (ETIS, Cergy-Pontoise) and Pr. Nicolas Spyratos (LRI, Paris), we proposed a solution for providing flexibility into the database query phase. The solution is based on a cost function exploiting user preferences and computing (when necessary) an alternative query as closed as possible from the original one but returning much more interesting results. This approach was published on both international and national conferences (SOCPAR 2010, BDA 2010) and we are currently working on skyline- based computational optimizations in order to publish the results in a relevant international journal.

Mining Multidimensional and Multilevel Sequential Patterns (2010-current)

[top]

OLAP mining goes one step further than OLAP operators by providing tools to extract some hidden knowledge into multidimensional databases. Since data warehouses often integrate a temporal dimension, mining sequential patterns looks appropriated. Few approaches exploit the several dimensions and the hierarchies. This is mainly due to the huge search space which must be scanned. In collaboration with Lionel Vinceslas and Jean Emile Symphor (CEREGMIA, Fort de France), we proposed a multidimensional and multilevel sequential pattern mining approach which exploit more efficiently the hierarchies. Thus, we are able to extract longer and more descriptive frequent sequences than existing approaches. This is mainly due to a better exploitation of hierarchies in the process. A paper was published in a national conference (EGC 2010) and some others are under writing.

Exploiting Frequent Patterns for Automatic Crop Recognition (2009-present)

[top]

Mining environmental data is an emerging research area allowing to answer to some high-stakes topic. Among these important issues, the automatic cartography enables to quickly take stock of cultures in a given area. For instance, this can be very useful to prevent famine risk in arid countries (e.g., Mali). In collaboration with some researchers from the CEMAGREF institute at Montpellier, we proposed a multidimensional-pattern-based classification technique to tackle this issue. One of the main strengths of the proposed methodology is to combine both texture indicators extracted from satellite images and data collected from a site survey. The approach was run on data associated to Mali and results were analyzed and validated by some domain experts. We empirically showed that our approach outperforms classical techniques used in the remote-sensing areas. Parts of this work have already been published published (or are under submission) in some national and international conferences and journals (AGILE 2011, Revue Internationale de Géomatique, Information Sciences).