Sequential patterns mining

Since 1996, The Tatoo team have focused on the extraction of patterns in large databases. Previous work (e.g. the PSP approach in 1998 or the ISE incrémental approaches have been cited respectively more than 259 and 160 times on Google Scholar) have focused on the extraction of sequential patterns (i.e. temporal expression patterns). This work was continued by focusing on the data complexity: symbolic vs. digital, dynamic, many occurrences, incomplete, etc. Recently, we have investigated the following questions: extracting patterns on data available in stream? How to summarize these patterns? finding new condensed representations? identifying specific patterns in different contexts?

Many approaches have been proposed by the community to find condensed representations minimizing the extracted knowledge and facilitating the extraction. In the context of sequential patterns, we proved that it was not possible to find better representation than the closed ones. This work has concluded the work of the community on this topic and have been selected as a Best Paper in the ECML/PKDD Conference. In the context of data streaming, it is essential to quickly extract patterns. We have proposed a new approach for extracting sequential patterns by using sampling technique (Thesis Chedy Raïssy ). In addition , we have summarized the patterns over time by using hierarchies. This approach, carried out in the framework of the ANR MIDAS (Thesis Yoann Pitarch ) project provided the opportunity for users to query the past of the stream with sufficient approximation for multiple analyzes.

The choice of an analysis dimension is often difficult for the user. Also, we have defined a new type of patterns, called multi-dimensional patterns (Thesis Marc Plantevit) integrating different dimensions to provide new knowledge to the decision maker. Besides the dimensions, the context in which the data involved is often important for the decision maker. We proposed another type of patterns, called contextual, which are representative of the different encountered situations. Used in a real railway application, this approach has for instance allowed the identification of specific patterns and improved maintenance operations of trains (Thesis Julien Rabatel) (selected as Best Chapter and as a chapter in Advances in Knowledge Discovery and Management Section) .

The previous patterns correspond to patterns that frequently occur in a database. However, it appears that the decision maker may also be interested in unexpected patterns. Via unexpected sequential patterns (Thesis Hayuan Li), our goal was to extract the patterns that do not respect the beliefs generally known on the data (selected for the magazine "RNTI - mining of complex data" section).

Last update on 01/12/2014