3 Octobre 2013, LIRMM, Arno Siebes, University of Utrecht, Netherlands

Jeudi 3 Octobre 2013, LIRMM

Arno Siebes, University of Utrecht, Netherlands

http://www.cs.uu.nl/staff/siebes.html

Titre: MDL for Pattern Mining

Arguably the most important contribution of the field of data mining has made to the data analysis problem is that of pattern mining. Patterns are local models that describe only part of the data. While pattern mining has proven its use in many different cases it does have its drawbacks. The most important one is the so-called pattern explosion: you easily end up with more patterns than you have data -- hardly a good way to get insight in your data! The solution to this problem is easy to see: select a (small) subset of patterns that do give you this insight. The question is, of course, which subset should you choose? In Utrecht we decided somewhere in 2005 to use the Minimum Description Length principle to guide this choice, i.e., the best set of patterns is the set of patterns that compress the data best. Unfortunately this leads to an intractable problem, hence we devised the heuristic algorithm Krimp. In follow-up research we have seen that MDL in general -- and Krimp in particular -- is a very versatile tool for a pattern miner. In this talk I will try to convince you of this observation. If time permits I will also discuss some of the weaknesses of our approach and indicate how I am currently addressing these weaknesses.

Bio : Arno in 1985 became a PhD student (in computer science) at CWI, the Dutch national research institute for Mathematics and Computer Science. He stayed there until 2000 growing PhD student to group leader. And changing research field from databases to data mining. While changing subject he co-founded a data mining company called Data Distilleries, which after a few years became part of SPSS and is, thus, now part of IBM. In 1999 he became part-time full professor at the Technical University Eindhoven and in 2000 full-time full professor at Utrecht University, where he leads to the ``Algorithmic Data Analysis'' group. After being head of the department of Information and Computing Sciences for six years he now has a sabbatical year which he spends trying to understand what he has been doing with MDL for the past 7 or 8 years.

Dernière mise à jour le 20/11/2013