Préparation des données et analyse des résultats de DEFT'05

Erick Alphonse, Ahmed Amrani, Jérôme Azé, Thomas Heitz, Amar-Djalil Mezaour, Mathieu Roche


Abstract

The text-mining challenge (DEFT) consisted of removing non relevant sentences from French corpora of political speeches. It took place in 2005 and brought together about thirty participants from eleven teams. This paper describes the preprocessings carried out on the corpora of F. Mitterrand and J. Chirac within the framework of this challenge. In particular, conversion to text format, sentence segmentation, classification of the speeches, introduction of F. Mitterrand's sentences into J. Chirac's speeches and identification of dates and people's names. The results obtained by the eleven participating teams are also presented.