Préparation des données et analyse des résultats de DEFT'05
Erick Alphonse, Ahmed Amrani, Jérôme Azé, Thomas Heitz, Amar-Djalil Mezaour, Mathieu Roche
Abstract
The text-mining challenge (DEFT) consisted of removing non relevant
sentences from French corpora of political speeches. It took place
in 2005 and brought together about thirty participants from eleven
teams. This paper describes the preprocessings carried out on the
corpora of F. Mitterrand and J. Chirac within the framework of this
challenge. In particular, conversion to text format, sentence
segmentation, classification of the speeches, introduction of F.
Mitterrand's sentences into J. Chirac's speeches and identification
of dates and people's names. The results obtained by the eleven
participating teams are also presented.