Mining texts by association rules discovery in a technical corpus

Mathieu Roche, Jérôme Azé, Oriane Matte-Tailliez, Yves Kodratoff


Abstract

The text mining tools proposed in this paper extract association rules from a set of specialized and homogeneous texts (corpus). This tool is built in several steps and, at each of them, the expert plays a fundamental role. The first step extracts the terms from the corpus, and clusters them in classes by semantic similarity, associating each class to a concept meaningful to a field expert. Using the knowledge thus obtained, the corpus generates a table of concept frequencies in the texts. Next, we discretize the values of this table, and finally we are able to extract association rules among the concept occurrences.