Des textes aux associations entre les concepts qu'ils contiennent

Yves Kodratoff, Jérome Azé, Mathieu Roche, Oriane Matte-Tailliez

Abstract

Performing knowledge extraction from texts requests the completion of successive steps. The amount of time requested by an expert to structure knowledge causes a relative lack of such tools in the various specialized fields. Automation is therefore necessary, and this paper present some progress on the topic of automating one fundamental step in the process of ontology building, namely the gathering of significant terms ("terminology") that will constitute the nodes of the ontology. We obtained the complete terminology of four homogeneous sets of texts (corpus) different by the language and the size. The validation of these terminologies by experts showed that our method provides a very great number of terms of satisfactory quality. These terms made it possible to build classes of concepts in a semi-automatic way. Using this knowledge, we extract association rules specific to the fields. The rules thus obtained where validated on three corpora by comparing our results with the ones given by a new measure called ``Normalized Implication Intensity.'' Two of these corpora were real-life, and an expert discussed the interest of the rules the two methods generated.