From the texts to the concepts they contain: a chain of linguistic
Ahmed Amrani, Jérôme Azé, Thomas Heitz, Yves Kodratoff, Mathieu Roche,
Abstract
The text-mining system we are building deals with the specific problem of
identifying the instances of relevant concepts present in the texts.
Our system relies therefore on interactions between a field expert and the
various linguistic modules we use, often adapted from existing ones, such as
Brill's tagger or CMU's Link parser. We have developed learning procedures
adapted to various steps of the linguistic treatment, mainly for grammatical
tagging, terminology, and concept learning.
Our interaction with the expert differs from classical supervised learning,
in that the expert is not simply a resource who is only able to provide
examples, and unable to provide the formalized knowledge underlying these
examples. We are developing specific programming languages which enable the
field expert to intervene directly in some of the linguistic tasks.
Our approach is thus devoted to help one expert in one field to detect
the concepts relevant for his/her field, using a large amount of texts. Our
approach is made of two steps. The first one is an automatic approach that
find relevant and novel sentences in the texts. The second one is based on
the expert's knowledge and finds more specific relevant sentences.
Working on 50 different domains without an expert has been a challenge in
itself, and explains our relatively poor results for the first Novelty task.