A Precise Automatic Extraction of Terminology in Genomics

Oriane Matte-Tailliez, Mathieu Roche, Yves Kodratoff


Abstract

Modern scientists tend to be overloaded with new data and new information. Many research fields work on solutions to this dilemma, such as Statistics (for data), and Natural Language Processing (for information written as texts). Two new research fields are even born since some 10 years, namely Data Mining and Text Mining that attempt to regroup and cross-fertilise all methods dealing with this overload. This paper presents how our success in one unavoidable step of Text Mining, namely the detection of terms terminology, can be applied to texts relative to the eukaryotic DNA-binding proteins.