Multilingual semantic annotations of Electronic Health Records and Pharmacogenomics data with ontologies.


Type:                  Postdoc position

Employer:           University of Montpellier

Context:              PratikPharma project (Practice-based evidences for actioning Knowledge in Pharmacogenomics) – ANR project

When:                 September 2016 – for 24 months (12 months renewable)

Where:                LIRMM, Montpellier (requires collaboration with Stanford University, USA.))

Keywords & background:

semantic annotation, biomedical ontologies & terminologies, multilingual context (French & English), text mining, semantic web, natural language processing, medical informatics, knowledge extraction.


The goal of the PractiKPharma project (http://practikpharma.loria.fr) is to validate or moderate Pharmacogenomics state-of-the-art knowledge on the basis of practice-based evidences, i.e., knowledge extracted from Electronic Health Records. During this project, we will extract state-of-the-art knowledge from (English) structured and unstructured descriptions in reference databases (e.g., PharmGKB) and literature (e.g., PubMed) as well as extract observational knowledge from (French) EHRs. Part of this multilingual knowledge extraction process will be based on semantic annotation (using relevant biomedical ontologies) of plain-text data. We plan to reuse and enhance tools developed in the context of the NCBO (www.bioontology.org) and SIFR projects (www.lirmm.fr/sifr).

We are seeking a motivated, curious and interested postdoc candidate to design and develop the semantic annotation workflows and help the project to annotate HER and Pharmacogenomics data. The postdoc develop new methodologies to capture the context of French clinical narrative, such as negations, specific sections, modality and word sense disambiguation. To support this French-English context we will also investigate the generation and use of multilingual ontology mappings (mostly reusing LIRMM’s YAM++ approach).


Pharmacogenomics (PGx) studies how individual gene variations cause variability in drug responses and constitutes a basis for implementing personalized medicine i.e., a medicine tailored to each patient by considering her/his genomic context [8]. PGx data is often not yet validated because most of it results from studies that do not fulfill statistics validation standards and are difficult to reproduce because of the rarity of gene variations studied. The goal of the PractiKPharma project is to validate PGx state-of-the-art knowledge publicly available (in English) on the basis of practice-based evidences, i.e., knowledge extracted from EHRs (in French). The project is mainly funded by the French ANR and leaded by A. Coulet (INRIA Nancy) in collaboration with HEGP (Georges Pompidou European Hospital, Paris), CHU Saint-Etienne and the LIRMM.

Units of knowledge in PGx typically have the form of ternary relationships gene variant–drug–adverse event, and can be formalized to different extents using biomedical ontologies. To achieve our goal, we will (1) extract state-of-the-art knowledge from PGx databases, (such as PharmGKB [3]) and literature (such as PubMed); (2) extract observational knowledge (i.e., knowledge extracted from observational data) from French EHRs, (3) to compare knowledge units extracted from these two origins, to confirm or moderate state-of-the-art knowledge, with the goal of enabling personalized medicine. (4) Finally, we will emphasize newly confirmed knowledge by investigating omics databases for molecular mechanisms that underlie and explain drug adverse events use biomedical Linked Open Data [1].

The clarification of the validity of PGx data will have an important impact on clinical care. This would permit the definition of guidelines that enable clinicians to implement personalized medicine by choosing and dosing drugs more precisely. Concretely, this will reduce the toxicity and increasing the efficacy of prescribed drugs, consequently reducing cost and improving quality of care.

Postdoc description:

Within this project, we (task leaded by LIRMM) will design and implement a semantic annotation workflow that will use the relevant reference ontologies to facilitate the knowledge extraction process. We plan to reuse and improve the outcomes of the National Center for Biomedical Ontology (www.bioontology.org - NCBO) and the Semantic Indexing of French Biomedical Data Resources (www.lirmm.fr/sifr - SIFR) projects that have developed tools for semantic annotations respectively for English and French data [5] [4].

The biggest improvements will come from introducing more natural language processing (NLP) mechanisms in the annotation process. We will specifically work on better capturing the context of French clinical or patient narrative. The new workflow improvements will include:

- to identify polysemic terms during the annotation process and choose the proper concept (e.g., cell, the biological component or telephone).

- to detect negation (e.g., the patient does not have the symptom) for instance using state-of-the-art results in the domain such as NegEx [2] already tested by NCBO team, and inventing others.

- to detect, to some extent, elements of context, by detecting modulators words (hypothetically, strongly) to associate annotations with different levels of importance and detect time information.

- the publication and link of the annotations produced by our workflow in the Web of data using standard semantic Web technologies & the new W3C Web Annotation Model standard.

- to improve the annotation scoring, which allows to rank annotations by importance, related to the context in which the annotation has been done and to the frequency of the matches [6].

- to make the workflow handles multilingual data and offers annotations with multilingual ontologies by leveraging multilingual mappings. For this purpose, multilingual mappings will have to be generated between English ontologies and their French counterparts.

The annotation of EHRs data will have to be run in-house at the HEGP in collaboration with the team there. The work done on text mining and semantic annotation in French, will be generalized to English and improvements of the SIFR annotator will be incorporated into the NCBO annotator when possible in collaboration with Stanford University. Multiple visits are planned. The work on ontology alignment will capitalize on the work of the OpenData team at LIRMM (e.g., on YAM++ [7]) and involve Zohra Bellahsene & Konstantin Todorov.

The ontologies used will include: SNOMED-CT, NCIt, HDO, HPO, MESH, LOINC, RXNORM, WHO-ATC, MEDRA

For reference, SIFR Annotator code and API are available here:

https://github.com/sifrproject & http://data.bioportal.lirmm.fr/documentation

Expected profile:

We are seeking a motivated, curious and interested postdoc candidate to design and develop the semantic annotation workflow and bring interesting new ideas to the project team. A computer science or bioinformatics PhD degree is required. Besides an important motivation for the research questions, we are also looking for someone with some good technical skills and motivation for concrete outcomes. The best candidate will demonstrate good programming skills in addition of a good track record of publications. The supervision will be done mostly remotely as Clement Jonquet is currently visiting scholar at Stanford University. The candidate will demonstrate aptitudes or matches with some of the following aspects:

- Research experience that match with the proposed subject.

- Experience with semantic some of the technologies involved: Java/JEE, Ruby/Rails, RESTful web services, XML/JSON.

- Experience with the semantic web vision (& technologies OWL, RDF, SPARQL)

- Experience in biomedical informatics (knowledge extraction, use of ontologies, BioPortal)

- Good track records in terms of publications and communication of his/her work.

- Excellent remote working capabilities (emails, trackers, collaborative tools, etc.)

- Perfect English oral and writing skills.

- Few knowledge in French language with objective to learn the language during the contract.

- Multiple project meetings are planned in France (Paris or Nancy),

- International trips accepted (collaboration with Stanford) and the being eligible for a visa for the USA.

- Excellent writing skills as reports, publications, and technical notes will always be necessary.

- Autonomy and initiative, take on technical decisions within the project and justify choices.

- Friendly person to join a small research team in Montpellier.


For more information about this position, please contact Clement Jonquet (jonquet@lirmm.fr). To apply, please send an email including links to (NO ATTACHED DOCUMENTS) the following:

- a motivation letter describing an explanation of YOUR interest for the position;

- a curriculum vitae describing your experience and the matches with the expected profile;

- copies of diplomas (PhD) and other relevant certificates (MSc grades, PhD jury evaluation);

- names and contact details of referees.


- The postdoc will be hired by the University of Montpellier (social security, etc. included).

- Salary will be around 2000€ net per month depending on experience.

- Contract should start September 1st 2016 or October 1st 2016.

- The contract will be for 1 year and renewable another year.


[1]    C. Bizer, T. Heath, and T. Berners-Lee. Linked Data - The Story So Far. Semantic Web and Information Systems, 5(3):1–22, 2009.

[2]    W. Chapman, W. Bridewell, P. Hanbury, G. F. Cooper, and B. G. Buchanan. A simple algorithm for identifying negated findings and diseases in discharge summaries. Biomedical Informatics, 34(5):301–310, October 2001.

[3]    M. Hewett, D. E. Oliver, D. L. Rubin, K. L. Easton, J. M. Stuart, R. B. Altman, and T. E. Klein. PharmGKB: the pharmacogenetics knowledge base. Nucleic acids research, 30(1):163–165, 2002.

[4]    C. Jonquet, A. Annane, K. Bouarech, V. Emonet, and S. Melzi. SIFR BioPortal : Un portail ouvert et générique d’ontologies et de terminologies biomédicales françaises au service de l’annotation sémantique. In 16th Journées Francophones d’Informatique Médicale, JFIM’16, Genève, July 2016.

[5]    C. Jonquet, N. H. Shah, and M. A. Musen. The Open Biomedical Annotator. In American Medical Informatics Association Symposium on Translational BioInformatics, AMIA-TBI’09, pages 56–60, San Francisco, CA, USA, March 2009.

[6]    S. Melzi and C. Jonquet. Scoring semantic annotations returned by the NCBO Annotator. In A. Paschke, A. Burger, P. Romano, M. Marshall, and A. Splendiani, editors, 7th International Semantic Web Applications and Tools for Life Sciences, SWAT4LS’14, Berlin, Germany, December 2014.

[7]    D. Ngo and Z. Bellahsene. YAM++ : A Multi-strategy Based Approach for Ontology Matching Task. In A. ten Teije, J. Völker, S. Handschuh, H. Stuckenschmidt, M. d’Acquin, A. Nikolov, N. Aussenac-Gilles, and N. Hernandez, editors, 18th International Conference on Knowledge Engineering and Knowledge Management,EKAW’12, p. 421–425, Galway, Irland, 2012. Springer.

[8]        H.-G. Xie and F. W. Frueh. Pharmacogenomics steps toward personalized medicine. Personalized Medicine, 2(4):325–337, 2005.