Adverse drug reaction (ADR) information is available in narrative text from sources such as drug labels (including prescribing information or package inserts) and the biomedical literature. A comprehensive ADR knowledge base is highly useful in biomedical research and clinical practice. The data can be used for applications such as avoiding unexpected incidents caused by administering a medication containing particular drugs and for pharmacovigilance. Unfortunately, the manual extraction of ADR information from free text sources can be costly, time-consuming, and inefficient. An NLP-based approach with high precision and recall is attractive, allowing knowledge bases to be rapidly constructed from existing text-based sources on ADRs and keeping knowledge bases up-to-date. This is possible as NLP-based approaches are easily automated and allow scientists to rapidly scan newly published literature sources routinely and add results into a drug knowledge base.
As a concrete example use case for this work, the FDA is interested in the automatic extraction of ADRs from drug labels for both comparing the ADRs present in labels from different manufacturers for the same drug and for performing post-marketing safety analysis (pharmacovigilance) by identifying new ADRs not currently present in the labels.
In 2017, Melax Tech participated and was ranked first in the Text Analytics Conference’s ADR challenge,* a joint activity of the U.S. Food and Drug Administration (FDA) and the U.S. National Library of Medicine (NLM), a part of the National Institutes of Health (NIH). The challenge organizers provided participants with an annotated set of drug labels and asked them to:
Extract adverse reaction mentions and modifier terms such as negation, severity, and drug class;
Identifying relations between adverse reaction mentions and those modifiers, including negation, hypothetical, and effect relations;
Determining the unique set of positive adverse reaction mention strings across all sections of a drug label
Normalizing those adverse reaction strings to a standard terminology, MedDRA.
The Melax Tech team participated in all four tasks and was ranked first.
Melax Tech’s ADR extraction NLP pipeline is built with Deep-Learning based Named Entity Recognition (NER) models and Relation Extraction (RE) models to extract drug names, severity, condition factors, reasons why this drug is prescribed, and possible adverse reactions. The extracted ADEs can be mapped to the Medical Dictionary for Regulatory Activities (MedDRA) code. The pipeline is part of our CLAMP suite of tools. CLAMP uses an extensive variety of features specifically designed to allow clients to easily extract many common data elements and the relationships between them from clinical text. The tool also allows for data to be mapped to standard clinical vocabulary systems, such as ICD-10, SNOMED-CT, and many other standard vocabularies, such as the MedDRA terminology used in this solution.
To learn more about our wide range of NLP solutions for various clinical, research, and pharmaceutical document types, request a demo today!
*Overview of the TAC 2017 Adverse Reaction Extraction from Drug Labels Track
The details of our approach can be found at Xu, J., Lee, H. J., Ji, Z., Wang, J., Wei, Q., & Xu, H. (2017, November). UTH_CCB System for Adverse Drug Reaction Extraction from Drug Labels at TAC-ADR 2017. In TAC. (See also https://clamp.uth.edu/challenges-publications/UTH_CCB%20System%20for%20Adverse%20Drug%20Reaction%20Extraction%20from%20Drug%20Labels%20at%20TAC-ADR%202017.%20In%20TAC.pdf)