Natural Language Processing for Informed Consent and Clinical Trial Protocol Documents


Clinical Trials are crucial to examine the safety and efficacy of new treatment or prevention options. However, a heavy burden exists in patient recruitment and trial data management. Natural language processing (NLP), which aims to extract structured information from unstructured text, has the power to automate and simplify the clinical trials recruitment and management process.


Melax Tech has provided NLP and semantic modeling capabilities for working with textual documents involved in initiating and managing Clinical Trials. We have done extensive work for both industry and the National Cancer Institute in extracting inclusion/exclusion criteria from clinical protocols and in understanding permissions, privacy protections, and data use restrictions from informed consent forms.


Clinical trials for personalized cancer therapy provide tailored treatments based on a patient’s specific characteristics (e.g., genetic status) and have shown great promise for improving outcomes for cancer patients. Hundreds of clinical trials are investigating various drugs and drug combinations that target specific genetic alterations in tumors. Melax Tech developed a CLAMP-based system for extracting genetic alteration information for personalized cancer therapy from trial protocol documents such as those found on ClinicalTrials.gov. This system can be extended to applications involving client need to extract genetic information and inclusion/exclusion criteria from clinical protocol documents. Easy access to this information has the potential to help patients find open trials and for research scientists to understand the design and availability of open trials.


For informed consent documents, we have been one of the leading industry/academic partnerships working on the complex problem of information extraction of permissions, human subject protections, data sharing, and use, and other major concepts of interest to the clinical research and regulatory community. In late 2019, Melax Tech was awarded an NIH/NCI SBIR contract targeted at extending our prior work on informed consent to develop information models and NLP tools usable by the community at large. To do this, we leveraged our CLAMP technology, demonstrating the use of NLP using Deep Learning-based (DL) transformer-based models to locate relevant permission attributes from consent material. Using formal semantics, based on our team's previous work on the Informed Consent Ontology (ICO), a part of the OBO Foundry, these permission attributes can be mapped to formal vocabularies being developed in concert with important groups such as the Global Alliance for Genomics and Health. We are extending this work to extract Named Entities and Relationships from informed consent material. We also developed a mechanism to report on the “completeness” of permission information contained in consent material, thus laying a foundation to support what we will term “regulatory decision support” tools aimed at improving comprehension of the material by the study participants.

Figure 1 shows a high-level overview of the semantic information model derived from this work.


Applications of the work to date can be used to support automatically generating “question answering” (Q&A) systems on a per-trial basis for trial sponsors and study coordinators to recruit participants. A Q&A system facilitates patient discussions on mobile and web recruitment platforms. Additionally, it could support automatic tools on websites for potential trial participants. This is by no means an exhaustive list of potential applications.


Selected relevant publications by our team:

  1. Amith M, Harris MR, Stansbury C, Ford K, Manion FJ, Tao C. Expressing and Executing Informed Consent Permissions Using SWRL: The All of Us Use Case. arXiv preprint arXiv:2108.10221. 2021 Aug 23.

  2. Lin Y, Harris MR, Manion FJ, et al. Development of a BFO-based Informed Consent Ontology (ICO). International Conference on Biomedical Ontologies (ICBO) 84-86 (2014).

  3. Manion FJ, He Y, Eisenhauer E, Lin Y, Karnovsky A, Harris MR. Towards a Common Semantic Representation of Informed Consent for Biobank Specimens. International Conference on Biomedical Ontology; (2014). 2014:CEUR Workshop Proceedings. p. 61-3.

  4. Du J, Wang Q, Wang J, Ramesh P, Xiang Y, Jiang X, Tao C. COVID-19 trial graph: a linked graph for COVID-19 clinical trials. Journal of the American Medical Informatics Association. 2021 Sep;28(9):1964-9.


The work introduced here demonstrates Melax Tech’s advanced capabilities in bringing sophisticated data science solutions to bear to solve state-of-the-art problems in the biomedical field. To learn more about the availability of this work, request a demo today!