Natural Language Processing for Informed Consent and Clinical Trial Protocol Documents


Clinical Trials are crucial to examine the safety and efficacy of new treatment or prevention options. However, there exists a heavy burden in patient recruitment and trial data management. Natural language processing (NLP), which aims to extract structured information from unstructured text, has the power to automate and simplify the clinical trials recruitment and management process.


Melax Tech has provided NLP and semantic modeling capabilities for working with textual documents involved in the initiation and management of Clinical Trials. We have done extensive work for both industry and the National Cancer Institute in extracting inclusion/exclusion criteria from clinical protocols, and in understanding permissions, privacy protections, and data use restrictions from informed consent forms.


Clinical trials for personalized cancer therapy provide tailored treatments based on a patient’s specific characteristics (e.g. genetic status) and have shown great promise for improving outcomes for cancer patients. Hundreds of clinical trials are investigating various drugs and drug combinations that target specific genetic alterations in tumors. A CLAMP based system was developed by Melax Tech for extracting genetic alteration information for personalized cancer therapy from trial protocol documents such as those found in ClinicalTrials.gov. This system can be extended to applications involving client need to extract genetic information, as well as inclusion/exclusion criteria from clinical protocol documents. Easy access to this information has the potential to help patients find open trials, and for research scientists to understand the design and availability of open trials.


For informed consent documents, we have been one of the leading industry/academic partnerships working on the difficult problem of information extraction of permissions, human subject protections, data sharing and use, and other major concepts of interest to the clinical research and regulatory community. In late 2019, Melax Tech was awarded an NIH/NCI SBIR contract targeted at extending our prior work on informed consent to develop information models and NLP tools usable by the community at large. To do this we leveraged our CLAMP technology, demonstrating the use of NLP using Deep Learning-based (DL) transformer-based models to locate relevant permission attributes from consent material. Using formal semantics, based on our teams earlier work on the Informed Consent Ontology (ICO), a part of the OBO Foundry, these permission attributes can be mapped to formal vocabularies being developed in concert with important groups such as the Global Alliance for Genomics and Health. We are extending this work to extract Named Entities and Relationships from informed consent material. We also developed a mechanism to report on the “completeness” of permission information contained in consent material, thus laying a foundation to support what we will term “regulatory decision support” tools aimed at improving comprehension of the material by the study participants. Figure 1 shows a high level overview of the semantic information model derived from this work.







Applications of the work to date can be used to support automatically generating “question answering” (Q&A) systems on a per-trial basis for use by both trial sponsors and study coordinators in recruiting participants. A Q&A system facilitates discussions with patients on both mobile and web recruitment platforms. Additionally, it could support automatic tools on websites for potential trial participants. This is by no means an exhaustive list of potential applications.


The work introduced here demonstrates Melax Tech’s advanced capabilities in bringing sophisticated data science solutions to bear to solve state of the art problems in the biomedical field. To learn more about the availability of this work, please contact us using the “contact us” menu item, or send email to: contact@melaxtech.com.


Selected relevant publications by our team:

  1. Amith M, Harris MR, Stansbury C, Ford K, Manion FJ, Tao C. Expressing and Executing Informed Consent Permissions Using SWRL: The All of Us Use Case. arXiv preprint arXiv:2108.10221. 2021 Aug 23.

  2. Lin Y, Harris MR, Manion FJ, et al. Development of a BFO-based Informed Consent Ontology (ICO). International Conference on Biomedical Ontologies (ICBO) 84-86 (2014).

  3. Manion FJ, He Y, Eisenhauer E, Lin Y, Karnovsky A, Harris MR. Towards a Common Semantic Representation of Informed Consent for Biobank Specimens. International Conference on Biomedical Ontology; (2014). 2014:CEUR Workshop Proceedings. p. 61-3.

  4. Du J, Wang Q, Wang J, Ramesh P, Xiang Y, Jiang X, Tao C. COVID-19 trial graph: a linked graph for COVID-19 clinical trials. Journal of the American Medical Informatics Association. 2021 Sep;28(9):1964-9.