Recently published work by the School of Biomedical Informatics at the University of Texas, done in collaboration with Melax Tech, demonstrated the feasibility of using AI-based approaches using deep learning Natural Language Processing (NLP) methods coupled with knowledge graph (KG) technology to construct advanced search, retrieval and visualization capabilities for finding COVID-19 clinical trials.
Given the urgency of the need to find safe and efficacious treatments for treating COVID-19, clinical trials are an essential tool for the clinical research community. However, the sheer number and rate-of-growth of clinical trials is now an issue for those looking to join a trial or to register patients on them. For example, from the start of the pandemic in late 2019 until October 2020, 3,392 COVID-19 clinical trials were registered in ClinicalTrials.gov. A year later, as of November 21, 2021, there were 7,031 clinical trials, 3,085 of which are actively recruiting.
Our novel approach for searching this material is to construct a knowledge graph of information from registered clinical trials on ClinicalTrials.gov that includes both structured information about the trial, as well as for inclusion of information extracted from the study protocol and eligibility criteria. Inclusion/exclusion criteria and protocol are represented as free, unstructured text, with deep learning-based NLP used to extract named entities and clinical concepts. Both structured and unstructured are then normalized and used as nodes in a KG, allowing the construction of sophisticated user queries and visualization of the results.
During evaluation, our methods demonstrated high precision and recall scores in retrieving relevant trials from ClinicalTrials.gov based on queries of both the eligibility criteria, and on searches involving the structured information from the trial. Melax Tech’s AI-based NLP system, CLAMP, was used during the development and NLP work on this project.
This method allows for diverse search queries and provides graph-based visualization of COVID-19 clinical trials. For example, the following are some of the use cases that we evaluated the system on.
Case query 1: Retrieve all COVID-19 clinical trials that target “remdesivir” as the intervention.
Case query 2: Retrieve all COVID-19 clinical trials that target “remdesivir” as the intervention but exclude pregnant women [OMOP ID: 4299535] from participating.
Case query 3: Retrieve all COVID-19 clinical trials that target “hydroxychloroquine” as the intervention and allow patients with shortness of breath [OMOP ID: 312437] to participate.
Case query 4: Retrieve all COVID-19 clinical trials in the United States that target “hydroxychloroquine” as the intervention and allow patients with diabetes to participate.
Note that the high-dimensional graph embedding vectors we produce are beneficial for many downstream applications, such as trial end recruitment status prediction and trial similarity comparison. This methodology is generalizable to clinical trials in other areas, such as oncology.
A full description of our methodology and results is given in “COVID-19 trial graph: a linked graph for COVID-19 clinical trials.” Jingcheng Du, Qing Wang, Jingqi Wang, Prerana Ramesh, Yang Xiang, Xiaoqian Jiang, Cui Tao. Journal of the American Medical Informatics Association, Volume 28, Issue 9, September 2021, Pages 1964–1969, PMCID: PMC8135317, https://doi.org/10.1093/jamia/ocab078