At the 17th International Conference on Computational Intelligence Methods for Bioinformatics and Biostatistics, a team led by our colleagues at the University of Texas, School of Biomedical Informatics presented a paper discussing a system co-developed by the team that uses knowledge graphs to explore the relationships between diet and neurodegenerative diseases. Melax Tech's Director of NLP Research, Dr. Jingcheng Du, was a member of the team. The system uses methods from literature mining to construct knowledge graphs for neurodegenerative diseases and subsequently uses these to explore relationships with diet.
To construct the system, abstracts from 4,300 peer-reviewed publications relevant to both diet and neurodegenerative diseases were selected from PubMed at the National Library of Medicine (NLM) using the NIH PubTator system. The resulting annotations were then used to construct a knowledge graph using Neo4j (Figure 1). Node2vec was used to train graph embedding vectors, and these were subsequently used to support concept identification and development of potential concept-level clusters. Visualizations were done using a t-distributed stochastic neighbor embedding (t-SNE) approach. Our findings suggest that several food-related chemicals derived from the diet may have an impact on neurodegenerative disease. A graphical result using t-SNE from node embedding data is shown in Figure 2 below.
Figure 1: demonstrates the constructed knowledge graph for Neurodegenerative Diseases and Diet
Figure 2: demonstrates system results. Clear clusters of diseases can be seen in the lower part of the figure. Several diseases at the bottom right are separated from the majority, and we found that they are diseases spreading among both animals and humans, for example, Chronic Wasting Disease, Creutzfeldt-Jakob Syndrome, and Prion Disease.
The development of this system was motivated by a desire to contribute to the armamentarium of tools available to both clinicians and researchers when studying potential relationships between diseases and diet. Our use of literature mining to build the knowledge graph allows us to encode biomedical concepts derived from the literature directly into the knowledge graph. Consequently, the tool allows researchers to study large amounts of literature rapidly and, as demonstrated in this work, reveals hidden potential relationships that are otherwise hard to discover. The framework of our system is flexible and could, for example, support the linking of sparse knowledge from rapidly expanding literature resources, promoting uncovering of new relationships and knowledge. The system could easily be repurposed for other diseases, as well as applied to problems in therapeutic discovery, clinical decision support, and drug repurposing.
A full description of our methodology and results is given in “Knowledge Graph-based Neurodegenerative Diseases and Diet Relationship Discovery.” Yi Nian , Jingcheng Du, Larry Bu, Fang Li, Xinyue Hu, Yuji Zhang, Cui Tao. In Proceedings of CIBB 2021 “CIBB 2021 Computational Intelligence Methods for Bioinformatics and Biostatistics” 15-17 November 2021. Online. Until the conference proceedings are published, a preprint of the work may be obtained from https://arxiv.org/abs/2109.06123 .