top of page

Facilitate Biomedical Data Re-use with Natural Language Processing


Biomedical research produces complex datasets ranging from the molecular level to individuals and populations. Many biomedical data repositories have been created, aiming to serve the community by housing and making the datasets available for reuse. Those data repositories greatly improve the availability and utility of biomedical datasets. However, the metadata of such biomedical datasets is often not standardized, making them less compliant with the FAIR principles (Findable, Accessible, Interoperable, and Reusable).


In a project funded by the National Institute of Allergy and Infectious Diseases (NIAID), Melax Tech proposed a solution to tackle this challenge for immunology research by developing natural language processing (NLP) and ontology-based methods and tools to extract and normalize the metadata of immunology datasets, thus improving their discoverability by general and specific search engines. We extended our flagship NLP product CLAMP to extract and normalize biomedical entities such as genes, diseases, and drugs in the description of biomedical datasets, making them interoperable via standard biomedical ontologies. Over 20,000 immunological datasets, as well as their linked publications, were processed by CLAMP and indexed through a public search engine. The developed framework that can automatically extract and normalize metadata of biomedical datasets and feed them into any search engines will greatly improve the reuse of existing datasets to increase biomedical research productivity and reproducibility.


Melax Tech utilized AI to standardize metadata from over 20,000 biomedical datasets. This has made the datasets more accessible and discoverable, improving productivity and reproducibility in biomedical research. To learn more about the implementation of CLAMP in making the datasets more interoperable, request a demo today.

bottom of page