NLP PIPELINES

Diseases and Symptoms (mapped to SNOMED-CT)

This pipeline extracts patients’ medical problems such as diseases and symptoms, together with associated modifiers including “negation”, “severity”, “uncertainty”, “condition”, “subject”, and “body location”, from clinical reports. The recognized diseases/symptoms will be mapped to SNOMED-CT concept IDs.

Comprehensive Clinical Information

This is the most comprehensive pipeline from Melax, which recognizes four primary clinical entities from clinical notes: “medical problems”, “medications”, “treatments”, and “lab tests”, as well as their modifiers, including (1) “negation”, “severity”, “uncertainty”, “condition”, “subject”, and “body location” for “medical problems”, (2) “form”, “dosage”, “strength”, “route”, “duration”, and “frequency” for “medications”; (3) “negation” for “treatments”; and (4) “negation” and “value” for “lab tests”. All “temporal information” associated with primary entities will be extracted as well. Moreover, all extracted primary entities will be mapped to standard codes in corresponding medical terminologies: SNOMED-CT for “medical problems”, RxNorm for “medications”, ICD-10 PCS for “treatments”, and LOINC for “lab tests”. 

Biomedical Entity Extraction

This pipeline extracts gene, chemical, disease from MEDLINE titles/abstracts.

Opioid

This pipeline will extract Opioid related medications and dosage information, then convert them to morphine milligram equivalents (MME) for opioid overdose recognition.

Language Barriers

This pipeline recognizes the patient's primary language and fluency levels of languages.

Intimate Partner Violence (IPV)

This pipeline identifies family violence related behaviours such as kick, hit, insult from clinical notes.

Demographics

This pipeline extracts demographic information of patients, including gender, age, and ethnicity from clinical notes.

Alcohol Status

This pipeline recognizes alcohol consumption of patients from clinical notes.

Smoking Status

This pipeline extracts mentions of smoking status of patients in clinical notes and classifies them into three categories: current smoker, past smoker, and non-smoker.

Human Phenotype Ontology(HPO) Concepts

This pipeline will recognize HPO terms in clinical text and map them into HPO codes.

Bleeding Events

This pipeline automatically identifies clopidogrel-induced bleeding events from clinical notes. Reference: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5543340/

Stressor Information Extraction

This pipeline extracts diverse types of stressors (e.g., lost job, family violence, financial difficulty) from psychairy notes. Reference: https://link.springer.com/chapter/10.1007/978-3-319-60045-1_41

COVID-19 Signs and Symptoms

It extracts COVID19 related signs and symptoms defined by WHO, as well as associated eight attributes (body location, severity, temporal expression, subject, condition, uncertainty, negation, and course) from clinical text. The extracted information is also mapped to standard concepts in the Observational Medical Outcomes Partnership Common Data Model. Reference: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7480086/

Colorectal Cancer Cases

This pipeline identifies colorectal cancer cases from multiple types of clinical notes. Reference: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3243156/

Lung Cancer Metastases Status

This pipeline extracts metastases-related Information from pathology reports of lung cancer patients, including histological type, grade, specimen site, metastatic status indicators and the procedure.

Cancer Information in Pathology Notes

This pipeline extracts comprehensive types of cancer-related information in pathology reports (e.g., tumor size, tumor stage, and biomarkers). Reference: https://pubmed.ncbi.nlm.nih.gov/31438083/

De-identification

This pipeline recognizes Protected Health Information (PHI), including patient names, doctor names, addresses, dates, etc. It also provides two types of post processings. One is to replace the recognized PHI with place holders, and the other is to replace PHI with synthetical (fake) data. It can also be used to shift the dates on a patient level.

Procedures and Other Treatments

This pipeline extracts procedures, as well as other non-medication treatments for patients from clinical reports. Recognized entities will be mapped to ICD-10 procedure codes.

Medication and Signature Information (mapped to RxNorm)

This pipeline identifies mentions of medications as well as their signature information including “form”, “dosage”, “strength”, “route”, “duration”, and “frequency”, from clinical reports. It then maps recognized medication/signature information to RxNorm codes.

Lab Tests (mapped to LOINC)

This pipeline recognizes lab test-related information from clinical reports. Examples of lab tests include panels and tests run on body fluids, procedures performed on a patient, such as x-rays and biopsies, and vital signs. It will extract numeric values associated with lab tests as well. Extracted lab test entities will be mapped to LOINC codes if applicable.

Diseases (mapped to ICD-10)

This pipeline extracts disease mentions, together with associated modifiers including “negation”, “severity”, “uncertainty”, “condition”, “subject”, and “body location” from clinical reports. The recognized diseases will be mapped to ICD-10 CM codes.