top of page

NLP Pipelines

With Melax Tech NLP pipelines, users can amass information on any number of conditions, co-morbidities, or cohorts within an unstructured data set.

Diseases and Symptoms (mapped to SNOMED-CT)

This pipeline extracts patients’ medical problems, such as diseases and symptoms, together with associated modifiers, including “negation”, “severity”, “uncertainty”, “condition”, “subject”, and “body location”, from clinical reports. The recognized diseases/symptoms will be mapped to SNOMED-CT concept IDs.


This pipeline will extract Opioid-related medications and dosage information, then convert them to morphine milligram equivalents (MME) for opioid overdose recognition.

Language Barriers

This pipeline recognizes the patient's primary language and fluency levels of languages.

Human Phenotype Ontology(HPO) Concepts

This pipeline will recognize HPO terms in clinical text and map them into HPO codes.

COVID-19 Signs and Symptoms

It extracts COVID19 related signs and symptoms defined by WHO, as well as associated eight attributes (body location, severity, temporal expression, subject, condition, uncertainty, negation, and course) from clinical text. The extracted information is also mapped to standard concepts in the Observational Medical Outcomes Partnership Common Data Model. Reference:

Cancer Information in Pathology Notes

This pipeline extracts comprehensive types of cancer-related information in pathology reports (e.g., tumor size, tumor stage, and biomarkers). Reference:

Procedures and Other Treatments

This pipeline extracts procedures, as well as other non-medication treatments for patients from clinical reports. Recognized entities will be mapped to ICD-10 procedure codes.

Diseases (mapped to ICD-10)

This pipeline extracts disease mentions, together with associated modifiers including “negation”, “severity”, “uncertainty”, “condition”, “subject”, and “body location” from clinical reports. The recognized diseases will be mapped to ICD-10 CM codes.

Comprehensive Clinical Information

This is the most comprehensive pipeline from Melax, which recognizes four primary clinical entities from clinical notes: “medical problems”, “medications”, “treatments”, and “lab tests”, as well as their modifiers, including (1) “negation”, “severity”, “uncertainty”, “condition”, “subject”, and “body location” for “medical problems”, (2) “form”, “dosage”, “strength”, “route”, “duration”, and “frequency” for “medications”; (3) “negation” for “treatments”; and (4) “negation” and “value” for “lab tests”. All “temporal information” associated with primary entities will be extracted as well. Moreover, all extracted primary entities will be mapped to standard codes in corresponding medical terminologies: SNOMED-CT for “medical problems”, RxNorm for “medications”, ICD-10 PCS for “treatments”, and LOINC for “lab tests”. 

Intimate Partner Violence (IPV)

This pipeline identifies family violence-related behaviors such as kicking, hitting, and insulting from clinical notes.

Smoking Status

This pipeline extracts mentions of the smoking status of patients in clinical notes and classifies them into three categories: current smoker, past smoker, and non-smoker.

Stressor Information Extraction

This pipeline extracts diverse types of stressors (e.g., lost job, family violence, financial difficulty) from psychiatry notes. Reference:

Colorectal Cancer Cases

This pipeline identifies colorectal cancer cases from multiple types of clinical notes. Reference:

Medication and Signature Information (mapped to RxNorm)

This pipeline identifies mentions of medications as well as their signature information including “form”, “dosage”, “strength”, “route”, “duration”, and “frequency”, from clinical reports. It then maps recognized medication/signature information to RxNorm codes.



Biomedical Entity Extraction

 This pipeline extracts genes, chemicals, and diseases from MEDLINE titles/abstracts.


This pipeline extracts demographic information of patients, including gender, age, and ethnicity from clinical notes.

Lab Tests (mapped to LOINC)

This pipeline recognizes lab test-related information from clinical reports. Examples of lab tests include panels and tests run on body fluids, procedures performed on a patient, such as x-rays and biopsies, and vital signs. It will extract numeric values associated with lab tests as well. Extracted lab test entities will be mapped to LOINC codes if applicable.

Alcohol Status

This pipeline recognizes alcohol consumption of patients from clinical notes.

Bleeding Events

This pipeline automatically identifies clopidogrel-induced bleeding events from clinical notes. Reference:

Lung Cancer Metastases Status

This pipeline extracts metastases-related Information from pathology reports of lung cancer patients, including histological type, grade, specimen site, metastatic status indicators and the procedure.


This pipeline recognizes Protected Health Information (PHI), including patient names, doctor names, addresses, dates, etc. It also provides two types of post processings. One is to replace the recognized PHI with place holders, and the other is to replace PHI with synthetical (fake) data. It can also be used to shift the dates on a patient level.

bottom of page