What is clinical natural language processing?
Clinical natural language processing (NLP), also known as Medical NLP, Health NLP, or Biomedical NLP, refers to automated methods to manipulate human language (i.e., spoken or textual narratives) in the biomedical domain, ranging from natural language understanding to natural language generation tasks. Other terms, such as computational linguistics and text mining, are often used interchangeably with NLP. However, they may have slightly different focuses (e.g., computational linguistics is more about modeling language using computational technologies, and text mining aims to apply NLP technologies to extensive textual collections to derive insights).
The terms Clinical NLP and Medical NLP often refer to NLP for clinical documents in electronic health records (EHRs), such as discharge summaries, pathology reports, and radiology reports. However, Health NLP and Biomedical NLP could include broader types of textual documents related to healthcare or life science, e.g., biomedical literature, clinical trial documents, drug labels, social media, online forums, etc.
What are the different types of NLP tasks from a linguistics aspect?
From the linguistics aspect, NLP tasks can be organized based on the type of linguistic information it aims to understand, including:
Phonology: focusing on understanding sounds, e.g., the automatic speech recognition task
Morphology: about how words are composed from atomic units, which can support tasks such as stemming
Lexicography: to understand words, e.g., part-of-speech tagging to determine if a word is a noun or a verb
Syntax: how words form a sentence (i.e., grammatical structures such as noun phrases), often involving the task of syntactic parsing
Semantics: to understand the meaning of words, phrases, and sentences, which is often related to information extraction tasks such as named entity recognition (NER) and relation extraction (RE)
Pragmatics: about how context or word knowledge can be used to interpret meanings of texts, e.g., co-reference resolution
What are the common NLP tasks in the biomedical domain?
From the application aspect, common NLP tasks in the biomedical domain include:
information retrieval (IR) – to search a text collection and find relevant documents to a user-specified query, e.g., PubMed for searching biomedical articles;
text/document classification – to assign a document to one or more defined labels, e.g., if an email is spam or not;
information extraction (IE) – to extract specific information of interest from a document, e.g., diseases mentioned in a patient report. Many clinical NLP systems currently target the IE task. In addition to extracting specific types of entities such as diseases (the NER task), contextual information around entities (e.g., negation, certainty, temporal information about a disease entity) is also needed, which can be tackled as a relation extraction task. Furthermore, recognized entities are often required mapping to concept codes in standard terminologies in the medical domain (e.g., diseases to ICD-10 CM codes), which is an entity linking or concept mapping task.
advanced applications: those could include tasks such as language translation, text summarization, question answering, and conversational bots, which often build on other tasks such as IR and IE.
What is the primary purpose of NLP?
Over the past two decades, abundant textual data have been accumulated in the biomedical domain, e.g., clinical documents in EHRs and biomedical articles in bibliographic databases, which leads to the high demand for clinical NLP technology. Many new applications have been developed and demonstrated the great use of NLP in healthcare and life science, such as
NLP to extract information from clinical documents to support real-world studies using EHRs
NLP supports clinical decision-making to improve care quality and safety
NLP to identify evidence and generate hypotheses for drug discovery from massive biomedical literature
NLP to monitor public perception of drugs and vaccines by mining social media data
NLP to parse clinical trial protocols for patient recruitment
Clinical NLP is now an essential technology for biomedical data processing and analyses. It applies novel language technology to biomedical textual data and has shown great promise in improving human health.