What is Clinical Natural Language Processing (NLP)?



Introduction to Clinical NLP

Clinical natural language processing (NLP), also known as Medical NLP, Health NLP, or Biomedical NLP, refers to automated methods to manipulate human language (i.e., spoken or textual narratives) in the biomedical domain, ranging from natural language understanding to natural language generation tasks. Other terms such as computational linguistics and text mining are often used interchangeable with NLP, although they may have slightly different focuses (e.g., computational linguistics is more about modeling language using computational technologies and text mining aims to apply NLP technologies to large textual collection to derive insights).


The terms of Clinical NLP and Medical NLP often refer to NLP for clinical documents in electronic health records (EHRs), such as discharge summaries, pathology reports, and radiology reports. However, Health NLP and Biomedical NLP could include broader types of textual documents that are related to healthcare or life science, e.g., biomedical literature, clinical trial documents, drug labels, social media, online forum, and many more.


From the linguistics aspect, NLP tasks can be organized based on the type of linguistic information it aims to understand, including:

  • phonology: focusing on understanding sounds, e.g., the automatic speech recognition task

  • morphology: about how words are composed from atomic units, which can support tasks such as stemming

  • lexicography: to understand words, e.g., part-of-speech tagging to determine if a word is a noun or a verb

  • syntax: how words form a sentence (i.e., grammatical structures such as noun phrases), often involving the task of syntactic parsing

  • semantics: to understand the meaning of words, phrases, and sentences, which is often related to information extraction tasks such named entity recognition (NER) and relation extraction (RE)

  • pragmatics: about how context or word knowledge can be used to interpret meanings of texts, e.g., co-reference resolution

What are the common NLP tasks in biomedical domain?

From the application aspect, common NLP tasks in the biomedical domain include:

  1. information retrieval (IR) – to search a text collection and find relevant documents to a user specified query, e.g., PubMed for searching biomedical articles;

  2. text/document classification – to assign a document to one or more defined labels, e.g., if an email is spam or not;

  3. information extraction (IE) – to extract specific information of interest from a document, e.g., diseases mentioned in a patient report. Many clinical NLP systems currently target on the IE task. In addition to extract specific types of entities such as diseases (the NER task), contextual information around entities (e.g., negation, certainty, temporal information about a disease entity) is also needed, which can be tackled as a relation extraction task. Furthermore, recognized entities are often required mapping to concept codes in standard terminologies in the medical domain (e.g., diseases to ICD-10 CM codes), which is an entity linking or concept mapping task.

  4. advanced applications: those could include tasks such as language translation, text summarization, question answering, and conversational bots, which often build on other tasks such as IR and IE.

What is the main purpose of NLP?

Over the past two decades, abundant textual data have been accumulated in the biomedical domain, e.g., clinical documents in EHRs and biomedical articles in bibliographic databases, which leads to the high demand of clinical NLP technology. Many new applications have been developed and demonstrated the great use of NLP in healthcare and life science, such as (1) NLP to extract information from clinical documents to support real world studies using EHRs; (2) NLP to support clinical decision making to improve care quality and safety; (3) NLP to identify evidence and generate hypotheses for drug discovery from massive biomedical literature; (4) NLP to monitor public perception on drugs and vaccines by mining social media data; and (5) NLP to parse clinical trial protocols for patient recruitment.


Conclusion

In summary, clinical NLP is now an essential technology for biomedical data processing and analyses. It applies novel language technology to biomedical textual data and it has shown great promise in improving human health.