Clinical Text Mining

A General Supervised Approach for Segmentation of Clinical Texts


Segmentation of clinical texts into logical groups is critical for all sorts of tasks such as medical coding for billing, auto drafting of discharge summaries, patient problem list generation, population study on allergies, etc. While there have been previous studies on using supervised approaches to segmentation of clinical texts, these existing approaches were trained and tested on a fairly limited data set showing low adaptability to new unseen documents. We propose a highly generalized model for segmenting clinical texts, based on a set of line-wise predictions by a classifier with constraints imposing their coherence. Evaluation results on 5 independent test sets show that the proposed approach can work on all sorts of note types and performs consistently across different organizations (i.e. hospitals).


Example segmented document:

Presentation Slides

Discovering Related Clinical Concepts Using Large Amounts of Clinical Notes


The ability to find highly related clinical concepts is essential for many applications such as for hypothesis generation, query expansion for medical literature search, search results filtering, ICD-10 code filtering and many other applications. While manually constructed medical terminologies such as SNOMED CT can surface certain related concepts, these terminologies are inadequate as they depend on expertise of several subject matter experts making the terminology curation process open to geographic and language bias. In addition, these terminologies also provide no quantifiable evidence on how related the concepts are. In this work, we explore an unsupervised graphical approach to mine related concepts by leveraging the volume within large amounts of clinical notes. Our evaluation shows that we are able to use a data driven approach to discovering highly related concepts for various search terms including medications, symptoms and diseases.

Mining Related Concepts

The Concept-Graph is used to mine related clinical terminology. For example if the query term is advair, related concepts can be singulair, combivent, inhaler, nebs, etc. The Concept-Graph is an undirected graph with each node representing a concept and the link between the nodes indicate a presence of relationship between two concepts. The results of this work was evaluated by experts in the medical field.

A similar graph data structure has been used for text summarization tasks.


  • Paper
  • Journal link

    Example Related Concepts

    Concepts related to chest pain

    concepts related to chest pain

    Concepts related to advair

    concepts related to advair