Not Logged In

Unsupervised Mapping of Sentences to Biomedical Concepts based on Integrated Information Retrieval Model and Clustering

Full Text: ACMBCB10.pdf PDF

Structured information revealed by manual annotation of disease descriptions with UMLS meta-thesaurus concepts, can provide high-quality reliable data sources for the research community. While progress in both extent and annotation has been made, only a limited scope of diseases has been annotated, largely because of the required human resources. Since annotating text is time consuming and the variation of disease descriptions makes the annotation task difficult, it is useful to develop systems for automatic mapping of biomedical sentences into an ontology. Our goal is to automatically map biomedical sentences into UMLS disease concepts. Previous methods including statistical methods, are still weaker than dictionary-based simple matching methods. To consider an alternative to both, we demonstrate how the mapping problem can be viewed as a document retrieval problem: under this perspective, the mapping integrates information based on a language model, document frequency, and distance measures. Our improvements are based on a three-step method using information retrieval and clustering. In the first step, we retrieve the top-10 ranked relevant UMLS concept entries using an integrated information retrieval model. In the second step, we cluster the retrieved concept entries according to shared words. In the final step, we select one answer for each cluster using a threshold. Our experiments are promising, and on typical data show a precision of 73.28%, recall of 77.51%, and F-measure of 75.34% significantly outperforming previous methods based on statistics, dictionaries, and the MetaMap by 6.95 to 9.95 percent.

Citation

M. Kim, Q. Dou, O. Zaiane, R. Goebel. "Unsupervised Mapping of Sentences to Biomedical Concepts based on Integrated Information Retrieval Model and Clustering". ACM Conference on Bioinformatics, Computational Biology and Biomedicine, Niagara Falls, USA, pp 322-329, August 2010.

Keywords: Algorithms, Experimentation, Languages
Category: In Conference

BibTeX

@incollection{Kim+al:ACMBCB10,
  author = {Mi-Young Kim and Qing Dou and Osmar R. Zaiane and Randy Goebel},
  title = {Unsupervised Mapping of Sentences to Biomedical Concepts based on
    Integrated Information Retrieval Model and Clustering},
  Pages = {322-329},
  booktitle = {ACM Conference on Bioinformatics, Computational Biology and
    Biomedicine},
  year = 2010,
}

Last Updated: January 15, 2020
Submitted by Sabina P

University of Alberta Logo AICML Logo