Not Logged In

Improving Subcellular Localization Prediction using Text Classification and the Gene Ontology

Each protein performs its functions within some specific locations in a cell. This subcellular location is important for understanding protein function and for facilitating its purification. There are now many computational techniques for predicting location based on sequence analysis and database information from homologs. A few recent techniques use text from biological abstracts: our goal is to improve the prediction accuracy of such text-based techniques. We identify three techniques for improving text-based prediction: a rule for ambiguous abstract removal, a mechanism for using synonyms from the Gene Ontology (GO) and a mechanism for using the GO hierarchy to generalize terms. We show that these three techniques can significantly improve the accuracy of protein subcellular-location predictors that use text extracted from PubMed abstracts whose references are recorded in Swiss-Prot.

Citation

A. Fyshe, Y. Liu, D. Szafron, R. Greiner, P. Lu. "Improving Subcellular Localization Prediction using Text Classification and the Gene Ontology". Bioinformatics, August 2008.

Keywords: bioinformatics, proteome analyst, machine learning, medical informatics, natural language, subcellular, gene ontology
Category: In Journal
Web Links: Journal DOI

BibTeX

@article{Fyshe+al:Bioinformatics08,
  author = {Alona Fyshe and Yifeng Liu and Duane Szafron and Russ Greiner and
    Paul Lu},
  title = {Improving Subcellular Localization Prediction using Text
    Classification and the Gene Ontology},
  journal = {Bioinformatics},
  year = 2008,
}

Last Updated: April 28, 2012
Submitted by Russ Greiner

University of Alberta Logo AICML Logo