Not Logged In

Semi-Supervised Conditional Random Fields for Improved Sequence Segmentation and Labeling

Full Text: Revised-ACL06.pdf PDF

We present a new semi-supervised training procedure for conditional random fields (CRFs) that can be used to train sequence segmentors and labelers from a combination of labeled and unlabeled training data. Our approach is based on extending the minimum entropy regularization framework to the structured prediction case, yielding a training objective that combines unlabeled conditional entropy with labeled conditional likelihood. Although the training objective is no longer concave, it can still be used to improve an initial model (e.g. obtained from supervised training) by iterative ascent. We apply our new training algorithm to the problem of identifying gene and protein mentions in biological texts, and show that incorporating unlabeled data improves the performance of the supervised CRF in this case.

Citation

F. Jiao, S. Wang, C. Lee, R. Greiner, D. Schuurmans. "Semi-Supervised Conditional Random Fields for Improved Sequence Segmentation and Labeling". International Conference on Computational Linguistics and the Association for Computational Linguist, July 2006.

Keywords: Conditional Random Field, Semi-Supervised, bioinformatics, gene mentions, machine learning
Category: In Conference

BibTeX

@incollection{Jiao+al:ACL06,
  author = {Feng Jiao and Shaojun Wang and Chi-Hoon Lee and Russ Greiner and
    Dale Schuurmans},
  title = {Semi-Supervised Conditional Random Fields for Improved Sequence
    Segmentation and Labeling},
  booktitle = {International Conference on Computational Linguistics and the
    Association for Computational Linguist},
  year = 2006,
}

Last Updated: September 21, 2012
Submitted by Russ Greiner

University of Alberta Logo AICML Logo