Not Logged In

Metric-based approaches for semi-supervised regression and classification

Full Text: ssl06.pdf PDF

Semi-supervised learning methods typically require an explicit relationship to be asserted between the labeled and unlabeled data—as illustrated, for example, by the neighbourhoods used in graph-based methods. Semi-supervised model selection and regularization methods are presented here that instead require only that the labeled and unlabeled data are drawn from the same distribution. From this assumption, a metric can be constructed over hypotheses based on their predictions for unlabeled data. This metric can then be used to detect untrustworthy training error estimates, leading to model selection strategies that select the richest hypothesis class while providing theoretical guarantees against over-fitting. This general approach is then adapted to regularization for supervised regression and supervised classification with probabilistic classifiers. The regularization adapts not only to the hypothesis class but also to the specific data sample provided, allowing for better performance than regularizers that account only for class complexity.

Citation

D. Schuurmans, F. Southey, D. Wilkinson, Y. Guo. "Metric-based approaches for semi-supervised regression and classification". Semi-Supervised Learning, MIT Press, (ed: O. Chapelle, B. Schoelkopf, A. Zein), January 2006.

Keywords: machine learning
Category: In Book

BibTeX

@inbook{Schuurmans+al:Semi-SupervisedLearning06,
  author = {Dale Schuurmans and Finnegan Southey and Dana Wilkinson and Yuhong
    Guo},
  title = {Metric-based approaches for semi-supervised regression and
    classification},
  Publisher = {MIT Press},
  Editor = {O. Chapelle, B. Schoelkopf, A. Zein},
  year = 2006,
}

Last Updated: September 20, 2009
Submitted by Dale Schuurmans

University of Alberta Logo AICML Logo