Not Logged In

Knowing What doesn't Matter: Exploiting the Omission of Irrelevant Data

Full Text: superfluous-journal.ps PS

Most learning algorithms work most effectively when their training data contain completely specified labeled samples. In many diagnostic tasks, however, the data will include the values of only some of the attributes; we model this as a blocking process that hides the values of those attributes from the learner. While blockers that remove the values of critical attributes can handicap a learner, this paper instead focuses on blockers that remove only irrelevant attribute values, ie, values that are not needed to classify an instance, given the values of the other unblocked attributes. We first motivate and formalize this model of ``superfluous-value blocking,'' and then demonstrate that these omissions can be useful, by proving that certain classes that seem hard to learn in the general PAC model --- viz., decision trees and DNF formulae --- are trivial to learn in this setting. We also show that this model can be extended to deal with (1) theory revision (ie, modifying an existing formula); (2) blockers that occasionally include superfluous values or exclude required values; and (3) other corruptions of the training data.

Citation

A. Grove, R. Greiner, A. Kogan. "Knowing What doesn't Matter: Exploiting the Omission of Irrelevant Data". Artificial Intelligence (AIJ), 97(1-2), pp 345--380, December 1997.

Keywords: omission, irrelevance, missing information, PAC, machine learning
Category: In Journal

BibTeX

@article{Grove+al:AIJ97,
  author = {Adam Grove and Russ Greiner and Alexander Kogan},
  title = {Knowing What doesn't Matter: Exploiting the Omission of Irrelevant
    Data},
  Volume = "97",
  Number = "1-2",
  Pages = {345--380},
  journal = {Artificial Intelligence (AIJ)},
  year = 1997,
}

Last Updated: September 30, 2010
Submitted by Russ Greiner

University of Alberta Logo AICML Logo