Not Logged In

Predictive Models for Breast Cancer Susceptibility from Multiple Single Nucleotide Polymorphisms

Full Text: SNP-CCR.pdf PDF

Hereditary predisposition and causative environmental exposures have long been recognized in human malignancies. In most instances, cancer cases occur sporadically, suggesting that environmental influences are critical in determining cancer risk. To test the influence of genetic polymorphisms on breast cancer risk, we have measured 98 single nucleotide polymorphisms (SNPs) distributed over 45 genes of potential relevance to breast cancer etiology in 174 patients and have compared these with matched normal controls. Using machine learning techniques such as support vector machines (SVMs), decision trees, and naive Bayes, we identified a subset of three SNPs as key discriminators between breast cancer and controls. The SVMs performed maximally among predictive models, achieving 69% predictive power in distinguishing between the two groups, compared with a 50% baseline predictive power obtained from the data after repeated random permutation of class labels (individuals with cancer or controls). However, the simpler na1¨ve Bayes model as well as the decision tree model performed quite similarly to the SVM. The three SNP sites most useful in this model were (a) the 4536T/C site of the aldosterone synthase gene CYP11B2 at amino acid residue 386 Val/Ala (T/C) (rs4541); (b) the 4328C/G site of the aryl hydrocarbon hydroxylase CYP1B1 at amino acid residue 293 Leu/Val (C/G) (rs5292); and (c) the 4449C/T site of the transcription factor BCL6 at amino acid 387 Asp/Asp (rs1056932). No single SNP site on its own could achieve more than 60% in predictive accuracy. We have shown that multiple SNP sites from different genes over distant parts of the genome are better at identifying breast cancer patients than any one SNP alone. As high-throughput technology for SNPs improves and as more SNPs are identified, it is likely that much higher predictive accuracy will be achieved and a useful clinical tool developed.

Citation

J. Listgarten, S. Damaraju, B. Poulin, L. Cook, J. Dufour, A. Driga, J. Mackey, D. Wishart, R. Greiner, B. Zanke. "Predictive Models for Breast Cancer Susceptibility from Multiple Single Nucleotide Polymorphisms". Clinical Cancer Research (CCR), April 2004.

Keywords: bioinformatics, polyomx, SNPs, cancer, machine learning, medical informatics
Category: In Journal

BibTeX

@article{Listgarten+al:CCR04,
  author = {Jennifer Listgarten and Sambasivarao Damaraju and Brett Poulin and
    Lillian Cook and Jennifer Dufour and Adrian Driga and John Mackey and David
    S. Wishart and Russ Greiner and Brent Zanke},
  title = {Predictive Models for Breast Cancer Susceptibility from Multiple
    Single Nucleotide Polymorphisms},
  journal = {Clinical Cancer Research (CCR)},
  year = 2004,
}

Last Updated: April 27, 2012
Submitted by Russ Greiner

University of Alberta Logo AICML Logo