Not Logged In

Breeding value estimation and quantitative trait loci detection by Machine Learning methods based on high dimensional Single Nucleotide Polymorphisms dataset

Full Text: my_thesis.pdf PDF

A Quantitative Trait Locus (QTL) is a region of DNA that is associated with a particular phenotypic trait. QTL mapping is the statistical study that relates the alleles that occur in a locus to the associated phenotypes. If we know the QTLs that a ffect the economically important traits in the breeding industry of dairy cattle, we could greatly improve the estimation of breeding values, which would in turn lead to more accurate selection of diary sires for breeding. With the advances in DNA chip technology and the discovery of thousands of single nucleotide polymorphisms (SNPs) in genome-sequencing projects, we can now identify the QTL associated with traits of interest based on the SNP information. In this study, we consider the challenge of learning the QTL mapping for predicting important traits that are then turned into breeding values using the SNPs dataset. This is especially challenging due to the high dimensionality of the dataset. We examine the use of two machine-learning kernel methods, Support Vector Machine (SVM) and Gaussian Process (GP), as well as several statistical methods | including partial least square regression (PLS) and LASSO. We also explore several feature selection techniques to identify the SNPs associated with the QTL a ffecting the traits for prediction, including correlation-based feature selection, logic regression, M5 prime for linear regression and haplotype blocks. We focus on a dataset from a diary-industry breeding program, where 1341 SNPs are genotyped of 462 dairy sires to predict 5 economically important traits. Our empirical results indicate that the average correlation between prediction and true value of these 5 traits is about 0.56 using GP, our best predictor. The results also suggest that the performance of the two kernel methods is better than that of the other statistical methods based on correlation and root-mean square error performance criteria. However, the feature selection methods we tried failed to identify the most relevant SNPs of the traits in this dataset.

Citation

W. Wei. "Breeding value estimation and quantitative trait loci detection by Machine Learning methods based on high dimensional Single Nucleotide Polymorphisms dataset". MSc Thesis, Dept of Computing Science, University of Alberta, December 2008.

Keywords: machine learning, bovine, QTL, SNP
Category: MSc Thesis

BibTeX

@mastersthesis{Wei:08,
  author = {Wei Wei},
  title = {Breeding value estimation and quantitative trait loci detection by
    Machine Learning methods based on high dimensional Single Nucleotide
    Polymorphisms dataset},
  School = {Dept of Computing Science, University of Alberta},
  year = 2008,
}

Last Updated: July 14, 2009
Submitted by Nelson Loyola

University of Alberta Logo AICML Logo