Not Logged In

The Budgeted Biomarker Discovery Problem

Full Text: Khan_Sheehan_V_201505_PhD.pdf PDF

Researchers conduct association studies to discover biomarkers in order to gain new biological insight on complex diseases and phenotypes. Although most researchers have intuitions about what defines a biomarker and how to assess the results of an association study, there is neither a formal definition for what a biomarker is, nor objective goal for association studies. As a result, the literature is full of association studies with conflicting results – e.g., studies on the same phenotype that produce lists of biomarkers with little to no overlap.

This thesis presents the “Budgeted Biomarker Discovery (BBD) problem”, which clearly defines (1) what a biomarker is, and (2) rewards for correctly identifying biomarkers and penalties for incorrectly identifying biomarkers. Furthermore, the BBD problem allows researchers to use a mixture of high- and low-throughput technologies. In the context of discovering biomarkers from gene expression data, we show how future association studies can use both microarrays and qPCR data to objectively find the genes that are biomarkers in a cost efficient manner.

We present several algorithms for solving the BBD problem, and show that good algorithms must make use of both microarrays and qPCR. Also, they must be able to adapt to the data as it is collected. For example, when solving a new BBD problem, we must begin by collecting microarrays because we do not yet know how many biomarkers we expect to identify, or which qPCR arrays would be most informative. Thus, we use the high-throughput microarrays to survey the problem, until we can identify which specific low-throughput qPCR arrays to use for focusing on those genes that are potentially biomarkers. To identify when this transition should occur, we present the problem of estimating the density of univariate statistics in high-throughput data, and we present our Fused Density Estimation (FDE) algorithm as ii a solution. We use FDE as the backbone of our adaptive algorithms for solving BBD problems. In a series of experiments on real microarray data and realistic synthetic data, we show that our BBD1 algorithm is the most robust solution, amongst those considered, to the BBD problem.

Citation

S. Khan. "The Budgeted Biomarker Discovery Problem". PhD Thesis, June 2015.

Keywords: bioinformatics, machine learning, budgeted learning
Category: PhD Thesis

BibTeX

@phdthesis{Khan:15,
  author = {Sheehan Khan},
  title = {The Budgeted Biomarker Discovery Problem},
  year = 2015,
}

Last Updated: September 18, 2017
Submitted by Russ Greiner

University of Alberta Logo AICML Logo