Not Logged In

The challenge of applying machine learning techniques to diagnose schizophrenia using multi-site fMRI data

Full Text: VegaRomero_Roberto_I_201701_MSc.pdf PDF

One of the main challenges for the use of machine learning techniques in neuroimaging data is the small n, large p problem. Datasets usually contain only a few hundreds of instances (n), each of which is described using hundreds of thousands of features (p). In this dissertation, we explore the effects of reducing the number of features by analyzing 264 specific regions of interest of the brain, and increasing the number of instances by merging imaging data obtained from different scanning sites for distinguishing people with schizophrenia from healthy controls. Empirical results show that, using features related to functional connectivity of the brain, we can achieve an accuracy above the chance level (over 70 %), when using data from a single scanning site for both training and testing. However, this performance decreases when additional data from a different scanning site is used as part of the training process. We attribute the decrease in performance to batch effects: technical noise introduced at different scanning sites that confound the biological signal of interest. Batch effects are often disregarded in association studies because there is often no statistically significant interaction between the scanning site and the variables being analyzed. In this work, we highlight important differences between association studies and prediction studies, and we argue that in the latter, batch effects matter. Our experiments reveal that not taking them into account reduces the performance of a learned classifier compared to using data from a single scanning site, even though this drastically reduces the size of the training set. In addition, we can create a classifier that can distinguish among sites (not case vs control) with an accuracy > 80 %. We empirically show that if the same subjects are scanned in two different sites, then a neural network that maps the fMRI scan from one scanner into another is enough for correcting the batch effects. In more realistic situations, involving disjoint set of subjects, simple techniques like z-score normalization or whitening can remove batch effects caused by translations and scaling, or translations and rotations of the feature matrix. Both approaches proved successful in reducing the accuracy of scanning site classification to near chance level, but they were unable to improve the accuracy of schizophrenia diagnosis using multisite data. This is a strong indication that batch effects go beyond these simple linear transformations. Finally, we explored the use of BECCA (batch effects correction using canonical correlation analysis) and approaches based on autoencoders for decreasing the influence of batch effects. These attempts were also unsuccessful under our test scenarios, suggesting that batch effects is a serious problem in prediction studies using fMRI data, and that more effort should be taken to understand their nature in order to reduce their influence.

Citation

R. Vega. "The challenge of applying machine learning techniques to diagnose schizophrenia using multi-site fMRI data". MSc Thesis, University of Alberta, January 2017.

Keywords: machine learning, fMRI, batch effects, domain adaptation, schizophrenia, multi-site fMRI, classification
Category: MSc Thesis

BibTeX

@mastersthesis{Vega:17,
  author = {Roberto Vega},
  title = {The challenge of applying machine learning techniques to diagnose
    schizophrenia using multi-site fMRI data},
  School = {University of Alberta},
  year = 2017,
}

Last Updated: January 23, 2017
Submitted by Nelson Loyola

University of Alberta Logo AICML Logo