Effects of sample size on differential gene expression, rank order and prediction accuracy of a gene signature
- Cynthia Stretch
- Sheehan Khan, Department of Computing Science
- Nasimeh Asgarian, AICML
- Roman Eisner
- Saman Vaisipour
- Sambasivarao Damaraju, Cross Cancer Institute
- OF Bathe
- Helen Steed
- Russ Greiner, Dept of Computing Science; PI of AICML
- Vickie Baracos, Division of Experimental Oncology
Top differentially expressed gene lists are often inconsistent between studies and it has been suggested that small sample sizes contribute to lack of reproducibility and poor prediction accuracy in discriminative models. We considered sex differences (69♂, 65♀) in 134 human skeletal muscle biopsies using DNA microarray. The full dataset and subsamples (n = 10 (5♂, 5♀) to n = 120 (60♂, 60♀)) thereof were used to assess the effect of sample size on the differential expression of single genes, gene rank order and prediction accuracy. Using our full dataset (n = 134), we identified 717 differentially expressed transcripts (p<0.0001) and we were able predict sex with ~90% accuracy, both within our dataset and on external datasets. Both p-values and rank order of top differentially expressed genes became more variable using smaller subsamples. For example, at n = 10 (5♂, 5♀), no gene was considered differentially expressed at p<0.0001 and prediction accuracy was ~50% (no better than chance). We found that sample size clearly affects microarray analysis results; small sample sizes result in unstable gene lists and poor prediction accuracy. We anticipate this will apply to other phenotypes, in addition to sex.
Citation
C. Stretch, S. Khan, N. Asgarian, R. Eisner, S. Vaisipour, S. Damaraju, O. Bathe, H. Steed, R. Greiner, V. Baracos. "Effects of sample size on differential gene expression, rank order and prediction accuracy of a gene signature". PLoS One, 8(6), pp e65380, April 2013.Keywords: | bioinformatics, microarray, sample size, machine learning |
Category: | In Journal |
Web Links: | DOI |
URL |
BibTeX
@article{Stretch+al:PLoSONE13, author = {Cynthia Stretch and Sheehan Khan and Nasimeh Asgarian and Roman Eisner and Saman Vaisipour and Sambasivarao Damaraju and OF Bathe and Helen Steed and Russ Greiner and Vickie Baracos}, title = {Effects of sample size on differential gene expression, rank order and prediction accuracy of a gene signature}, Volume = "8", Number = "6", Pages = {e65380}, journal = {PLoS One}, year = 2013, }Last Updated: February 10, 2020
Submitted by Sabina P