. 2015 Apr 28:16:132.

doi: 10.1186/s12859-015-0571-7.

Moment based gene set tests

Jessica L Larson^{1

2}, Art B Owen³

Affiliations

¹ Department of Bioinformatics and Computational Biology, Genentech, Inc., South San Francisco, USA. larson.jess@gmail.com.
² Currently at GenePeeks, Inc., Cambridge, USA. larson.jess@gmail.com.
³ Department of Statistics, Stanford University, Stanford, USA. owen@stanford.edu.

PMID: 25928861
PMCID: PMC4419444
DOI: 10.1186/s12859-015-0571-7

Moment based gene set tests

Jessica L Larson et al. BMC Bioinformatics. 2015.

. 2015 Apr 28:16:132.

doi: 10.1186/s12859-015-0571-7.

Authors

Jessica L Larson^{1

2}, Art B Owen³

Affiliations

¹ Department of Bioinformatics and Computational Biology, Genentech, Inc., South San Francisco, USA. larson.jess@gmail.com.
² Currently at GenePeeks, Inc., Cambridge, USA. larson.jess@gmail.com.
³ Department of Statistics, Stanford University, Stanford, USA. owen@stanford.edu.

PMID: 25928861
PMCID: PMC4419444
DOI: 10.1186/s12859-015-0571-7

Abstract

Background: Permutation-based gene set tests are standard approaches for testing relationships between collections of related genes and an outcome of interest in high throughput expression analyses. Using M random permutations, one can attain p-values as small as 1/(M+1). When many gene sets are tested, we need smaller p-values, hence larger M, to achieve significance while accounting for the number of simultaneous tests being made. As a result, the number of permutations to be done rises along with the cost per permutation. To reduce this cost, we seek parametric approximations to the permutation distributions for gene set tests.

Results: We study two gene set methods based on sums and sums of squared correlations. The statistics we study are among the best performers in the extensive simulation of 261 gene set methods by Ackermann and Strimmer in 2009. Our approach calculates exact relevant moments of these statistics and uses them to fit parametric distributions. The computational cost of our algorithm for the linear case is on the order of doing |G| permutations, where |G| is the number of genes in set G. For the quadratic statistics, the cost is on the order of |G|(2) permutations which can still be orders of magnitude faster than plain permutation sampling. We applied the permutation approximation method to three public Parkinson's Disease expression datasets and discovered enriched gene sets not previously discussed. We found that the moment-based gene set enrichment p-values closely approximate the permutation method p-values at a tiny fraction of their cost. They also gave nearly identical rankings to the gene sets being compared.

Conclusions: We have developed a moment based approximation to linear and quadratic gene set test statistics' permutation distribution. This allows approximate testing to be done orders of magnitude faster than one could do by sampling permutations. We have implemented our method as a publicly available Bioconductor package, npGSEA (www.bioconductor.org) .

PubMed Disclaimer

Figures

**Figure 1**
Distributions of permuted statistics resemble known probability densities. Top panel shows a permutation histogram for a linear test statistic for the steroid hormone signaling pathway gene set as described in the text. The bottom panel shows a quadratic test statistic. Solid red dots indicate the observed values and curves indicate parametric fits, based on normal and χ ² distributions.

**Figure 2**
Permutation and moment-based p-values are tightly correlated. Permutation p-values (x-axis) versus moment-based p-values (y-axis) for 6,303 gene sets. The left two column represents results for a linear test statistic versus the beta and Gaussian approximations; the right-most column represents results for the sum of squares statistic versus the χ ² approximation. Data come from three genome-wide expression studies. We applied the transformation − log10(p) to stretch the lower range of these distributions for a more informative visual. Red dotted lines represent the line y=x.

See this image and copyright information in PMC

Cited by

Roastgsa: a comparison of rotation-based scores for gene set enrichment analysis.
Caballé-Mestres A, Berenguer-Llergo A, Stephan-Otto Attolini C. Caballé-Mestres A, et al. BMC Bioinformatics. 2023 Oct 30;24(1):408. doi: 10.1186/s12859-023-05510-x. BMC Bioinformatics. 2023. PMID: 37904108 Free PMC article.
SEMgsa: topology-based pathway enrichment analysis with structural equation models.
Grassi M, Tarantino B. Grassi M, et al. BMC Bioinformatics. 2022 Aug 17;23(1):344. doi: 10.1186/s12859-022-04884-8. BMC Bioinformatics. 2022. PMID: 35978279 Free PMC article.
Patient-derived xenografts undergo mouse-specific tumor evolution.
Ben-David U, Ha G, Tseng YY, Greenwald NF, Oh C, Shih J, McFarland JM, Wong B, Boehm JS, Beroukhim R, Golub TR. Ben-David U, et al. Nat Genet. 2017 Nov;49(11):1567-1575. doi: 10.1038/ng.3967. Epub 2017 Oct 9. Nat Genet. 2017. PMID: 28991255 Free PMC article.
Overlapping Group Logistic Regression with Applications to Genetic Pathway Selection.
Zeng Y, Breheny P. Zeng Y, et al. Cancer Inform. 2016 Sep 15;15:179-87. doi: 10.4137/CIN.S40043. eCollection 2016. Cancer Inform. 2016. PMID: 27679461 Free PMC article.
Bioconductor's EnrichmentBrowser: seamless navigation through combined results of set- & network-based enrichment analysis.
Geistlinger L, Csaba G, Zimmer R. Geistlinger L, et al. BMC Bioinformatics. 2016 Jan 20;17:45. doi: 10.1186/s12859-016-0884-1. BMC Bioinformatics. 2016. PMID: 26791995 Free PMC article.

References

1. Mootha VK, Lindgren CM, Eriksson KF, Subramanian A, Sihag S, Lehar J, et al. PGC-1 α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet. 2003;34:267–73. doi: 10.1038/ng1180. - DOI - PubMed
1. Newton MA, Quintana FA, den Boon JA, Sengupta S, Ahlquist P. Random-set methods identify distinct aspects of the enrichment signal in gene-set analysis. Ann Appl Stat. 2007;1:85–106. doi: 10.1214/07-AOAS104. - DOI
1. Tian L, Greenberg SA, Kong SW, Altschuler J, Kohane IS, Park PJ. Discovering statistically significant pathways in expression profiling studies. Proc Natl Acad Sci. 2005;102(38):13544–49. doi: 10.1073/pnas.0506577102. - DOI - PMC - PubMed
1. Goeman JJ, Bühlmann P. Analyzing gene expression data in terms of gene sets: methodological issues. Bioinformatics. 2007;23(8):980–7. doi: 10.1093/bioinformatics/btm051. - DOI - PubMed
1. Jiang Z, Gentleman R. Extensions to gene set enrichment. Bioinformatics. 2007;23(3):306–13. doi: 10.1093/bioinformatics/btl599. - DOI - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Medical
- MedlinePlus Health Information
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Moment based gene set tests

Affiliations

Moment based gene set tests

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical

Research Materials

Abstract

Figures

Similar articles

Cited by

References

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical

Research Materials