Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Apr 28:16:132.
doi: 10.1186/s12859-015-0571-7.

Moment based gene set tests

Affiliations

Moment based gene set tests

Jessica L Larson et al. BMC Bioinformatics. .

Abstract

Background: Permutation-based gene set tests are standard approaches for testing relationships between collections of related genes and an outcome of interest in high throughput expression analyses. Using M random permutations, one can attain p-values as small as 1/(M+1). When many gene sets are tested, we need smaller p-values, hence larger M, to achieve significance while accounting for the number of simultaneous tests being made. As a result, the number of permutations to be done rises along with the cost per permutation. To reduce this cost, we seek parametric approximations to the permutation distributions for gene set tests.

Results: We study two gene set methods based on sums and sums of squared correlations. The statistics we study are among the best performers in the extensive simulation of 261 gene set methods by Ackermann and Strimmer in 2009. Our approach calculates exact relevant moments of these statistics and uses them to fit parametric distributions. The computational cost of our algorithm for the linear case is on the order of doing |G| permutations, where |G| is the number of genes in set G. For the quadratic statistics, the cost is on the order of |G|(2) permutations which can still be orders of magnitude faster than plain permutation sampling. We applied the permutation approximation method to three public Parkinson's Disease expression datasets and discovered enriched gene sets not previously discussed. We found that the moment-based gene set enrichment p-values closely approximate the permutation method p-values at a tiny fraction of their cost. They also gave nearly identical rankings to the gene sets being compared.

Conclusions: We have developed a moment based approximation to linear and quadratic gene set test statistics' permutation distribution. This allows approximate testing to be done orders of magnitude faster than one could do by sampling permutations. We have implemented our method as a publicly available Bioconductor package, npGSEA (www.bioconductor.org) .

PubMed Disclaimer

Figures

Figure 1
Figure 1
Distributions of permuted statistics resemble known probability densities. Top panel shows a permutation histogram for a linear test statistic for the steroid hormone signaling pathway gene set as described in the text. The bottom panel shows a quadratic test statistic. Solid red dots indicate the observed values and curves indicate parametric fits, based on normal and χ 2 distributions.
Figure 2
Figure 2
Permutation and moment-based p-values are tightly correlated. Permutation p-values (x-axis) versus moment-based p-values (y-axis) for 6,303 gene sets. The left two column represents results for a linear test statistic versus the beta and Gaussian approximations; the right-most column represents results for the sum of squares statistic versus the χ 2 approximation. Data come from three genome-wide expression studies. We applied the transformation − log10(p) to stretch the lower range of these distributions for a more informative visual. Red dotted lines represent the line y=x.

Similar articles

Cited by

References

    1. Mootha VK, Lindgren CM, Eriksson KF, Subramanian A, Sihag S, Lehar J, et al. PGC-1 α-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet. 2003;34:267–73. doi: 10.1038/ng1180. - DOI - PubMed
    1. Newton MA, Quintana FA, den Boon JA, Sengupta S, Ahlquist P. Random-set methods identify distinct aspects of the enrichment signal in gene-set analysis. Ann Appl Stat. 2007;1:85–106. doi: 10.1214/07-AOAS104. - DOI
    1. Tian L, Greenberg SA, Kong SW, Altschuler J, Kohane IS, Park PJ. Discovering statistically significant pathways in expression profiling studies. Proc Natl Acad Sci. 2005;102(38):13544–49. doi: 10.1073/pnas.0506577102. - DOI - PMC - PubMed
    1. Goeman JJ, Bühlmann P. Analyzing gene expression data in terms of gene sets: methodological issues. Bioinformatics. 2007;23(8):980–7. doi: 10.1093/bioinformatics/btm051. - DOI - PubMed
    1. Jiang Z, Gentleman R. Extensions to gene set enrichment. Bioinformatics. 2007;23(3):306–13. doi: 10.1093/bioinformatics/btl599. - DOI - PubMed