×

Flexible signal denoising via flexible empirical Bayes shrinkage. (English) Zbl 1540.62023

Summary: Signal denoising – also known as non-parametric regression – is often performed through shrinkage estimation in a transformed (e.g., wavelet) domain; shrinkage in the transformed domain corresponds to smoothing in the original domain. A key question in such applications is how much to shrink, or, equivalently, how much to smooth. Empirical Bayes shrinkage methods provide an attractive solution to this problem; they use the data to estimate a distribution of underlying “effects,” hence automatically select an appropriate amount of shrinkage. However, most existing implementations of empirical Bayes shrinkage are less flexible than they could be – both in their assumptions on the underlying distribution of effects, and in their ability to handle heteroskedasticity – which limits their signal denoising applications. Here we address this by adopting a particularly flexible, stable and computationally convenient empirical Bayes shrinkage method and applying it to several signal denoising problems. These applications include smoothing of Poisson data and heteroskedastic Gaussian data. We show through empirical comparisons that the results are competitive with other methods, including both simple thresholding rules and purpose-built empirical Bayes procedures. Our methods are implemented in the R package smashr, “SMoothing by Adaptive SHrinkage in R,” available at https://www.github.com/stephenslab/smashr.

MSC:

62-08 Computational methods for problems pertaining to statistics
62C12 Empirical decision procedures; empirical Bayes procedures
62G08 Nonparametric regression and quantile regression

References:

[1] F. Abramovich, T. Sapatinas, and B. W. Silverman. Wavelet thresholding via a Bayesian approach.Journal of the Royal Statistical Society, Series B, 60(4):725-749, 1998. · Zbl 0910.62031
[2] S. Anders and W. Huber. Differential expression analysis for sequence count data.Genome Biology, 11(10): R106, 2010.
[3] A. Antoniadis, J. Bigot, and T. Sapatinas. Wavelet estimators in nonparametric regression: a comparative simulation study.Journal of Statistical Software, 6(6):1-83, 2001.
[4] R. G. Baraniuk. Optimal tree approximation with wavelets. InProceeding of the SPIE International Symposium on Optical Science, Engineering and Instrumentation, volume 3813, 1999.
[5] P. Besbeas, I. De Feis, and T. Sapatinas. A comparative simulation study of wavelet shrinkage estimators for poisson counts.International Statistical Review, 72(2):209-237, 2004. · Zbl 1211.62055
[6] Figure 10: Variance functions used to simulate the Gaussian data sets. These functions were rescaled in the simulations to achieve the desired signal-to-noise ratio.
[7] G. Beylkin. On the representation of operators in bases of compactly supported wavelets.SIAM Journal on Numerical Analysis, 29(6):1716-1740, 1992. · Zbl 0766.65007
[8] P. J. Bickel and E. Levina. Covariance regularization by thresholding.Annals of Statistics, 36(6):2577-2604, 2008. · Zbl 1196.62062
[9] L. D. Brown and M. Levine. Variance estimation in nonparametric regression via the difference sequence method.Annals of Statistics, 35(5):2219-2232, 2007. · Zbl 1126.62024
[10] T. T. Cai and L. Wang. Adaptive variance function estimation in heteroscedastic nonparametric regression. Annals of Statistics, 36(5):2025-2054, 2008. · Zbl 1148.62021
[11] E. J. Cand‘es and D. L. Donoho. Curvelets: a surprisingly effective nonadaptive representation for objects with edges. In A. Cohen, C. Rabut, and L. L. Schumaker, editors,Curve and Surface Fitting, pages 105-120, 2000.
[12] C. M. Carvalho, N. G. Polson, and J. G. Scott. The horseshoe estimator for sparse signals.Biometrika, 97 (2):465-480, 2010. · Zbl 1406.62021
[13] W. Chang, J. Cheng, J. Allaire, Y. Xie, and J. McPherson.shiny: web application framework for R, 2018. URLhttps://CRAN.R-project.org/package=shiny. R package version 1.1.0.
[14] M. Clyde and E. I. George. Flexible Empirical Bayes estimation for wavelets.Journal of the Royal Statistical Society, Series B, 62(4):681-698, 2000. · Zbl 0957.62006
[15] R. R. Coifman and D. L. Donoho. Translation-invariant de-noising. In A. Antoniadis and G. Oppenheim, editors,Wavelets and Statistics, volume 103 ofLecture Notes in Statistics, pages 125-150. Springer, New York, NY, 1995. · Zbl 0866.94008
[16] M. J. Daniels and R. E. Kass. Shrinkage estimators for covariance matrices.Biometrics, 57(4):1173-1184, 2001. · Zbl 1209.62132
[17] I. Daubechies.Ten Lectures on Wavelets. SIAM, Philadelphia, PA, 1992. · Zbl 0776.42018
[18] V. Delouille, J. Simoens, and R. von Sachs. Smooth design-adapted wavelets for nonparametric stochastic regression.Journal of the American Statistical Association, 99(467):643-658, 2004. · Zbl 1117.62315
[19] D. L. Donoho and I. M. Johnstone. Adapting to unknown smoothness via wavelet shrinkage.Journal of the American Statistical Association, 90(432):1200-1224, 1995. · Zbl 0869.62024
[20] D. L. Donoho and J. M. Johnstone. Ideal spatial adaptation by wavelet shrinkage.Biometrika, 81(3):425-455, 1994. · Zbl 0815.62019
[21] I. Dunham, A. Kundaje, S. F. Aldred, P. J. Collins, C. A. Davis, et al. An integrated encyclopedia of DNA elements in the human genome.Nature, 489(7414):57-74, 2012.
[22] D. Eddelbuettel and R. Francois. Rcpp: seamless R and C++ integration.Journal of Statistical Software, 40 (8):1-18, 2011.
[23] B. Efron. Large-scale simultaneous hypothesis testing: the choice of a null hypothesis.Journal of the American Statistical Association, 99(465):96-104, 2004. · Zbl 1089.62502
[24] B. Efron and R. Tibshirani. Empirical Bayes methods and false discovery rates for microarrays.Genetic Epidemiology, 23(1):70-86, 2002.
[25] ENCODE Project Consortium. A user’s guide to the Encyclopedia of DNA Elements (ENCODE).PLoS Biology, 9(4):e1001046, 2011.
[26] J. Fan and Q. Yao. Efficient estimation of conditional variance functions in stochastic regression.Biometrika, 85(3):645-660, 1998. · Zbl 0918.62065
[27] H. A. Friberg.Rmosek: the R to MOSEK optimization interface, 2017. http://rmosek.r-forge.r-project.org, http://www.mosek.com.
[28] P. Fryzlewicz and G. P. Nason. A Haar-Fisz algorithm for Poisson intensity estimation.Journal of Computational and Graphical Statistics, 13(3):621-638, 2004.
[29] H. Y. Gao. Wavelet shrinkage estimates for heteroscedastic regression models. Technical report, MathSoft Inc., 1997.
[30] J. J. Gart and J. R. Zweifel. On the bias of various estimators of the logit and its variance with application to quantal bioassay.Biometrika, 54(1):181-187, 1967. · Zbl 0163.14601
[31] J. Gertz, D. Savic, K. E. Varley, E. C. Partridge, A. Safi, P. Jain, G. M. Cooper, T. E. Reddy, G. E. Crawford, and R. M. Myers. Distinct properties of cell-type-specific and shared transcription factor binding sites. Molecular Cell, 52(1):25-36, 2013.
[32] N. C. Henderson and R. Varadhan. Damped Anderson acceleration with restarts and monotonicity control for accelerating EM and EM-like algorithms.Journal of Computational and Graphical Statistics, 28(4): 834-846, 2019. · Zbl 07499030
[33] W. James and C. Stein. Estimation with quadratic loss. InProceedings of the Fourth Berkeley Symposium on Mathematical statistics and probability, volume 1, pages 361-379, 1961. · Zbl 1281.62026
[34] I. Johnstone and B. Silverman. EbayesThresh: R programs for empirical Bayes thresholding.Journal of Statistical Software, 12(8):1-38, 2005a.
[35] I. M. Johnstone and B. W. Silverman. Needles and straw in haystacks: Empirical Bayes estimates of possibly sparse sequences.Annals of Statistics, 32(4):1594-1649, 2004. · Zbl 1047.62008
[36] I. M. Johnstone and B. W. Silverman. Empirical Bayes selection of wavelet thresholds.Annals of Statistics, 33(4):1700-1752, 2005b. · Zbl 1078.62005
[37] Y. Kim, P. Carbonetto, M. Stephens, and M. Anitescu. A fast algorithm for maximum likelihood estimation of mixture proportions using sequential quadratic programming.Journal of Computational and Graphical Statistics, 29(2):261-273, 2020. · Zbl 07499254
[38] R. Koenker and J. Gu. REBayes: an R package for empirical Bayes mixture methods.Journal of Statistical Software, 82(8):1-26, 2017.
[39] R. Koenker and I. Mizera. Convex optimization, shape constraints, compound decisions, and empirical bayes rules.Journal of the American Statistical Association, 109(506):674-685, 2014. · Zbl 1367.62020
[40] E. D. Kolaczyk. Bayesian multiscale models for Poisson processes.Journal of the American Statistical Association, 94(447):920-933, 1999. · Zbl 1072.62630
[41] S. G. Landt, G. K. Marinov, A. Kundaje, P. Kheradpour, F. Pauli, S. Batzoglou, B. E. Bernstein, P. Bickel, J. B. Brown, P. Cayting, Y. Chen, G. DeSalvo, C. Epstein, K. I. Fisher-Aylor, G. Euskirchen, M. Gerstein, J. Gertz, A. J. Hartemink, M. M. Hoffman, V. R. Iyer, Y. L. Jung, S. Karmakar, M. Kellis, P. V. Kharchenko, Q. Li, T. Liu, X. S. Liu, L. Ma, A. Milosavljevic, R. M. Myers, P. J. Park, M. J. Pazin, M. D. Perry, D. Raha, T. E. Reddy, J. Rozowsky, N. Shoresh, A. Sidow, M. Slattery, J. A. Stamatoyannopoulos, M. Y. Tolstorukov, K. P. White, S. Xi, P. J. Farnham, J. D. Lieb, B. J. Wold, and M. Snyder. ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia.Genome Research, 22(9):1813-1831, 2012.
[42] S. G. Mallat.A wavelet tour of signal processing: the sparse way. Elsevier/Academic Press, 3 edition, 2009. · Zbl 1170.94003
[43] J. C. Marioni, C. E. Mason, S. M. Mane, M. Stephens, and Y. Gilad. RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays.Genome Research, 18(9):1509-1517, 2008.
[44] M. Menictas and M. P. Wand. Variational inference for heteroscedastic semiparametric regression.Australian and New Zealand Journal of Statistics, 57(1):119-138, 2015. · Zbl 1331.62236
[45] G. Nason.wavethresh: wavelets statistics and transforms, 2016. URLhttps://CRAN.R-project. org/package=wavethresh. R package version 4.6.8.
[46] G. P. Nason. Choice of the threshold parameter in wavelet function estimation. In A. Antoniadis and G. Oppenheim, editors,Wavelets and Statistics, volume 103 ofLecture Notes in Statistics, pages 261- 280. Springer, New York, NY, 1995. · Zbl 0875.62160
[47] G. P. Nason. Wavelet shrinkage using cross-validation.Journal of the Royal Statistical Society, Series B, 58 (2):463-479, 1996. · Zbl 0853.62034
[48] G. P. Nason. Choice of wavelet smoothness, primary resolution and threshold in wavelet shrinkage.Statistics and Computing, 12(3):219-227, 2002.
[49] R. D. Nowak. Multiscale hidden Markov models for Bayesian image analysis. In P. M¨uller and B. Vidakovic, editors,Bayesian Inference in Wavelet-Based Models, volume 141 ofLecture Notes in Statistics, pages 243-265. Springer, New York, NY, 1999. · Zbl 0940.62091
[50] R. D. Nowak and E. D. Kolaczyk. A statistical multiscale framework for Poisson inverse problems.IEEE Transactions on Information Theory, 46(5):1811-1825, 2000. · Zbl 0999.94004
[51] N. G. Polson and J. G. Scott. Shrink globally, act locally: sparse Bayesian regularization and prediction. Bayesian Statistics, 9:501-538, 2010.
[52] G. Robertson, M. Hirst, M. Bainbridge, M. Bilenky, Y. Zhao, T. Zeng, G. Euskirchen, B. Bernier, R. Varhol, A. Delaney, N. Thiessen, O. L. Griffith, A. He, M. Marra, M. Snyder, and S. Jones. Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nature Methods, 4(8):651-657, 2007.
[53] S. Sardy, D. B. Percival, A. G. Bruce, H.-Y. Gao, and W. Stuetzle. Wavelet shrinkage for unequally spaced data.Statistics and Computing, 9:65-75, 1999.
[54] B. W. Silverman. Some aspects of the spline smoothing approach to non-parametric regression curve fitting. Journal of the Royal Statistical Society, Series B, 47(1):1-52, 1985. · Zbl 0606.62038
[55] B. W. Silverman. Wavelets in statistics: beyond the standard assumptions.Philosophical Transactions of the Royal Society of London, Series A, 1760(357):2459-2473, 1999. · Zbl 1054.62538
[56] C. A. Sloan, E. T. Chan, J. M. Davidson, V. S. Malladi, J. S. Strattan, B. C. Hitz, I. Gabdank, A. K. Narayanan, M. Ho, B. T. Lee, L. D. Rowe, T. R. Dreszer, G. Roe, N. R. Podduturi, F. Tanaka, E. L. Hong, and J. M. Cherry. ENCODE data at the ENCODE portal.Nucleic Acids Research, 44(D1):D726-D732, 2016.
[57] M. Stephens. False discovery rates: a new deal.Biostatistics, 18(2):275-294, 2017.
[58] R. Tibshirani. Regression shrinkage and selection via the Lasso.Journal of the Royal Statistical Society, Series B, 58(1):267-288, 1996. · Zbl 0850.62538
[59] K. E. Timmermann and R. D. Nowak. Multiscale modeling and estimation of Poisson processes with application to photon-limited imaging.IEEE Transactions on Information Theory, 45(3):846-862, 1999. · Zbl 0947.94005
[60] R. Varadhan and C. Roland. Simple and globally convergent methods for accelerating the convergence of any EM algorithm.Scandinavian Journal of Statistics, 35(2):335-353, 2008. · Zbl 1164.65006
[61] E. G. Wilbanks and M. T. Facciotti. Evaluation of algorithm performance in ChIP-Seq peak detection.PLoS ONE, 5(7):e11471, 2010.
[62] Z. Xing, P. Carbonetto, and M. Stephens. Source code and data accompanying this manuscript, June 2021. URLhttps://doi.org/10.5281/zenodo.4895382.
[63] Y. Zhang, T. Liu, C. A. Meyer, J. Eeckhoute, D. S. Johnson, B. E. Bernstein, C. Nusbaum, R. M. Myers, M. Brown, W. Li, and X. S. Liu. Model-based analysis of ChIP-Seq (MACS).Genome Biology, 9:R137, 2008.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.