×

Non-reversible guided Metropolis kernel. (English) Zbl 1520.65004

Summary: We construct a class of non-reversible Metropolis kernels as a multivariate extension of the guided-walk kernel proposed by P. Gustafson [Stat. Comput. 8, 357–364 (1998; doi:10.1023/A:1008880707168)]. The main idea of our method is to introduce a projection that maps a state space to a totally ordered group. By using Haar measure, we construct a novel Markov kernel termed the Haar mixture kernel, which is of interest in its own right. This is achieved by inducing a topological structure to the totally ordered group. Our proposed method, the \(\Delta\)-guided Metropolis-Haar kernel, is constructed by using the Haar mixture kernel as a proposal kernel. The proposed non-reversible kernel is at least 10 times better than the random-walk Metropolis kernel and Hamiltonian Monte Carlo kernel for the logistic regression and a discretely observed stochastic process in terms of effective sample size per second.

MSC:

65C05 Monte Carlo methods
65C40 Numerical analysis or methods applied to Markov chains
60J05 Discrete-time Markov processes on general state spaces

References:

[1] Andrieu, C. (2016). On random- and systematic-scan samplers. Biometrika103, 719-726. · Zbl 1506.62010
[2] Andrieu, C. and Livingstone, S. (2021). Peskun-Tierney ordering for Markovian Monte Carlo: beyond the reversible scenario. Ann. Statist.49, 1958-1981. · Zbl 1489.65005
[3] Berger, J. O. (1993). Statistical Decision Theory and Bayesian Analysis. Springer, New York.
[4] Beskos, A.et al. (2017). Geometric MCMC for infinite-dimensional inverse problems. J. Comput. Phys.335, 327-351. · Zbl 1375.35627
[5] Beskos, A., Papaspiliopoulos, O. and Roberts, G. (2009). Monte Carlo maximum likelihood estimation for discretely observed diffusion processes. Ann. Statist.37, 223-245. · Zbl 1169.65004
[6] Beskos, A., Papaspiliopoulos, O., Roberts, G. O. and Fearnhead, P. (2006). Exact and computationally efficient likelihood-based estimation for discretely observed diffusion processes (with discussion). J. R. Statist. Soc. B [Statist. Methodology]68, 333-382. · Zbl 1100.62079
[7] Beskos, A., Pillai, N., Roberts, G., Sanz-Serna, J.-M. and Stuart, A. (2013). Optimal tuning of the hybrid Monte Carlo algorithm. Bernoulli19, 1501-1534. · Zbl 1287.60090
[8] Beskos, A., Roberts, G., Stuart, A. and Voss, J. (2008). MCMC methods for diffusion bridges. Stoch. Dynamics8, 319-350. · Zbl 1159.65007
[9] Bierkens, J. (2016). Non-reversible Metropolis-Hastings. Statist. Comput.26, 1213-1228. · Zbl 1360.65040
[10] Bierkens, J., Fearnhead, P. and Roberts, G. (2019). The zig-zag process and super-efficient sampling for Bayesian analysis of big data. Ann. Statist.47, 1288-1320. · Zbl 1417.65008
[11] Bouchard-Côté, A., Vollmer, S. J. and Doucet, A. (2018). The bouncy particle sampler: a nonreversible rejection-free Markov chain Monte Carlo method. J. Amer. Statist. Assoc.113, 855-867. · Zbl 1398.60084
[12] Chopin, N. and Ridgway, J. (2017). Leave Pima Indians alone: binary regression as a benchmark for Bayesian computation. Statist. Sci.32, 64-87. · Zbl 1442.62007
[13] Cotter, S. L., Roberts, G. O., Stuart, A. M. and White, D. (2013). MCMC methods for functions: modifying old algorithms to make them faster. Statist. Sci.28, 424-446. · Zbl 1331.62132
[14] Diaconis, P., Holmes, S. and Neal, R. M. (2000). Analysis of a nonreversible Markov chain sampler. Ann. Appl. Prob.10, 726-752. · Zbl 1083.60516
[15] Diaconis, P. and Saloff-Coste, L. (1993). Comparison theorems for reversible Markov chains. Ann. Appl. Prob.3, 696. · Zbl 0799.60058
[16] Dua, D. and Graff, C. (2017). UCI Machine Learning Repository. Available at https://archive.ics.uci.edu/ml/index.php. University of California, Irvine, School of Information and Computer Science.
[17] Duane, S., Kennedy, A., Pendleton, B. J. and Roweth, D. (1987). Hybrid Monte Carlo. Phys. Lett. B195, 216-222.
[18] Eddelbuettel, D. and Sanderson, C. (2014). RcppArmadillo: accelerating R with high-performance C++ linear algebra. Comput. Statist. Data Anal.71, 1054-1063. · Zbl 1471.62055
[19] Florens-Zmirou, D. (1989). Approximate discrete-time schemes for statistics of diffusion processes. Statistics20, 547-557. · Zbl 0704.62072
[20] Gagnon, P. and Maire, F. (2020). Lifted samplers for partially ordered discrete state-spaces. Preprint. Available at https://arxiv.org/abs/2003.05492v1.
[21] Ghosh, J. K., Delampady, M. and Samanta, T. (2006). An Introduction to Bayesian Analysis. Springer, New York. · Zbl 1135.62002
[22] Green, P. J., Łatuszyński, K., Pereyra, M. and Robert, C. P. (2015). Bayesian computation: a summary of the current state, and samples backwards and forwards. Statist. Comput.25, 835-862. · Zbl 1331.62017
[23] Gustafson, P. (1998). A guided walk Metropolis algorithm. Statist. Comput.8, 357-364.
[24] Halmos, P. R. (1950). Measure Theory. D. Van Nostrand, New York. · Zbl 0040.16802
[25] Hobert, J. P. and Marchev, D. (2008). A theoretical comparison of the data augmentation, marginal augmentation and PX-DA algorithms. Ann. Statist.36, 532-554. · Zbl 1155.60031
[26] Horowitz, A. M. (1991). A generalized guided Monte Carlo algorithm. Phys. Lett. B268, 247-252.
[27] Hosseini, B. (2019). Two Metropolis-Hastings algorithms for posterior measures with non-Gaussian priors in infinite dimensions. SIAM/ASA J. Uncertainty Quantif.7, 1185-1223. · Zbl 07118427
[28] Kamatani, K. (2017). Ergodicity of Markov chain Monte Carlo with reversible proposal. J. Appl. Prob.54, 638-654. · Zbl 1401.65009
[29] Kamatani, K. (2018). Efficient strategy for the Markov chain Monte Carlo in high-dimension with heavy-tailed target probability distribution. Bernoulli24, 3711-3750. · Zbl 1407.60102
[30] Kipnis, C. and Varadhan, S. R. S. (1986). Central limit theorem for additive functionals of reversible Markov processes and applications to simple exclusions. Commun. Math. Phys.104, 1-19. · Zbl 0588.60058
[31] Kontoyiannis, I. and Meyn, S. P. (2011). Geometric ergodicity and the spectral gap of non-reversible Markov chains. Prob. Theory Relat. Fields154, 327-339. · Zbl 1263.60064
[32] Kotz, S. and Nadarajah, S. (2004). Multivariate t Distributions and Their Applications. Cambridge University Press. · Zbl 1100.62059
[33] Lewis, P. A. W., Mckenzie, E. and Hugus, D. K. (1989). Gamma processes. Commun. Statist. Stoch. Models5, 1-30. · Zbl 0665.62090
[34] Liu, J. S. and Sabatti, C. (2000). Generalised Gibbs sampler and multigrid Monte Carlo for Bayesian computation. Biometrika87, 353-369. · Zbl 0960.65015
[35] Liu, J. S. and Wu, Y. N. (1999). Parameter expansion for data augmentation. J. Amer. Statist. Assoc.94, 1264-1274. · Zbl 1069.62514
[36] Ludkin, M. and Sherlock, C. (2022). Hug and hop: a discrete-time, nonreversible Markov chain Monte Carlo algorithm. To appear in Biometrika. · Zbl 07689558
[37] Ma, Y.-A., Chen, T. and Fox, E. B. (2015). A complete recipe for stochastic gradient MCMC. In Proc. 28th International Conference on Neural Information Processing Systems (NIPS ’15), Vol. 2, MIT Press, pp. 2917-2925.
[38] Ma, Y.-A., Fox, E. B., Chen, T. and Wu, L. (2019). Irreversible samplers from jump and continuous Markov processes. Statist. Comput.29, 177-202. · Zbl 1430.62060
[39] Neal, R. M. (1999). Regression and classification using Gaussian process priors. In Bayesian Statistics 6, Oxford University Press, New York, pp. 475-501. · Zbl 0974.62072
[40] Neal, R. M. (2011). MCMC using Hamiltonian dynamics. In Handbook of Markov Chain Monte Carlo, CRC Press, Boca Raton, FL, pp. 113-162. · Zbl 1229.65018
[41] Neal, R. M. (2020). Non-reversibly updating a uniform [0,1] value for Metropolis accept/reject decisions. Preprint. Available at https://arxiv.org/abs/2001.11950.
[42] Neiswanger, W., Wang, C. and Xing, E. P. (2014). Asymptotically exact, embarrassingly parallel MCMC. In Proc. Thirtieth Conference on Uncertainty in Artificial Intelligence (UAI ’14), AUAI Press, Arlington, VA, pp. 623-632.
[43] Ottobre, M., Pillai, N. S., Pinski, F. J. and Stuart, A. M. (2016). A function space HMC algorithm with second order Langevin diffusion limit. Bernoulli22, 60-106. · Zbl 1346.60119
[44] Plummer, M., Best, N., Cowles, K. and Vines, K. (2006). CODA: convergence diagnosis and output analysis for MCMC. R News6, 7-11.
[45] Prakasa Rao, B. L. S. (1983). Asymptotic theory for non-linear least squares estimator for diffusion processes. Ser. Statist.14, 195-209. · Zbl 0532.62060
[46] Prakasa Rao, B. L. S. (1988). Statistical inference from sampled data for stochastic processes. In Statistical Inference from Stochastic Processes (Ithaca, NY, 1987), American Mathematical Society, Providence, RI, pp. 249-284. · Zbl 0687.62069
[47] (2020). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna.
[48] Robert, C. and Casella, G. (2011). A short history of Markov chain Monte Carlo: subjective recollections from incomplete data. Statist. Sci.26, 102-115. · Zbl 1222.65006
[49] Robert, C. P. (2007). The Bayesian Choice: From Decision-Theoretic Foundations to Computational Implementation, 2nd edn. Springer, New York. · Zbl 1129.62003
[50] Roberts, G. O. and Rosenthal, J. S. (1997). Geometric ergodicity and hybrid Markov chains. Electron. Commun. Prob.2, 13-25. · Zbl 0890.60061
[51] Roberts, G. O. and Rosenthal, J. S. (1998). Optimal scaling of discrete approximations to Langevin diffusions. J. R. Statist. Soc. B [Statist. Methodology]60, 255-268. · Zbl 0913.60060
[52] Roberts, G. O. and Tweedie, R. L. (1996). Exponential convergence of Langevin diffusions and their discrete approximations. Bernoulli2, 341-363. · Zbl 0870.60027
[53] Roberts, G. O. and Tweedie, R. L. (2001). Geometric \(L^2\) and \(L^1\) convergence are equivalent for reversible Markov chains. J. Appl. Prob.38A, 37-41. · Zbl 1011.60050
[54] Rossky, P. J., Doll, J. D. and Friedman, H. L. (1978). Brownian dynamics as smart Monte Carlo simulation. J. Chem. Phys.69, 4628-4633.
[55] Sato, K. (1999). Lévy Processes and Infinitely Divisible Distributions. Cambridge University Press. · Zbl 0973.60001
[56] Scott, S. L. et al. (2016). Bayes and big data: the consensus Monte Carlo algorithm. Internat. J. Manag. Sci. Eng. Manag.11, 78-88.
[57] Shariff, R., György, A. and Szepesvári, C. (2015). Exploiting symmetries to construct efficient MCMC algorithms with an application to SLAM. In Proc. Eighteenth International Conference on Artificial Intelligence and Statistics (Proc. Machine Learning Research 38), eds G. Lebanon and S. V. N. Vishwanathan, PMLR, San Diego, CA, pp. 866-874.
[58] Sherlock, C. and Thiery, A. H. (2022). A discrete bouncy particle sampler. Biometrika109, 335-349. · Zbl 07543327
[59] (2020). RStan: the R interface to Stan. R package version 2.21.2. Available at http://mc-stan.org.
[60] Titsias, M. K. and Papaspiliopoulos, O. (2018). Auxiliary gradient-based sampling algorithms. J. R. Statist. Soc. B [Statist. Methodology]80, 749-767. · Zbl 1398.62035
[61] Tripuraneni, N., Rowland, M., Ghahramani, Z. and Turner, R. (2017). Magnetic Hamiltonian Monte Carlo. In Proc. 34th International Conference on Machine Learning (Proc. Machine Learning Research 70), eds D. Precup and Y. W. Teh, PMLR, Sydney, pp. 3453-3461.
[62] Turitsyn, K. S., Chertkov, M. and Vucelja, M. (2011). Irreversible Monte Carlo algorithms for efficient sampling. Physica D240, 410-414. · Zbl 1216.82022
[63] Vucelja, M. (2016). Lifting—a nonreversible Markov chain Monte Carlo algorithm. Amer. J. Phys.84, 958-968.
[64] Wang, X. and Dunson, D. B. (2013). Parallelizing MCMC via Weierstrass sampler. Preprint. Available at https://arxiv.org/abs/1312.4605.
[65] Welling, M. and Teh, Y. W. (2011). Bayesian learning via stochastic gradient Langevin dynamics. In Proc. 28th International Conference on Machine Learning (ICML ’11), Omnipress, Madison, WI, pp. 681-688.
[66] Yoshida, N. (1992). Estimation for diffusion processes from discrete observation. J. Multivariate Anal.41, 220-242. · Zbl 0811.62083
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.