Document Zbl 07877173

Efficient multimodal sampling via tempered distribution flow. (English) Zbl 07877173

J. Am. Stat. Assoc. 119, No. 546, 1446-1460 (2024).

Summary: Sampling from high-dimensional distributions is a fundamental problem in statistical research and practice. However, great challenges emerge when the target density function is unnormalized and contains isolated modes. We tackle this difficulty by fitting an invertible transformation mapping, called a transport map, between a reference probability measure and the target distribution, so that sampling from the target distribution can be achieved by pushing forward a reference sample through the transport map. We theoretically analyze the limitations of existing transport-based sampling methods using the Wasserstein gradient flow theory, and propose a new method called TemperFlow that addresses the multimodality issue. TemperFlow adaptively learns a sequence of tempered distributions to progressively approach the target distribution, and we prove that it overcomes the limitations of existing methods. Various experiments demonstrate the superior performance of this novel sampler compared to traditional methods, and we show its applications in modern deep learning tasks such as image generation. The programming code for the numerical experiments is available in the supplementary material.

MSC:

62-XX

Statistics

Keywords:

deep neural network; gradient flow; Markov chain Monte Carlo; normalizing flow; parallel tempering

Software:

BayesDA; Almond; NICE; Adam; Fashion-MNIST

Cite Review PDF

Full Text: DOI arXiv

References:

[1]	Ambrosio, L., Gigli, N., and Savare, G. (2008), Gradient Flows: In Metric Spaces and in the Space of Probability Measures, Basel: Birkhäuser. · Zbl 1145.35001
[2]	Bhattacharya, R. (1978), “Criteria for Recurrence and Existence of Invariant Measures for Multidimensional Diffusions,”The Annals of Probability, 6, 541-553. DOI: . · Zbl 0386.60056
[3]	Bond-Taylor, S., Leach, A., Long, Y., and Willcocks, C. G. (2021), “Deep Generative Modelling: A Comparative Review of VAEs, GANs, Normalizing Flows, Energy-based and Autoregressive Models,” arXiv preprint, arXiv:2103.04922.
[4]	Brooks, S., Gelman, A., Jones, G., and Meng, X.-L. (2011), Handbook of Markov Chain Monte Carlo, Boca Raton, FL: Chapman & Hall/CRC. · Zbl 1218.65001
[5]	Che, T., Zhang, R., Sohl-Dickstein, J., Larochelle, H., Paull, L., Cao, Y., and Bengio, Y. (2020), “Your GAN is Secretly an Energy-based Model and You Should Use Discriminator Driven Latent Sampling,” inAdvances in Neural Information Processing Systems (Vol. 33).
[6]	Cheng, X., Chatterji, N. S., Abbasi-Yadkori, Y., Bartlett, P. L., and Jordan, M. I. (2018), “Sharp Convergence Rates for Langevin Dynamics in the Nonconvex Setting,” arXiv preprint arXiv:1805.01648.
[7]	Chwialkowski, K., Strathmann, H., and Gretton, A. (2016), “A Kernel Test of Goodness of Fit,” in International Conference on Machine Learning, , pp. 2606-2615. PMLR.
[8]	Dinh, L., Krueger, D., and Bengio, Y. (2014), “NICE: Non-Linear Independent Components Estimation,” arXiv preprint arXiv:1410.8516.
[9]	Dinh, L., Sohl-Dickstein, J., and Bengio, S. (2016), “Density Estimation Using Real NVP,” arXiv preprint arXiv:1605.08803.
[10]	Dolatabadi, H. M., Erfani, S., and Leckie, C. (2020), “Invertible Generative Modeling Using Linear Rational Splines,” in International Conference on Artificial Intelligence and Statistics, , pp. 4236-4246.
[11]	Dongarra, J., and Sullivan, F. (2000), “Guest Editors’ Introduction: The Top 10 Algorithms,” Computing in Science & Engineering, 2, 22-23. DOI: .
[12]	Dunson, D. B., and Johndrow, J. (2020), “The Hastings Algorithm at Fifty,”Biometrika, 107, 1-23. DOI: . · Zbl 1435.62042
[13]	Durkan, C., Bekasov, A., Murray, I., and Papamakarios, G. (2019), “Neural Spline Flows,” inAdvances in Neural Information Processing Systems(Vol. 32).
[14]	Earl, D. J., and Deem, M. W. (2005), “Parallel Ttempering: Theory, Applications, and New Perspectives,” Physical Chemistry Chemical Physics, 7, 3910-3916. DOI: .
[15]	Falcioni, M., and Deem, M. W. (1999), “A Biased Monte Carlo Scheme for Zeolite Structure Solution,”The Journal of Chemical Physics, 110, 1754-1766. DOI: .
[16]	Gao, Y., Huang, J., Jiao, Y., Liu, J., Lu, X., and Yang, Z. (2022), “Deep Generative Learning via Euler Particle Transport,” in Mathematical and Scientific Machine Learning, pp. 336-368.
[17]	Gao, Y., Jiao, Y., Wang, Y., Wang, Y., Yang, C., and Zhang, S. (2019), “Deep Generative Learning via Variational Gradient Flow,” in International Conference on Machine Learning, , pp. 2093-2101. PMLR.
[18]	Ge, R., Lee, H., and Risteski, A. (2018), “Simulated Tempering Langevin Monte Carlo II: An Improved Proof using Soft Markov Chain Decomposition,” arXiv preprint arXiv:1812.00793.
[19]	Gelman, A., Carlin, J., Stern, H., Dunson, D., Vehtari, A., and Rubin, D. (2014), Bayesian Data Analysis, Boca Raton, FL: Chapman & Hall/CRC. · Zbl 1279.62004
[20]	Geyer, C. J. (1991), “Markov Chain Monte Carlo Maximum Likelihood,”in 23rd Symposium on the Interface, .
[21]	Geyer, C. J., and Thompson, E. A. (1995), “Annealing Markov Chain Monte Carlo with Applications to Ancestral Inference,”Journal of the American Statistical Association, 90, 909-920. DOI: . · Zbl 0850.62834
[22]	Gilks, W., Richardson, S., and Spiegelhalter, D. (1995), Markov Chain Monte Carlo in Practice, Boca Raton, FL: Chapman & Hall/CRC.
[23]	Goodfellow, I., Bengio, Y., and Courville, A. (2016), Deep Learning, Cambridge, MA: MIT Press. · Zbl 1373.68009
[24]	Gretton, A., Borgwardt, K., Rasch, M., Schölkopf, B., and Smola, A. (2006), “A Kernel Method for the Two-Sample-Problem,”in Advances in Neural Information Processing Systems (Vol. 19).
[25]	Hastings, W. K. (1970), “Monte Carlo Sampling Methods using Markov Chains and Their Applications,”Biometrika, 57, 97-109. DOI: . · Zbl 0219.65008
[26]	Hoffman, M., Sountsov, P., Dillon, J. V., Langmore, I., Tran, D., and Vasudevan, S. (2019), “NeuTra-Lizing Bad Geometry in Hamiltonian Monte Carlo Using Neural Transport,” arXiv preprint, arXiv:1903.03704.
[27]	Huang, J., Jiao, Y., Kang, L., Liao, X., Liu, J., and Liu, Y. (2021), “Schrödinger-föllmer Sampler: Sampling Without Ergodicity,” arXiv preprint arXiv:2106.10880.
[28]	Kingma, D. P., and Ba, J. (2015), “Adam: A Method for Stochastic Optimization,” in International Conference on Learning Representations, pp, . 1-13.
[29]	Kingma, D. P., Salimans, T., Jozefowicz, R., Chen, X., Sutskever, I., and Welling, M. (2016), “Improved Variational Inference with Inverse Autoregressive Flow,” inAdvances in Neural Information Processing Systems (Vol. 29).
[30]	Kirkpatrick, S., Gelatt, C. D., and Vecchi, M. P. (1983), “Optimization by Simulated Annealing,”Science, 220, 671-680. DOI: . · Zbl 1225.90162
[31]	Kofke, D. A. (2002), “On the Acceptance Probability of Replica-Exchange Monte Carlo Trials,”The Journal of Chemical Physics, 117, 6911-6914. DOI: .
[32]	Kone, A., and Kofke, D. A. (2005), “Selection of Temperature Intervals for Parallel-Tempering Simulations,”The Journal of Chemical Physics, 122, 206101. DOI: .
[33]	Levine, R. A., and Casella, G. (2001), “Implementations of the Monte Carlo EM Algorithm,”Journal of Computational and Graphical Statistics, 10, 422-439. DOI: .
[34]	Ley, C., and Swan, Y. (2013), “Stein’s Density Approach and Information Inequalities,” Electronic Communications in Probability, 18,1-14. DOI: . · Zbl 1307.60009
[35]	Liu, Q., Xu, J., Jiang, R., and Wong, W. H. (2021), “Density Estimation Using Deep Generative Neural Networks,”Proceedings of the National Academy of Sciences, 118, e2101344118. DOI: .
[36]	Liu, Z., Luo, P., Wang, X., and Tang, X. (2015), “Deep Learning Face Attributes in the Wild,” in Proceedings of the IEEE International Conference on Computer Vision, , pp. 3730-3738.
[37]	Marinari, E., and Parisi, G. (1992), “Simulated Tempering: A New Monte Carlo Scheme,”Europhysics Letters, 19, 451. DOI: .
[38]	Marzouk, Y., Moselhy, T., Parno, M., and Spantini, A. (2016), “Sampling via Measure Transport: An Introduction,” in Handbook of Uncertainty Quantification, eds. Ghanem, R., Higdon, D., and Owhadi, H., pp. 785-825, Cham: Springer.
[39]	Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., and Teller, E. (1953), “Equation of State Calculations by Fast Computing Machines,”The Journal of Chemical Physics, 21,1087-1092. DOI: . · Zbl 1431.65006
[40]	Neal, R. M. (1996), “Sampling from Multimodal Distributions using Tempered Transitions,”Statistics and Computing, 6, 353-366. DOI: .
[41]	Nelsen, R. B. (2006). An Introduction to Copulas, New York: Springer. · Zbl 1152.62030
[42]	Owen, A. B. (2013). Monte Carlo Theory, Methods and Examples. Available at https://artowen.su.domains/mc/
[43]	Pang, B., Han, T., Nijkamp, E., Zhu, S.-C., and Wu, Y. N. (2020), “Learning Latent Space Energy-based Prior Model,” inAdvances in Neural Information Processing Systems (Vol. 33).
[44]	Papamakarios, G., Pavlakou, T., and Murray, I. (2017), “Masked Autoregressive Flow for Density Estimation,”in Advances in Neural Information Processing Systems (Vol. 30).
[45]	Qiu, Y., and Wang, X. (2021), “ALMOND: Adaptive Latent Modeling and Optimization via Neural Networks and Langevin Diffusion,”Journal of the American Statistical Association, 116, 1224-1236. DOI: . · Zbl 1512.62087
[46]	Raginsky, M., Rakhlin, A., and Telgarsky, M. (2017), “Non-Convex Learning via Stochastic Gradient Langevin Dynamics: A Nonasymptotic Analysis” in Conference on Learning Theory, pp. 1674-1703, PMLR.
[47]	Rezende, D., and Mohamed, S. (2015), “Variational Inference with Normalizing Flows,” in International Conference on Machine Learning, , pp. 1530-1538. PMLR.
[48]	Robbins, H., and Monro, S. (1951), “A Stochastic Approximation Method,”The Annals of Mathematical Statistics, 22, 400-407. DOI: . · Zbl 0054.05901
[49]	Romano, Y., Sesia, M., and Candès, E. (2020), “Deep Knockoffs,”Journal of the American Statistical Association, 115, 1861-1872. DOI: . · Zbl 1452.62710
[50]	Salakhutdinov, R. (2015), “Learning Deep Generative Models,”Annual Review of Statistics and Its Application, 2, 361-385. DOI: .
[51]	Santambrogio, F. (2017), “\(\{\) Euclidean, Metric, and Wasserstein \(\}\) Gradient Flows: An Overview,”Bulletin of Mathematical Sciences, 7,87-154. · Zbl 1369.34084
[52]	Sullivan, T. J. (2015). Introduction to Uncertainty Quantification, Cham: Springer. · Zbl 1336.60002
[53]	Sun, Y., Song, Q., and Liang, F. (2021), “Consistent Sparse Deep Learning: Theory and Computation,”Journal of the American Statistical Association, accepted. DOI: .
[54]	Swendsen, R. H., and Wang, J.-S. (1986), “Replica Monte Carlo Simulation of Spin-Glasses,” Physical Review Letters, 57, 2607. DOI: .
[55]	Tabak, E. G., and Turner, C. V. (2013), “A Family of Nonparametric Density Estimation Algorithms,”Communications on Pure and Applied Mathematics, 66, 145-164. DOI: . · Zbl 06132669
[56]	Tabak, E. G., and Vanden-Eijnden, E. (2010), “Density Estimation by Dual Ascent of the Log-Likelihood,”Communications in Mathematical Sciences, 8, 217-233. DOI: . · Zbl 1189.62063
[57]	Vempala, S., and Wibisono, A. (2019), “Rapid Convergence of the Unadjusted Langevin Algorithm: Isoperimetry Suffices,” inAdvances in Neural Information Processing Systems (Vol. 2).
[58]	Villani, C. (2009). Optimal Transport: Old and New, Berlin: Springer. · Zbl 1156.53003
[59]	Vousden, W., Farr, W. M., and Mandel, I. (2016), “Dynamic Temperature Selection for Parallel Tempering in Markov Chain Monte Carlo Simulations,”Monthly Notices of the Royal Astronomical Society, 455, 1919-1937. DOI: .
[60]	Wei, G. C., and Tanner, M. A. (1990), “A Monte Carlo Implementation of the EM Algorithm and the Poor Man’s Data Augmentation Algorithms,” Journal of the American Statistical Association, 85, 699-704. DOI: .
[61]	Woodard, D. B., Schmidler, S. C., and Huber, M. (2009), “Conditions for Rapid Mixing of Parallel and Simulated Tempering on Multimodal Distributions,”The Annals of Applied Probability, 19, 617-640. DOI: . · Zbl 1171.65008
[62]	Xiao, H., Rasul, K., and Vollgraf, R. (2017), “Fashion-MNIST: A Novel Image Dataset for Benchmarking Machine Learning Algorithms,” arXiv preprint arXiv:1708.07747.
[63]	Yuan, Y., Deng, Y., Zhang, Y., and Qu, A. (2020), “Deep Learning from a Statistical Perspective,”Stat, 9, e294. DOI: . · Zbl 07851180
[64]	Zheng, Z. (2003), “On Swapping and Simulated Tempering Algorithms,”Stochastic Processes and their Applications, 104, 131-154. DOI: . · Zbl 1075.60545
[65]	Zhou, X., Jiao, Y., Liu, J., and Huang, J. (2021), “A Deep Generative Approach to Conditional Sampling,”Journal of the American Statistical Association, accepted. DOI: .

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.