×

Principal feature detection via \(\phi \)-Sobolev inequalities. (English) Zbl 07898707

Summary: We investigate the approximation of high-dimensional target measures as low-dimensional updates of a dominating reference measure. This approximation class replaces the associated density with the composition of: (i) a feature map that identifies the leading principal components or features of the target measure, relative to the reference, and (ii) a low-dimensional profile function. When the reference measure satisfies a subspace \(\phi \)-Sobolev inequality, we construct a computationally tractable approximation that yields certifiable error guarantees with respect to the Amari \(\alpha \)-divergences. Our construction proceeds in two stages. First, for any feature map and any \(\alpha \)-divergence, we obtain an analytical expression for the optimal profile function. Second, for linear feature maps, the principal features are obtained from eigenvectors of a matrix involving gradients of the log-density. Neither step requires explicit access to normalizing constants. Notably, by leveraging the \(\phi \)-Sobolev inequalities, we demonstrate that these features universally certify approximation errors across the range of \(\alpha \)-divergences \(\alpha \in(0, 1]\). We then propose an application to Bayesian inverse problems and provide an analogous construction with approximation guarantees that hold in expectation over the data. We conclude with an extension of the proposed dimension reduction strategy to nonlinear feature maps.

MSC:

68T10 Pattern recognition, speech recognition
68T09 Computational aspects of data analysis and big data
94A17 Measures of information, entropy
62H25 Factor analysis and principal components; correspondence analysis
62F15 Bayesian inference
26D10 Inequalities involving derivatives and differential and integral operators

References:

[1] Amari, S.-I. (2009). \(α\)-divergence is unique, belonging to both \(f\)-divergence and Bregman divergence classes. IEEE Trans. Inf. Theory 55 4925-4931. 10.1109/TIT.2009.2030485 MathSciNet: MR2596950 · Zbl 1367.94133
[2] Andrieu, C., Lee, A.W.L., Power, S. and Wang, A. (2022). Explicit convergence bounds for Metropolis Markov chains: Isoperimetry, spectral gaps and profiles. Available at arXiv:2211.08959.
[3] Bakry, D., Gentil, I. and Ledoux, M. (2014). Analysis and Geometry of Markov Diffusion Operators. Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences] 348. Cham: Springer. 10.1007/978-3-319-00227-9 MathSciNet: MR3155209 · Zbl 1376.60002
[4] Bakry, D. and Ledoux, M. (2006). A logarithmic Sobolev form of the Li-Yau parabolic inequality. Rev. Mat. Iberoam. 22 683-702. 10.4171/RMI/470 MathSciNet: MR2294794 · Zbl 1116.58024
[5] Banerjee, A., Guo, X. and Wang, H. (2005). On the optimality of conditional expectation as a Bregman predictor. IEEE Trans. Inf. Theory 51 2664-2669. 10.1109/TIT.2005.850145 MathSciNet: MR2246384 · Zbl 1284.94025
[6] Baptista, R., Marzouk, Y. and Zahm, O. (2022). Gradient-based data and parameter dimension reduction for Bayesian models: An information theoretic perspective. Available at arXiv:2207.08670.
[7] Beckner, W. (1989). A generalized Poincaré inequality for Gaussian measures. Proc. Amer. Math. Soc. 105 397-400. 10.2307/2046956 MathSciNet: MR0954373 · Zbl 0677.42020
[8] Beskos, A., Girolami, M., Lan, S., Farrell, P.E. and Stuart, A.M. (2017). Geometric MCMC for infinite-dimensional inverse problems. J. Comput. Phys. 335 327-351. 10.1016/j.jcp.2016.12.041 MathSciNet: MR3612501 · Zbl 1375.35627
[9] Bigoni, D., Marzouk, Y., Prieur, C. and Zahm, O. (2022). Nonlinear dimension reduction for surrogate modeling using gradient information. Inf. Inference 11 1597-1639. 10.1093/imaiai/iaac006 MathSciNet: MR4526330 · Zbl 1505.65155
[10] Bolley, F. and Gentil, I. (2010). Phi-entropy inequalities for diffusion semigroups. J. Math. Pures Appl. 93 449-473. 10.1016/j.matpur.2010.02.004 MathSciNet: MR2609029 · Zbl 1193.47046
[11] Brennan, M., Bigoni, D., Zahm, O., Spantini, A. and Marzouk, Y. (2020). Greedy inference with structure-exploiting lazy maps. In Adv. Neural Inf. Process 33 8330-8342.
[12] Canonne, C.L. (2022). A short note on an inequality between KL and TV. Available at arXiv:2202.07198.
[13] Chafaï, D. (2004). Entropies, convexity, and functional inequalities: On Φ-entropies and Φ-Sobolev inequalities. J. Math. Kyoto Univ. 44 325-363. 10.1215/kjm/1250283556 MathSciNet: MR2081075 · Zbl 1079.26009
[14] Chen, P. and Ghattas, O. (2020). Projected Stein variational gradient descent. In Adv. Neural Inf. Process. Syst. 33 1947-1958.
[15] Chewi, S., Erdogdu, M.A., Li, M., Shen, R. and Zhang, S. (2022). Analysis of Langevin Monte Carlo from Poincaré to Log-Sobolev. In Proceedings of Thirty Fifth Conference on Learning Theory (P.-L. Loh and M. Raginsky, eds.). PMLR 178 1-2. Proc. Mach. Learn. Res.
[16] Constantine, P.G. (2015). Active Subspaces: Emerging Ideas for Dimension Reduction in Parameter Studies. SIAM Spotlights 2. Philadelphia, PA: SIAM. 10.1137/1.9781611973860 MathSciNet: MR3486165 · Zbl 1431.65001
[17] Constantine, P.G. and Diaz, P. (2017). Global sensitivity metrics from active subspaces. Reliab. Eng. Syst. Saf. 162 1-13.
[18] Constantine, P.G., Kent, C. and Bui-Thanh, T. (2016). Accelerating Markov chain Monte Carlo with active subspaces. SIAM J. Sci. Comput. 38 A2779-A2805. 10.1137/15M1042127 MathSciNet: MR3543164 zbMATH: 1348.65010 · Zbl 1348.65010
[19] Cotter, S.L., Roberts, G.O., Stuart, A.M. and White, D. (2013). MCMC methods for functions: Modifying old algorithms to make them faster. Statist. Sci. 28 424-446. 10.1214/13-STS421 MathSciNet: MR3135540 · Zbl 1331.62132
[20] Cui, T., Dolgov, S. and Zahm, O. (2023). Scalable conditional deep inverse Rosenblatt transports using tensor trains and gradient-based dimension reduction. J. Comput. Phys. 485 Paper No. 112103, 31. 10.1016/j.jcp.2023.112103 MathSciNet: MR4574525 · Zbl 07690210
[21] Cui, T., Law, K.J.H. and Marzouk, Y.M. (2016). Dimension-independent likelihood-informed MCMC. J. Comput. Phys. 304 109-137. 10.1016/j.jcp.2015.10.008 MathSciNet: MR3422405 · Zbl 1349.65009
[22] Cui, T., Martin, J., Marzouk, Y.M., Solonen, A. and Spantini, A. (2014). Likelihood-informed dimension reduction for nonlinear inverse problems. Inverse Probl. 30 114015, 28. 10.1088/0266-5611/30/11/114015 MathSciNet: MR3274599 · Zbl 1310.62030
[23] Cui, T. and Tong, X.T. (2022). A unified performance analysis of likelihood-informed subspace methods. Bernoulli 28 2788-2815. 10.3150/21-bej1437 MathSciNet: MR4474562 · Zbl 1501.65002
[24] Cui, T., Tong, X.T. and Zahm, O. (2022). Prior normalization for certified likelihood-informed subspace detection of Bayesian inverse problems. Inverse Probl. 38 Paper No. 124002, 36. 10.1088/1361-6420/ac9582 MathSciNet: MR4500900 · Zbl 07611864
[25] Cui, T. and Zahm, O. (2021). Data-free likelihood-informed dimension reduction of Bayesian inverse problems. Inverse Probl. 37 Paper No. 045009, 41. 10.1088/1361-6420/abeafb MathSciNet: MR4234453 · Zbl 1464.62239
[26] Del Moral, P., Doucet, A. and Jasra, A. (2006). Sequential Monte Carlo samplers. J. R. Stat. Soc. Ser. B. Stat. Methodol. 68 411-436. 10.1111/j.1467-9868.2006.00553.x zbMATH: 1105.62034 MathSciNet: MR2278333 · Zbl 1105.62034
[27] Ehre, M., Flock, R., Fußeder, M., Papaioannou, I. and Straub, D. (2023). Certified dimension reduction for Bayesian updating with the cross-entropy method. SIAM/ASA J. Uncertain. Quantificat. 11 358-388. 10.1137/22M1484031 MathSciNet: MR4558760 · Zbl 1515.62042
[28] Fan, K. (1949). On a theorem of Weyl concerning eigenvalues of linear transformations. I. Proc. Natl. Acad. Sci. USA 35 652-655. 10.1073/pnas.35.11.652 MathSciNet: MR0034519 · Zbl 0041.00602
[29] Gross, L. (1975). Logarithmic Sobolev inequalities. Amer. J. Math. 97 1061-1083. 10.2307/2373688 MathSciNet: MR0420249 · Zbl 0359.46038
[30] Guillin, A., Léonard, C., Wu, L. and Yao, N. (2009). Transportation-information inequalities for Markov processes. Probab. Theory Related Fields 144 669-695. 10.1007/s00440-008-0159-5 MathSciNet: MR2496446 · Zbl 1169.60304
[31] Harremoës, P. and Vajda, I. (2011). On pairs of \(f\)-divergences and their joint range. IEEE Trans. Inf. Theory 57 3230-3235. 10.1109/TIT.2011.2137353 MathSciNet: MR2817015 · Zbl 1365.94145
[32] Kaipio, J. and Somersalo, E. (2005). Statistical and Computational Inverse Problems. Applied Mathematical Sciences 160. New York: Springer. MathSciNet: MR2102218 · Zbl 1068.65022
[33] Kallenberg, O. (1997). Foundations of Modern Probability. Probability and Its Applications (New York). New York: Springer. MathSciNet: MR1464694 · Zbl 0892.60001
[34] Kim, K.-T., Villa, U., Parno, M., Marzouk, Y., Ghattas, O. and Petra, N. (2023). HIPPYlib-MUQ: A Bayesian inference software framework for integration of data with complex predictive models under uncertainty. ACM Trans. Math. Software 49 Art. 17, 31. 10.1145/3580278 MathSciNet: MR4616414 · Zbl 07910032
[35] Kingma, D.P. and Welling, M. (2014). Auto-encoding variational Bayes. In 2nd International Conference on Learning Representations, ICLR.
[36] Laparra, V., Camps-Valls, G. and Malo, J. (2011). Iterative Gaussianization: From ICA to random rotations. IEEE Trans. Neural Netw. 22 537-549. 10.1109/TNN.2011.2106511
[37] Latała, R. and Oleszkiewicz, K. (2000). Between Sobolev and Poincaré. In Geometric Aspects of Functional Analysis. Lecture Notes in Math. 1745 147-168. Berlin: Springer. 10.1007/BFb0107213 MathSciNet: MR1796718 · Zbl 0986.60017
[38] Lee, M.R. (2019). Modified active subspaces using the average of gradients. SIAM/ASA J. Uncertain. Quantificat. 7 53-66. 10.1137/17M1140662 MathSciNet: MR3895346 · Zbl 07003656
[39] Li, M.T.C., Marzouk, Y. and Zahm, O. (2024). Supplement to “Principal feature detection via \(ϕ\)-Sobolev inequalities.” 10.3150/23-BEJ1702SUPP
[40] Liese, F. and Vajda, I. (2006). On divergences and informations in statistics and information theory. IEEE Trans. Inf. Theory 52 4394-4412. 10.1109/TIT.2006.881731 MathSciNet: MR2300826 · Zbl 1287.94025
[41] Liu, S. and Owen, A.B. (2023). Preintegration via active subspace. SIAM J. Numer. Anal. 61 495-514. 10.1137/22M1479129 MathSciNet: MR4558754 · Zbl 1511.65008
[42] Liu, X., Zhu, H., Ton, J.-F., Wynne, G. and Duncan, A. (2022). Grassmann Stein variational gradient descent. In Proceedings of the 25th International Conference on Artificial Intelligence and Statistics. PMLR 2002-2021. Proc. Mach. Learn. Res.
[43] Mangoubi, O. and Vishnoi, N.K. (2018). Dimensionally tight bounds for second-order Hamiltonian Monte Carlo. In Proceedings of the 32nd International Conference on Neural Information Processing Systems. NIPS’18 6030-6040. Red Hook, NY, USA: Curran Associates Inc.
[44] Marzouk, Y., Moselhy, T., Parno, M. and Spantini, A. (2017). Sampling via measure transport: An introduction. In Handbook of Uncertainty Quantification. Vol. 1, 2, 3 785-825. Cham: Springer. MathSciNet: MR3821485
[45] Papamakarios, G., Nalisnick, E., Rezende, D.J., Mohamed, S. and Lakshminarayanan, B. (2021). Normalizing flows for probabilistic modeling and inference. J. Mach. Learn. Res. 22 Paper No. 57, 64. MathSciNet: MR4253750 · Zbl 07370574
[46] Pillai, N.S., Stuart, A.M. and Thiéry, A.H. (2012). Optimal scaling and diffusion limits for the Langevin algorithm in high dimensions. Ann. Appl. Probab. 22 2320-2356. 10.1214/11-AAP828 MathSciNet: MR3024970 · Zbl 1272.60053
[47] Pillaud-Vivien, L., Bach, F., Lelièvre, T., Rudi, A. and Stoltz, G. (2020). Statistical estimation of the Poincaré constant and application to sampling multimodal distributions. In Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics. PMLR 2753-2763. Proc. Mach. Learn. Res.
[48] Polyanskiy, Y. and Wu, Y. (2023). Information Theory: From Coding to Learning. Cambridge University Press. Forthcoming.
[49] Rebeschini, P. and van Handel, R. (2015). Can local particle filters beat the curse of dimensionality? Ann. Appl. Probab. 25 2809-2866. 10.1214/14-AAP1061 MathSciNet: MR3375889 · Zbl 1325.60058
[50] Rezende, D. and Mohamed, S. (2015). Variational inference with normalizing flows. In Proceedings of the 32nd International Conference on Machine Learning (F. Bach and D. Blei, eds.). PMLR 37 1530-1538.
[51] Rezende, D.J., Mohamed, S. and Wierstra, D. (2014). Stochastic backpropagation and approximate inference in deep generative models. In Proceedings of the 31st International Conference on Machine Learning (E.P. Xing and T. Jebara, eds.). PMLR 32 1278-1286.
[52] Roberts, G.O. and Rosenthal, J.S. (1998). Optimal scaling of discrete approximations to Langevin diffusions. J. R. Stat. Soc. Ser. B. Stat. Methodol. 60 255-268. 10.1111/1467-9868.00123 MathSciNet: MR1625691 zbMATH: 0913.60060 · Zbl 0913.60060
[53] Roberts, G.O. and Rosenthal, J.S. (2004). General state space Markov chains and MCMC algorithms. Probab. Surv. 1 20-71. 10.1214/154957804100000024 MathSciNet: MR2095565 · Zbl 1189.60131
[54] Roy, O. and Vetterli, M. (2007). The effective rank: A measure of effective dimensionality. In 2007 15th European Signal Processing Conference 606-610. IEEE.
[55] Rudolf, D. and Sprungk, B. (2018). On a generalization of the preconditioned Crank-Nicolson Metropolis algorithm. Found. Comput. Math. 18 309-343. 10.1007/s10208-016-9340-x MathSciNet: MR3777781 · Zbl 1391.60169
[56] Samarov, A.M. (1993). Exploring regression structure using nonparametric functional estimation. J. Amer. Statist. Assoc. 88 836-847. MathSciNet: MR1242934 · Zbl 0790.62035
[57] Snyder, M., Bengtsson, T., Bickel, P. and Anderson, L. (2008). Obstacles to high-dimensional particle filtering. Mon. Weather Rev. 4629-4640.
[58] Spantini, A., Solonen, A., Cui, T., Martin, J., Tenorio, L. and Marzouk, Y. (2015). Optimal low-rank approximations of Bayesian linear inverse problems. SIAM J. Sci. Comput. 37 A2451-A2487. 10.1137/140977308 MathSciNet: MR3418226 zbMATH: 1325.62060 · Zbl 1325.62060
[59] Stuart, A.M. (2010). Inverse problems: A Bayesian perspective. Acta Numer. 19 451-559. 10.1017/S0962492910000061 MathSciNet: MR2652785 · Zbl 1242.65142
[60] Tabak, E.G. and Turner, C.V. (2013). A family of nonparametric density estimation algorithms. Comm. Pure Appl. Math. 66 145-164. 10.1002/cpa.21423 MathSciNet: MR2999294 · Zbl 06132669
[61] Tong, S. and Stadler, G. (2023). Large deviation theory-based adaptive importance sampling for rare events in high dimensions. SIAM/ASA J. Uncertain. Quantificat. 11 788-813. 10.1137/22M1524758 MathSciNet: MR4612613 · Zbl 1518.65009
[62] Uribe, F., Papaioannou, I., Marzouk, Y.M. and Straub, D. (2021). Cross-entropy-based importance sampling with failure-informed dimension reduction for rare event simulation. SIAM/ASA J. Uncertain. Quantificat. 9 818-847. 10.1137/20M1344585 MathSciNet: MR4274842 · Zbl 1472.62023
[63] Vempala, S.S. and Wibisono, A. (2023). Rapid convergence of the unadjusted Langevin algorithm: Isoperimetry suffices. In Geometric Aspects of Functional Analysis. Lecture Notes in Math. 2327 381-438. Cham: Springer. 10.1007/978-3-031-26300-2_15 MathSciNet: MR4651229
[64] Zahm, O., Constantine, P.G., Prieur, C. and Marzouk, Y.M. (2020). Gradient-based dimension reduction of multivariate vector-valued functions. SIAM J. Sci. Comput. 42 A534-A558. 10.1137/18M1221837 MathSciNet: MR4069329 · Zbl 1433.41007
[65] Zahm, O., Cui, T., Law, K., Spantini, A. and Marzouk, Y. (2022). Certified dimension reduction in nonlinear Bayesian inverse problems. Math. Comp. 91 1789-1835. 10.1090/mcom/3737 MathSciNet: MR4435948 · Zbl 07541892
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.