Riemannian Stochastic Variance-Reduced Cubic Regularized Newton Method for Submanifold Optimization

425 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

We propose a stochastic variance-reduced cubic regularized Newton algorithm to optimize the finite-sum problem over a Riemannian submanifold of the Euclidean space. The proposed algorithm requires a full gradient and Hessian update at the beginning of each epoch while it performs stochastic variance-reduced updates in the iterations within each epoch. The iteration complexity of $O(\epsilon ^{-3/2})$ to obtain an $(\epsilon ,\sqrt{\epsilon })$-second-order stationary point, i.e., a point with the Riemannian gradient norm upper bounded by $\epsilon $ and minimum eigenvalue of Riemannian Hessian lower bounded by $-\sqrt{\epsilon }$, is established when the manifold is embedded in the Euclidean space. Furthermore, the paper proposes a computationally more appealing modification of the algorithm which only requires an inexact solution of the cubic regularized Newton subproblem with the same iteration complexity. The proposed algorithm is evaluated and compared with three other Riemannian second-order methods over two numerical studies on estimating the inverse scale matrix of the multivariate t-distribution on the manifold of symmetric positive definite matrices and estimating the parameter of a linear classifier on the sphere manifold.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Primal-dual optimization algorithms over Riemannian manifolds: an iteration complexity analysis

Article 10 August 2019

Faster Riemannian Newton-type optimization by subsampling and cubic regularization

Article Open access 02 May 2023

Riemannian stochastic fixed point optimization algorithm

Article 08 February 2022

Data availability statement

The data generated and analyzed in the current study can be entirely reproduced by running our code which is publicly available on GitHub repository at https://github.com/samdavanloo/R-SVRC.

References

Absil, P.A., Baker, C.G., Gallivan, K.A.: Trust-region methods on Riemannian manifolds with applications in numerical linear algebra. In: Proceedings of the 16th International Symposium on Mathematical Theory of Networks and Systems (MTNS2004), Leuven, Belgium, pp. 5–9 (2004)
Absil, P.A., Baker, C.G., Gallivan, K.A.: Trust-region methods on Riemannian manifolds. Found. Comput. Math. 7(3), 303–330 (2007)
MathSciNet MATH Google Scholar
Absil, P.A., Hosseini, S.: A collection of nonsmooth Riemannian optimization problems. In: Nonsmooth Optimization and Its Applications, pp. 1–15. Springer (2019)
Absil, P.A., Mahony, R., Sepulchre, R.: Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton (2009)
MATH Google Scholar
Agarwal, N., Boumal, N., Bullins, B., Cartis, C.: Adaptive regularization with cubics on manifolds. Mathematical Programming pp. 1–50 (2020)
Arjovsky, M., Shah, A., Bengio, Y.: Unitary evolution recurrent neural networks. In: International Conference on Machine Learning, pp. 1120–1128 (2016)
Baker, C.G., Absil, P.A., Gallivan, K.A.: An implicit trust-region method on Riemannian manifolds. IMA J. Numer. Anal. 28(4), 665–689 (2008)
MathSciNet MATH Google Scholar
Bansal, N., Chen, X., Wang, Z.: Can we gain more from orthogonality regularizations in training deep networks? Adv. Neural Inf. Process. Syst. 31 (2018). https://doi.org/10.48550/arXiv.1810.09102
Bento, G., Ferreira, O., Oliveira, P.: Proximal point method for a special class of nonconvex functions on hadamard manifolds. Optimization 64(2), 289–319 (2015)
MathSciNet MATH Google Scholar
Bento, G.C., Ferreira, O.P., Melo, J.G.: Iteration-complexity of gradient, subgradient and proximal point methods on Riemannian manifolds. J. Optim. Theory Appl. 173(2), 548–562 (2017)
MathSciNet MATH Google Scholar
Bhatia, R.: Positive Definite Matrices. Princeton University Press, Princeton (2009)
MATH Google Scholar
Bonnabel, S.: Stochastic gradient descent on Riemannian manifolds. IEEE Trans. Autom. Control 58(9), 2217–2229 (2013)
MathSciNet MATH Google Scholar
Boumal, N.: Riemannian trust regions with finite-difference Hessian approximations are globally convergent. In: International Conference on Geometric Science of Information, pp. 467–475. Springer (2015)
Boumal, N.: An introduction to optimization on smooth manifolds. Available online (2020)
Boumal, N., Absil, P.A., Cartis, C.: Global rates of convergence for nonconvex optimization on manifolds. IMA J. Numer. Anal. 39(1), 1–33 (2019)
MathSciNet MATH Google Scholar
Boumal, N., Mishra, B., Absil, P.A., Sepulchre, R.: Manopt, a Matlab toolbox for optimization on manifolds. J. Mach. Learn. Res. 15(42), 1455–1459 (2014). (https://www.manopt.org)
MATH Google Scholar
Carmon, Y., Duchi, J.: Gradient descent finds the cubic-regularized nonconvex Newton step. SIAM J. Optim. 29(3), 2146–2178 (2019)
MathSciNet MATH Google Scholar
Cartis, C., Gould, N.I., Toint, P.L.: Adaptive cubic regularisation methods for unconstrained optimization. part I: motivation, convergence and numerical results. Math. Program. 127(2), 245–295 (2011)
MathSciNet MATH Google Scholar
Cartis, C., Gould, N.I., Toint, P.L.: Adaptive cubic regularisation methods for unconstrained optimization. part ii: worst-case function-and derivative-evaluation complexity. Math. Program. 130(2), 295–319 (2011)
MathSciNet MATH Google Scholar
Cartis, C., Gould, N.I., Toint, P.L.: Complexity bounds for second-order optimality in unconstrained optimization. J. Complex. 28(1), 93–108 (2012)
MathSciNet MATH Google Scholar
Cartis, C., Gould, N.I., Toint, P.L.: On the complexity of finding first-order critical points in constrained nonlinear optimization. Math. Program. 144(1), 93–106 (2014)
MathSciNet MATH Google Scholar
Chavel, I.: Riemannian Geometry: A Modern Introduction, vol. 98. Cambridge University Press, Cambridge (2006)
MATH Google Scholar
Chen, S., Ma, S., Man-Cho So, A., Zhang, T.: Proximal gradient method for nonsmooth optimization over the stiefel manifold. SIAM J. Optim. 30(1), 210–239 (2020)
MathSciNet MATH Google Scholar
Cogswell, M., Ahmed, F., Girshick, R., Zitnick, L., Batra, D.: Reducing overfitting in deep networks by decorrelating representations. (2015) arXiv preprint arXiv:1511.06068
Criscitiello, C., Boumal, N.: Efficiently escaping saddle points on manifolds. In: Advances in Neural Information Processing Systems, pp. 5987–5997 (2019)
de Carvalho Bento, G., da Cruz Neto, J.X., Oliveira, P.R.: A new approach to the proximal point method: convergence on general Riemannian manifolds. J. Optim. Theory Appl. 168(3), 743–755 (2016)
da Cruz Neto, J., De Lima, L., Oliveira, P.: Geodesic algorithms in Riemannian geometry. Balkan J. Geom. Appl. 3(2), 89–100 (1998)
Domino, K.: Selected methods for non-Gaussian data analysis. (2018) arXiv preprint arXiv:1811.10486
Durrett, R.: Probability: Theory and Examples, vol. 49. Cambridge University Press, Cambridge (2019)
MATH Google Scholar
Ferreira, O., Oliveira, P.: Subgradient algorithm on Riemannian manifolds. J. Optim. Theory Appl. 97(1), 93–104 (1998)
MathSciNet MATH Google Scholar
Ferreira, O., Oliveira, P.: Proximal point algorithm on Riemannian manifolds. Optimization 51(2), 257–270 (2002)
MathSciNet MATH Google Scholar
Ferreira, O.P., Louzeiro, M.S., Prudente, L.: Gradient method for optimization on Riemannian manifolds with lower bounded curvature. SIAM J. Optim. 29(4), 2517–2541 (2019)
MathSciNet MATH Google Scholar
Gabay, D.: Minimizing a differentiable function over a differential manifold. J. Optim. Theory Appl. 37(2), 177–219 (1982)
MathSciNet MATH Google Scholar
Hosseini, R., Sra, S.: Recent advances in stochastic Riemannian optimization. In: Handbook of Variational Methods for Nonlinear Geometric Data, pp. 527–554 (2020)
Hu, J., Liu, X., Wen, Z.W., Yuan, Y.X.: A brief introduction to manifold optimization. J. Oper. Res. Soc. China 8(2), 199–248 (2020)
MathSciNet MATH Google Scholar
Hu, J., Milzarek, A., Wen, Z., Yuan, Y.: Adaptive quadratically regularized Newton method for Riemannian optimization. SIAM J. Matrix Anal. Appl. 39(3), 1181–1207 (2018)
MathSciNet MATH Google Scholar
Huang, L., Liu, X., Lang, B., Yu, A.W., Wang, Y., Li, B.: Orthogonal weight normalization: Solution to optimization over multiple dependent Stiefel manifolds in deep neural networks. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Huang, W., Wei, K.: Riemannian proximal gradient methods. Mathematical Programming, pp. 1–43 (2021)
Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. Adv. Neural. Inf. Process. Syst. 26, 315–323 (2013)
Google Scholar
Kasai, H., Mishra, B.: Inexact trust-region algorithms on riemannian manifolds. Adv. Neural Inf. Process. Syst. 31 (2018)
Kasai, H., Sato, H., Mishra, B.: Riemannian stochastic quasi-Newton algorithm with variance reduction and its convergence analysis. In: International Conference on Artificial Intelligence and Statistics, pp. 269–278 (2018)
Kasai, H., Sato, H., Mishra, B.: Riemannian stochastic recursive gradient algorithm. In: International Conference on Machine Learning, pp. 2516–2524 (2018)
Kotz, S., Nadarajah, S.: Multivariate t-Distributions and their Applications. Cambridge University Press, Cambridge (2004)
MATH Google Scholar
Kovalev, D., Mishchenko, K., Richtárik, P.: Stochastic Newton and cubic Newton methods with simple local linear-quadratic rates. (2019) arXiv preprint arXiv:1912.01597
Krzanowski, W.: Principles of Multivariate Analysis, vol. 23. OUP Oxford (2000)
Lee, J.M.: Introduction to Riemannian Manifolds, vol. 176. Springer, Berlin (2018)
MATH Google Scholar
Levin, E., Kileel, J., Boumal, N.: The effect of smooth parametrizations on nonconvex optimization landscapes (2022). arXiv preprint arXiv:2207.03512
Li, F., Yang, Y.: A loss function analysis for classification methods in text categorization. In: Proceedings of the 20th international conference on machine learning (ICML-03), pp. 472–479 (2003)
Li, J., Fuxin, L., Todorovic, S.: Efficient Riemannian optimization on the Stiefel manifold via the Cayley transform (2020). arXiv preprint arXiv:2002.01113
Li, P., Rangapuram, S.S., Slawski, M.: Methods for sparse and low-rank recovery under simplex constraints. Stat. Sin. 30(2), 557–577 (2020)
MathSciNet MATH Google Scholar
Li, Q., McKenzie, D., Yin, W.: From the simplex to the sphere: faster constrained optimization using the hadamard parametrization (2021). arXiv preprint arXiv:2112.05273
Li, X., Chen, S., Deng, Z., Qu, Q., Zhu, Z., Man-Cho So, A.: Weakly convex optimization over stiefel manifold using Riemannian subgradient-type methods. SIAM J. Optim. 31(3), 1605–1634 (2021)
MathSciNet MATH Google Scholar
Luenberger, D.G.: The gradient projection method along geodesics. Manage. Sci. 18(11), 620–631 (1972)
MathSciNet MATH Google Scholar
Mackey, L., Jordan, M.I., Chen, R.Y., Farrell, B., Tropp, J.A., et al.: Matrix concentration inequalities via the method of exchangeable pairs. Ann. Probab. 42(3), 906–945 (2014)
MathSciNet MATH Google Scholar
de Melo Mendes, B.V., de Souza, R.M.: Measuring financial risks with copulas. Int. Rev. Financ. Anal. 13(1), 27–45 (2004)
Google Scholar
Nesterov, Y., Polyak, B.T.: Cubic regularization of Newton method and its global performance. Math. Program. 108(1), 177–205 (2006)
MathSciNet MATH Google Scholar
Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization (2017). arXiv preprint arXiv:1705.07261
Nocedal, J., Wright, S.: Numerical Optimization. Springer, Berlin (2006)
MATH Google Scholar
Qi, C.: Numerical optimization methods on Riemannian manifolds. Ph.D. thesis, Florida State University (2011)
Ring, W., Wirth, B.: Optimization methods on Riemannian manifolds and their application to shape space. SIAM J. Optim. 22(2), 596–627 (2012)
MathSciNet MATH Google Scholar
Roychowdhury, A.: Accelerated stochastic quasi-Newton optimization on Riemann manifolds (2017). arXiv preprint arXiv:1704.01700
Rudin, W., et al.: Principles of Mathematical Analysis, vol. 3. McGraw-Hill, New York (1964)
MATH Google Scholar
Sato, H.: Riemannian Optimization and Its Applications. Springer, Berlin (2021)
MATH Google Scholar
Sato, H., Iwai, T.: A new globally convergent Riemannian conjugate gradient method. Optimization 64(4), 1011–1031 (2015)
MathSciNet MATH Google Scholar
Sato, H., Kasai, H., Mishra, B.: Riemannian stochastic variance reduced gradient algorithm with retraction and vector transport. SIAM J. Optim. 29(2), 1444–1472 (2019)
MathSciNet MATH Google Scholar
Smith, S.T.: Geometric Optimization Methods for Adaptive Filtering. Harvard University, Cambridge (1993)
Google Scholar
Smith, S.T.: Optimization Techniques on Riemannian Manifolds. Fields Inst. Commun. 3(3), 113–135 (1994)
MathSciNet MATH Google Scholar
Sra, S., Hosseini, R.: Conic geometric optimization on the manifold of positive definite matrices. SIAM J. Optim. 25(1), 713–739 (2015)
MathSciNet MATH Google Scholar
Sun, Y., Flammarion, N., Fazel, M.: Escaping from saddle points on Riemannian manifolds. In: Advances in Neural Information Processing Systems, pp. 7276–7286 (2019)
Sun, Y., Zheng, L., Deng, W., Wang, S.: Svdnet for pedestrian retrieval. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3800–3808 (2017)
Szegö, G.: Measures of risk. J. Bank. Finance 26(7), 1253–1272 (2002)
Google Scholar
Tripuraneni, N., Flammarion, N., Bach, F., Jordan, M.I.: Averaging stochastic gradient descent on Riemannian manifolds. In: Conference on Learning Theory, pp. 650–687 (2018)
Udriste, C.: Convex Functions and Optimization Methods on Riemannian Manifolds, vol. 297. Springer, Berlin (2013)
MATH Google Scholar
Wang, B., Ma, S., Xue, L.: Riemannian stochastic proximal gradient methods for nonsmooth optimization over the stiefel manifold. J. Mach. Learn. Res. 23(106), 1–33 (2022)
MathSciNet Google Scholar
Wisdom, S., Powers, T., Hershey, J., Le Roux, J., Atlas, L.: Full-capacity unitary recurrent neural networks. In: Advances in Neural Information Processing Systems, pp. 4880–4888 (2016)
Xie, D., Xiong, J., Pu, S.: All you need is beyond a good init: exploring better solution for training extremely deep convolutional neural networks with orthonormality and modulation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6176–6185 (2017)
Zhang, H., Reddi, S.J., Sra, S.: Riemannian svrg: fast stochastic optimization on Riemannian manifolds. In: Advances in Neural Information Processing Systems, pp. 4592–4600 (2016)
Zhang, H., Sra, S.: First-order methods for geodesically convex optimization. In: Conference on Learning Theory, pp. 1617–1638 (2016)
Zhang, J., Zhang, H., Sra, S.: R-spider: A fast riemannian stochastic optimization algorithm with curvature independent rate (2018). arXiv preprint arXiv:1811.04194
Zhang, J., Zhang, S.: A cubic regularized Newton’s method over Riemannian manifolds (2018). arXiv preprint arXiv:1805.05565
Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288. IEEE (2010)
Zhou, D., Xu, P., Gu, Q.: Stochastic variance-reduced cubic regularized Newton methods. In: International Conference on Machine Learning, pp. 5990–5999 (2018)
Zhou, P., Yuan, X.T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on Riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019)

Download references

Author information

Authors and Affiliations

The Ohio State University, Columbus, OH, 43210, USA
Dewei Zhang & Sam Davanloo Tajbakhsh

Authors

Dewei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Sam Davanloo Tajbakhsh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sam Davanloo Tajbakhsh.

Additional information

Communicated by Defeng Sun.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Proof of Lemma 3.2

Denote the orthogonal projection operator onto $T_{\textbf{x}}\mathcal {M}$, i.e., the tangent space of $\mathcal {M}$ at $\textbf{x}$, by $P_{\textbf{x}}$. Denote the Euclidean gradient and Hessian by $\nabla f(\textbf{x})$ and $\nabla ^2 f(\textbf{x})$ correspondingly. For any $\textbf{y}$, such that $d(\textbf{x},\textbf{y})<\text {inj}(\mathcal {M})$ and any $\xi \in T_{\textbf{y}}\mathcal {M}$, s.t. $\Vert \xi \Vert =1$, we have

$$\begin{aligned} \text {Hess} f(\textbf{y})[\xi ]&=P_{\textbf{y}}(D(\textbf{y}\rightarrow P_{\textbf{y}}\nabla f(\textbf{y}))(\textbf{y})[\xi ])\\&=P_{\textbf{y}}(D(\textbf{y}\rightarrow P_{\textbf{y}})(\textbf{y})[\xi ][\nabla f(\textbf{y})])+P_{\textbf{y}} (\nabla ^2 f(\textbf{y})[\xi ])\\&\equiv A_1+B_1, \end{aligned}$$

The first equality follows from (7), and the second equality comes from the chain rule and the fact that the projection operator is linear. Similarly, we have

$$\begin{aligned} \varGamma _{\textbf{x}}^{\textbf{y}}\text {Hess} f(\textbf{x}) [\varGamma _{\textbf{y}}^{\textbf{x}}\xi ]&= \varGamma _{\textbf{x}}^{\textbf{y}} P_{\textbf{x}}(D(\textbf{x}\rightarrow P_{\textbf{x}}\nabla f(\textbf{x}))(\textbf{x})[\varGamma _{\textbf{y}}^{\textbf{x}}\xi ])\\&=\varGamma ^{\textbf{y}}_{\textbf{x}} P_{\textbf{x}}(D(\textbf{x}\rightarrow P_{\textbf{x}})(\textbf{x})[\varGamma _{\textbf{y}}^{\textbf{x}}\xi ][\nabla f(\textbf{x})])+\varGamma ^{\textbf{y}}_{\textbf{x}}P_{\textbf{x}} (\nabla ^2 f(\textbf{x})[\varGamma _{\textbf{y}}^{\textbf{x}}\xi ])\\&\equiv A_2+B_2. \end{aligned}$$

First, to quantify $\Vert A_1-A_2\Vert $, we have,

$$\begin{aligned} \Vert A_1-A_2\Vert&=\Vert O_{A_1}[\nabla f(\textbf{y})+\nabla f(\textbf{x})-\nabla f(\textbf{x})]-O_{A_2}[\nabla f(\textbf{x})]\Vert \end{aligned}$$

(74)

$$\begin{aligned}&=\Vert O_{A_1}[\nabla f(\textbf{y})-\nabla f(\textbf{x})]+(O_{A_1}-O_{A_2})[\nabla f(\textbf{x})]\Vert \end{aligned}$$

(75)

$$\begin{aligned}&\le \Vert O_{A_1}[\nabla f(\textbf{y})-\nabla f(\textbf{x})]\Vert +\Vert (O_{A_1}-O_{A_2})[\nabla f(\textbf{x})]\Vert \end{aligned}$$

(76)

$$\begin{aligned}&\le \Vert O_{A_1}\Vert _{op}\cdot \Vert \nabla f(\textbf{y})-\nabla f(\textbf{x})\Vert +\Vert (O_{A_1}-O_{A_2})[\nabla f(\textbf{x})]\Vert , \end{aligned}$$

(77)

where $O_{A_1}\triangleq P_{\textbf{y}}(D(\textbf{y}\rightarrow P_{\textbf{y}})(\textbf{y})[\xi ][\cdot ])$ and $O_{A_2}\triangleq \varGamma ^{\textbf{y}}_{\textbf{x}} P_{\textbf{x}}(D(\textbf{x}\rightarrow P_{\textbf{x}})(\textbf{x})[\varGamma _{\textbf{y}}^{\textbf{x}}\xi ][\cdot ])$.

Due to the smoothness and compactness of $\mathcal {M}$ and $\Vert \xi \Vert =1$, $\Vert P_{\textbf{y}}(D(\textbf{y}\rightarrow P_{\textbf{y}})(\textbf{y})[\xi ][\cdot ])\Vert _{op}$ exists and is uniformly upper bounded, i.e., there exists a finite $M_1$ independent of $\textbf{x}$, $\textbf{y}$ and $\xi $, s.t. $\Vert P_{\textbf{y}}(D(\textbf{y}\rightarrow P_{\textbf{y}})(\textbf{y})[\xi ][\cdot ])\Vert _{op}\le M_1$ for any $\textbf{x}$, $\textbf{y}\in \mathcal {M}$ and $\xi $, s.t. $\Vert \xi \Vert =1$.

For any $\textbf{z}$, such that $d(\textbf{z},\textbf{y})<\text {inj}(\mathcal {M})$, define $Q_{\textbf{z},\textbf{y},\xi }\triangleq \varGamma ^{\textbf{y}}_{\textbf{z}} P_{\textbf{z}}(D(\textbf{z}\rightarrow P_{\textbf{z}})(\textbf{z})[\varGamma _{\textbf{y}}^{\textbf{z}}\xi ][\cdot ])$. Note that $O_{A_1}=Q_{\textbf{y},\textbf{y},\xi }$ and $O_{A_2}=Q_{\textbf{x},\textbf{y},\xi }$. For fixed $\tilde{\textbf{x}}$, $\tilde{\textbf{y}}$ and $\tilde{\xi }$, $Q_{\textbf{z},\tilde{\textbf{y}},\tilde{\xi }}[\nabla f(\tilde{\textbf{x}})]$ is a continuously differentiable function of $\textbf{z}$ based on the conditions that the manifold is smooth and $f(\textbf{x})$ has Lipschitz continuous Hessian. Since $\textbf{z}$ belongs to a compact set, $Q_{\textbf{z},\tilde{\textbf{y}},\tilde{\xi }}[\nabla f(\tilde{\textbf{x}})]$ is Lipschitz continuous on $\textbf{z}$, i.e.,

$$\begin{aligned} \Vert Q_{\textbf{x},\tilde{\textbf{y}},\tilde{\xi }}[\nabla f(\tilde{\textbf{x}})]-Q_{\textbf{y},\tilde{\textbf{y}},\tilde{\xi }}[\nabla f(\tilde{\textbf{x}})]\Vert \le M_{\tilde{\textbf{x}},\tilde{\textbf{y}},\tilde{\xi }}\Vert \textbf{x}-\textbf{y}\Vert , \forall \textbf{x},\textbf{y}\in \mathcal {M}, \end{aligned}$$

(78)

where $M_{\tilde{\textbf{x}},\tilde{\textbf{y}},\tilde{\xi }}$ is a finite constant depending on $\tilde{\textbf{x}},\tilde{\textbf{y}},\tilde{\xi }$. Especially, due to the smoothness of manifold and the function $f(\textbf{x})$ has Lipschitz continuous Hessian, we have a continuous map from $\tilde{\textbf{x}},\tilde{\textbf{y}},\tilde{\xi }$ to $M_{\tilde{\textbf{x}},\tilde{\textbf{y}},\tilde{\xi }}$. To argue the existence of such a continuous map, for a fixed x, y, and $\xi $, $Q_{z,y,\xi }[\nabla f(x)]$ is a continuously differentiable function of z. For simplicity of notation, let $q(z;x,y,\xi )\triangleq Q_{z,y,\xi }[\nabla f(x)]$. Furthermore, let $H_z(x,y,\xi )\triangleq D(z\rightarrow q(z;x,y,\xi ))$. Given that parallel transport, projection are smooth maps, and $f\in C^2$, $H_z(x,y,\xi )$ is Lipschitz continuous operator for $x,y,z\in \mathcal {M}$ a compact manifold and $\xi \in \{\xi \in T_z\mathcal {M}: \Vert \xi \Vert =1\}$ which is a compact set. Define

$$\begin{aligned} M(x,y,\xi )=\sup _{z\in \mathcal {M}} \Vert D(z\rightarrow q(z;x,y,\xi ))\Vert _{op}=\sup _{z\in \mathcal {M}} \Vert H_z(x,y,\xi )\Vert _{op}. \end{aligned}$$

We have

$$\begin{aligned} |M(x,y,\xi )-M(x_0,y_0,\xi _0)|&= |\sup _{z\in \mathcal {M}} \Vert H_z(x,y,\xi )\Vert _{op} - \sup _{z\in \mathcal {M}} \Vert H_z(x_0,y_0,\xi _0)\Vert _{op}| \\&\le \sup _{z\in \mathcal {M}} |\Vert H_z(x,y,\xi )\Vert _{op} - \Vert H_z(x_0,y_0,\xi _0)\Vert _{op}| \\&\le \Vert H_z(x,y,\xi )-H_z(x_0,y_0,\xi _0)\Vert _{op} \\&\le \sup _{z\in \mathcal {M}} L(z) \left( d(x,x_0)+d(y,y_0)+\Vert \xi -\varGamma _{y_0}^y\xi _0\Vert \right) , \end{aligned}$$

where the first inequality follows from an infinite-dimensional extension of triangle inequality for $\ell _\infty $-norm, the second inequality from the regular triangle inequality, and the third inequality from Lipschitz continuity of H. Taking $\bar{L}=\sup _{z\in \mathcal {M}} L(z)$ which is attainable given the compactness of $\mathcal {M}$, continuity of $M(x,y,\xi )$ is established.

Now, since $\tilde{\textbf{x}},\tilde{\textbf{y}}\in \mathcal {M}$, which is a compact set and $\Vert \tilde{\xi }\Vert =1$, we have a finite constant $M_2$, s.t. $M_{\tilde{\textbf{x}},\tilde{\textbf{y}},\tilde{\xi }}\le M_2$ for all ${\tilde{\textbf{x}},\tilde{\textbf{y}},\tilde{\xi }}$. In (78), letting $\textbf{x}=\tilde{\textbf{x}}$, $\textbf{y}=\tilde{\textbf{y}}$, we have,

$$\begin{aligned} \Vert Q_{\tilde{\textbf{x}},\tilde{\textbf{y}},\tilde{\xi }}[\nabla f(\tilde{\textbf{x}})]-Q_{\tilde{\textbf{y}},\tilde{\textbf{y}},\tilde{\xi }}[\nabla f(\tilde{\textbf{x}})]\Vert \le M_{\tilde{\textbf{x}},\tilde{\textbf{y}},\tilde{\xi }}\Vert \tilde{\textbf{x}}-\tilde{\textbf{y}}\Vert \le M_2\Vert \tilde{\textbf{x}}-\tilde{\textbf{y}}\Vert . \end{aligned}$$

(79)

Due to the arbitrariness of $\tilde{\textbf{x}}$, $\tilde{\textbf{y}}$ and $\tilde{\xi }$, we conclude the second term in (77), $\Vert (O_{A_1}-O_{A_2})[\nabla f(\textbf{x})]\Vert \le M_2\Vert \textbf{x}-\textbf{y}\Vert $.

On the other hand, the gradient of a twice continuously differentiable function on a compact manifold is Lipschitz continuous. Therefore, there exists a finite $L_1$, s.t.

$$\begin{aligned} \Vert A_1-A_2\Vert&\le M_1\Vert \nabla f(\textbf{y})-\nabla f(\textbf{x})\Vert +M_2\Vert \textbf{x}-\textbf{y}\Vert \le (M_1\cdot L_1+M_2) \Vert \textbf{y}-\textbf{x}\Vert \nonumber \\&\le (M_1\cdot L_1+M_2)d(\textbf{x},\textbf{y}), \end{aligned}$$

(80)

where $d(\textbf{x},\textbf{y})$ is the Riemannian distance between $\textbf{x}$ and $\textbf{y}$. The third inequality holds since the manifold is embedded in the Euclidean space.

Second, to quantify $\Vert B_1-B_2\Vert $, we define

$$\begin{aligned} R_{\textbf{y},\xi }(\textbf{z})\triangleq \varGamma ^{\textbf{y}}_{\textbf{z}}P_{\textbf{z}} (\nabla ^2 f(\textbf{z})[\varGamma _{\textbf{y}}^{\textbf{z}}\xi ]). \end{aligned}$$

(81)

Fixing $\textbf{y}$,$\xi $ to be $\tilde{\textbf{y}}$ and $\tilde{\xi }$, $R_{\tilde{\textbf{y}},\tilde{\xi }}(\textbf{z})$ is Lipschitz continuous on $\textbf{z}$ due to the smoothness of the manifold and $\nabla ^2 f(\textbf{z})$ is Lipschitz continuous. Therefore, there exists a constant $N_{\tilde{\textbf{y}},\tilde{\xi }}$ depending on $\tilde{\textbf{y}}$ and $\tilde{\xi }$, s.t. $\Vert R_{\tilde{\textbf{y}},\tilde{\xi }}(\textbf{x})-R_{\tilde{\textbf{y}},\tilde{\xi }}(\textbf{y})\Vert \le N_{\tilde{\textbf{y}},\tilde{\xi }}\Vert \textbf{x}-\textbf{y}\Vert $ for all $\textbf{x}$, $\textbf{y}\in \mathcal {M}$. Especially, there is a continuous mapping from $\tilde{\textbf{y}}$, $\tilde{\xi }$ to $N_{\tilde{\textbf{y}},\tilde{\xi }}$. Since $\tilde{\textbf{y}}$, $\tilde{\xi }$ are from compact sets, there exists a finite constant $M_3$, s.t. $\Vert R_{\tilde{\textbf{y}},\tilde{\xi }}(\textbf{x})-R_{\tilde{\textbf{y}},\tilde{\xi }}(\textbf{y})\Vert \le N_{\tilde{\textbf{y}},\tilde{\xi }}\Vert \textbf{x}-\textbf{y}\Vert \le M_3 \Vert \textbf{x}-\textbf{y}\Vert $ for all $\textbf{x}$, $\textbf{y}\in \mathcal {M}$.

Letting $\textbf{x}=\tilde{\textbf{x}}$, $\textbf{y}=\tilde{\textbf{y}}$, and due to the arbitrariness of $\tilde{\textbf{x}}$, $\tilde{\textbf{y}}$ and $\tilde{\xi }$, we have

$$\begin{aligned} \Vert B_1-B_2\Vert =\Vert R_{\textbf{y},\xi }(\textbf{y})-R_{\textbf{y},\xi }(\textbf{x})\Vert \le M_3 \Vert \textbf{x}-\textbf{y}\Vert \le M_3\cdot d(\textbf{x},\textbf{y}). \end{aligned}$$

(82)

Combining (80), (82), there exists a finite $L\triangleq M_1\cdot L_1+M_2+M_3$, s.t.

$$\begin{aligned} \Vert \text {Hess} f(\textbf{y})[\xi ]-\varGamma _{\textbf{x}}^{\textbf{y}}\text {Hess} f(\textbf{x}) [\varGamma _{\textbf{y}}^{\textbf{x}}\xi ]\Vert \le \Vert A_1-A_2\Vert +\Vert B_1-B_2\Vert \le L\cdot d(\textbf{x}, \textbf{y}). \end{aligned}$$

Since $\xi $ is an arbitrary tangent vector, we have,

$$\begin{aligned} \Vert \text {Hess} f(\textbf{y})-\varGamma _{\textbf{x}}^{\textbf{y}}\text {Hess} f(\textbf{x}) \varGamma _{\textbf{y}}^{\textbf{x}}\Vert _{op}\le L\cdot d(\textbf{x}, \textbf{y}). \end{aligned}$$

$\square $

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhang, D., Davanloo Tajbakhsh, S. Riemannian Stochastic Variance-Reduced Cubic Regularized Newton Method for Submanifold Optimization. J Optim Theory Appl 196, 324–361 (2023). https://doi.org/10.1007/s10957-022-02137-5

Download citation

Received: 03 December 2021
Accepted: 12 November 2022
Published: 07 December 2022
Issue Date: January 2023
DOI: https://doi.org/10.1007/s10957-022-02137-5

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Primal-dual optimization algorithms over Riemannian manifolds: an iteration complexity analysis

Faster Riemannian Newton-type optimization by subsampling and cubic regularization

Riemannian stochastic fixed point optimization algorithm

Data availability statement

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: Proof of Lemma 3.2

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Riemannian Stochastic Variance-Reduced Cubic Regularized Newton Method for Submanifold Optimization

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Primal-dual optimization algorithms over Riemannian manifolds: an iteration complexity analysis

Faster Riemannian Newton-type optimization by subsampling and cubic regularization

Riemannian stochastic fixed point optimization algorithm

Data availability statement

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: Proof of Lemma 3.2

Appendix: Proof of Lemma 3.2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation