Skip to main content
Log in

Riemannian Stochastic Variance-Reduced Cubic Regularized Newton Method for Submanifold Optimization

  • Published:
Journal of Optimization Theory and Applications Aims and scope Submit manuscript

Abstract

We propose a stochastic variance-reduced cubic regularized Newton algorithm to optimize the finite-sum problem over a Riemannian submanifold of the Euclidean space. The proposed algorithm requires a full gradient and Hessian update at the beginning of each epoch while it performs stochastic variance-reduced updates in the iterations within each epoch. The iteration complexity of \(O(\epsilon ^{-3/2})\) to obtain an \((\epsilon ,\sqrt{\epsilon })\)-second-order stationary point, i.e., a point with the Riemannian gradient norm upper bounded by \(\epsilon \) and minimum eigenvalue of Riemannian Hessian lower bounded by \(-\sqrt{\epsilon }\), is established when the manifold is embedded in the Euclidean space. Furthermore, the paper proposes a computationally more appealing modification of the algorithm which only requires an inexact solution of the cubic regularized Newton subproblem with the same iteration complexity. The proposed algorithm is evaluated and compared with three other Riemannian second-order methods over two numerical studies on estimating the inverse scale matrix of the multivariate t-distribution on the manifold of symmetric positive definite matrices and estimating the parameter of a linear classifier on the sphere manifold.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data availability statement

The data generated and analyzed in the current study can be entirely reproduced by running our code which is publicly available on GitHub repository at https://github.com/samdavanloo/R-SVRC.

References

  1. Absil, P.A., Baker, C.G., Gallivan, K.A.: Trust-region methods on Riemannian manifolds with applications in numerical linear algebra. In: Proceedings of the 16th International Symposium on Mathematical Theory of Networks and Systems (MTNS2004), Leuven, Belgium, pp. 5–9 (2004)

  2. Absil, P.A., Baker, C.G., Gallivan, K.A.: Trust-region methods on Riemannian manifolds. Found. Comput. Math. 7(3), 303–330 (2007)

    MathSciNet  MATH  Google Scholar 

  3. Absil, P.A., Hosseini, S.: A collection of nonsmooth Riemannian optimization problems. In: Nonsmooth Optimization and Its Applications, pp. 1–15. Springer (2019)

  4. Absil, P.A., Mahony, R., Sepulchre, R.: Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton (2009)

    MATH  Google Scholar 

  5. Agarwal, N., Boumal, N., Bullins, B., Cartis, C.: Adaptive regularization with cubics on manifolds. Mathematical Programming pp. 1–50 (2020)

  6. Arjovsky, M., Shah, A., Bengio, Y.: Unitary evolution recurrent neural networks. In: International Conference on Machine Learning, pp. 1120–1128 (2016)

  7. Baker, C.G., Absil, P.A., Gallivan, K.A.: An implicit trust-region method on Riemannian manifolds. IMA J. Numer. Anal. 28(4), 665–689 (2008)

    MathSciNet  MATH  Google Scholar 

  8. Bansal, N., Chen, X., Wang, Z.: Can we gain more from orthogonality regularizations in training deep networks? Adv. Neural Inf. Process. Syst. 31 (2018). https://doi.org/10.48550/arXiv.1810.09102

  9. Bento, G., Ferreira, O., Oliveira, P.: Proximal point method for a special class of nonconvex functions on hadamard manifolds. Optimization 64(2), 289–319 (2015)

    MathSciNet  MATH  Google Scholar 

  10. Bento, G.C., Ferreira, O.P., Melo, J.G.: Iteration-complexity of gradient, subgradient and proximal point methods on Riemannian manifolds. J. Optim. Theory Appl. 173(2), 548–562 (2017)

    MathSciNet  MATH  Google Scholar 

  11. Bhatia, R.: Positive Definite Matrices. Princeton University Press, Princeton (2009)

    MATH  Google Scholar 

  12. Bonnabel, S.: Stochastic gradient descent on Riemannian manifolds. IEEE Trans. Autom. Control 58(9), 2217–2229 (2013)

    MathSciNet  MATH  Google Scholar 

  13. Boumal, N.: Riemannian trust regions with finite-difference Hessian approximations are globally convergent. In: International Conference on Geometric Science of Information, pp. 467–475. Springer (2015)

  14. Boumal, N.: An introduction to optimization on smooth manifolds. Available online (2020)

  15. Boumal, N., Absil, P.A., Cartis, C.: Global rates of convergence for nonconvex optimization on manifolds. IMA J. Numer. Anal. 39(1), 1–33 (2019)

    MathSciNet  MATH  Google Scholar 

  16. Boumal, N., Mishra, B., Absil, P.A., Sepulchre, R.: Manopt, a Matlab toolbox for optimization on manifolds. J. Mach. Learn. Res. 15(42), 1455–1459 (2014). (https://www.manopt.org)

    MATH  Google Scholar 

  17. Carmon, Y., Duchi, J.: Gradient descent finds the cubic-regularized nonconvex Newton step. SIAM J. Optim. 29(3), 2146–2178 (2019)

    MathSciNet  MATH  Google Scholar 

  18. Cartis, C., Gould, N.I., Toint, P.L.: Adaptive cubic regularisation methods for unconstrained optimization. part I: motivation, convergence and numerical results. Math. Program. 127(2), 245–295 (2011)

    MathSciNet  MATH  Google Scholar 

  19. Cartis, C., Gould, N.I., Toint, P.L.: Adaptive cubic regularisation methods for unconstrained optimization. part ii: worst-case function-and derivative-evaluation complexity. Math. Program. 130(2), 295–319 (2011)

    MathSciNet  MATH  Google Scholar 

  20. Cartis, C., Gould, N.I., Toint, P.L.: Complexity bounds for second-order optimality in unconstrained optimization. J. Complex. 28(1), 93–108 (2012)

    MathSciNet  MATH  Google Scholar 

  21. Cartis, C., Gould, N.I., Toint, P.L.: On the complexity of finding first-order critical points in constrained nonlinear optimization. Math. Program. 144(1), 93–106 (2014)

    MathSciNet  MATH  Google Scholar 

  22. Chavel, I.: Riemannian Geometry: A Modern Introduction, vol. 98. Cambridge University Press, Cambridge (2006)

    MATH  Google Scholar 

  23. Chen, S., Ma, S., Man-Cho So, A., Zhang, T.: Proximal gradient method for nonsmooth optimization over the stiefel manifold. SIAM J. Optim. 30(1), 210–239 (2020)

    MathSciNet  MATH  Google Scholar 

  24. Cogswell, M., Ahmed, F., Girshick, R., Zitnick, L., Batra, D.: Reducing overfitting in deep networks by decorrelating representations. (2015) arXiv preprint arXiv:1511.06068

  25. Criscitiello, C., Boumal, N.: Efficiently escaping saddle points on manifolds. In: Advances in Neural Information Processing Systems, pp. 5987–5997 (2019)

  26. de Carvalho Bento, G., da Cruz Neto, J.X., Oliveira, P.R.: A new approach to the proximal point method: convergence on general Riemannian manifolds. J. Optim. Theory Appl. 168(3), 743–755 (2016)

  27. da Cruz Neto, J., De Lima, L., Oliveira, P.: Geodesic algorithms in Riemannian geometry. Balkan J. Geom. Appl. 3(2), 89–100 (1998)

  28. Domino, K.: Selected methods for non-Gaussian data analysis. (2018) arXiv preprint arXiv:1811.10486

  29. Durrett, R.: Probability: Theory and Examples, vol. 49. Cambridge University Press, Cambridge (2019)

    MATH  Google Scholar 

  30. Ferreira, O., Oliveira, P.: Subgradient algorithm on Riemannian manifolds. J. Optim. Theory Appl. 97(1), 93–104 (1998)

    MathSciNet  MATH  Google Scholar 

  31. Ferreira, O., Oliveira, P.: Proximal point algorithm on Riemannian manifolds. Optimization 51(2), 257–270 (2002)

    MathSciNet  MATH  Google Scholar 

  32. Ferreira, O.P., Louzeiro, M.S., Prudente, L.: Gradient method for optimization on Riemannian manifolds with lower bounded curvature. SIAM J. Optim. 29(4), 2517–2541 (2019)

    MathSciNet  MATH  Google Scholar 

  33. Gabay, D.: Minimizing a differentiable function over a differential manifold. J. Optim. Theory Appl. 37(2), 177–219 (1982)

    MathSciNet  MATH  Google Scholar 

  34. Hosseini, R., Sra, S.: Recent advances in stochastic Riemannian optimization. In: Handbook of Variational Methods for Nonlinear Geometric Data, pp. 527–554 (2020)

  35. Hu, J., Liu, X., Wen, Z.W., Yuan, Y.X.: A brief introduction to manifold optimization. J. Oper. Res. Soc. China 8(2), 199–248 (2020)

    MathSciNet  MATH  Google Scholar 

  36. Hu, J., Milzarek, A., Wen, Z., Yuan, Y.: Adaptive quadratically regularized Newton method for Riemannian optimization. SIAM J. Matrix Anal. Appl. 39(3), 1181–1207 (2018)

    MathSciNet  MATH  Google Scholar 

  37. Huang, L., Liu, X., Lang, B., Yu, A.W., Wang, Y., Li, B.: Orthogonal weight normalization: Solution to optimization over multiple dependent Stiefel manifolds in deep neural networks. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)

  38. Huang, W., Wei, K.: Riemannian proximal gradient methods. Mathematical Programming, pp. 1–43 (2021)

  39. Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. Adv. Neural. Inf. Process. Syst. 26, 315–323 (2013)

    Google Scholar 

  40. Kasai, H., Mishra, B.: Inexact trust-region algorithms on riemannian manifolds. Adv. Neural Inf. Process. Syst. 31 (2018)

  41. Kasai, H., Sato, H., Mishra, B.: Riemannian stochastic quasi-Newton algorithm with variance reduction and its convergence analysis. In: International Conference on Artificial Intelligence and Statistics, pp. 269–278 (2018)

  42. Kasai, H., Sato, H., Mishra, B.: Riemannian stochastic recursive gradient algorithm. In: International Conference on Machine Learning, pp. 2516–2524 (2018)

  43. Kotz, S., Nadarajah, S.: Multivariate t-Distributions and their Applications. Cambridge University Press, Cambridge (2004)

    MATH  Google Scholar 

  44. Kovalev, D., Mishchenko, K., Richtárik, P.: Stochastic Newton and cubic Newton methods with simple local linear-quadratic rates. (2019) arXiv preprint arXiv:1912.01597

  45. Krzanowski, W.: Principles of Multivariate Analysis, vol. 23. OUP Oxford (2000)

  46. Lee, J.M.: Introduction to Riemannian Manifolds, vol. 176. Springer, Berlin (2018)

    MATH  Google Scholar 

  47. Levin, E., Kileel, J., Boumal, N.: The effect of smooth parametrizations on nonconvex optimization landscapes (2022). arXiv preprint arXiv:2207.03512

  48. Li, F., Yang, Y.: A loss function analysis for classification methods in text categorization. In: Proceedings of the 20th international conference on machine learning (ICML-03), pp. 472–479 (2003)

  49. Li, J., Fuxin, L., Todorovic, S.: Efficient Riemannian optimization on the Stiefel manifold via the Cayley transform (2020). arXiv preprint arXiv:2002.01113

  50. Li, P., Rangapuram, S.S., Slawski, M.: Methods for sparse and low-rank recovery under simplex constraints. Stat. Sin. 30(2), 557–577 (2020)

    MathSciNet  MATH  Google Scholar 

  51. Li, Q., McKenzie, D., Yin, W.: From the simplex to the sphere: faster constrained optimization using the hadamard parametrization (2021). arXiv preprint arXiv:2112.05273

  52. Li, X., Chen, S., Deng, Z., Qu, Q., Zhu, Z., Man-Cho So, A.: Weakly convex optimization over stiefel manifold using Riemannian subgradient-type methods. SIAM J. Optim. 31(3), 1605–1634 (2021)

    MathSciNet  MATH  Google Scholar 

  53. Luenberger, D.G.: The gradient projection method along geodesics. Manage. Sci. 18(11), 620–631 (1972)

    MathSciNet  MATH  Google Scholar 

  54. Mackey, L., Jordan, M.I., Chen, R.Y., Farrell, B., Tropp, J.A., et al.: Matrix concentration inequalities via the method of exchangeable pairs. Ann. Probab. 42(3), 906–945 (2014)

    MathSciNet  MATH  Google Scholar 

  55. de Melo Mendes, B.V., de Souza, R.M.: Measuring financial risks with copulas. Int. Rev. Financ. Anal. 13(1), 27–45 (2004)

    Google Scholar 

  56. Nesterov, Y., Polyak, B.T.: Cubic regularization of Newton method and its global performance. Math. Program. 108(1), 177–205 (2006)

    MathSciNet  MATH  Google Scholar 

  57. Nguyen, L.M., Liu, J., Scheinberg, K., Takáč, M.: Stochastic recursive gradient algorithm for nonconvex optimization (2017). arXiv preprint arXiv:1705.07261

  58. Nocedal, J., Wright, S.: Numerical Optimization. Springer, Berlin (2006)

    MATH  Google Scholar 

  59. Qi, C.: Numerical optimization methods on Riemannian manifolds. Ph.D. thesis, Florida State University (2011)

  60. Ring, W., Wirth, B.: Optimization methods on Riemannian manifolds and their application to shape space. SIAM J. Optim. 22(2), 596–627 (2012)

    MathSciNet  MATH  Google Scholar 

  61. Roychowdhury, A.: Accelerated stochastic quasi-Newton optimization on Riemann manifolds (2017). arXiv preprint arXiv:1704.01700

  62. Rudin, W., et al.: Principles of Mathematical Analysis, vol. 3. McGraw-Hill, New York (1964)

    MATH  Google Scholar 

  63. Sato, H.: Riemannian Optimization and Its Applications. Springer, Berlin (2021)

    MATH  Google Scholar 

  64. Sato, H., Iwai, T.: A new globally convergent Riemannian conjugate gradient method. Optimization 64(4), 1011–1031 (2015)

    MathSciNet  MATH  Google Scholar 

  65. Sato, H., Kasai, H., Mishra, B.: Riemannian stochastic variance reduced gradient algorithm with retraction and vector transport. SIAM J. Optim. 29(2), 1444–1472 (2019)

    MathSciNet  MATH  Google Scholar 

  66. Smith, S.T.: Geometric Optimization Methods for Adaptive Filtering. Harvard University, Cambridge (1993)

    Google Scholar 

  67. Smith, S.T.: Optimization Techniques on Riemannian Manifolds. Fields Inst. Commun. 3(3), 113–135 (1994)

    MathSciNet  MATH  Google Scholar 

  68. Sra, S., Hosseini, R.: Conic geometric optimization on the manifold of positive definite matrices. SIAM J. Optim. 25(1), 713–739 (2015)

    MathSciNet  MATH  Google Scholar 

  69. Sun, Y., Flammarion, N., Fazel, M.: Escaping from saddle points on Riemannian manifolds. In: Advances in Neural Information Processing Systems, pp. 7276–7286 (2019)

  70. Sun, Y., Zheng, L., Deng, W., Wang, S.: Svdnet for pedestrian retrieval. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3800–3808 (2017)

  71. Szegö, G.: Measures of risk. J. Bank. Finance 26(7), 1253–1272 (2002)

    Google Scholar 

  72. Tripuraneni, N., Flammarion, N., Bach, F., Jordan, M.I.: Averaging stochastic gradient descent on Riemannian manifolds. In: Conference on Learning Theory, pp. 650–687 (2018)

  73. Udriste, C.: Convex Functions and Optimization Methods on Riemannian Manifolds, vol. 297. Springer, Berlin (2013)

    MATH  Google Scholar 

  74. Wang, B., Ma, S., Xue, L.: Riemannian stochastic proximal gradient methods for nonsmooth optimization over the stiefel manifold. J. Mach. Learn. Res. 23(106), 1–33 (2022)

    MathSciNet  Google Scholar 

  75. Wisdom, S., Powers, T., Hershey, J., Le Roux, J., Atlas, L.: Full-capacity unitary recurrent neural networks. In: Advances in Neural Information Processing Systems, pp. 4880–4888 (2016)

  76. Xie, D., Xiong, J., Pu, S.: All you need is beyond a good init: exploring better solution for training extremely deep convolutional neural networks with orthonormality and modulation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6176–6185 (2017)

  77. Zhang, H., Reddi, S.J., Sra, S.: Riemannian svrg: fast stochastic optimization on Riemannian manifolds. In: Advances in Neural Information Processing Systems, pp. 4592–4600 (2016)

  78. Zhang, H., Sra, S.: First-order methods for geodesically convex optimization. In: Conference on Learning Theory, pp. 1617–1638 (2016)

  79. Zhang, J., Zhang, H., Sra, S.: R-spider: A fast riemannian stochastic optimization algorithm with curvature independent rate (2018). arXiv preprint arXiv:1811.04194

  80. Zhang, J., Zhang, S.: A cubic regularized Newton’s method over Riemannian manifolds (2018). arXiv preprint arXiv:1805.05565

  81. Zhao, L., Mammadov, M., Yearwood, J.: From convex to nonconvex: a loss function analysis for binary classification. In: 2010 IEEE International Conference on Data Mining Workshops, pp. 1281–1288. IEEE (2010)

  82. Zhou, D., Xu, P., Gu, Q.: Stochastic variance-reduced cubic regularized Newton methods. In: International Conference on Machine Learning, pp. 5990–5999 (2018)

  83. Zhou, P., Yuan, X.T., Feng, J.: Faster first-order methods for stochastic non-convex optimization on Riemannian manifolds. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 138–147 (2019)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sam Davanloo Tajbakhsh.

Additional information

Communicated by Defeng Sun.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Proof of Lemma 3.2

Appendix: Proof of Lemma 3.2

Denote the orthogonal projection operator onto \(T_{\textbf{x}}\mathcal {M}\), i.e., the tangent space of \(\mathcal {M}\) at \(\textbf{x}\), by \(P_{\textbf{x}}\). Denote the Euclidean gradient and Hessian by \(\nabla f(\textbf{x})\) and \(\nabla ^2 f(\textbf{x})\) correspondingly. For any \(\textbf{y}\), such that \(d(\textbf{x},\textbf{y})<\text {inj}(\mathcal {M})\) and any \(\xi \in T_{\textbf{y}}\mathcal {M}\), s.t. \(\Vert \xi \Vert =1\), we have

$$\begin{aligned} \text {Hess} f(\textbf{y})[\xi ]&=P_{\textbf{y}}(D(\textbf{y}\rightarrow P_{\textbf{y}}\nabla f(\textbf{y}))(\textbf{y})[\xi ])\\&=P_{\textbf{y}}(D(\textbf{y}\rightarrow P_{\textbf{y}})(\textbf{y})[\xi ][\nabla f(\textbf{y})])+P_{\textbf{y}} (\nabla ^2 f(\textbf{y})[\xi ])\\&\equiv A_1+B_1, \end{aligned}$$

The first equality follows from (7), and the second equality comes from the chain rule and the fact that the projection operator is linear. Similarly, we have

$$\begin{aligned} \varGamma _{\textbf{x}}^{\textbf{y}}\text {Hess} f(\textbf{x}) [\varGamma _{\textbf{y}}^{\textbf{x}}\xi ]&= \varGamma _{\textbf{x}}^{\textbf{y}} P_{\textbf{x}}(D(\textbf{x}\rightarrow P_{\textbf{x}}\nabla f(\textbf{x}))(\textbf{x})[\varGamma _{\textbf{y}}^{\textbf{x}}\xi ])\\&=\varGamma ^{\textbf{y}}_{\textbf{x}} P_{\textbf{x}}(D(\textbf{x}\rightarrow P_{\textbf{x}})(\textbf{x})[\varGamma _{\textbf{y}}^{\textbf{x}}\xi ][\nabla f(\textbf{x})])+\varGamma ^{\textbf{y}}_{\textbf{x}}P_{\textbf{x}} (\nabla ^2 f(\textbf{x})[\varGamma _{\textbf{y}}^{\textbf{x}}\xi ])\\&\equiv A_2+B_2. \end{aligned}$$

First, to quantify \(\Vert A_1-A_2\Vert \), we have,

$$\begin{aligned} \Vert A_1-A_2\Vert&=\Vert O_{A_1}[\nabla f(\textbf{y})+\nabla f(\textbf{x})-\nabla f(\textbf{x})]-O_{A_2}[\nabla f(\textbf{x})]\Vert \end{aligned}$$
(74)
$$\begin{aligned}&=\Vert O_{A_1}[\nabla f(\textbf{y})-\nabla f(\textbf{x})]+(O_{A_1}-O_{A_2})[\nabla f(\textbf{x})]\Vert \end{aligned}$$
(75)
$$\begin{aligned}&\le \Vert O_{A_1}[\nabla f(\textbf{y})-\nabla f(\textbf{x})]\Vert +\Vert (O_{A_1}-O_{A_2})[\nabla f(\textbf{x})]\Vert \end{aligned}$$
(76)
$$\begin{aligned}&\le \Vert O_{A_1}\Vert _{op}\cdot \Vert \nabla f(\textbf{y})-\nabla f(\textbf{x})\Vert +\Vert (O_{A_1}-O_{A_2})[\nabla f(\textbf{x})]\Vert , \end{aligned}$$
(77)

where \(O_{A_1}\triangleq P_{\textbf{y}}(D(\textbf{y}\rightarrow P_{\textbf{y}})(\textbf{y})[\xi ][\cdot ])\) and \(O_{A_2}\triangleq \varGamma ^{\textbf{y}}_{\textbf{x}} P_{\textbf{x}}(D(\textbf{x}\rightarrow P_{\textbf{x}})(\textbf{x})[\varGamma _{\textbf{y}}^{\textbf{x}}\xi ][\cdot ])\).

Due to the smoothness and compactness of \(\mathcal {M}\) and \(\Vert \xi \Vert =1\), \(\Vert P_{\textbf{y}}(D(\textbf{y}\rightarrow P_{\textbf{y}})(\textbf{y})[\xi ][\cdot ])\Vert _{op}\) exists and is uniformly upper bounded, i.e., there exists a finite \(M_1\) independent of \(\textbf{x}\), \(\textbf{y}\) and \(\xi \), s.t. \(\Vert P_{\textbf{y}}(D(\textbf{y}\rightarrow P_{\textbf{y}})(\textbf{y})[\xi ][\cdot ])\Vert _{op}\le M_1\) for any \(\textbf{x}\), \(\textbf{y}\in \mathcal {M}\) and \(\xi \), s.t. \(\Vert \xi \Vert =1\).

For any \(\textbf{z}\), such that \(d(\textbf{z},\textbf{y})<\text {inj}(\mathcal {M})\), define \(Q_{\textbf{z},\textbf{y},\xi }\triangleq \varGamma ^{\textbf{y}}_{\textbf{z}} P_{\textbf{z}}(D(\textbf{z}\rightarrow P_{\textbf{z}})(\textbf{z})[\varGamma _{\textbf{y}}^{\textbf{z}}\xi ][\cdot ])\). Note that \(O_{A_1}=Q_{\textbf{y},\textbf{y},\xi }\) and \(O_{A_2}=Q_{\textbf{x},\textbf{y},\xi }\). For fixed \(\tilde{\textbf{x}}\), \(\tilde{\textbf{y}}\) and \(\tilde{\xi }\), \(Q_{\textbf{z},\tilde{\textbf{y}},\tilde{\xi }}[\nabla f(\tilde{\textbf{x}})]\) is a continuously differentiable function of \(\textbf{z}\) based on the conditions that the manifold is smooth and \(f(\textbf{x})\) has Lipschitz continuous Hessian. Since \(\textbf{z}\) belongs to a compact set, \(Q_{\textbf{z},\tilde{\textbf{y}},\tilde{\xi }}[\nabla f(\tilde{\textbf{x}})]\) is Lipschitz continuous on \(\textbf{z}\), i.e.,

$$\begin{aligned} \Vert Q_{\textbf{x},\tilde{\textbf{y}},\tilde{\xi }}[\nabla f(\tilde{\textbf{x}})]-Q_{\textbf{y},\tilde{\textbf{y}},\tilde{\xi }}[\nabla f(\tilde{\textbf{x}})]\Vert \le M_{\tilde{\textbf{x}},\tilde{\textbf{y}},\tilde{\xi }}\Vert \textbf{x}-\textbf{y}\Vert , \forall \textbf{x},\textbf{y}\in \mathcal {M}, \end{aligned}$$
(78)

where \(M_{\tilde{\textbf{x}},\tilde{\textbf{y}},\tilde{\xi }}\) is a finite constant depending on \(\tilde{\textbf{x}},\tilde{\textbf{y}},\tilde{\xi }\). Especially, due to the smoothness of manifold and the function \(f(\textbf{x})\) has Lipschitz continuous Hessian, we have a continuous map from \(\tilde{\textbf{x}},\tilde{\textbf{y}},\tilde{\xi }\) to \(M_{\tilde{\textbf{x}},\tilde{\textbf{y}},\tilde{\xi }}\). To argue the existence of such a continuous map, for a fixed x, y, and \(\xi \), \(Q_{z,y,\xi }[\nabla f(x)]\) is a continuously differentiable function of z. For simplicity of notation, let \(q(z;x,y,\xi )\triangleq Q_{z,y,\xi }[\nabla f(x)]\). Furthermore, let \(H_z(x,y,\xi )\triangleq D(z\rightarrow q(z;x,y,\xi ))\). Given that parallel transport, projection are smooth maps, and \(f\in C^2\), \(H_z(x,y,\xi )\) is Lipschitz continuous operator for \(x,y,z\in \mathcal {M}\) a compact manifold and \(\xi \in \{\xi \in T_z\mathcal {M}: \Vert \xi \Vert =1\}\) which is a compact set. Define

$$\begin{aligned} M(x,y,\xi )=\sup _{z\in \mathcal {M}} \Vert D(z\rightarrow q(z;x,y,\xi ))\Vert _{op}=\sup _{z\in \mathcal {M}} \Vert H_z(x,y,\xi )\Vert _{op}. \end{aligned}$$

We have

$$\begin{aligned} |M(x,y,\xi )-M(x_0,y_0,\xi _0)|&= |\sup _{z\in \mathcal {M}} \Vert H_z(x,y,\xi )\Vert _{op} - \sup _{z\in \mathcal {M}} \Vert H_z(x_0,y_0,\xi _0)\Vert _{op}| \\&\le \sup _{z\in \mathcal {M}} |\Vert H_z(x,y,\xi )\Vert _{op} - \Vert H_z(x_0,y_0,\xi _0)\Vert _{op}| \\&\le \Vert H_z(x,y,\xi )-H_z(x_0,y_0,\xi _0)\Vert _{op} \\&\le \sup _{z\in \mathcal {M}} L(z) \left( d(x,x_0)+d(y,y_0)+\Vert \xi -\varGamma _{y_0}^y\xi _0\Vert \right) , \end{aligned}$$

where the first inequality follows from an infinite-dimensional extension of triangle inequality for \(\ell _\infty \)-norm, the second inequality from the regular triangle inequality, and the third inequality from Lipschitz continuity of H. Taking \(\bar{L}=\sup _{z\in \mathcal {M}} L(z)\) which is attainable given the compactness of \(\mathcal {M}\), continuity of \(M(x,y,\xi )\) is established.

Now, since \(\tilde{\textbf{x}},\tilde{\textbf{y}}\in \mathcal {M}\), which is a compact set and \(\Vert \tilde{\xi }\Vert =1\), we have a finite constant \(M_2\), s.t. \(M_{\tilde{\textbf{x}},\tilde{\textbf{y}},\tilde{\xi }}\le M_2\) for all \({\tilde{\textbf{x}},\tilde{\textbf{y}},\tilde{\xi }}\). In (78), letting \(\textbf{x}=\tilde{\textbf{x}}\), \(\textbf{y}=\tilde{\textbf{y}}\), we have,

$$\begin{aligned} \Vert Q_{\tilde{\textbf{x}},\tilde{\textbf{y}},\tilde{\xi }}[\nabla f(\tilde{\textbf{x}})]-Q_{\tilde{\textbf{y}},\tilde{\textbf{y}},\tilde{\xi }}[\nabla f(\tilde{\textbf{x}})]\Vert \le M_{\tilde{\textbf{x}},\tilde{\textbf{y}},\tilde{\xi }}\Vert \tilde{\textbf{x}}-\tilde{\textbf{y}}\Vert \le M_2\Vert \tilde{\textbf{x}}-\tilde{\textbf{y}}\Vert . \end{aligned}$$
(79)

Due to the arbitrariness of \(\tilde{\textbf{x}}\), \(\tilde{\textbf{y}}\) and \(\tilde{\xi }\), we conclude the second term in (77), \(\Vert (O_{A_1}-O_{A_2})[\nabla f(\textbf{x})]\Vert \le M_2\Vert \textbf{x}-\textbf{y}\Vert \).

On the other hand, the gradient of a twice continuously differentiable function on a compact manifold is Lipschitz continuous. Therefore, there exists a finite \(L_1\), s.t.

$$\begin{aligned} \Vert A_1-A_2\Vert&\le M_1\Vert \nabla f(\textbf{y})-\nabla f(\textbf{x})\Vert +M_2\Vert \textbf{x}-\textbf{y}\Vert \le (M_1\cdot L_1+M_2) \Vert \textbf{y}-\textbf{x}\Vert \nonumber \\&\le (M_1\cdot L_1+M_2)d(\textbf{x},\textbf{y}), \end{aligned}$$
(80)

where \(d(\textbf{x},\textbf{y})\) is the Riemannian distance between \(\textbf{x}\) and \(\textbf{y}\). The third inequality holds since the manifold is embedded in the Euclidean space.

Second, to quantify \(\Vert B_1-B_2\Vert \), we define

$$\begin{aligned} R_{\textbf{y},\xi }(\textbf{z})\triangleq \varGamma ^{\textbf{y}}_{\textbf{z}}P_{\textbf{z}} (\nabla ^2 f(\textbf{z})[\varGamma _{\textbf{y}}^{\textbf{z}}\xi ]). \end{aligned}$$
(81)

Fixing \(\textbf{y}\),\(\xi \) to be \(\tilde{\textbf{y}}\) and \(\tilde{\xi }\), \(R_{\tilde{\textbf{y}},\tilde{\xi }}(\textbf{z})\) is Lipschitz continuous on \(\textbf{z}\) due to the smoothness of the manifold and \(\nabla ^2 f(\textbf{z})\) is Lipschitz continuous. Therefore, there exists a constant \(N_{\tilde{\textbf{y}},\tilde{\xi }}\) depending on \(\tilde{\textbf{y}}\) and \(\tilde{\xi }\), s.t. \(\Vert R_{\tilde{\textbf{y}},\tilde{\xi }}(\textbf{x})-R_{\tilde{\textbf{y}},\tilde{\xi }}(\textbf{y})\Vert \le N_{\tilde{\textbf{y}},\tilde{\xi }}\Vert \textbf{x}-\textbf{y}\Vert \) for all \(\textbf{x}\), \(\textbf{y}\in \mathcal {M}\). Especially, there is a continuous mapping from \(\tilde{\textbf{y}}\), \(\tilde{\xi }\) to \(N_{\tilde{\textbf{y}},\tilde{\xi }}\). Since \(\tilde{\textbf{y}}\), \(\tilde{\xi }\) are from compact sets, there exists a finite constant \(M_3\), s.t. \(\Vert R_{\tilde{\textbf{y}},\tilde{\xi }}(\textbf{x})-R_{\tilde{\textbf{y}},\tilde{\xi }}(\textbf{y})\Vert \le N_{\tilde{\textbf{y}},\tilde{\xi }}\Vert \textbf{x}-\textbf{y}\Vert \le M_3 \Vert \textbf{x}-\textbf{y}\Vert \) for all \(\textbf{x}\), \(\textbf{y}\in \mathcal {M}\).

Letting \(\textbf{x}=\tilde{\textbf{x}}\), \(\textbf{y}=\tilde{\textbf{y}}\), and due to the arbitrariness of \(\tilde{\textbf{x}}\), \(\tilde{\textbf{y}}\) and \(\tilde{\xi }\), we have

$$\begin{aligned} \Vert B_1-B_2\Vert =\Vert R_{\textbf{y},\xi }(\textbf{y})-R_{\textbf{y},\xi }(\textbf{x})\Vert \le M_3 \Vert \textbf{x}-\textbf{y}\Vert \le M_3\cdot d(\textbf{x},\textbf{y}). \end{aligned}$$
(82)

Combining (80), (82), there exists a finite \(L\triangleq M_1\cdot L_1+M_2+M_3\), s.t.

$$\begin{aligned} \Vert \text {Hess} f(\textbf{y})[\xi ]-\varGamma _{\textbf{x}}^{\textbf{y}}\text {Hess} f(\textbf{x}) [\varGamma _{\textbf{y}}^{\textbf{x}}\xi ]\Vert \le \Vert A_1-A_2\Vert +\Vert B_1-B_2\Vert \le L\cdot d(\textbf{x}, \textbf{y}). \end{aligned}$$

Since \(\xi \) is an arbitrary tangent vector, we have,

$$\begin{aligned} \Vert \text {Hess} f(\textbf{y})-\varGamma _{\textbf{x}}^{\textbf{y}}\text {Hess} f(\textbf{x}) \varGamma _{\textbf{y}}^{\textbf{x}}\Vert _{op}\le L\cdot d(\textbf{x}, \textbf{y}). \end{aligned}$$

\(\square \)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, D., Davanloo Tajbakhsh, S. Riemannian Stochastic Variance-Reduced Cubic Regularized Newton Method for Submanifold Optimization. J Optim Theory Appl 196, 324–361 (2023). https://doi.org/10.1007/s10957-022-02137-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10957-022-02137-5

Keywords

Navigation