Document Zbl 1536.90137

Distributed stochastic compositional optimization problems over directed networks. (English) Zbl 1536.90137

Comput. Optim. Appl. 87, No. 1, 249-288 (2024).

Summary: We study the distributed stochastic compositional optimization problems over directed communication networks in which agents privately own a stochastic compositional objective function and collaborate to minimize the sum of all objective functions. We propose a distributed stochastic compositional gradient descent method, where the gradient tracking and the stochastic correction techniques are employed to adapt to the networks’ directed structure and increase the accuracy of inner function estimation. When the objective function is smooth, the proposed method achieves the convergence rate \(\mathcal{O}\left( k^{-1/2}\right)\) and sample complexity \(\mathcal{O}\left( \frac{1}{\epsilon^2}\right)\) for finding the \((\epsilon)\)-stationary point. When the objective function is strongly convex, the convergence rate is improved to \(\mathcal{O}\left( k^{-1}\right)\). Moreover, the asymptotic normality of Polyak-Ruppert averaged iterates of the proposed method is also presented. We demonstrate the empirical performance of the proposed method on model-agnostic meta-learning problem and logistic regression problem.

MSC:

90C15	Stochastic programming
90C35	Programming involving graphs or networks

Keywords:

distributed stochastic compositional optimization; directed communication networks; gradient tracking; asymptotic normality

Cite Review PDF

Full Text: DOI arXiv

References:

[1]	Balasubramanian, K.; Ghadimi, S.; Nguyen, A., Stochastic multi-level composition optimization algorithms with levelindependent convergence rates, SIAM J. Optim., 32, 519-544 (2022) · Zbl 1491.90155 · doi:10.1137/21M1406222
[2]	Bianchi, P.; Fort, G.; Hachem, W., Performance of a distributed stochastic approximation algorithm, IEEE Trans. Inf. Theory, 59, 7405-7418 (2013) · Zbl 1364.62220 · doi:10.1109/TIT.2013.2275131
[3]	Chen, T.; Sun, Y.; Yin, W., Solving stochastic compositional optimization is nearly as easy as solving stochastic optimization, IEEE Trans. Signal Process., 69, 4937-4948 (2021) · Zbl 07591674 · doi:10.1109/TSP.2021.3092377
[4]	Chung, KL, On a stochastic approximation method, Ann. Math. Stat., 25, 463-483 (1954) · Zbl 0059.13203 · doi:10.1214/aoms/1177728716
[5]	Dai, B., He, N., Pan, Y., Boots, B., Song, L.: Learning from Conditional Distributions via Dual Embeddings. In: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, vol. 54, PMLR, pp. 1458-1467 (2017)
[6]	Dentcheva, D.; Penev, S.; Ruszczyński, A., Statistical estimation of composite risk functionals and risk optimization problems, Ann. Inst. Stat. Math., 69, 737-760 (2017) · Zbl 1447.62119 · doi:10.1007/s10463-016-0559-8
[7]	Ermoliev, YM, Methods of Stochastic Programming (1976), Moscow: Nauka, Moscow · Zbl 0484.90070
[8]	Ermoliev, YM; Norkin, VI, Sample average approximation method for compound stochastic optimization problems, SIAM J. Optim., 23, 2231-2263 (2013) · Zbl 1295.90025 · doi:10.1137/120863277
[9]	Fabian, V., On asymptotic normality in stochastic approximation, Ann. Math. Stat., 39, 1327-1332 (1968) · Zbl 0176.48402 · doi:10.1214/aoms/1177698258
[10]	Finn, C., Abbeel, P., Levine, S.: Model-agnostic meta-learning for fast adaptation of deep networks. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 1126-1135 (2017)
[11]	Gao, H., Huang, H.: Fast training method for stochastic compositional optimization problems. In: Advances in Neural Information Processing Systems, vol. 34, pp. 25334-25345 (2021)
[12]	Gao, H., Li, J., Huang, H.: On the convergence of local stochastic compositional gradient descent with momentum. In: Proceedings of the 39th International Conference on Machine Learning, vol. 162, pp. 7017-7035 (2022)
[13]	Ghadimi, S.; Ruszczynski, A.; Wang, M., A single timescale stochastic approximation method for nested stochastic optimization, SIAM J. Optim., 30, 960-979 (2020) · Zbl 1441.90106 · doi:10.1137/18M1230542
[14]	Ghadimi, S., Wang, M.: Approximation methods for bilevel programming, arXiv preprint arXiv:1802.02246 (2018)
[15]	Guo, Z., Hu, Q., Zhang, L., Yang, T.: Randomized stochastic variance-reduced methods for multi-task stochastic bilevel optimization, arXiv preprint arXiv:2105.02266 (2021)
[16]	Hong, M.; Wai, H-T; Wang, Z.; Yang, Z., A two-timescale stochastic algorithm framework for bilevel optimization: complexity analysis and application to actor-critic, SIAM J. Optim., 33, 147-180 (2023) · Zbl 1516.90062 · doi:10.1137/20M1387341
[17]	Hu, Y.; Chen, X.; He, N., Sample complexity of sample average approximation for conditional stochastic optimization, SIAM J. Optim., 30, 2103-2133 (2020) · Zbl 1448.90063 · doi:10.1137/19M1284865
[18]	Huo, Z., Gu, B., Liu, J., Huang, H.: Accelerated method for stochastic composition optimization with nonsmooth regularization. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence, pp. 3287-3294 (2018)
[19]	Ji, K., Yang, J., Liang, Y.: Bilevel optimization: Convergence analysis and enhanced design. In: Proceedings of the 38th International Conference on Machine Learning, vol. 139, pp. 4882-4892 (2021)
[20]	Jiang, W., Wang, B., Wang, Y., Zhang, L., Yang, T.: Optimal algorithms for stochastic multi-level compositional optimization. In: Proceedings of the 39th International Conference on Machine Learning, vol 162, pp. 10195-10216 (2022)
[21]	Lei, J.; Chen, HF; Fang, HT, Asymptotic properties of primal-dual algorithm for distributed stochastic optimization over random networks with imperfect communications, SIAM J. Control. Optim., 56, 2159-2188 (2018) · Zbl 1404.90092 · doi:10.1137/16M1086133
[22]	Liu, L.; Liu, J.; Tao, D., Variance reduced methods for non-convex composition optimization, IEEE Trans Pattern Anal Mach Intell., 44, 5813-5825 (2021)
[23]	Morral, G., Bianchi, P., Fort, G., Jakubowicz, J.: Distributed stochastic approximation: the price of non-double stochasticity. In: 2012 Conference Record of the Forty Sixth Asilomar Conference on Signals, Systems and Computers, pp. 1473-1477 (2012)
[24]	Nedic, A., Distributed gradient methods for convex machine learning problems in networks: distributed optimization, IEEE Signal Process. Mag., 37, 92-101 (2020) · doi:10.1109/MSP.2020.2975210
[25]	Polyak, BT, Introduction to Optimization (1987), New York: Optimization Software, New York · Zbl 0625.62093
[26]	Polyak, BT; Juditsky, AB, Acceleration of stochastic approximation by averaging, SIAM J. Control. Optim., 30, 838-855 (1992) · Zbl 0762.62022 · doi:10.1137/0330046
[27]	Pu, S., Shi, W., Xu, J., Nedic, A.: A push-pull gradient method for distributed optimization in networks. In: IEEE Conference on Decision and Control, pp. 3385-3390 (2018)
[28]	Pu, S.; Shi, W.; Xu, J.; Nedic, A., Push-pull gradient methods for distributed optimization in networks, IEEE Trans. Autom. Control, 66, 1-16 (2021) · Zbl 1536.93337 · doi:10.1109/TAC.2020.2972824
[29]	Qi, Q., Luo, Y., Xu, Z., Ji, S., Yang, T.: Stochastic optimization of areas under precision-recall curves with provable convergence. In: Advances in Neural Information Processing Systems, vol 34, pp. 1752-1765 (2021)
[30]	Qu, G.; Li, N., Harnessing smoothness to accelerate distributed optimization, IEEE Trans. Control Netw. Syst., 5, 1245-1260 (2018) · Zbl 1515.93111 · doi:10.1109/TCNS.2017.2698261
[31]	Rakhlin, A., Shamir, O., Sridharan, K.: Making gradient descent optimal for strongly convex stochastic optimization. In: Proceedings of the 29th International Coference on Machine Learning, pp. 1571-1578 (2012)
[32]	Ren, J.; Haupt, J.; Guo, Z., Communication-efficient hierarchical distributed optimization for multi-agent policy evaluation, J. Comput. Sci., 49 (2021) · doi:10.1016/j.jocs.2020.101280
[33]	Ruszczynski, A., A stochastic subgradient method for nonsmooth nonconvex multilevel composition optimization, SIAM J. Control. Optim., 59, 2301-2320 (2021) · Zbl 1470.90096 · doi:10.1137/20M1312952
[34]	Sahu, AK; Kar, S.; Moura, JMF; Poor, HV, Distributed constrained recursive nonlinear least-squares estimation: algorithms and asymptotics, IEEE Trans. Signal Inf. Process. Netw., 2, 426-441 (2016)
[35]	Seneta, E., Non-Negative Matrices and Markov Chains (1981), New York: Springer, New York · Zbl 1099.60004 · doi:10.1007/0-387-32792-4
[36]	Sha, X.; Zhang, J.; You, K.; Zhang, K.; Basar, T., Fully asynchronous policy evaluation in distributed reinforcement learning over networks, Automatica, 136, 1-11 (2022) · Zbl 1480.93027 · doi:10.1016/j.automatica.2021.110092
[37]	Song, Z.; Shi, L.; Pu, S.; Yan, M., Compressed gradient tracking for decentralized optimization over general directed networks, IEEE Trans. Signal Process., 70, 1775-1787 (2022) · Zbl 07911366 · doi:10.1109/TSP.2022.3160238
[38]	Wang, B., Yuan, Z., Ying, Y., Yang, T.: Memory-based optimization methods for model-agnostic meta-learning, arXiv preprint arXiv:2106.04911 (2021)
[39]	Wang, M.; Fang, EX; Liu, H., Stochastic compositional gradient descent: algorithms for minimizing compositions of expected-value functions, Math. Program., 161, 419-449 (2017) · Zbl 1356.90099 · doi:10.1007/s10107-016-1017-3
[40]	Wang, M.; Liu, J.; Fang, E., Accelerating stochastic composition optimization, J. Mach. Learn. Res., 18, 1-23 (2017) · Zbl 1441.62217
[41]	Xin, R.; Khan, UA, A linear algorithm for optimization over directed graphs with geometric convergence, IEEE Control Syst. Lett., 2, 315-320 (2018) · doi:10.1109/LCSYS.2018.2834316
[42]	Xin, R.; Pu, S.; Nedic, A.; Khan, UA, A general framework for decentralized optimization with first-order methods, Proc. IEEE, 108, 1869-1889 (2020) · doi:10.1109/JPROC.2020.3024266
[43]	Yang, S.; Wang, M.; Fang, EX, Multilevel stochastic gradient methods for nested composition optimization, SIAM J. Optim., 29, 616-659 (2019) · Zbl 1414.90251 · doi:10.1137/18M1164846
[44]	Yang, S., Zhang, X., Wang, M.: Decentralized gossip-based stochastic bilevel optimization over communication networks. In: Advances in Neural Information Processing Systems, vol. 35, pp. 238-252 (2022)
[45]	Zhang, J.; Xiao, L., Multilevel composite stochastic optimization via nested variance reduction, SIAM J. Optim., 31, 1131-1157 (2021) · Zbl 1514.90174 · doi:10.1137/19M1285457
[46]	Zhang, J., Xiao, L.: A stochastic composite gradient method with incremental variance reduction. In: Advances in Neural Information Processing Systems, pp. 9078-9088 (2019)
[47]	Zhao, S.; Chen, X-M; Liu, Y., Asymptotic properties of dual averaging algorithm for constrained distributed stochastic optimization, Syst. Control Lett., 165, 1-14 (2022) · Zbl 1497.93245 · doi:10.1016/j.sysconle.2022.105252
[48]	Zhao, S., Liu, Y.: Asymptotic properties of \(\cal{S}-\cal{AB}\) method with diminishing stepsize, arXiv preprint arXiv:2109.07981 (2021)

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.