Abstract
It is observed that the inexact convergence of the well-known distributed gradient descent algorithm can be caused by inaccuracy of consensus procedures. Motivated by this, to achieve the improved convergence, we ensure the sufficiently accurate consensus via approximate consensus steps. The accuracy is controlled by a predefined sequence of consensus error bounds. It is shown that one can achieve exact convergence when the sequence decays to zero; furthermore, a linear convergence rate can be obtained when the sequence decays to zero linearly. To implement the approximate consensus step with given error bounds, an inexact average consensus scheme is proposed in a distributed manner. Due to the flexibility of choices of both consensus error bounds and consensus schemes, the proposed framework offers the potential to balance the communication and computation in distributed optimization.
Similar content being viewed by others
References
Bazerque, J.A., Giannakis, G.B.: Distributed spectrum sensing for cognitive radio networks by exploiting sparsity. IEEE Trans. Signal Process. 58(3), 1847–1862 (2010)
Schizas, I.D., Giannakis, G.B., Roumeliotis, S.I., Ribeiro, A.: Consensus in ad hoc WSNs with noisy links—part II: distributed estimation and smoothing of random signals. IEEE Trans. Signal Process. 56(4), 1650–1666 (2008)
Cao, Y., Yu, W., Ren, W., Chen, G.: An overview of recent progress in the study of distributed multi-agent coordination. IEEE Trans. Ind. Inform. 9(1), 427–438 (2013)
Bullo, F., Cortes, J., Martinez, S.: Distributed Control of Robotic Networks: A Mathematical Approach to Motion Coordination Algorithms, vol. 27. Princeton University Press, Princeton (2009)
Cevher, V., Becker, S., Schmidt, M.: Convex optimization for big data: scalable, randomized, and parallel algorithms for big data analytics. IEEE Signal Process. Mag. 31(5), 32–43 (2014)
Bekkerman, R., Bilenko, M., Langford, J.: Scaling Up Machine Learning: Parallel and Distributed Approaches. Cambridge University Press, Cambridge (2011)
Nedic, A., Ozdaglar, A.: Distributed subgradient methods for multi-agent optimization. IEEE Trans. Autom. Control 1(54), 48–61 (2009)
Tsianos, K.I., Rabbat, M.G.: Distributed strongly convex optimization. In: 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 593–600. IEEE (2012)
Nedic, A., Olshevsky, A., Shi, W.: Achieving geometric convergence for distributed optimization over time-varying graphs. SIAM J. Optim. 27(4), 2597–2633 (2017)
Shi, W., Ling, Q., Wu, G., Yin, W.: Extra: an exact first-order algorithm for decentralized consensus optimization. SIAM J. Optim. 25(2), 944–966 (2015)
Jakovetić, D.: A unification and generalization of exact distributed first-order methods. IEEE Trans. Signal Inf. Process. Netw. 5(1), 31–46 (2019)
Shi, W., Ling, Q., Yuan, K., Wu, G., Yin, W.: On the linear convergence of the ADMM in decentralized consensus optimization. IEEE Trans. Signal Process. 62(7), 1750–1761 (2014)
Mokhtari, A., Shi, W., Ling, Q., Ribeiro, A.: A decentralized second-order method with exact linear convergence rate for consensus optimization. IEEE Trans. Signal Inf. Process. Netw. 2(4), 507–522 (2016)
Chen, A.I., Ozdaglar, A.: A fast distributed proximal-gradient method. In: 2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 601–608. IEEE (2012)
Jakovetić, D., Xavier, J., Moura, J.M.: Fast distributed gradient methods. IEEE Trans. Autom. Control 59(5), 1131–1146 (2014)
Berahas, A.S., Bollapragada, R., Keskar, N.S., Wei, E.: Balancing communication and computation in distributed optimization. IEEE Trans. Autom. Control 64(8), 3141–3155 (2018)
Manitara, N.E., Hadjicostis, C.N.: Distributed stopping for average consensus in undirected graphs via event-triggered strategies. Automatica 70, 121–127 (2016)
Manitara, N.E., Hadjicostis, C.N.: Distributed stopping for average consensus in digraphs. IEEE Trans. Control of Netw. Syst. 5(3), 957–967 (2018)
Yadav, V., Salapaka, M.V.: Distributed protocol for determining when averaging consensus is reached. In: 45th Annual Allerton Conference, pp. 715–720 (2007)
Sundaram, S., Hadjicostis, C.N.: Finite-time distributed consensus in graphs with time-invariant topologies. In: 2007 American Control Conference, pp. 711–716. IEEE (2007)
Mou, S., Morse, A.S.: Finite-time distributed averaging. In: 2014 American Control Conference, pp. 5260–5263. IEEE (2014)
Nesterov, Y.: Introductory lectures on convex programming Volume I: basic course. Lect. Notes 3(4), 5 (1998)
Nedic, A., Ozdaglar, A., Parrilo, P.A.: Constrained consensus and optimization in multi-agent networks. IEEE Trans. Autom. Control 55(4), 922–938 (2010)
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: Proof of Theorem 3.1
Appendix: Proof of Theorem 3.1
We begin the proof by introducing the following additional notations. Recalling that \({\mathbf {x}}^{k+1/2}\) represents the intermediate result obtained by a pure gradient step, as shown in (5), we denote the exact average of \({\mathbf {x}}^k\) and \({\mathbf {x}}^{k+1/2}\) as \(\bar{{\mathbf {x}}}^k\) and \(\bar{{\mathbf {x}}}^{k+1/2}\), respectively, i.e.,
Moreover, we use \(\bar{{\mathbf {g}}}^k\) to denote the global gradient at the \(\bar{{\mathbf {x}}}^k\) and use \({\mathbf {g}}^k\) to denote the average of local gradient at \({\mathbf {x}}^k\), i.e.,
Next, we show by the following Lemma A.1 that, once the \(\varepsilon \)-inexact average consensus is guaranteed by \({\mathcal {C}}^{\varepsilon ^k}\), the distance \(\Vert {{\mathbf {x}}}_i^{k+1} - \bar{{\mathbf {x}}}^{k+1}\Vert \) can be bounded with the error \(\varepsilon ^k\).
Lemma A.1
Given that \({\mathbf {x}}^{k+1}\) is the output of \({\mathcal {C}}^{\varepsilon ^k}({\mathbf {x}}^{k+1/2})\) and thus \(\Vert {\mathbf {x}}_i^{k+1} - \bar{{\mathbf {x}}}^{k+1/2}\Vert \le \varepsilon ^{k}\) is guaranteed for \(\forall i\in {\mathcal {N}}\), then one can have,
Proof
By the definition of \(\bar{{\mathbf {x}}}^{k+1}\), it holds that
Therefore, for \(\forall i \in {\mathcal {N}}\), one can have,
\(\square \)
With the help of Lemma A.1, we next bound the distance between \(\bar{{\mathbf {x}}}^k\) and the optimizer \({\mathbf {x}}^\star \). Before doing that, we will need a few supporting lemmas.
Lemma A.2
Suppose that Assumption 1 holds, then one can have
where \(c_2 = 2L\).
Proof
By the definitions of \({\mathbf {g}}^k\) and \(\bar{{\mathbf {g}}}^k\), as shown in (26), it holds that
where (A.2.a) is due to Assumption 1 and (A.2.b) follows from Lemma A.1. \(\square \)
Lemma A.3
Suppose that Assumptions 1 and 2 hold, then for \(\forall {\mathbf {x}}, {\mathbf {y}} \in {\mathbb {R}}^p\), we can have,
Proof
The proof can be found in Theorem 2.1.11 in [22] and thus omitted. \(\square \)
Now, we bound the distance \(\Vert \bar{{\mathbf {x}}}^{k+1}-{\mathbf {x}}^\star \Vert \) by the following lemma.
Lemma A.4
Suppose that Assumptions 1 and 2 hold, and let the constant step size satisfy \(\alpha \le 2/(\mu + L)\), then one can have
where \(c_3 = \sqrt{1-2\alpha \mu L/(\mu +L)} < 1\) and \(c_4 = \alpha c_2 +1\).
Proof
Consider,
where (A.4.a) follows from Lemma A.3 and (A.4.b) is due to the assumption \(\alpha \le 2/(\mu + L)\). Therefore, it holds that
where (A.4.c) is due to the fact that \(\bar{{\mathbf {x}}}^{k+1/2} = \bar{{\mathbf {x}}}^{k} - \alpha {\mathbf {g}}^k\), (A.4.d) follows from Lemma A.1, (A.4.e) is due to (34) and Lemma A.2, and (A.4.f) is due to the assumption \(\varepsilon ^{k-1} \ge \varepsilon ^{k}\). \(\square \)
Here, we introduce the last supporting lemma, which states the convergence of product of two decaying sequences.
Lemma A.5
Given a constant \(0<\lambda <1\) and a positive scalar sequence \(\{\beta ^k\}_{k\in {\mathbb {N}}_+}\) which has \(\lim _{k \rightarrow \infty } \beta ^k = 0\), then one can have,
Proof
The proof can be found in Lemma 7 in [23], and thus omitted. \(\square \)
With the help of all the above lemmas, we are now ready to prove Theorem 3.1. By the result obtained from Lemma A.4, it holds that
According to Lemma A.5 and the fact that \(c_3 <1\), as shown in Lemma A.4, we can have \(\lim _{k \rightarrow \infty } \Vert \bar{{\mathbf {x}}}^{k+1} - {\mathbf {x}}^\star \Vert = 0\). Furthermore, it holds that
where (6.a) is due to (37) and Lemma A.1, and (6.b) follows from the fact \(c_3 < 1\) and Lemma A.5. Therefore, the statement 1) in Theorem 3.1 is proved.
At last, we consider statement 2) where the sequence of consensus error bounds decays linearly, i.e., there exist constants \(c_0 > 0\) and \(0< \rho _0 <1\) such that \(\varepsilon ^k \le c_0 (\rho _0)^k\). Following the same path that we proved statement 1), let us first show that the distance \(\Vert \bar{{\mathbf {x}}}^{k+1}-{\mathbf {x}}^\star \Vert \) converges to zero at a linear rate, i.e., there exist constants \(c' > 0\) and \(0< \rho ' = \max \{\rho _0, c_3 + c_4c_0/(c'\rho _0)\} <1\) such that
Recall inequality (33) obtained by Lemma A.4, we prove the statement by induction. First, (39) apparently holds when \(k=0\). Suppose that (39) holds at the kth iteration, we now consider the \((k+1)\)th iteration and have,
where (7.a) is due to the fact \(\rho ' = \max \{\rho _0, c_3 + c_4c_0/(c'\rho _0)\}\). Now, following the same procedure as shown in (38), there exist constants \(c_1 = \max \{2c_0/\rho _0, c'\}\) and \(\rho _1 = \max \{\rho _0, \rho '\}\) such that
where (8.a) follows from Lemma A.1. Therefore, the proof is completed. \(\square \)
Rights and permissions
About this article
Cite this article
Du, B., Zhou, J. & Sun, D. Improving the Convergence of Distributed Gradient Descent via Inexact Average Consensus. J Optim Theory Appl 185, 504–521 (2020). https://doi.org/10.1007/s10957-020-01648-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10957-020-01648-3