×

High-dimensional \(M\)-estimation for Byzantine-robust decentralized learning. (English) Zbl 1527.62029

Summary: In this paper, we focus on robust sparse \(M\)-estimation over decentralized networks in the presence of Byzantine attacks. In particular, a decentralized network is modeled as an undirected graph without a central node, while a small fraction of nodes usually behave arbitrarily and send erroneous information due to system breakdowns, cyber attacks and so on. To address the Byzantine issue, some pre-determined robust aggregation rules are applied. Moreover, the gradient tracking and proximal algorithm are combined to ensure convergence and sparsity simultaneously. Theoretically, our proposed algorithms are provably robust against Byzantine attacks and achieve linear convergence rates. The finite-sample performance is studied through numerical experiments under various settings and an application to Communities and Crime Data is also presented.

MSC:

62G08 Nonparametric regression and quantile regression
62J05 Linear regression; mixed models
90C25 Convex programming
Full Text: DOI

References:

[1] Baruch, G.; Baruch, M.; Goldberg, Y., A little is enough: circumventing defenses for distributed learning. Adv. Neural Inf. Process. Syst., 8632-8642 (2019)
[2] Bellet, A.; Guerraoui, R.; Taziki, M.; Tommasi, M., Personalized and private peer-to-peer machine learning. Int. Conf. Artif. Intell. Statist., 473-481 (2018)
[3] Blanchard, P.; El Mhamdi, E. M.; Guerraoui, R.; Stainer, J., Machine learning with adversaries: byzantine tolerant gradient descent. Adv. Neural Inf. Process. Syst., 119-129 (2017)
[4] Bubeck, S., Convex optimization: algorithms and complexity. Found. Trends® Mach. Learn., 3-4, 231-357 (2015) · Zbl 1365.90196
[5] Chen, J.; Chen, M.; Zeng, G.; Weng, J., Bdfl: a byzantine-fault-tolerance decentralized federated learning method for autonomous vehicle. IEEE Trans. Veh. Technol., 9, 8639-8652 (2021)
[6] Chen, X.; Liu, W.; Mao, X.; Yang, Z., Distributed high-dimensional regression under a quantile loss function. J. Mach. Learn. Res., 1, 7432-7474 (2020)
[7] Cheu, A.; Smith, A.; Ullman, J.; Zeber, D.; Zhilyaev, M., Distributed differential privacy via shuffling, 375-403 · Zbl 1470.94081
[8] Colin, I.; Bellet, A.; Salmon, J.; Clémençon, S., Gossip dual averaging for decentralized optimization of pairwise functions. Int. Conf. Mach. Learn., 1388-1396 (2016)
[9] Di Lorenzo, P.; Scutari, G., Next: in-network nonconvex optimization. IEEE Trans. Signal Inf. Process. Netw., 2, 120-136 (2016)
[10] Elkordy, A. R.; Prakash, S.; Avestimehr, S., Basil: a fast and byzantine-resilient approach for decentralized training. IEEE J. Sel. Areas Commun., 9, 2694-2716 (2022)
[11] Fan, J.; Li, R., Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc., 456, 1348-1360 (2001) · Zbl 1073.62547
[12] Fang, C.; Yang, Z.; Bajwa, W. U., Bridge: byzantine-resilient decentralized gradient descent. IEEE Trans. Signal Inf. Process. Netw., 610-626 (2022)
[13] Fang, M.; Cao, X.; Jia, J.; Gong, N., Local model poisoning attacks to byzantine-robust federated learning, 1605-1622
[14] Ghosh, A.; Maity, R. K.; Kadhe, S.; Mazumdar, A.; Ramchandran, K., Communication-efficient and byzantine-robust distributed learning with error feedback. IEEE J. Sel. Areas Inf. Theory, 3, 942-953 (2021)
[15] Guo, S.; Zhang, T.; Yu, H.; Xie, X.; Ma, L.; Xiang, T.; Liu, Y., Byzantine-resilient decentralized stochastic gradient descent. IEEE Trans. Circuits Syst. Video Technol., 6, 4096-4106 (2021)
[16] Hastie, T.; Tibshirani, R.; Wainwright, M., Statistical Learning with Sparsity: The Lasso and Generalizations (2015), CRC Press: CRC Press Cambridge · Zbl 1319.68003
[17] He, L.; Karimireddy, S. P.; Jaggi, M., Byzantine-robust decentralized learning via self-centered clipping (2022), preprint
[18] Hou, J.; Wang, F.; Wei, C.; Huang, H.; Hu, Y.; Gui, N., Credibility assessment based byzantine-resilient decentralized learning. IEEE Trans. Dependable Secure Comput., 1-12 (2022)
[19] Hu, J.; Chen, G.; Li, H., Prox-dbro-vr: a unified analysis on decentralized byzantine-resilient proximal stochastic optimization with variance reduction and non-asymptotic convergence rates (2023), preprint
[20] Karimireddy, S. P.; He, L.; Jaggi, M., Learning from history for byzantine robust optimization. Int. Conf. Mach. Learn., 5311-5319 (2021)
[21] Karimireddy, S. P.; Rebjock, Q.; Stich, S.; Jaggi, M., Error feedback fixes signsgd and other gradient compression schemes. Int. Conf. Mach. Learn., 3252-3261 (2019)
[22] Konan, S.; Seraj, E.; Gombolay, M., Iterated reasoning with mutual information in cooperative and byzantine decentralized teaming (2022), preprint
[23] Kuwaranancharoen, K.; Sundaram, S., On the geometric convergence of byzantine-resilient distributed optimization algorithms (2023), preprint
[24] Lamport, L.; Shostak, R.; Pease, M., The byzantine generals problem. ACM Trans. Program. Lang. Syst., 3, 382-401 (1982) · Zbl 0483.68021
[25] Lecué, G.; Lerasle, M., Robust machine learning by median-of-means: theory and practice. Ann. Stat., 2, 906-931 (2020) · Zbl 1487.62034
[26] Li, T.; Sahu, A. K.; Zaheer, M.; Sanjabi, M.; Talwalkar, A.; Smith, V., Federated optimization in heterogeneous networks. Proc. Mach. Learn. Syst., 429-450 (2020)
[27] Lian, X.; Zhang, C.; Zhang, H.; Hsieh, C.-J.; Zhang, W.; Liu, J., Can decentralized algorithms outperform centralized algorithms? A case study for decentralized parallel stochastic gradient descent. Adv. Neural Inf. Process. Syst., 5336-5346 (2017)
[28] McMahan, B.; Moore, E.; Ramage, D.; Hampson, S.; y Arcas, B. A., Communication-efficient learning of deep networks from decentralized data. Artif. Intell. Stat., 1273-1282 (2017)
[29] Nedic, A., Distributed gradient methods for convex machine learning problems in networks: distributed optimization. IEEE Signal Process. Mag., 3, 92-101 (2020)
[30] Nedić, A.; Olshevsky, A.; Rabbat, M. G., Network topology and communication-computation tradeoffs in decentralized optimization. Proc. IEEE, 5, 953-976 (2018)
[31] Nedic, A.; Ozdaglar, A., Distributed subgradient methods for multi-agent optimization. IEEE Trans. Autom. Control, 1, 48-61 (2009) · Zbl 1367.90086
[32] Parikh, N.; Boyd, S., Proximal algorithms. Found. Trends® Optim., 3, 127-239 (2014)
[33] Peng, J.; Li, W.; Ling, Q., Byzantine-robust decentralized stochastic optimization over static and time-varying networks. Signal Process. (2021)
[34] Peng, J.; Li, W.; Ling, Q., Byzantine-robust decentralized stochastic optimization with stochastic gradient noise-independent learning error (2023), preprint
[35] Pillutla, K.; Kakade, S. M.; Harchaoui, Z., Robust aggregation for federated learning. IEEE Trans. Signal Process., 1142-1154 (2022) · Zbl 07911322
[36] Richards, D.; Rebeschini, P., Optimal statistical rates for decentralised non-parametric regression with linear speed-up. Adv. Neural Inf. Process. Syst., 1216-1227 (2019)
[37] Richards, D.; Rebeschini, P.; Rosasco, L., Decentralised learning with random features and distributed gradient descent. Int. Conf. Mach. Learn., 8105-8115 (2020)
[38] Rockafellar, R. T., Monotone operators and the proximal point algorithm. SIAM J. Control Optim., 5, 877-898 (1976) · Zbl 0358.90053
[39] Savazzi, S.; Nicoli, M.; Rampa, V., Federated learning with cooperating devices: a consensus approach for massive iot networks. IEEE Int. Things J., 5, 4641-4654 (2020)
[40] Sayed, A. H., Adaptation, learning, and optimization over networks. Found. Trends® Mach. Learn., 4-5, 311-801 (2014) · Zbl 1315.68212
[41] Smith, V.; Chiang, C.-K.; Sanjabi, M.; Talwalkar, A. S., Federated multi-task learning. Adv. Neural Inf. Process. Syst., 4424-4434 (2017)
[42] Su, L.; Vaidya, N. H., Byzantine-resilient multiagent optimization. IEEE Trans. Autom. Control, 5, 2227-2233 (2020) · Zbl 1536.93187
[43] Sun, Q.; Zhou, W.; Fan, J., Adaptive huber regression. J. Am. Stat. Assoc., 529, 254-265 (2020) · Zbl 1437.62250
[44] Sun, Y.; Maros, M.; Scutari, G.; Cheng, G., High-dimensional inference over networks: linear convergence and statistical guarantees (2022), preprint
[45] Tao, H.; Qiu, J.; Chen, Y.; Stojanovic, V.; Cheng, L., Unsupervised cross-domain rolling bearing fault diagnosis based on time-frequency information fusion. J. Franklin Inst., 2, 1454-1477 (2023) · Zbl 1506.93064
[46] Tibshirani, R., Regression shrinkage and selection via the lasso. J. R. Stat. Soc., Ser. B, Methodol., 1, 267-288 (1996) · Zbl 0850.62538
[47] Tu, J.; Liu, W.; Mao, X., Byzantine-robust distributed sparse learning for m-estimation. Mach. Learn., 10, 3773-3804 (2023) · Zbl 1534.68209
[48] Wang, R.; Zhuang, Z.; Tao, H.; Paszke, W.; Stojanovic, V., Q-learning based fault estimation and fault tolerant iterative learning control for mimo systems. ISA Trans. (2023)
[49] Wei, K.; Li, J.; Ding, M.; Ma, C.; Yang, H. H.; Farokhi, F.; Jin, S.; Quek, T. Q.; Poor, H. V., Federated learning with differential privacy: algorithms and performance analysis. IEEE Trans. Inf. Forensics Secur., 3454-3469 (2020)
[50] Wu, S.; Huang, D.; Wang, H., Network gradient descent algorithm for decentralized federated learning. J. Bus. Econ. Stat., 3, 806-818 (2023) · Zbl 1531.62203
[51] Wu, Z.; Chen, T.; Ling, Q., Byzantine-resilient decentralized stochastic optimization with robust aggregation rules. IEEE Trans. Signal Process., 3179-3195 (2023) · Zbl 07911921
[52] Xie, C.; Koyejo, O.; Gupta, I., Generalized byzantine-tolerant sgd (2018), preprint
[53] Xie, C.; Koyejo, O.; Gupta, I., Fall of empires: breaking byzantine-tolerant sgd by inner product manipulation. Uncertainty Artif. Intell., 261-270 (2020)
[54] Xu, J.; Huang, S., Byzantine-resilient decentralized collaborative learning. Int. Conf. Acoust. Speech Signal Proc., 5253-5257 (2022)
[55] Xu, J.; Zhu, S.; Soh, Y. C.; Xie, L., Convergence of asynchronous distributed gradient methods over stochastic networks. IEEE Trans. Autom. Control, 2, 434-448 (2017) · Zbl 1390.90433
[56] Yang, Z.; Bajwa, W. U., Byrdie: byzantine-resilient distributed coordinate descent for decentralized learning. IEEE Trans. Signal Inf. Process. Netw., 4, 611-627 (2019)
[57] Ye, H.; Zhu, H.; Ling, Q., On the tradeoff between privacy preservation and byzantine-robustness in decentralized learning (2023), preprint
[58] Yin, D.; Chen, Y.; Kannan, R.; Bartlett, P., Byzantine-robust distributed learning: towards optimal statistical rates. Int. Conf. Mach. Learn., 5650-5659 (2018)
[59] Yuan, K.; Ling, Q.; Yin, W., On the convergence of decentralized gradient descent. SIAM J. Optim., 3, 1835-1854 (2016) · Zbl 1345.90068
[60] Zhang, C., Nearly unbiased variable selection under minimax concave penalty. Ann. Stat., 2, 894-942 (2010) · Zbl 1183.62120
[61] Zhao, P.; Yu, B., On model selection consistency of lasso. J. Mach. Learn. Res., 2541-2563 (2006) · Zbl 1222.62008
[62] Zhou, C.; Tao, H.; Chen, Y.; Stojanovic, V.; Paszke, W., Robust point-to-point iterative learning control for constrained systems: a minimum energy approach. Int. J. Robust Nonlinear Control, 18, 10139-10161 (2022) · Zbl 1529.93023
[63] Zhou, X.; Chang, L.; Xu, P.; Lv, S., Communication-efficient and byzantine-robust distributed learning with statistical guarantee. Pattern Recognit. (2023)
[64] Zhu, B.; Wang, L.; Pang, Q.; Wang, S.; Jiao, J.; Song, D.; Jordan, M. I., Byzantine-robust federated learning with optimal statistical rates. Int. Conf. Artif. Intell. Statist., 3151-3178 (2023)
[65] Zhuang, Z.; Tao, H.; Chen, Y.; Stojanovic, V.; Paszke, W., An optimal iterative learning control approach for linear systems with nonuniform trial lengths under input constraints. IEEE Trans. Syst. Man Cybern. Syst., 6, 3461-3473 (2023)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.