×

Solving fused penalty estimation problems via block splitting algorithms. (English) Zbl 07499257

Summary: We propose a method for solving a penalized estimation problem in which the penalty function is a function of differences between pairs of parameter vectors. In most cases such a penalty function is not separable in terms of the parameter vectors. This undesirable property often makes large scale estimation difficult with the penalty function. To solve the estimation problem in a separable way, we introduce a set of equality constraints that connect each parameter vector to a group of auxiliary variables. These auxiliary variables allow us to reformulate the estimation problem that is separable either in terms of the parameter vectors or in terms of the auxiliary variables. This separable property further facilitates us to solve the problem with an iterative scheme in that tasks within each iteration can be carried out separately in parallel. Our simulation results show that the iterative scheme has advantages over its traditional counterpart in terms of computational time and memory usage. Additional theoretical analysis shows that the iterative scheme can make the objective function approach to the optimal value with a convergence rate \(O(r^{-1})\), where \(r\) is the iteration number. Supplementary materials ffor this article are available online.

MSC:

62-XX Statistics
Full Text: DOI

References:

[1] Bates, D.; Eddelbuettel, D., “Fast and Elegant Numerical Linear Algebra Using the RcppEigen Package,”, Journal of Statistical Software, 52, 1-24 (2013) · doi:10.18637/jss.v052.i05
[2] Bento, J.; Furmaniak, R.; Ray, S., “On the Complexity of the Weighted Fused Lasso,”, IEEE Signal Processing Letters, 25, 1595-1599 (2018) · doi:10.1109/LSP.2018.2867800
[3] Boyd, S.; Parikh, N.; Chu, E.; Peleato, B.; Eckstein, J., “Distributed Optimization and Statistical Learning via Alternating Direction Method of Multipliers,”, Foundations and Trends in Machine Learning, 3, 1-122 (2011) · Zbl 1229.90122 · doi:10.1561/2200000016
[4] Chambolle, A.; Pock, T., “A First-Order Primal-Dual Algorithm for Convex Problems With Applications to Imaging,”, Journal of Mathematical Imaging Vision, 40, 120-145 (2011) · Zbl 1255.68217 · doi:10.1007/s10851-010-0251-1
[5] Chan, N. H.; Yau, Y. C.; Zhang, R.-M., “Group LASSO for Structural Break Time Series,”, Journal of the American Statistical Association, 109, 590-599 (2014) · Zbl 1367.62251 · doi:10.1080/01621459.2013.866566
[6] Chen, G.; Teboulle, M., “A Proximal-Based Decomposition Method for Convex Minimization Problems,”, Mathematical Programming, 64, 81-101 (1994) · Zbl 0823.90097 · doi:10.1007/BF01582566
[7] Chi, E. C.; Lange, K., “Splitting Methods for Convex Clustering,”, Journal of Computational and Graphical Statistics, 24, 994-1013 (2015) · doi:10.1080/10618600.2014.948181
[8] Combettes, P.; Pesquet, J.-C.; Bauschke, H. H.; Burachik, R.; Combettes, P. L.; Elser, V.; Luke, D. R.; Wolkowicz, H., Fixed-Point Algorithms for Inverse Problems in Science and Engineering, Proximal Splitting Methods in Signal Processing,” (2011), New York: Springer, New York · Zbl 1217.00018
[9] Condat, L., “A Direct Algorithm for 1-D Total Variation Denoising,”, IEEE Signal Processing Letters, 20, 1054-1057 (2013) · doi:10.1109/LSP.2013.2278339
[10] Csardi, G.; Nepusz, T., The igraph Software Package for Complex Network Research,”, InterJournal, Complex Systems, 1695, 1-9 (2006)
[11] Davies, P. L.; Kovac, A., “Local Extremes, Runs, Strings and Multiresolution,”, The Annals of Statistics, 29, 1-48 (2001) · Zbl 1029.62038 · doi:10.1214/aos/996986501
[12] Eddelbuettel, D.; François, R., Rcpp: Seamless R and C++ Integration,”, Journal of Statistical Software, 40, 1-18 (2011) · doi:10.18637/jss.v040.i08
[13] Fougner, C., Pogs—Proximal Operator Graph Solver,”, Matrix, 10, 4 (2014)
[14] França, G.; Bento, J., “An Explicit Rate Bound for Over-Relaxed ADMM, 2104-2108 (2016)
[15] França, G.; Bento, J., “How Is Distributed ADMM Affected by Network Topology?,”, arXiv no. 1710.00889 (2017)
[16] Ghadimi, E.; Teixeira, A.; Rabbat, M. G.; Johansson, M., “The ADMM Algorithm for Distributed Averaging: Convergence Rates and Optimal Parameter Selection, 783-787 (2014)
[17] Ghadimi, E.; Teixeira, A.; Shames, I.; Johansson, M., “Optimal Parameter Selection for the Alternating Direction Method of Multipliers (ADMM): Quadratic Problems,”, IEEE Transactions on Automatic Control, 60, 644-658 (2015) · Zbl 1360.90182 · doi:10.1109/TAC.2014.2354892
[18] Giselsson, P.; Boyd, S., “Linear Convergence and Metric Selection for Douglas-Rachford Splitting and ADMM,”, IEEE Transactions on Automatic Control, 62, 532-544 (2017) · Zbl 1364.90256 · doi:10.1109/TAC.2016.2564160
[19] Golub, G. H.; Van Loan, C. F., Matrix Computation (1996), Baltimore, MD: The Johns Hopkins University Press, Baltimore, MD · Zbl 0865.65009
[20] Hallac, D.; Leskovec, J.; Boyd, S., Network Lasso: Clustering and Optimization in Large Graphs, 387-396 (2015)
[21] Hallac, D.; Wong, C.; Diamond, S.; Sharang, A.; Boyd, S., “SnapVX: A Network-Based Convex Optimization Solver,”, Journal of Machine Learning Research, 18, 1-5 (2017) · Zbl 1433.68345
[22] Hao, N.; Oghbaee, A.; Rostami, M.; Derbinsky, N.; Bento, J., “Testing Fine-Grained Parallelism for the ADMM on a Factor-Graph, 835-844 (2016)
[23] He, B.; Yuan, X., “On the \(O####\) Convergence Rate of the Douglas-Rachford Alternating Direction method,”, SIAM Journal of Numerical Analysis, 50, 700-709 (2012) · Zbl 1245.90084 · doi:10.1137/110836936
[24] Hoefling, H., “A Path Algorithm for the Fused Lasso Signal Approximator,”, Journal of Computational and Graphical Statistics, 19, 984-1006 (2010) · doi:10.1198/jcgs.2010.09208
[25] Ioannidis, S.; Jiang, Y.; Amizadeh, S.; Laptev, N., “Parallel News-Article Traffic Forecasting With ADMM.” (2015)
[26] Johnson, N. A., “A Dynamic Programming Algorithm for the Fused Lasso and L_0-Segmentation,”, Journal of Computational and Graphical Statistics, 22, 246-260 (2013) · doi:10.1080/10618600.2012.681238
[27] Mairal, J.; Yu, B., “Complexity Analysis of the Lasso Regularization Path,”, arXiv no. 1205.0079 (2012)
[28] Makhdoumi, A.; Ozdaglar, A., “Convergence Rate of Distributed ADMM Over Networks,”, IEEE Transactions on Automatic Control, 62, 5082-5095 (2017) · Zbl 1390.90551 · doi:10.1109/TAC.2017.2677879
[29] Masarotto, G.; Varin, C., “The Ranking Lasso and Its Application to Sport Tournaments,”, The Annals of Applied Statistics, 6, 1949-1970 (2012) · Zbl 1257.62020 · doi:10.1214/12-AOAS581
[30] Nelder, J. A.; Wedderburn, R. W. M., “Generalized Linear Models,”, Journal of the Royal Statistical Society, Series A, 135, 370-384 (1972) · doi:10.2307/2344614
[31] Nesterov, Y., “Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems,”, SIAM Journal of Optimization, 22, 341-362 (2012) · Zbl 1257.90073 · doi:10.1137/100802001
[32] Parikh, N.; Boyd, S., “Proximal Algorithms,”, Foundations and Trends in Optimization, 1, 123-231 (2013)
[33] Parikh, N.; Boyd, S., “Block Splitting for Distributed Optimization,”, Mathematical Programming and Computation, 6, 77-102 (2014) · Zbl 1305.90291 · doi:10.1007/s12532-013-0061-8
[34] Polson, N. G.; Scott, J. G.; Willard, B. T., “Proximal Algorithms in Statistics and Machine Learning,”, Statistical Science, 30, 559-581 (2015) · Zbl 1426.62213 · doi:10.1214/15-STS530
[35] R Core Team, R: A Language and Environment for Statistical Computing (2019), Vienna, Austria: R Foundation for Statistical Computing, Vienna, Austria
[36] Radchenko, P.; Mukherjee, G., “Convex Clustering via l_1 Fusion Penalization,”, Journal of the Royal Statistical Society, Series B, 79, 1527-1546 (2017) · Zbl 1381.62193 · doi:10.1111/rssb.12226
[37] Ramdas, A.; Tibshirani, R. J., “Fast and Flexible ADMM Algorithms for Trend Filtering,”, Journal of Computational and Graphical Statistics, 25, 839-258 (2016) · doi:10.1080/10618600.2015.1054033
[38] Richtárik, P.; Takáĉ, M., “Parallel Coordinate Descent Methods for Big Data Optimization,”, Mathematical Programming, 156, 433-484 (2016) · Zbl 1342.90102 · doi:10.1007/s10107-015-0901-6
[39] Shi, W.; Ling, Q.; Yuan, K.; Wu, G.; Yin, W., “On the Linear Convergence of the ADMM in Decentralized Consensus Optimization,”, IEEE Transactions on Signal Processing, 62, 1750-1761 (2014) · Zbl 1394.94532 · doi:10.1109/TSP.2014.2304432
[40] Teixeira, A.; Ghadimi, E.; Shames, I.; Sandberg, H., Optimal Scaling of the ADMM Algorithm for Distributed Quadratic Programming, 6868-6873 (2013)
[41] Tibshirani, R.; Saunders, M.; Rosset, S.; Zhu, J.; Knight, K., “Sparsity and Smoothness via the Fused Lasso,”, Journal of the Royal Statistical Society, Series B, 67, 91-108 (2005) · Zbl 1060.62049 · doi:10.1111/j.1467-9868.2005.00490.x
[42] Tibshirani, R. J.; Taylor, J., “The Solution Path of the Generalized Lasso,”, The Annals of Statistics, 39, 1335-1371 (2011) · Zbl 1234.62107 · doi:10.1214/11-AOS878
[43] Wei, E.; Ozdaglar, A. A., Distributed Alternating Direction Method of Multipliers, 5445-5450 (2012)
[44] Wickham, H., ggplot2: Elegant Graphics for Data Analysis (2016), New York: Springer-Verlag, New York · Zbl 1397.62006
[45] Wytock, M.; Wang, P.-W.; Kolter, J. Z., “Convex Programming With Fast Proximal and Linear Operators,”, arXiv no. 1511.04815 (2015)
[46] Zhu, Y., “An Augmented ADMM Algorithm With Application to Generalized Lasso Problem,”, Journal of Computational and Graphical Statistics, 26, 195-204 (2017) · doi:10.1080/10618600.2015.1114491
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.