Document Zbl 07612069

Ko, Seyoon; Zhou, Hua; Zhou, Jin J.; Won, Joong-Ho

High-performance statistical computing in the computing environments of the 2020s. (English) Zbl 07612069

Stat. Sci. 37, No. 4, 494-518 (2022).

Summary: Technological advances in the past decade, hardware and software alike, have made access to high-performance computing (HPC) easier than ever. We review these advances from a statistical computing perspective. Cloud computing makes access to supercomputers affordable. Deep learning software libraries make programming statistical algorithms easy and enable users to write code once and run it anywhere – from a laptop to a workstation with multiple graphics processing units (GPUs) or a supercomputer in a cloud. Highlighting how these developments benefit statisticians, we review recent optimization algorithms that are useful for high-dimensional models and can harness the power of HPC. Code snippets are provided to demonstrate the ease of programming. We also provide an easy-to-use distributed matrix data structure suitable for HPC. Employing this data structure, we illustrate various statistical applications including large-scale positron emission tomography and \(\ell_1\)-regularized Cox regression. Our examples easily scale up to an 8-GPU workstation and a 720-CPU-core cluster in a cloud. As a case in point, we analyze the onset of type-2 diabetes from the UK Biobank with 200,000 subjects and about 500,000 single nucleotide polymorphisms using the HPC \(\ell_1\)-regularized Cox regression. Fitting this half-million-variate model takes less than 45 minutes and reconfirms known associations. To our knowledge, this is the first demonstration of the feasibility of penalized regression of survival outcomes at this scale.

Cited in 1 Document

MSC:

62-XX

Statistics

Keywords:

high-performance statistical computing; graphics processing units (GPUs); cloud computing; deep learning; MM algorithms; ADMM; PDHG; Cox regression

Software:

DAVID; CUDA; MXNet; Torch; Theano; MPIArrays.jl; Caffe; R; OpenMPI; Hadoop; TensorFlow; PyTorch; Saga; OpenCL; CUDA.jl; SUMMA; MKL; PyOpenCL; Spark; GitHub; CUSPARSE; gputools; DiffSharp; PyCUDA; DistributedArrays.jl; CUDAnative.jl; Numba; UNLocBoX; Slurm; CNTK; Julia; Sun Grid Engine; Python; CUBLAS; reticulate; MapReduce; BLAS; proxdist; Horovod; OpenBLAS

Cite Review PDF

Full Text: DOI arXiv Link

References:

[1]	ABADI, M., AGARWAL, A., BARHAM, P., BREVDO, E., CHEN, Z., CITRO, C., CORRADO, G. S., DAVIS, A., DEAN, J. et al. (2016). TensorFlow: Large-scale machine learning on heterogeneous systems. Preprint. Available at arXiv:1603.04467. Software available from: https://tensorflow.org.
[2]	ARMBRUST, M., FOX, A., GRIFFITH, R., JOSEPH, A. D., KATZ, R., KONWINSKI, A., LEE, G., PATTERSON, D., RABKIN, A. et al. (2010). A view of cloud computing. Commun. ACM 53 50-58.
[3]	ATCHADÉ, Y. F., FORT, G. and MOULINES, E. (2017). On perturbed proximal gradient algorithms. J. Mach. Learn. Res. 18 Paper No. 10, 33. · Zbl 1433.90199
[4]	BAHRAMPOUR, S., RAMAKRISHNAN, N., SCHOTT, L. and SHAH, M. (2016). Comparative study of deep learning software frameworks. Preprint. Available at arXiv:1511.06435.
[5]	BALLARD, G., DEMMEL, J., HOLTZ, O. and SCHWARTZ, O. (2011). Minimizing communication in numerical linear algebra. SIAM J. Matrix Anal. Appl. 32 866-901. · Zbl 1246.68128 · doi:10.1137/090769156
[6]	Bauer, B. and Kohler, M. (2019). On deep learning as a remedy for the curse of dimensionality in nonparametric regression. Ann. Statist. 47 2261-2285. · Zbl 1421.62036 · doi:10.1214/18-AOS1747
[7]	BAYDIN, A. G., PEARLMUTTER, B. A., RADUL, A. A. and SISKIND, J. M. (2017). Automatic differentiation in machine learning: A survey. J. Mach. Learn. Res. 18 Paper No. 153, 43. · Zbl 06982909
[8]	BECK, A. (2017). First-Order Methods in Optimization. MOS-SIAM Series on Optimization 25. SIAM, Philadelphia, PA. · doi:10.1137/1.9781611974997.ch1
[9]	Beck, A. and Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2 183-202. · Zbl 1175.94009 · doi:10.1137/080716542
[10]	BESARD, T., FOKET, C. and DE SUTTER, B. (2019). Effective extensible programming: Unleashing Julia on GPUs. IEEE Trans. Parallel Distrib. Syst. 30 827-841.
[11]	Bezanson, J., Edelman, A., Karpinski, S. and Shah, V. B. (2017). Julia: a fresh approach to numerical computing. SIAM Rev. 59 65-98. · Zbl 1356.68030 · doi:10.1137/141000671
[12]	BLACKFORD, L. S. et al. (2002). An updated set of basic linear algebra subprograms (BLAS). ACM Trans. Math. Software 28 135-151. · Zbl 1070.65520 · doi:10.1145/567806.567807
[13]	Boyd, S., Parikh, N., Chu, E., Peleato, B. and Eckstein, J. (2011). Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3 1-122. · Zbl 1229.90122
[14]	BUCKNER, J., WILSON, J., SELIGMAN, M., ATHEY, B., WATSON, S. and MENG, F. (2009). The gputools package enables GPU computing in R. Bioinformatics 26 134-135.
[15]	Chambolle, A. and Pock, T. (2011). A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vision 40 120-145. · Zbl 1255.68217 · doi:10.1007/s10851-010-0251-1
[16]	CHAMBOLLE, A., EHRHARDT, M. J., RICHTÁRIK, P. and SCHÖNLIEB, C.-B. (2018). Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications. SIAM J. Optim. 28 2783-2808. · Zbl 06951767 · doi:10.1137/17M1134834
[17]	CHEN, Y., LAN, G. and OUYANG, Y. (2014). Optimal primal-dual methods for a class of saddle point problems. SIAM J. Optim. 24 1779-1814. · Zbl 1329.90090 · doi:10.1137/130919362
[18]	CHEN, T., LI, M., LI, Y., LIN, M., WANG, N., WANG, M., XIAO, T., XU, B., ZHANG et al. (2015). MXNet: A flexible and efficient machine learning library for heterogeneous distributed systems. Preprint. Available at arXiv:1512.01274.
[19]	CHI, E. C., ZHOU, H. and LANGE, K. (2014). Distance majorization and its applications. Math. Program. 146 409-436. · Zbl 1297.65067 · doi:10.1007/s10107-013-0697-1
[20]	CHU, D., ZHANG, C., SUN, S. and TAO, Q. (2020). Semismooth Newton algorithm for efficient projections onto \[{\ell_{1,\infty }}\]-norm ball. In ICML 2020. Proc. Mach. Learn. Res. 119 1974-1983.
[21]	CHURCH, D. M., SCHNEIDER, V. A., GRAVES, T., AUGER, K., CUNNINGHAM, F., BOUK, N., CHEN, H.-C., AGARWALA, R., MCLAREN, W. M. et al. (2011). Modernizing reference genome assemblies. PLoS Biol. 9 e1001091.
[22]	COLLOBERT, R., KAVUKCUOGLU, K. and FARABET, C. (2011). Torch7: A Matlab-like environment for machine learning. In BigLearn, NeurIPS Workshop.
[23]	COMBETTES, P. L. (2018). Monotone operator theory in convex optimization. Math. Program. 170 177-206. · Zbl 1471.47033 · doi:10.1007/s10107-018-1303-3
[24]	COMBETTES, P. L. and PESQUET, J.-C. (2011). Proximal splitting methods in signal processing. In Fixed-Point Algorithms for Inverse Problems in Science and Engineering. Springer Optim. Appl. 49 185-212. Springer, New York. · Zbl 1242.90160 · doi:10.1007/978-1-4419-9569-8_10
[25]	CONDAT, L. (2013). A primal-dual splitting method for convex optimization involving Lipschitzian, proximable and linear composite terms. J. Optim. Theory Appl. 158 460-479. · Zbl 1272.90110 · doi:10.1007/s10957-012-0245-9
[26]	THE WELLCOME TRUST CASE CONTROL CONSORTIUM (2007). Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447 661-678.
[27]	COOK, A. L., CHEN, W., THURBER, A. E., SMIT, D. J., SMITH, A. G., BLADEN, T. G., BROWN, D. L., DUFFY, D. L., PASTORINO, L. et al. (2009). Analysis of cultured human melanocytes based on polymorphisms within the SLC45A2/MATP, SLC24A5/NCKX5, and OCA2/P loci. J. Invest. Dermatol. 129 392-405.
[28]	Cox, D. R. (1972). Regression models and life-tables. J. Roy. Statist. Soc. Ser. B 34 187-220. · Zbl 0243.62041
[29]	Dean, J. and Ghemawat, S. (2008). Mapreduce: Simplified data processing on large clusters. Commun. ACM 51 107-113.
[30]	DEFAZIO, A., BACH, F. and LACOSTE-JULIEN, S. (2014). SAGA: A fast incremental gradient method with support for non-strongly convex composite objectives. In NeurIPS 2014. Adv. Neural Inform. Process. Syst. 27 1646-1654.
[31]	Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. Ser. B 39 1-38. · Zbl 0364.62022
[32]	DONOHO, D. (2017). 50 years of data science. J. Comput. Graph. Statist. 26 745-766. · doi:10.1080/10618600.2017.1384734
[33]	Duchi, J. C., Jordan, M. I., Wainwright, M. J. and Zhang, Y. (2014). Optimality guarantees for distributed statistical estimation. Preprint. Available at arXiv:1405.0782.
[34]	DUPUIS, J., LANGENBERG, C., PROKOPENKO, I., SAXENA, R., SORANZO, N., JACKSON, A. U., WHEELER, E., GLAZER, N. L., BOUATIA-NAJI, N. et al. (2010). New genetic loci implicated in fasting glucose homeostasis and their impact on type 2 diabetes risk. Nat. Genet. 42 105-116.
[35]	EDDELBUETTEL, D. (2021). Parallel computing with R: A brief review. Wiley Interdiscip. Rev.: Comput. Stat. 13 Paper No. e1515, 13. · Zbl 07910737 · doi:10.1002/wics.1515
[36]	EIJKHOUT, V. (2016). Introduction to High Performance Scientific Computing, 2nd ed., Lulu.com.
[37]	ESSER, E., ZHANG, X. and CHAN, T. F. (2010). A general framework for a class of first order primal-dual algorithms for convex optimization in imaging science. SIAM J. Imaging Sci. 3 1015-1046. · Zbl 1206.90117 · doi:10.1137/09076934X
[38]	EVANGELINOS, C. and HILL, C. N. (2008). Cloud computing for parallel scientific HPC applications: Feasibility of running coupled atmosphere-ocean climate models on Amazon’s EC2. In CCA 2008 ACM, New York.
[39]	FAN, J., GUO, Y. and WANG, K. (2019). Communication-efficient accurate statistical estimation. J. Am. Stat. Assoc. · Zbl 07707218 · doi:10.1080/01621459.2021.1969238
[40]	APACHE SOFTWARE FOUNDATION (2021). ‘Apache Hadoop’, https://hadoop.apache.org. Version 3.3.1. Accessed: 2021-07-03.
[41]	FOX, A. (2011). Cloud computing—What’s in it for me as a scientist?. Science 331 406-407.
[42]	GABAY, D. and MERCIER, B. (1976). A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput. Math. Appl. 2 17-40. · Zbl 0352.65034
[43]	GABRIEL, E., FAGG, G. E., BOSILCA, G., ANGSKUN, T., DONGARRA, J. J., SQUYRES, J. M., SAHAY, V., KAMBADUR, P., BARRETT, B. et al. (2004). Open MPI: Goals, concept, and design of a next generation MPI implementation. In ‘Proceedings of the 11th European PVM/MPI Users’ Group Meeting’ 97-104, Budapest, Hungary.
[44]	GENTZSCH, W. (2001). Sun Grid Engine: Towards creating a compute power grid. In CCGRID 2001 35-36. IEEE Comput. Soc., Los Alamitos, CA.
[45]	GITTENS, A., DEVARAKONDA, A., RACAH, E., RINGENBURG, M., GERHARDT, L., KOTTALAM, J., LIU, J., MASCHHOFF, K., CANON, S. et al. (2016). Matrix factorizations at scale: A comparison of scientific data analytics in Spark and C + MPI using three case studies. In 2016 IEEE BigData 204-213. IEEE, New York.
[46]	Golub, G. H. and Van Loan, C. F. (2013). Matrix Computations, 4th ed. Johns Hopkins Studies in the Mathematical Sciences. Johns Hopkins Univ. Press, Baltimore, MD. · Zbl 1268.65037
[47]	GOWER, R. M., LOIZOU, N., QIAN, X., SAILANBAYEV, A., SHULGIN, E. and RICHTÁRIK, P. (2019). SGD: General analysis and improved rates. In ICML 2019. Proc. Mach. Learn. Res. 97 5200-5209.
[48]	GRIEWANK, A. and WALTHER, A. (2008). Evaluating Derivatives: Principles and Techniques of Algorithmic Differentiation, 2nd ed. SIAM, Philadelphia, PA. · Zbl 1159.65026 · doi:10.1137/1.9780898717761
[49]	Gu, Y., Fan, J., Kong, L., Ma, S. and Zou, H. (2018). ADMM for high-dimensional sparse penalized quantile regression. Technometrics 60 319-331. · doi:10.1080/00401706.2017.1345703
[50]	HAGER, G. and WELLEIN, G. (2010). Introduction to High Performance Computing for Scientists and Engineers. CRC Press, Boca Raton, FL.
[51]	Hastie, T. J. and Tibshirani, R. J. (1990). Generalized Additive Models. Monographs on Statistics and Applied Probability 43. CRC Press, London. · Zbl 0747.62061
[52]	HUANG, D. W., SHERMAN, B. T. and LEMPICKI, R. A. (2009a). Bioinformatics enrichment tools: Paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 37 1-13.
[53]	HUANG, D. W., SHERMAN, B. T. and LEMPICKI, R. A. (2009b). Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat. Protoc. 4 44-57.
[54]	Huang, J., Jiao, Y., Liu, Y. and Lu, X. (2018). A constructive approach to \[{L_0}\] penalized regression. J. Mach. Learn. Res. 19 Paper No. 10, 37. · Zbl 1444.62091
[55]	HUANG, J., JIAO, Y., JIN, B., LIU, J., LU, X. and YANG, C. (2021). A unified primal dual active set algorithm for nonconvex sparse recovery. Statist. Sci. 36 215-238. · Zbl 07368234 · doi:10.1214/19-sts758
[56]	Hunter, D. R. and Lange, K. (2004). A tutorial on MM algorithms. Amer. Statist. 58 30-37. · doi:10.1198/0003130042836
[57]	Hunter, D. R. and Li, R. (2005). Variable selection using MM algorithms. Ann. Statist. 33 1617-1642. · Zbl 1078.62028 · doi:10.1214/009053605000000200
[58]	IMAIZUMI, M. and FUKUMIZU, K. (2019). Deep neural networks learn non-smooth functions effectively. In AISTATS 2019. Proc. Mach. Learn. Res. 89 869-878.
[59]	INCUBATOR, F. (2021). Gloo: Collective communications library with various primitives for multi-machine training. https://github.com/facebookincubator/gloo. Accessed: 2021-07-03.
[60]	JANSSENS, B. (2021). MPIArrays.jl: Distributed arrays based on MPI one-sided communication. https://github.com/barche/MPIArrays.jl. Accessed: 2021-07-03.
[61]	JHA, S., QIU, J., LUCKOW, A., MANTHA, P. and FOX, G. C. (2014). A tale of two data-intensive paradigms: Applications, abstractions, and architectures. In 2014 IEEE BigData 645-652. IEEE, New York.
[62]	JIA, Y., SHELHAMER, E., DONAHUE, J., KARAYEV, S., LONG, J., GIRSHICK, R., GUADARRAMA, S. and DARRELL, T. (2014). Caffe: Convolutional architecture for fast feature embedding. In MM 2014 675-678. ACM, New York.
[63]	JOHNSON, R. and ZHANG, T. (2013). Accelerating stochastic gradient descent using predictive variance reduction. In NeurIPS 2013. Adv. Neural Inform. Process. Syst. 26 315-323.
[64]	Jordan, M. I., Lee, J. D. and Yang, Y. (2019). Communication-efficient distributed statistical inference. J. Amer. Statist. Assoc. 114 668-681. · Zbl 1420.62097 · doi:10.1080/01621459.2018.1429274
[65]	KEYS, K. L., ZHOU, H. and LANGE, K. (2019). Proximal distance algorithms: Theory and practice. J. Mach. Learn. Res. 20 Paper No. 66, 38. · Zbl 1489.90184
[66]	KIRK, D. (2007). NVIDIA CUDA software and GPU parallel computing architecture. In ISMM 7 103-104.
[67]	KLÖCKNER, A., PINTO, N., LEE, Y., CATANZARO, B., IVANOV, P. and FASIH, A. (2012). PyCUDA and PyOpenCL: A scripting-based approach to GPU run-time code generation. Parallel Comput. 38 157-174.
[68]	KO, S. (2020). Easily parallelizable statistical computing methods and their applications in modern high-performance computing environments. Ph.D. thesis, Seoul National Univ.
[69]	KO, S. and WON, J.-H. (2019). Optimal minimization of the sum of three convex functions with a linear operator. In AISTATS 2019. Proc. Mach. Learn. Res. 89 1185-1194.
[70]	KO, S., YU, D. and WON, J.-H. (2019). Easily parallelizable and distributable class of algorithms for structured sparsity, with optimal acceleration. J. Comput. Graph. Statist. 28 821-833. · Zbl 07499029 · doi:10.1080/10618600.2019.1592757
[71]	KO S., ZHOU H., ZHOU J. J. and WON J.-H. (2022). Supplement to “High-Performance Statistical Computing in the Computing Environments of the 2020s.” https://doi.org/10.1214/21-STS835SUPP
[72]	KOANANTAKOOL, P., AZAD, A., BULUÇ, A., MOROZOV, D., OH, S.-Y., OLIKER, L. and YELICK, L. (2016). Communication-avoiding parallel sparse-dense matrix-matrix multiplication. In 2016 IEEE IPDPS 842-853. IEEE, New York.
[73]	KOANANTAKOOL, P., ALI, A., AZAD, A., BULUC, A., MOROZOV, D., OLIKER, L., YELICK, K. and OH, S.-Y. (2018). Communication-avoiding optimization methods for distributed massive-scale sparse inverse covariance estimation. In AISTATS 2018. Proc. Mach. Learn. Res. 84 1376-1386.
[74]	KUMMER, B. (1988). Newton’s method for non-differentiable functions. In Advances in Mathematical Optimization (Guddat, J., Bank, B., Hollatz, H., Kall, P., Klatte, D., Kummer, B., Lommatzsch, K., Tammer, K., Vlach, M. et al., eds.). Mathematical Research 45 114-125. Akademie-Verlag, Berlin. · Zbl 0662.65050
[75]	LAI, T. L. and YUAN, H. (2021). Stochastic approximation: From statistical origin to big-data, multidisciplinary applications. Statist. Sci. 36 291-302. · Zbl 07368238 · doi:10.1214/20-sts784
[76]	LAM, S. K., PITROU, A. and SEIBERT, S. (2015). Numba: A LLVM-based Python JIT compiler. In LLVM 2015 7, ACM, New York, 1-6.
[77]	Lange, K. (2016). MM Optimization Algorithms. SIAM, Philadelphia, PA. · Zbl 1357.90002 · doi:10.1137/1.9781611974409.ch1
[78]	LANGE, K., HUNTER, D. R. and YANG, I. (2000). Optimization transfer using surrogate objective functions. J. Comput. Graph. Statist. 9 1-59. · doi:10.2307/1390605
[79]	LeCun, Y., Bengio, Y. and Hinton, G. (2015). Deep learning. Nature 521 436-444. · doi:10.1038/nature14539
[80]	LEE, D. D. and SEUNG, H. S. (1999). Learning the parts of objects by non-negative matrix factorization. Nature 401 788-791. · Zbl 1369.68285
[81]	LEE, D. D. and SEUNG, H. S. (2001). Algorithms for non-negative matrix factorization. In NeurIPS 2001. Adv. Neural Inform. Process. Syst. 14 556-562.
[82]	LEE, J. D., LIU, Q., SUN, Y. and TAYLOR, J. E. (2017a). Communication-efficient sparse regression. J. Mach. Learn. Res. 18 Paper No. 5, 30. · Zbl 1434.62157
[83]	LEE, T., WON, J.-H., LIM, J. and YOON, S. (2017b). Large-scale structured sparsity via parallel fused lasso on multiple GPUs. J. Comput. Graph. Statist. 26 851-864. · doi:10.1080/10618600.2017.1328363
[84]	Li, X., Sun, D. and Toh, K.-C. (2018). A highly efficient semismooth Newton augmented Lagrangian method for solving lasso problems. SIAM J. Optim. 28 433-458. · Zbl 1392.65062 · doi:10.1137/16M1097572
[85]	LIM, H., DEWARAJA, Y. K. and FESSLER, J. A. (2018). A PET reconstruction formulation that enforces non-negativity in projection space for bias reduction in Y-90 imaging. Phys. Med. Biol. 63 035042.
[86]	LIN, C.-J. (2007). Projected gradient methods for nonnegative matrix factorization. Neural Comput. 19 2756-2779. · Zbl 1173.90583 · doi:10.1162/neco.2007.19.10.2756
[87]	LIU, X., LI, Y., TANG, J. and YAN, M. (2020). A double residual compression algorithm for efficient distributed learning. In AISTATS 2020. Proc. Mach. Learn. Res. 108 133-143.
[88]	MAHAJAN, A., TALIUN, D., THURNER, M., ROBERTSON, N. R., TORRES, J. M., RAYNER, N. W., PAYNE, A. J., STEINTHORSDOTTIR, V., SCOTT, R. A. et al. (2018). Fine-mapping type 2 diabetes loci to single-variant resolution using high-density imputation and islet-specific epigenome maps. Nat. Genet. 50 1505-1513.
[89]	MCLAREN, W., GIL, L., HUNT, S. E., RIAT, H. S., RITCHIE, G. R., THORMANN, A., FLICEK, P. and CUNNINGHAM, F. (2016). The Ensembl variant effect predictor. Genome Biol. 17 122.
[90]	MITTAL, S., MADIGAN, D., BURD, R. S. and SUCHARD, M. A. (2014). High-dimensional, massive sample-size Cox proportional hazards regression for survival analysis. Biostatistics 15 207-221.
[91]	MUNSHI, A. (2009). The OpenCL specification. In 2009 IEEE HCS 1-314. IEEE, New York.
[92]	NAKANO, J. (2012). Parallel computing techniques. In Handbook of Computational Statistics—Concepts and Methods. 1, 2. Springer Handb. Comput. Stat. 243-271. Springer, Heidelberg. · doi:10.1007/978-3-642-21551-3_9
[93]	Negahban, S. N., Ravikumar, P., Wainwright, M. J. and Yu, B. (2012). A unified framework for high-dimensional analysis of \(M\)-estimators with decomposable regularizers. Statist. Sci. 27 538-557. · Zbl 1331.62350 · doi:10.1214/12-STS400
[94]	NERSC (2021). Distributed TensorFlow. https://docs.nersc.gov/machinelearning/tensorflow/#distributed-tensorflow. Accessed: 2021-07-03.
[95]	NG, M. C., SHRINER, D., CHEN, B. H., LI, J., CHEN, W.-M., GUO, X., LIU, J., BIELINSKI, S. J., YANEK, L. R. et al. (2014). Meta-analysis of genome-wide association studies in African Americans provides insights into the genetic architecture of type 2 diabetes. PLoS Genet. 10 e1004517.
[96]	NITANDA, A. (2014). Stochastic proximal gradient descent with acceleration techniques. In NeurIPS 2014. Adv. Neural Inform. Process. Syst. 27 1574-1582.
[97]	NVIDIA (2021a). Basic linear algebra subroutines (cuBLAS) library. http://docs.nvidia.com/cuda/cublas. Accessed: 2021-07-03.
[98]	NVIDIA (2021b). Sparse matrix library (cuSPARSE). http://docs.nvidia.com/cuda/cusparse. Accessed: 2021-07-03.
[99]	O’HARA, R. B. and SILLANPÄÄ, M. J. (2009). A review of Bayesian variable selection methods: What, how and which. Bayesian Anal. 4 85-117. · Zbl 1330.62291 · doi:10.1214/09-BA403
[100]	OHN, I. and KIM, Y. (2019). Smooth function approximation by deep neural networks with general activation functions. Entropy 21 Paper No. 627, 21. · doi:10.3390/e21070627
[101]	OWENS, J. D., LUEBKE, D., GOVINDARAJU, N., HARRIS, M., KRÜGER, J., LEFOHN, A. E. and PURCELL, T. J. (2007). A survey of general-purpose computation on graphics hardware. In Computer Graphics Forum 26 80-113. Wiley, New York.
[102]	Parikh, N. and Boyd, S. (2014). Proximal algorithms. Found. Trends Optim. 1 127-239.
[103]	PASZKE, A., GROSS, S., MASSA, F., LERER, A., BRADBURY, J., CHANAN, G., KILLEEN, T., LIN, Z., GIMELSHEIN, N. et al. (2019). PyTorch: An imperative style, high-performance deep learning library. In NeurIPS 2019. Adv. Neural Inform. Process. Syst. 32 8026-8037.
[104]	Polson, N. G., Scott, J. G. and Willard, B. T. (2015). Proximal algorithms in statistics and machine learning. Statist. Sci. 30 559-581. · Zbl 1426.62213 · doi:10.1214/15-STS530
[105]	QI, L. Q. and SUN, J. (1993). A nonsmooth version of Newton’s method. Math. Program. 58 353-367. · Zbl 0780.90090 · doi:10.1007/BF01581275
[106]	QIAN, X., QU, Z. and RICHTÁRIK, P. (2019). SAGA with arbitrary sampling. In ICML 2019. Proc. Mach. Learn. Res. 97 5190-5199.
[107]	RAINA, R., MADHAVAN, A. and NG, A. Y. (2009). Large-scale deep unsupervised learning using graphics processors. In ICML 2009 873-880. ACM, New York.
[108]	RAMDAS, A. and TIBSHIRANI, R. J. (2016). Fast and flexible ADMM algorithms for trend filtering. J. Comput. Graph. Statist. 25 839-858. · doi:10.1080/10618600.2015.1054033
[109]	HYPERION RESEARCH (2019). HPC market update from ISC 2019, Technical report, Hyperion Research.
[110]	REYES, A. R. (2021). rTorch. https://f0nzie.github.io/rTorch/. Accessed: 2021-07-03.
[111]	REYES-ORTIZ, J. L., ONETO, L. and ANGUITA, D. (2015). Big data analytics in the cloud: Spark on Hadoop vs MPI/OpenMP on Beowulf. In INNS Conference on Big Data 8 121.
[112]	RICHTÁRIK, P. and TAKÁČ, M. (2016a). On optimal probabilities in stochastic coordinate descent methods. Optim. Lett. 10 1233-1243. · Zbl 1353.90148 · doi:10.1007/s11590-015-0916-1
[113]	RICHTÁRIK, P. and TAKÁČ, M. (2016b). Parallel coordinate descent methods for big data optimization. Math. Program. 156 433-484. · Zbl 1342.90102 · doi:10.1007/s10107-015-0901-6
[114]	Robbins, H. and Monro, S. (1951). A stochastic approximation method. Ann. Math. Stat. 22 400-407. · Zbl 0054.05901 · doi:10.1214/aoms/1177729586
[115]	ROLAND, CH., VARADHAN, R. and FRANGAKIS, C. E. (2007). Squared polynomial extrapolation methods with cycling: An application to the positron emission tomography problem. Numer. Algorithms 44 159-172. · Zbl 1123.65043 · doi:10.1007/s11075-007-9094-2
[116]	ROSASCO, L., VILLA, S. and VŨ, B. C. (2020). Convergence of stochastic proximal gradient algorithm. Appl. Math. Optim. 82 891-917. · Zbl 1465.90101 · doi:10.1007/s00245-019-09617-7
[117]	RSTUDIO (2021). ‘R interface to TensorFlow’, https://tensorflow.rstudio.com/. Version 2.5.0. Accessed: 2021-07-03.
[118]	Rudin, L. I., Osher, S. and Fatemi, E. (1992). Nonlinear total variation based noise removal algorithms. Phys. D 60 259-268. · Zbl 0780.49028 · doi:10.1016/0167-2789(92)90242-F
[119]	RUMELHART, D. E., HINTON, G. E. and WILLIAMS, R. J. (1986). Learning representations by back-propagating errors. Nature 323 533-536. · Zbl 1369.68284
[120]	RYU, E. K., KO, S. and WON, J.-H. (2020). Splitting with near-circulant linear systems: Applications to total variation CT and PET. SIAM J. Sci. Comput. 42 B185-B206. · Zbl 1448.92113 · doi:10.1137/18M1224003
[121]	Schmidt-Hieber, J. (2020). Nonparametric regression using deep neural networks with ReLU activation function. Ann. Statist. 48 1875-1897. · Zbl 1459.62059 · doi:10.1214/19-AOS1875
[122]	SCOTT, L. J., MOHLKE, K. L., BONNYCASTLE, L. L., WILLER, C. J., LI, Y., DUREN, W. L., ERDOS, M. R., STRINGHAM, H. M., CHINES, P. S. et al. (2007). A genome-wide association study of type 2 diabetes in finns detects multiple susceptibility variants. Science 316 1341-1345.
[123]	SEIDE, F. and AGARWAL, A. (2016). CNTK: Microsoft’s open-source deep-learning toolkit. In SIGKDD 2016 2135-2135. ACM, New York.
[124]	SERGEEV, A. and DEL BALSO, M. (2018). Horovod: Fast and easy distributed deep learning in tensorflow. Preprint. Available at arXiv:1802.05799.
[125]	SERVICES, A. W. (2021). AWS ParallelCluster. https://aws.amazon.com/ko/hpc/parallelcluster/. Version 2.11.0. Accessed: 2021-07-03.
[126]	STAPLES, G. (2006). Torque resource manager. In SC 2006 8. ACM, New York.
[127]	SUCHARD, M. A., WANG, Q., CHAN, C., FRELINGER, J., CRON, A. and WEST, M. (2010). Understanding GPU programming for statistical computation: Studies in massively parallel massive mixtures. J. Comput. Graph. Statist. 19 419-438. · doi:10.1198/jcgs.2010.10016
[128]	SUCHARD, M. A., SIMPSON, S. E., ZORYCH, I., RYAN, P. and MADIGAN, D. (2013). Massive parallelization of serial inference algorithms for a complex generalized linear model. ACM Trans. Model. Comput. Simul. 23 Art. 10, 17. · Zbl 1386.65065 · doi:10.1145/2414416.2414791
[129]	SUDLOW, C., GALLACHER, J., ALLEN, N., BERAL, V., BURTON, P., DANESH, J., DOWNEY, P., ELLIOTT, P., GREEN, J. et al. (2015). UK Biobank: An open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12 e1001779.
[130]	SUZUKI, T. (2019). Adaptivity of deep ReLU network for learning in Besov and mixed smooth Besov spaces: Optimal rate and curse of dimensionality. In ICLR 2019.
[131]	TANG, H., YU, C., LIAN, X., ZHANG, T. and LIU, J. (2019). \[ \mathtt{DoubleSqueeze} \]: Parallel stochastic gradient descent with double-pass error-compensated compression. In ICML 2019. Proc. Mach. Learn. Res. 97 6155-6165.
[132]	THEANO DEVELOPMENT TEAM (2016). Theano: A Python framework for fast computation of mathematical expressions. Preprint. Available at arXiv:1605.02688.
[133]	JULIAPARALLEL TEAM (2021). DistributedArrays.jl: Distributed arrays in Julia. https://github.com/JuliaParallel/DistributedArrays.jl. Accessed: 2021-07-03.
[134]	R CORE TEAM (2021). \(R\): A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
[135]	Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267-288. · Zbl 0850.62538
[136]	Tibshirani, R. J. and Taylor, J. (2011). The solution path of the generalized lasso. Ann. Statist. 39 1335-1371. · Zbl 1234.62107 · doi:10.1214/11-AOS878
[137]	TIELEMAN, T. (2010). Gnumpy: An easy way to use GPU boards in Python Technical Report UTML TR 2010-002, Department of Computer Science, Univ. Toronto.
[138]	TSENG, P. and YUN, S. (2009). A coordinate gradient descent method for nonsmooth separable minimization. Math. Program. 117 387-423. · Zbl 1166.90016 · doi:10.1007/s10107-007-0170-0
[139]	. UNIVERSITY OF ZURICH (2021). ElastiCluster. https://elasticluster.readthedocs.io/en/latest/. Accessed: 2021-07-03.
[140]	USHEY, K., ALLAIRE, J. and TANG, Y. (2021). reticulate: Interface to ‘Python’. https://cran.r-project.org/package=reticulate. Version 1.20. Accessed: 2021-07-03.
[141]	VAN ROSSUM, G. (1995). Python tutorial Technical Report CS-R9526, Centrum voor Wiskunde en Informatica (CWI), Amsterdam.
[142]	VAN DE GEIJN, R. A. and WATTS, J. (1997). SUMMA: Scalable universal matrix multiplication algorithm. Concurrency: Practice and Experience 9 255-274.
[143]	VOIGHT, B. F., SCOTT, L. J., STEINTHORSDOTTIR, V., MORRIS, A. P., DINA, C., WELCH, R. P., ZEGGINI, E., HUTH, C., AULCHENKO, Y. S. et al. (2010). Twelve type 2 diabetes susceptibility loci identified through large-scale association analysis. Nat. Genet. 42 579.
[144]	VŨ, B. C. (2013). A splitting algorithm for dual monotone inclusions involving cocoercive operators. Adv. Comput. Math. 38 667-681. · Zbl 1284.47045 · doi:10.1007/s10444-011-9254-8
[145]	WALKER, E. (2008). Benchmarking Amazon EC2 for hig-performance scientific computing. Login:: The Magazine of USENIX & SAGE 33 18-23.
[146]	WANG, E., ZHANG, Q., SHEN, B., ZHANG, G., LU, X., WU, Q. and WANG, Y. (2014). Intel Math Kernel library. In High-Performance Computing on the Intel® Xeon Phi™ 167-188. Springer, Berlin.
[147]	WANG, J., KOLAR, M., SREBRO, N. and ZHANG, T. (2017). Efficient distributed learning with sparsity. In ICML 2017. Proc. Mach. Learn. Res. 70 3636-3645.
[148]	WON, J.-H. (2020). Proximity operator of the matrix perspective function and its applications. In NeurIPS 2020. Adv. Neural Inform. Process. Syst. 33.
[149]	WRIGHT, S. J. (2015). Coordinate descent algorithms. Math. Program. 151 3-34. · Zbl 1317.49038 · doi:10.1007/s10107-015-0892-3
[150]	WU, T. T. and LANGE, K. (2010). The MM alternative to EM. Statist. Sci. 25 492-505. · Zbl 1329.62106 · doi:10.1214/08-STS264
[151]	XIAO, L. and ZHANG, T. (2014). A proximal stochastic gradient method with progressive variance reduction. SIAM J. Optim. 24 2057-2075. · Zbl 1321.65016 · doi:10.1137/140961791
[152]	XUE, L., MA, S. and ZOU, H. (2012). Positive-definite \[{\ell_1}\]-penalized estimation of large covariance matrices. J. Amer. Statist. Assoc. 107 1480-1491. · Zbl 1258.62063 · doi:10.1080/01621459.2012.725386
[153]	YOO, A. B., JETTE, M. A. and GRONDONA, M. (2003). Slurm: Simple Linux utility for resource management. In JSSPP 2003 44-60. Springer, Berlin.
[154]	YU, D., WON, J.-H., LEE, T., LIM, J. and YOON, S. (2015). High-dimensional fused lasso regression using majorization-minimization and parallel processing. J. Comput. Graph. Statist. 24 121-153. · doi:10.1080/10618600.2013.878662
[155]	ZAHARIA, M., CHOWDHURY, M., FRANKLIN, M. J., SHENKER, S., STOICA, I. et al. (2010). Spark: Cluster computing with working sets. HotCloud 10 95.
[156]	Zhang, Y., Duchi, J. C. and Wainwright, M. J. (2013). Communication-efficient algorithms for statistical optimization. J. Mach. Learn. Res. 14 3321-3363. · Zbl 1318.62016
[157]	ZHANG, X., WANG, Q. and CHOTHIA, Z. (2021). OpenBLAS: An optimized BLAS library. https://www.openblas.net/. Accessed: 2021-07-03.
[158]	ZHOU, H., LANGE, K. and SUCHARD, M. A. (2010). Graphics processing units and high-dimensional optimization. Statist. Sci. 25 311-324. · Zbl 1329.62028 · doi:10.1214/10-STS336
[159]	ZHU, Y. (2017). An augmented ADMM algorithm with application to the generalized lasso problem. J. Comput. Graph. Statist. 26 195-204. · doi:10.1080/10618600.2015.1114491
[160]	ZHU, M. and CHAN, T. (2008). An efficient primal-dual hybrid gradient algorithm for total variation image restoration Technical Report 08-34, UCLA CAM

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.