×

An unsupervised machine-learning checkpoint-restart algorithm using Gaussian mixtures for particle-in-cell simulations. (English) Zbl 07513838

Summary: We propose an unsupervised machine-learning checkpoint-restart (CR) algorithm for particle-in-cell (PIC) algorithms using Gaussian mixtures (GM). The algorithm compresses the particle population per spatial cell by constructing a velocity distribution function using GM. Particles are reconstructed at restart time by local resampling of the Gaussians. To guarantee fidelity of the CR process, we ensure the exact preservation of invariants such as charge, momentum, and energy for both compression and reconstruction stages, everywhere on the mesh. We also ensure the preservation of Gauss’ law after particle reconstruction by exactly matching the density profile at restart time. As a result, the GM CR algorithm is shown to provide a clean, conservative restart capability while potentially affording orders of magnitude savings in input/output requirements. We demonstrate the algorithm using a recently developed exactly energy- and charge-conserving PIC algorithm using both electrostatic and electromagnetic tests. The tests demonstrate not only a high-fidelity CR capability, but also its potential for enhancing the fidelity of the PIC solution for a given particle resolution.

MSC:

62Fxx Parametric inference
62Hxx Multivariate analysis
62-XX Statistics

Software:

PRMLT; SPRNG; Anderson; zfp

References:

[1] Kahle, J. A.; Moreno, J.; Dreps, D., 2.1 summit and Sierra: designing ai/hpc supercomputers, (2019 IEEE International Solid-State Circuits Conference-(ISSCC) (2019), IEEE), 42-43
[2] Nightingale, E. B.; Douceur, J. R.; Orgovan, V., Cycles, cells and platters: an empirical analysis of hardware failures on a million consumer pcs, (Proceedings of the Sixth Conference on Computer Systems (2011)), 343-356
[3] Liu, R.-T.; Chen, Z.-N., A large-scale study of failures on petascale supercomputers, J. Comput. Sci. Technol., 33, 1, 24-41 (2018)
[4] Rojas, E.; Meneses, E.; Jones, T.; Maxwell, D., Analyzing a five-year failure record of a leadership-class supercomputer, (2019 31st International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD) (2019), IEEE), 196-203
[5] Dauwe, D.; Pasricha, S.; Maciejewski, A. A.; Siegel, H. J., An analysis of resilience techniques for exascale computing platforms, (2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) (2017), IEEE), 914-923
[6] Miao, Z.; Calhoun, J.; Ge, R., Energy analysis and optimization for resilient scalable linear systems, (2018 IEEE International Conference on Cluster Computing (CLUSTER) (2018), IEEE), 24-34
[7] Birdsall, C. K.; Langdon, A. B., Plasma Physics via Computer Simulation (2004), CRC Press
[8] Hockney, R. W.; Eastwood, J. W., Computer Simulation Using Particles (1988), CRC Press · Zbl 0662.76002
[9] Yee, K., Numerical solution of initial boundary value problems involving Maxwell’s equations in isotropic media, IEEE Trans. Antennas Propag., 14, 3, 302-307 (1966) · Zbl 1155.78304
[10] Taflove, A.; Hagness, S. C., Computational Electrodynamics: the Finite-Difference Time-Domain Method (2005), Artech House
[11] McOwen, R. C., Partial Differential Equations: Methods and Applications (2004), Pearson
[12] Lofstead, J. F.; Klasky, S.; Schwan, K.; Podhorszki, N.; Jin, C., Flexible io and integration for scientific codes through the adaptable io system (adios), (Proceedings of the 6th International Workshop on Challenges of Large Applications in Distributed Environments (2008)), 15-24
[13] Moody, A.; Bronevetsky, G.; Mohror, K.; De Supinski, B. R., Design, modeling, and evaluation of a scalable multi-level checkpointing system, (SC’10: Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (2010), IEEE), 1-11
[14] Ferreira, K. B.; Riesen, R.; Bridges, P.; Arnold, D.; Brightwell, R., Accelerating incremental checkpointing for extreme-scale computing, Future Gener. Comput. Syst., 30, 66-77 (2014)
[15] Tiwari, D.; Gupta, S.; Vazhkudai, S. S., Lazy checkpointing: exploiting temporal locality in failures to mitigate checkpointing overheads on extreme-scale systems, (2014 44th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (2014), IEEE), 25-36
[16] Garg, R.; Patel, T.; Cooperman, G.; Tiwari, D., Shiraz: exploiting system reliability and application resilience characteristics to improve large scale system throughput, (2018 48th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN) (2018), IEEE), 83-94
[17] Son, S. W.; Chen, Z.; Hendrix, W.; Agrawal, A.; Liao, W.-k.; Choudhary, A., Data compression for the exascale computing era-survey, Supercomput. Front. Innov., 1, 2, 76-88 (2014)
[18] Cappello, F.; Di, S.; Li, S.; Liang, X.; Gok, A. M.; Tao, D.; Yoon, C. H.; Wu, X.-C.; Alexeev, Y.; Chong, F. T., Use cases of lossy compression for floating-point data in scientific data sets, Int. J. High Perform. Comput. Appl., 33, 6, 1201-1220 (2019)
[19] Chen, Z.; Son, S. W.; Hendrix, W.; Agrawal, A.; Liao, W.-k.; Choudhary, A., Numarck: machine learning algorithm for resiliency and checkpointing, (SC’14: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (2014), IEEE), 733-744
[20] Sasaki, N.; Sato, K.; Endo, T.; Matsuoka, S., Exploration of lossy compression for application-level checkpoint/restart, (2015 IEEE International Parallel and Distributed Processing Symposium (2015), IEEE), 914-922
[21] Tao, D.; Di, S.; Liang, X.; Chen, Z.; Cappello, F., Improving performance of iterative methods by lossy checkponting, (Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing (2018)), 52-65
[22] Zhang, J.; Zhuo, X.; Moon, A.; Liu, H.; Son, S. W., Efficient encoding and reconstruction of hpc datasets for checkpoint/restart, (2019 35th Symposium on Mass Storage Systems and Technologies (MSST) (2019), IEEE), 79-91
[23] Calhoun, J.; Cappello, F.; Olson, L. N.; Snir, M.; Gropp, W. D., Exploring the feasibility of lossy compression for pde simulations, Int. J. High Perform. Comput. Appl., 33, 2, 397-410 (2019)
[24] Reza, T.; Calhoun, J.; Keipert, K.; Di, S.; Cappello, F., Analyzing the performance and accuracy of lossy checkpointing on sub-iteration of NWChem, (2019 IEEE/ACM 5th International Workshop on Data Analysis and Reduction for Big Scientific Data (DRBSD-5) (2019), IEEE), 23-27
[25] Triantafyllides, P.; Reza, T.; Calhoun, J. C., Analyzing the impact of lossy compressor variability on checkpointing scientific simulations, (2019 IEEE International Conference on Cluster Computing (CLUSTER) (2019), IEEE), 1-5
[26] Zhang, J.; Chen, J.; Moon, A.; Zhuo, X.; Son, S. W., Bit-error aware quantization for dct-based lossy compression, (2020 IEEE High Performance Extreme Computing Conference (HPEC) (2020), IEEE), 1-7
[27] Tao, D.; Di, S.; Chen, Z.; Cappello, F., In-depth exploration of single-snapshot lossy compression techniques for n-body simulations, (2017 IEEE International Conference on Big Data (Big Data) (2017), IEEE), 486-493
[28] Cappello, F.; Di, S.; Gok, A. M., Fulfilling the promises of lossy compression for scientific applications, (Smoky Mountains Computational Sciences and Engineering Conference (2020), Springer), 99-116
[29] Lindstrom, P., Fixed-rate compressed floating-point arrays, IEEE Trans. Vis. Comput. Graph., 20, 12, 2674-2683 (2014)
[30] Di, S.; Cappello, F., Fast error-bounded lossy hpc data compression with sz, (2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS) (2016), IEEE), 730-739
[31] McLachlan, G. J.; Peel, D., Finite Mixture Models (2004), John Wiley & Sons
[32] Figueiredo, M. A.; Jain, A. K., Unsupervised selection and estimation of finite mixture models, (Proceedings 15th International Conference on Pattern Recognition. ICPR-2000, vol. 2 (2000), IEEE), 87-90
[33] Dempster, A. P.; Laird, N. M.; Rubin, D. B., Maximum likelihood from incomplete data via the em algorithm, J. R. Stat. Soc., Ser. B, Methodol., 1-38 (1977) · Zbl 0364.62022
[34] Wallace, C. S., Statistical and Inductive Inference by Minimum Message Length (2005), Springer Science & Business Media · Zbl 1085.62002
[35] Behboodian, J., On a mixture of normal distributions, Biometrika, 34, 57, 215-217 (1970), Part 1 · Zbl 0193.18104
[36] Nguyen, T.; Chen, G.; Chacón, L., An adaptive em accelerator for unsupervised learning of Gaussian mixture models (2020), arXiv preprint
[37] Lemons, D. S.; Winske, D.; Daughton, W.; Albright, B., Small-angle Coulomb collision model for particle-in-cell simulations, J. Comput. Phys., 228, 5, 1391-1403 (2009) · Zbl 1157.76059
[38] Burgess, D.; Sulsky, D.; Brackbill, J., Mass matrix formulation of the flip particle-in-cell method, J. Comput. Phys., 103, 1, 1-15 (1992) · Zbl 0761.73117
[39] Dupuis, R.; Goldman, M. V.; Newman, D. L.; Amaya, J.; Lapenta, G., Characterizing magnetic reconnection regions using Gaussian mixture models on particle velocity distributions, Astrophys. J., 889, 1, 22 (2020)
[40] Bowers, K. J.; Devolder, B. G.; Yin, L.; Kwan, T. J., A maximum likelihood method for linking particle-in-cell and Monte-Carlo transport simulations, Comput. Phys. Commun., 164, 1-3, 311-317 (2004) · Zbl 1196.82005
[41] Larson, D. J.; Young, C. V., A finite mass based method for Vlasov-Poisson simulations, J. Comput. Phys., 284, 171-185 (2015) · Zbl 1351.82081
[42] Everitt, B. S., Finite mixture distributions, (Wiley StatsRef: Statistics Reference Online (2014))
[43] Efron, B., Bayes’ theorem in the 21st century, Science, 340, 6137, 1177-1178 (2013) · Zbl 1355.00005
[44] Blitzstein, J. K.; Hwang, J., Introduction to Probability (2019), CRC Press · Zbl 1407.60001
[45] MacKay, D. J.; Mac Kay, D. J., Information Theory, Inference and Learning Algorithms (2003), Cambridge University Press · Zbl 1055.94001
[46] Good, I. J., The Estimation of Probabilities: An Essay on Modern Bayesian Methods (1965), MIT Press · Zbl 0168.39603
[47] Rousseau, J.; Mengersen, K., Asymptotic behaviour of the posterior distribution in overfitted mixture models, J. R. Stat. Soc., Ser. B, Stat. Methodol., 73, 5, 689-710 (2011) · Zbl 1228.62034
[48] Zivkovic, Z., Improved adaptive Gaussian mixture model for background subtraction, (Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004, vol. 2 (2004), IEEE), 28-31
[49] Tu, K., Modified Dirichlet distribution: allowing negative parameters to induce stronger sparsity, (Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing (2016)), 1986-1991
[50] Hasselblad, V., Estimation of parameters for a mixture of normal distributions, Technometrics, 8, 3, 431-444 (1966)
[51] Bishop, C. M., Pattern Recognition and Machine Learning (2006), Springer · Zbl 1107.68072
[52] Gauvain, J.-L.; Lee, C.-H., Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains, IEEE Trans. Speech Audio Process., 2, 2, 291-298 (1994)
[53] Celeux, G.; Chrétien, S.; Forbes, F.; Mkhadri, A., A component-wise em algorithm for mixtures, J. Comput. Graph. Stat., 10, 4, 697-712 (2001)
[54] Redner, R. A.; Walker, H. F., Mixture densities, maximum likelihood and the em algorithm, SIAM Rev., 26, 2, 195-239 (1984) · Zbl 0536.62021
[55] Walker, H. F.; Ni, P., Anderson acceleration for fixed-point iterations, SIAM J. Numer. Anal., 49, 4, 1715-1735 (2011) · Zbl 1254.65067
[56] Plasse, J. H., The EM algorithm in multivariate Gaussian mixture models using Anderson acceleration (2013), Worcester Polytechnic Institute, PhD thesis
[57] Tong, Y. L., The Multivariate Normal Distribution (2012), Springer Science & Business Media · Zbl 0689.62036
[58] Mascagni, M.; Srinivasan, A., Algorithm 806: Sprng: a scalable library for pseudorandom number generation, ACM Trans. Math. Softw., 26, 3, 436-461 (2000)
[59] Chen, G.; Chacón, L., A multi-dimensional, energy- and charge-conserving, nonlinearly implicit, electromagnetic Vlasov-Darwin particle-in-cell algorithm, Comput. Phys. Commun., 197, 73-87 (2015) · Zbl 1352.65405
[60] Mardia, K. V., Measures of multivariate skewness and kurtosis with applications, Biometrika, 57, 519-530 (1970) · Zbl 0214.46302
[61] Richtmyer, R. D., A non-random sampling method, based on congruences, for Monte Carlo problems (1958), New York Univ., New York. Atomic Energy Commission Computing and Applied Mathematics Center, Tech. Rep.
[62] Hammersley, J. M., Monte Carlo methods for solving multivariable problems, Ann. N.Y. Acad. Sci., 86, 3, 844-874 (1960) · Zbl 0111.12405
[63] Lampert, M. A., Plasma oscillations at extremely high frequencies, J. Appl. Phys., 27, 1, 5-11 (1956) · Zbl 0071.23701
[64] Roberts, K.; Berk, H. L., Nonlinear evolution of a two-stream instability, Phys. Rev. Lett., 19, 6, 297 (1967)
[65] Weibel, E. S., Spontaneously growing transverse waves in a plasma due to an anisotropic velocity distribution, Phys. Rev. Lett., 2, 3, 83 (1959)
[66] Wang, B.; Miller, G. H.; Colella, P., A particle-in-cell method with adaptive phase-space remapping for kinetic plasmas, SIAM J. Sci. Comput., 33, 6, 3509-3537 (2011) · Zbl 1232.76046
[67] Myers, A.; Colella, P.; Straalen, B. V., A 4th-order particle-in-cell method with phase-space remapping for the Vlasov-Poisson equation, SIAM J. Sci. Comput., 39, 3, B467-B485 (2017) · Zbl 1520.65074
[68] Faghihi, D.; Carey, V.; Michoski, C.; Hager, R.; Janhunen, S.; Chang, C.-S.; Moser, R., Moment preserving constrained resampling with applications to particle-in-cell methods, J. Comput. Phys., 409, Article 109317 pp. (2020) · Zbl 1435.76100
[69] Bowers, K. J.; Albright, B.; Yin, L.; Bergen, B.; Kwan, T., Ultrahigh performance three-dimensional electromagnetic relativistic kinetic plasma simulation, Phys. Plasmas, 15, 5, Article 055703 pp. (2008)
[70] Byna, S.; Uselton, A.; Prabhat, D. K.; He, Y., Trillion particles, 120,000 cores, and 350 tbs: lessons learned from a hero i/o run on Hopper, (Cray User Group Meeting (2013))
[71] Behzad, B.; Byna, S.; Snir, M., Optimizing i/o performance of hpc applications with autotuning, ACM Trans. Parallel Comput. (TOPC), 5, 4, 1-27 (2019)
[72] Neal, R. M.; Hinton, G. E., A view of the em algorithm that justifies incremental, sparse, and other variants, (Learning in Graphical Models (1998), Springer), 355-368 · Zbl 0916.62019
[73] Corduneanu, A.; Bishop, C. M., Variational Bayesian model selection for mixture distributions, (Artificial Intelligence and Statistics, vol. 2001 (2001), Morgan Kaufmann: Morgan Kaufmann Waltham, MA), 27-34
[74] Schervish, M. J., Theory of Statistics (2012), Springer Science & Business Media
[75] Casella, G.; Berger, R. L., Statistical Inference, vol. 2 (2002), Duxbury: Duxbury Pacific Grove, CA
[76] Lanterman, A. D., Schwarz, Wallace, and Rissanen: intertwining themes in theories of model selection, Int. Stat. Rev., 69, 2, 185-212 (2001) · Zbl 1211.62007
[77] Kass, R. E.; Wasserman, L., The selection of prior distributions by formal rules, J. Am. Stat. Assoc., 91, 435, 1343-1370 (1996) · Zbl 0884.62007
[78] Titterington, D. M.; Smith, A. F.; Makov, U. E., Statistical Analysis of Finite Mixture Distributions (1985), Wiley · Zbl 0646.62013
[79] Raim, A. M.; Neerchal, N. K.; Morel, J. G., An approximation to the information matrix of exponential family finite mixtures, Ann. Inst. Stat. Math., 69, 2, 333-364 (2017) · Zbl 1396.62155
[80] Bernardo, J.; Girón, F., A Bayesian analysis of simple mixture problems, Bayesian Stat., 3, 3, 67-78 (1988) · Zbl 0706.62029
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.