Document Zbl 07745662

Cai, Junyang; Khai-Nguyen Nguyen; Shrestha, Nishant; Good, Aidan; Tu, Ruisen; Yu, Xin; Zhe, Shandian; Serra, Thiago

Getting away with more network pruning: from sparsity to geometry and linear regions. (English) Zbl 07745662

Cire, Andre A. (ed.), Integration of constraint programming, artificial intelligence, and operations research. 20th international conference, CPAIOR 2023, Nice, France, May 29 – June 1, 2023. Proceedings. Cham: Springer. Lect. Notes Comput. Sci. 13884, 200-218 (2023).

Summary: One surprising trait of neural networks is the extent to which their connections can be pruned with little to no effect on accuracy. But when we cross a critical level of parameter sparsity, pruning any further leads to a sudden drop in accuracy. This drop plausibly reflects a loss in model complexity, which we aim to avoid. In this work, we explore how sparsity also affects the geometry of the linear regions defined by a neural network, and consequently reduces the expected maximum number of linear regions based on the architecture. We observe that pruning affects accuracy similarly to how sparsity affects the number of linear regions and our proposed bound for the maximum number. Conversely, we find out that selecting the sparsity across layers to maximize our bound very often improves accuracy in comparison to pruning as much with the same sparsity in all layers, thereby providing us guidance on where to prune.
For the entire collection see [Zbl 1521.68007].

MSC:

68T20	Problem solving in the context of artificial intelligence (heuristics, search strategies, etc.)
90Bxx	Operations research and management science
90C27	Combinatorial optimization

Keywords:

model complexity; network pruning; solution counting

Software:

JANOS; ImageNet; Fashion-MNIST; CIFAR; AlexNet

Cite Review PDF

Full Text: DOI arXiv

References:

[1]	Anderson, R.; Huchette, J.; Ma, W.; Tjandraatmadja, C.; Vielma, JP, Strong mixed-integer programming formulations for trained neural networks, Math. Program., 183, 1, 3-39 (2020) · Zbl 1450.90014 · doi:10.1007/s10107-020-01474-5
[2]	Anderson, R.; Huchette, J.; Tjandraatmadja, C.; Vielma, JP; Lodi, A.; Nagarajan, V., Strong mixed-integer programming formulations for trained neural networks, Integer Programming and Combinatorial Optimization, 27-42 (2019), Cham: Springer, Cham · Zbl 1436.90088 · doi:10.1007/978-3-030-17953-3_3
[3]	Arora, R., Basu, A., Mianjy, P., Mukherjee, A.: Understanding deep neural networks with rectified linear units. In: ICLR (2018)
[4]	Baykal, C., Liebenwein, L., Gilitschenski, I., Feldman, D., Rus, D.: Data-dependent coresets for compressing neural networks with applications to generalization bounds. In: ICLR (2019) · Zbl 1508.68277
[5]	Bergman, D.; Huang, T.; Brooks, P.; Lodi, A.; Raghunathan, A., JANOS: an integrated predictive and prescriptive modeling framework, INFORMS J. Comput., 34, 807-816 (2022) · Zbl 07551211 · doi:10.1287/ijoc.2020.1023
[6]	Blalock, D., Ortiz, J., Frankle, J., Guttag, J.: What is the state of neural network pruning? In: MLSys (2020)
[7]	Bonami, P.; Lodi, A.; Tramontani, A.; Wiese, S., On mathematical programming with indicator constraints, Math. Program., 151, 1, 191-223 (2015) · Zbl 1328.90086 · doi:10.1007/s10107-015-0891-4
[8]	Botoeva, E., Kouvaros, P., Kronqvist, J., Lomuscio, A., Misener, R.: Efficient verification of ReLU-based neural networks via dependency analysis. In: AAAI (2020)
[9]	Cheng, C-H; Nührenberg, G.; Ruess, H.; D’Souza, D.; Narayan Kumar, K., Maximum resilience of artificial neural networks, Automated Technology for Verification and Analysis, 251-268 (2017), Cham: Springer, Cham · doi:10.1007/978-3-319-68167-2_18
[10]	Cybenko, G., Approximation by superpositions of a sigmoidal function, Math. Control Signals Syst., 2, 303-314 (1989) · Zbl 0679.94019 · doi:10.1007/BF02551274
[11]	Delarue, A., Anderson, R., Tjandraatmadja, C.: Reinforcement learning with combinatorial actions: an application to vehicle routing. In: NeurIPS (2020)
[12]	Denil, M., Shakibi, B., Dinh, L., Ranzato, M., Freitas, N.: Predicting parameters in deep learning. In: NeurIPS (2013)
[13]	Dong, X., Chen, S., Pan, S.: Learning to prune deep neural networks via layer-wise optimal brain surgeon. In: NeurIPS (2017)
[14]	Elesedy, B., Kanade, V., Teh, Y.W.: Lottery tickets in linear models: an analysis of iterative magnitude pruning (2020)
[15]	Fischetti, M.; Jo, J., Deep neural networks and mixed integer linear optimization, Constraints, 23, 3, 296-309 (2018) · Zbl 1402.90096 · doi:10.1007/s10601-018-9285-6
[16]	Frankle, J., Carbin, M.: The lottery ticket hypothesis: finding sparse, trainable neural networks. In: ICLR (2019)
[17]	Funahashi, K.I.: On the approximate realization of continuous mappings by neural networks. Neural Netw. 2(3) (1989)
[18]	Ganev, I., Walters, R.: Model compression via symmetries of the parameter space (2022). https://openreview.net/forum?id=8MN_GH4Ckp4
[19]	Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier neural networks. In: AISTATS (2011)
[20]	Good, A., et al.: Recall distortion in neural network pruning and the undecayed pruning algorithm. In: NeurIPS (2022)
[21]	Goodfellow, I., Warde-Farley, D., Mirza, M., Courville, A., Bengio, Y.: Maxout networks. In: ICML (2013)
[22]	Gordon, M., Duh, K., Andrews, N.: Compressing BERT: studying the effects of weight pruning on transfer learning. In: Rep4NLP Workshop (2020)
[23]	Han, S., Mao, H., Dally, W.: Deep compression: compressing deep neural networks with pruning, trained quantization and Huffman coding. In: ICLR (2016)
[24]	Han, S., Pool, J., Tran, J., Dally, W.: Learning both weights and connections for efficient neural network. In: NeurIPS (2015)
[25]	Hanin, B., Rolnick, D.: Complexity of linear regions in deep networks. In: ICML (2019)
[26]	Hanin, B., Rolnick, D.: Deep ReLU networks have surprisingly few activation patterns. In: NeurIPS (2019)
[27]	Hanin, B., Sellke, M.: Approximating continuous functions by ReLU nets of minimal width. arXiv:1710.11278 (2017)
[28]	Hanson, S., Pratt, L.: Comparing biases for minimal network construction with back-propagation. In: NeurIPS (1988)
[29]	Hassibi, B., Stork, D.: Second order derivatives for network pruning: optimal Brain Surgeon. In: NeurIPS (1992)
[30]	Hassibi, B., Stork, D., Wolff, G.: Optimal brain surgeon and general network pruning. In: IEEE International Conference on Neural Networks (1993)
[31]	Hoefler, T., Alistarh, D., Ben-Nun, T., Dryden, N., Peste, A.: Sparsity in deep learning: pruning and growth for efficient inference and training in neural networks. arXiv:2102.00554 (2021) · Zbl 07626756
[32]	Hooker, S., Courville, A., Clark, G., Dauphin, Y., Frome, A.: What do compressed deep neural networks forget? arXiv:1911.05248 (2019)
[33]	Hooker, S., Moorosi, N., Clark, G., Bengio, S., Denton, E.: Characterising bias in compressed models. arXiv:2010.03058 (2020)
[34]	Hornik, K., Stinchcombe, M., White, H.: Multilayer feedforward networks are universal approximators. Neural Netw. 2(5) (1989) · Zbl 1383.92015
[35]	Janowsky, S.: Pruning versus clipping in neural networks. Phys. Rev. A (1989)
[36]	Jin, T., Roy, D., Carbin, M., Frankle, J., Dziugaite, G.: On neural network pruning’s effect on generalization. In: NeurIPS (2022)
[37]	Kanamori, K., Takagi, T., Kobayashi, K., Ike, Y., Uemura, K., Arimura, H.: Ordered counterfactual explanation by mixed-integer linear optimization. In: AAAI (2021)
[38]	Krizhevsky, A.: Learning multiple layers of features from tiny images. Technical report, University of Toronto (2009)
[39]	Krizhevsky, A.; Sutskever, I.; Hinton, GE, ImageNet classification with deep convolutional neural networks, Commun. ACM, 60, 6, 84-90 (2017) · doi:10.1145/3065386
[40]	Lebedev, V., Lempitsky, V.: Fast ConvNets using group-wise brain damage. In: CVPR (2016)
[41]	LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. In: Proceedings of the IEEE (1998)
[42]	LeCun, Y., Denker, J., Solla, S.: Optimal brain damage. In: NeurIPS (1989)
[43]	Lee, N., Ajanthan, T., Torr, P.: SNIP: single-shot network pruning based on connection sensitivity. In: ICLR (2019)
[44]	Li, H., Kadav, A., Durdanovic, I., Samet, H., Graf, H.: Pruning filters for efficient convnets. In: ICLR (2017)
[45]	Li, H., Xu, Z., Taylor, G., Studer, C., Goldstein, T.: Visualizing the loss landscape of neural nets. In: NeurIPS (2018)
[46]	Liebenwein, L., Baykal, C., Carter, B., Gifford, D., Rus, D.: Lost in pruning: the effects of pruning neural networks beyond test accuracy. In: MLSys (2021)
[47]	Liebenwein, L., Baykal, C., Lang, H., Feldman, D., Rus, D.: Provable filter pruning for efficient neural networks. In: ICLR (2020) · Zbl 1508.68277
[48]	Liu, S., et al.: Sparse training via boosting pruning plasticity with neuroregeneration. In: NeurIPS (2021)
[49]	Lu, Z., Pu, H., Wang, F., Hu, Z., Wang, L.: The expressive power of neural networks: a view from the width. In: NeurIPS (2017)
[50]	Molchanov, P., Tyree, S., Karras, T., Aila, T., Kautz, J.: Pruning convolutional neural networks for resource efficient inference. In: ICLR (2017)
[51]	Montúfar, G.: Notes on the number of linear regions of deep neural networks. In: SampTA (2017)
[52]	Montúfar, G., Pascanu, R., Cho, K., Bengio, Y.: On the number of linear regions of deep neural networks. In: NeurIPS (2014)
[53]	Montúfar, G., Ren, Y., Zhang, L.: Sharp bounds for the number of regions of maxout networks and vertices of Minkowski sums (2021) · Zbl 1538.68010
[54]	Mozer, M., Smolensky, P.: Using relevance to reduce network size automatically. Connection Sci. (1989)
[55]	Nair, V., Hinton, G.: Rectified linear units improve restricted Boltzmann machines. In: ICML (2010)
[56]	Paganini, M.: Prune responsibly. arXiv:2009.09936 (2020)
[57]	Pascanu, R., Montúfar, G., Bengio, Y.: On the number of response regions of deep feedforward networks with piecewise linear activations. In: ICLR (2014)
[58]	Raghu, M., Poole, B., Kleinberg, J., Ganguli, S., Dickstein, J.: On the expressive power of deep neural networks. In: ICML (2017)
[59]	Say, B., Wu, G., Zhou, Y., Sanner, S.: Nonlinear hybrid planning with deep net learned transition models and mixed-integer linear programming. In: IJCAI (2017)
[60]	Serra, T.; Kumar, A.; Ramalingam, S.; Hebrard, E.; Musliu, N., Lossless compression of deep neural networks, Integration of Constraint Programming, Artificial Intelligence, and Operations Research, 417-430 (2020), Cham: Springer, Cham · Zbl 07636034 · doi:10.1007/978-3-030-58942-4_27
[61]	Serra, T., Ramalingam, S.: Empirical bounds on linear regions of deep rectifier networks. In: AAAI (2020)
[62]	Serra, T., Tjandraatmadja, C., Ramalingam, S.: Bounding and counting linear regions of deep neural networks. In: ICML (2018)
[63]	Serra, T., Yu, X., Kumar, A., Ramalingam, S.: Scaling up exact neural network compression by ReLU stability. In: NeurIPS (2021)
[64]	Singh, S.P., Alistarh, D.: WoodFisher: efficient second-order approximation for neural network compression. In: NeurIPS (2020)
[65]	Sourek, G., Zelezny, F.: Lossless compression of structured convolutional models via lifting. In: ICLR (2021)
[66]	Sun, R.; Li, D.; Liang, S.; Ding, T.; Srikant, R., The global landscape of neural networks: an overview, IEEE Signal Process. Mag., 37, 5, 95-108 (2020) · doi:10.1109/MSP.2020.3004124
[67]	Tanaka, H., Kunin, D., Yamins, D., Ganguli, S.: Pruning neural networks without any data by iteratively conserving synaptic flow. In: NeurIPS (2020)
[68]	Telgarsky, M.: Representation benefits of deep feedforward networks (2015)
[69]	Tjeng, V., Xiao, K., Tedrake, R.: Evaluating robustness of neural networks with mixed integer programming. In: ICLR (2019)
[70]	Tran, C., Fioretto, F., Kim, J.E., Naidu, R.: Pruning has a disparate impact on model accuracy. In: NeurIPS (2022)
[71]	Tseran, H., Montúfar, G.: On the expected complexity of maxout networks. In: NeurIPS (2021)
[72]	Wang, C., Grosse, R., Fidler, S., Zhang, G.: EigenDamage: structured pruning in the Kronecker-factored eigenbasis. In: ICML (2019)
[73]	Wang, C., Zhang, G., Grosse, R.: Picking winning tickets before training by preserving gradient flow. In: ICLR (2020)
[74]	Xiao, H., Rasul, K., Vollgraf, R.: Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. arXiv:1708.07747 (2017)
[75]	Xiao, K., Tjeng, V., Shafiullah, N., Madry, A.: Training for faster adversarial robustness verification via inducing ReLU stability. In: ICLR (2019)
[76]	Xing, X., Sha, L., Hong, P., Shang, Z., Liu, J.: Probabilistic connection importance inference and lossless compression of deep neural networks. In: ICLR (2020)
[77]	Xiong, H., Huang, L., Yu, M., Liu, L., Zhu, F., Shao, L.: On the number of linear regions of convolutional neural networks. In: ICML (2020)
[78]	Yarotsky, D.: Error bounds for approximations with deep ReLU networks. Neural Netw. 94 (2017) · Zbl 1429.68260
[79]	Yu, R., et al.: NISP: pruning networks using neuron importance score propagation. In: CVPR (2018)
[80]	Yu, X., Serra, T., Ramalingam, S., Zhe, S.: The combinatorial brain surgeon: pruning weights that cancel one another in neural networks. In: ICML (2022)
[81]	Zaslavsky, T.: Facing up to arrangements: face-count formulas for partitions of space by hyperplanes. Am. Math. Soc. (1975) · Zbl 0296.50010
[82]	Zeng, W., Urtasun, R.: MLPrune: multi-layer pruning for automated neural network compression (2018)

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.