×

Scale-covariant and scale-invariant Gaussian derivative networks. (English) Zbl 07510341

Summary: This paper presents a hybrid approach between scale-space theory and deep learning, where a deep learning architecture is constructed by coupling parameterized scale-space operations in cascade. By sharing the learnt parameters between multiple scale channels, and by using the transformation properties of the scale-space primitives under scaling transformations, the resulting network becomes provably scale covariant. By in addition performing max pooling over the multiple scale channels, or other permutation-invariant pooling over scales, a resulting network architecture for image classification also becomes provably scale invariant. We investigate the performance of such networks on the MNIST Large Scale dataset, which contains rescaled images from the original MNIST dataset over a factor of 4 concerning training data and over a factor of 16 concerning testing data. It is demonstrated that the resulting approach allows for scale generalization, enabling good performance for classifying patterns at scales not spanned by the training data.

MSC:

68T07 Artificial neural networks and deep learning

References:

[1] Jansson, Y., Lindeberg, T.: Exploring the ability of CNNs to generalise to previously unseen scales over wide scale ranges. In: International Conference on Pattern Recognition (ICPR 2020), pp. 1181-1188. (2021). Extended version in arXiv:2004.01536
[2] Lindeberg, T., Provably scale-covariant continuous hierarchical networks based on scale-normalized differential expressions coupled in cascade, J. Math. Imaging Vis., 62, 120-148 (2020) · Zbl 1435.68291
[3] Lindeberg, T.: Scale-covariant and scale-invariant Gaussian derivative networks. In: Proceedings of Scale Space and Variational Methods in Computer Vision (SSVM 2021), volume 12679 of Springer LNCS, pp. 3-14 (2021) · Zbl 1484.68207
[4] Lindeberg, T., Feature detection with automatic scale selection, Int. J. Comput. Vis., 30, 77-116 (1998)
[5] Lindeberg, T., Edge detection and ridge detection with automatic scale selection, Int. J. Comput. Vis., 30, 117-154 (1998)
[6] Bretzner, L.; Lindeberg, T., Feature tracking with automatic selection of spatial scales, Comput. Vis. Image Underst., 71, 385-392 (1998)
[7] Chomat, O., de Verdiere, V., Hall, D., Crowley, J.: Local scale selection for Gaussian based description techniques. In: Proceedings of European Conference on Computer Vision (ECCV 2000), volume 1842 of Springer LNCS, Dublin, Ireland, pp. 117-133 (2000)
[8] Mikolajczyk, K.; Schmid, C., Scale and affine invariant interest point detectors, Int. J. Comput. Vis., 60, 63-86 (2004)
[9] Lowe, DG, Distinctive image features from scale-invariant keypoints, Int. J. Comput. Vis., 60, 91-110 (2004)
[10] Bay, H.; Ess, A.; Tuytelaars, T.; van Gool, L., Speeded up robust features (SURF), Comput. Vis. Image Underst., 110, 346-359 (2008)
[11] Tuytelaars, T.; Mikolajczyk, K., A Survey on Local Invariant Features (2008), New York: Now Publishers, New York
[12] Lindeberg, T.; Hawkes, P., Generalized axiomatic scale-space theory, Advances in Imaging and Electron Physics, 1-96 (2013), Amsterdam: Elsevier, Amsterdam
[13] Lindeberg, T., Image matching using generalized scale-space interest points, J. Math. Imaging Vis., 52, 3-36 (2015) · Zbl 1357.94023
[14] Fawzi, A., Frossard, P.: Manitest: are classifiers really invariant? In: British Machine Vision Conference (BMVC 2015) (2015)
[15] Singh, B., Davis, L.S.: An analysis of scale invariance in object detection—SNIP. In: Proceedings of Computer Vision and Pattern Recognition (CVPR 2018), pp. 3578-3587 (2018)
[16] Xu, Y., Xiao, T., Zhang, J., Yang, K., Zhang, Z.: Scale-invariant convolutional neural networks. arXiv preprint arXiv:1411.6369 (2014)
[17] Kanazawa, A., Sharma, A., Jacobs, D.W.: Locally scale-invariant convolutional neural networks. In: NIPS 2014 Deep Learning and Representation Learning Workshop (2014). arXiv preprint arXiv:1412.5104
[18] Marcos, D., Kellenberger, B., Lobry, S., Tuia, D.: Scale equivariance in CNNs with vector fields. In: ICML/FAIM 2018 Workshop on Towards Learning with Limited Labels: Equivariance, Invariance, and Beyond (2018). arXiv preprint arXiv:1807.11783
[19] Ghosh, R., Gupta, A.K.: Scale steerable filters for locally scale-invariant convolutional neural networks. In: ICML Workshop on Theoretical Physics for Deep Learning (2019). arXiv preprint arXiv:1906.03861
[20] Jansson, Y., Lindeberg, T.: Exploring the ability of CNNs to generalise to previously unseen scales over wide scale ranges. In: International Conference on Pattern Recognition (ICPR 2020), pp. 1181-1188 (2021)
[21] Jansson, Y., Lindeberg, T.: MNIST Large Scale dataset. Zenodo (2020). https://www.zenodo.org/record/3820247
[22] Sosnovik, I., Szmaja, M., Smeulders, A.: Scale-equivariant steerable networks. In: International Conference on Learning Representations (ICLR 2020) (2020)
[23] Worrall, D., Welling, M.: Deep scale-spaces: Equivariance over scale. In: Advances in Neural Information Processing Systems (NeurIPS 2019), pp. 7366-7378 (2019)
[24] Lindeberg, T.: Provably scale-covariant networks from oriented quasi quadrature measures in cascade. In: Proceedings of Scale Space and Variational Methods in Computer Vision (SSVM 2019), vol. 11603 of Springer LNCS, pp. 328-340 (2019)
[25] Bekkers, E.J.: B-spline CNNs on Lie groups. In: International Conference on Learning Representations (ICLR 2020) (2020)
[26] Singh, B., Najibi, M., Sharma, A., Davis, L.S.: Scale normalized image pyramids with AutoFocus for object detection. IEEE Trans. Pattern Anal. Mach. Intell. (2021)
[27] Li, Y., Chen, Y., Wang, N., Zhang, Z.: Scale-aware trident networks for object detection. In: Proceedings of International Conference on Computer Vision (ICCV 2019), pp. 6054-6063 (2019)
[28] Schiele, B.; Crowley, J., Recognition without correspondence using multidimensional receptive field histograms, Int. J. Comput. Vis., 36, 31-50 (2000)
[29] Linde, O.; Lindeberg, T., Object recognition using composed receptive field histograms of higher dimensionality, Int. Conf. Pattern Recognit., 2, 1-6 (2004)
[30] Laptev, I., Lindeberg, T.: Local descriptors for spatio-temporal recognition. In: Proceedings of ECCV’04 Workshop on Spatial Coherence for Visual Motion Analysis, vol. 3667 of Springer LNCS, Prague, Czech Republic, pp. 91-103 (2004)
[31] Linde, O.; Lindeberg, T., Composed complex-cue histograms: an investigation of the information content in receptive field based image descriptors for object recognition, Comput. Vis. Image Underst., 116, 538-560 (2012)
[32] Larsen, A.B.L., Darkner, S., Dahl, A.L., Pedersen, K.S.: Jet-based local image descriptors. In: Proceedings of European Conference on Computer Vision (ECCV 2012), vol. 7574 of Springer LNCS, pp. 638-650. Springer (2012)
[33] Cohen, T., Welling, M.: Group equivariant convolutional networks. In: International Conference on Machine Learning (ICML 2016), pp. 2990-2999 (2016)
[34] Jaderberg, M., Simonyan, K., Zisserman, A., Kavukcuoglu, K.: Spatial transformer networks. In: Proceedings of Neural Information Processing Systems (NIPS 2015), pp. 2017-2025 (2015)
[35] Lin, C.H., Lucey, S.: Inverse compositional spatial transformer networks. In: Proceedings of Computer Vision and Pattern Recognition (CVPR 2017), pp. 2568-2576 (2017)
[36] Finnveden, L., Jansson, Y., Lindeberg, T.: Understanding when spatial transformer networks do not support invariance, and what to do about it. In: International Conference on Pattern Recognition (ICPR 2020), pp. 3427-3434 (2021). Extended version in arXiv:2004.11678
[37] Jansson, Y., Maydanskiy, M., Finnveden, L., Lindeberg, T.: Inability of spatial transformations of CNN feature maps to support invariant recognition. arXiv preprint arXiv:2004.14716 (2020)
[38] Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., LeCun, Y.: OverFeat: integrated recognition, localization and detection using convolutional networks. arXiv preprint arXiv:1312.6229 (2013)
[39] Girshick, R.: Fast R-CNN. In: Proceedings of International Conference on Computer Vision (ICCV 2015), pp. 1440-1448 (2015)
[40] Lin, T.Y., Dollár, P., Girshick, R., He, K., Hariharan, B., Belongie, S.: Feature pyramid networks for object detection. In: Proceedings of Computer Vision and Pattern Recognition (CVPR 2017) (2017)
[41] Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of International Conference on Computer Vision (ICCV 2017), pp. 2980-2988 (2017)
[42] He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of International Conference on Computer Vision (ICCV 2017), pp. 2961-2969 (2017)
[43] Hu, P., Ramanan, D.: Finding tiny faces. In: Proceedings of Computer Vision and Pattern Recognition (CVPR 2017), pp. 951-959 (2017)
[44] Ren, S.; He, K.; Girshick, R.; Zhang, X.; Sun, J., Object detection networks on convolutional feature maps, IEEE Trans. Pattern Anal. Mach. Intell., 39, 1476-1481 (2016)
[45] Nah, S., Kim, T.H., Lee, K.M.: Deep multi-scale convolutional neural network for dynamic scene deblurring. In: Proceedings of Computer Vision and Pattern Recognition (CVPR 2017), pp. 3883-3891 (2017)
[46] Chen, LC; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, AL, DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs, IEEE Trans. Pattern Anal. Mach. Intell., 40, 834-848 (2017)
[47] Yang, F., Choi, W., Lin, Y.: Exploit all the layers: fast and accurate CNN object detector with scale dependent pooling and cascaded rejection classifiers. In: Proceedings of Computer Vision and Pattern Recognition (CVPR 2016), pp. 2129-2137 (2016)
[48] Cai, Z., Fan, Q., Feris, R.S., Vasconcelos, N.: A unified multi-scale deep convolutional neural network for fast object detection. In: Proceedings of European Conference on Computer Vision (ECCV 2016), vol. 9908 of Springer LNCS, pp. 354-370 (2016)
[49] Yu, F., Koltun, V.: Multi-scale context aggregation by dilated convolutions. In: Internation Conference on Learning Representations (ICLR 2016) (2016)
[50] Yu, F., Koltun, V., Funkhouser, T.: Dilated residual networks. In: Proceedings of Computer Vision and Pattern Recognition (CVPR 2017), pp. 472-480 (2017)
[51] Mehta, S., Rastegari, M., Caspi, A., Shapiro, L., Hajishirzi, H.: ESPNet: efficient spatial pyramid of dilated convolutions for semantic segmentation. In: Proceedings of European Conference on Computer Vision (ECCV 2018), pp. 552-568 (2018)
[52] Zhang, R., Tang, S., Zhang, Y., Li, J., Yan, S.: Scale-adaptive convolutions for scene parsing. In: Proceedings of International Conference on Computer Vision (ICCV 2017), pp. 2031-2039 (2017)
[53] Wang, H., Kembhavi, A., Farhadi, A., Yuille, A.L., Rastegari, M.: ELASTIC: improving CNNs with dynamic scaling policies. In: Proceedings of Computer Vision and Pattern Recognition (CVPR 2019), pp. 2258-2267 (2019)
[54] Chen, Y., Fang, H., Xu, B., Yan, Z., Kalantidis, Y., Rohrbach, M., Yan, S., Feng, J.: Drop an octave: Reducing spatial redundancy in convolutional neural networks with octave convolution. In: Proceedings of International Conference on Computer Vision (ICCV 2019) (2019)
[55] Iijima, T.: Basic theory on normalization of pattern (in case of typical one-dimensional pattern). Bull. Electrotech. Lab. 26, 368-388 (1962). (in Japanese)
[56] Witkin, A.P.: Scale-space filtering. In: Proceedings of 8th International Joint Conference on Artifical Intelligence, Karlsruhe, Germany, pp. 1019-1022 (1983)
[57] Koenderink, JJ, The structure of images, Biol. Cybern., 50, 363-370 (1984) · Zbl 0537.92011
[58] Babaud, J.; Witkin, AP; Baudin, M.; Duda, RO, Uniqueness of the Gaussian kernel for scale-space filtering, IEEE Trans. Pattern Anal. Mach. Intell., 8, 26-33 (1986) · Zbl 0574.93054
[59] Koenderink, JJ; van Doorn, AJ, Generic neighborhood operators, IEEE Trans. Pattern Anal. Mach. Intell., 14, 597-605 (1992)
[60] Lindeberg, T., Scale-Space Theory in Computer Vision (1993), New York: Springer, New York · Zbl 0812.68040
[61] Lindeberg, T., Scale-space theory: a basic tool for analysing structures at different scales, J. Appl. Stat., 21, 225-270 (1994)
[62] Florack, LMJ, Image Structure (1997), New York: Springer, New York
[63] Weickert, J.; Ishikawa, S.; Imiya, A., Linear scale-space has first been proposed in Japan, J. Math. Imaging Vis., 10, 237-252 (1999) · Zbl 1002.68177
[64] ter Haar Romeny, B., Front-End Vision and Multi-Scale Image Analysis (2003), New York: Springer, New York
[65] Lindeberg, T., Generalized Gaussian scale-space axiomatics comprising linear scale-space, affine scale-space and spatio-temporal scale-space, J. Math. Imaging Vis., 40, 36-81 (2011) · Zbl 1255.68250
[66] Lindeberg, T., A computational theory of visual receptive fields, Biol. Cybern., 107, 589-635 (2013) · Zbl 1294.92009
[67] Bruna, J.; Mallat, S., Invariant scattering convolution networks, IEEE Trans. Pattern Anal. Mach. Intell., 35, 1872-1886 (2013)
[68] Sifre, L., Mallat, S.: Rotation, scaling and deformation invariant scattering for texture discrimination. In: Proceedings of Computer Vision and Pattern Recognition (CVPR 2013), pp. 1233-1240 (2013)
[69] Oyallon, E., Mallat, S.: Deep roto-translation scattering for object classification. In: Proceedings of Computer Vision and Pattern Recognition (CVPR 2015), pp. 2865-2873 (2015)
[70] Jacobsen, J.J., van Gemert, J., Lou, Z., Smeulders, A.W.M.: Structured receptive fields in CNNs. In: Proceedings of Computer Vision and Pattern Recognition (CVPR 2016), pp. 2610-2619 (2016)
[71] Luan, S.; Chen, C.; Zhang, B.; Han, J.; Liu, J., Gabor convolutional networks, IEEE Trans. Pattern Anal. Mach. Intell., 27, 4357-4366 (2018)
[72] Shelhamer, E., Wang, D., Darrell, T.: Blurring the line between structure and learning to optimize and adapt receptive fields. arXiv preprint arXiv:1904.11487 (2019)
[73] Henriques, JF; Vedaldi, A., Warped convolutions: efficient invariance to spatial transformations, Int. Conf. Mach. Learn., 70, 1461-1469 (2017)
[74] Esteves, C., Allen-Blanchette, C., Zhou, X., Daniilidis, K.: Polar transformer networks. In: International Conference on Learning Representations (ICLR 2018) (2018)
[75] Poggio, TA; Anselmi, F., Visual Cortex and Deep Networks: Learning Invariant Representations (2016), Cambridge: MIT Press, Cambridge
[76] Laptev, D., Savinov, N., Buhmann, J.M., Pollefeys, M.: TI-pooling: transformation-invariant pooling for feature learning in convolutional neural networks. In: Proceedings of Computer Vision and Pattern Recognition (CVPR 2016), pp. 289-297 (2016)
[77] Kondor, R., Trivedi, S.: On the generalization of equivariance and convolution in neural networks to the action of compact groups. In: International Conference on Machine Learning (ICML 2018) (2018)
[78] Lindeberg, T., Normative theory of visual receptive fields, Heliyon, 7, e05897, 1-20 (2021)
[79] Roux, N.L., Bengio, Y.: Continuous neural networks. In: Artificial Intelligence and Statistics (AISTATS 2007), vol. 2 of Proceedings of Machine Learning Research, pp. 404-411 (2007)
[80] Wang, S., Suo, S., Ma, W.C., Pokrovsky, A., Urtasun, R.: Deep parametric continuous convolutional neural networks. In: Proceedings of Computer Vision and Pattern Recognition (CVPR 2018), pp. 2589-2597 (2018)
[81] Wu, W., Qi, Z., Fuxin, L.: PointConv: deep convolutional networks on 3D point clouds. In: Proceedings of Computer Vision and Pattern Recognition (CVPR 2019), pp. 9621-9630 (2019)
[82] Shocher, A., Feinstein, B., Haim, N., Irani, M.: From discrete to continuous convolution layers. arXiv preprint arXiv:2006.11120 (2020)
[83] Duits, R., Smets, B., Bekkers, E., Portegies, J.: Equivariant deep learning via morphological and linear scale space PDEs on the space of positions and orientations. In: Proceedings of Scale Space and Variational Methods in Computer Vision (SSVM 2021), vol. 12679 of Springer LNCS, pp. 27-39 (2021) · Zbl 1484.68206
[84] L. Ruthotto and E. Haber Deep neural networks motivated by partial differential equations. J. Math. Imaging Vis. 62, 352-364 2020 · Zbl 1434.68522
[85] Z. Shen and L. He and Z. Lin and J. Ma Partial differential operator based equivariant convolutions. In: International Conference on Machine Learning (ICML 2020). 8697-8706 2020
[86] Duits, R.; Florack, L.; de Graaf, J.; ter Haar Romeny, B., On the axioms of scale space theory, J. Math. Imaging Vis., 22, 267-298 (2004) · Zbl 1435.94064
[87] Lindeberg, T., Invariance of visual operations at the level of receptive fields, PLoS ONE, 8, e66990 (2013)
[88] Lindeberg, T.: On the axiomatic foundations of linear scale-space. In: Sporring, J., Nielsen, M., Florack, L., Johansen, P. (eds.) Gaussian Scale-Space Theory: Proceedings, pp. 75-97. PhD School on Scale-Space Theory, Copenhagen, Denmark, Springer (1996)
[89] Pauwels, EJ; Fiddelaers, P.; Moons, T.; van Gool, LJ, An extended class of scale-invariant and recursive scale-space filters, IEEE Trans. Pattern Anal. Mach. Intell., 17, 691-701 (1995)
[90] Felsberg, M.; Sommer, G., The monogenic scale-space: a unifying approach to phase-based image processing in scale-space, J. Math. Imaging Vis., 21, 5-26 (2004) · Zbl 1478.94037
[91] Koenderink, JJ; van Doorn, AJ, Representation of local geometry in the visual system, Biol. Cybern., 55, 367-375 (1987) · Zbl 0617.92024
[92] Lindeberg, T., Dense scale selection over space, time and space-time, SIAM J. Imag. Sci., 11, 407-441 (2018) · Zbl 1401.65024
[93] Valois, RLD; Cottaris, NP; Mahon, LE; Elfer, SD; Wilson, JA, Spatial and temporal receptive fields of geniculate and cortical cells and directional selectivity, Vis. Res., 40, 3685-3702 (2000)
[94] LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P., Gradient-based learning applied to document recognition, Proc. IEEE, 86, 2278-2324 (1998)
[95] Paszke, A., Gross, S., Chintala, S., Chanan, G., Yang, E., De Vito, Z., Lin, Z., Desmaison, A., Antiga, L., Lerer, A.: Automatic differentiation in PyTorch. In: Proceedings of Neural Information Processing Systems (NIPS 2017) (2017)
[96] Kingma, P.D., Ba, J.: Adam: a method for stochastic optimization. In: International Conference for Learning Representations (ICLR 2015) (2015)
[97] Lindeberg, T., Scale-space for discrete signals, IEEE Trans. Pattern Anal. Mach. Intell., 12, 234-254 (1990)
[98] Lindeberg, T., Discrete derivative approximations with scale-space properties: a basis for low-level feature extraction, J. Math. Imaging Vis., 3, 349-376 (1993)
[99] Lindeberg, T.; Ikeuchi, K., Scale selection, Computer Vision (2021), Berlin: Springer, Berlin · Zbl 0812.68040 · doi:10.1007/978-3-030-03243-2_242-1
[100] Loog, M., Li, Y., Tax, D.M.J.: Maximum membership scale selection. In: Multiple Classifier Systems, vol. 5519 of Springer LNCS, pp. 468-477 (2009)
[101] Li, Y.; Tax, DMJ; Loog, M., Scale selection for supervised image segmentation, Image Vis. Comput., 30, 991-1003 (2012)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.