Document Zbl 1429.92021

Sánchez Giraldo, Luis Gonzalo; Schwartz, Odelia

Integrating flexible normalization into midlevel representations of deep convolutional neural networks. (English) Zbl 1429.92021

Neural Comput. 31, No. 11, 2138-2176 (2019).

Summary: Deep convolutional neural networks (CNNs) are becoming increasingly popular models to predict neural responses in visual cortex. However, contextual effects, which are prevalent in neural processing and in perception, are not explicitly handled by current CNNs, including those used for neural prediction. In primary visual cortex, neural responses are modulated by stimuli spatially surrounding the classical receptive field in rich ways. These effects have been modeled with divisive normalization approaches, including flexible models, where spatial normalization is recruited only to the degree that responses from center and surround locations are deemed statistically dependent. We propose a flexible normalization model applied to midlevel representations of deep CNNs as a tractable way to study contextual normalization mechanisms in midlevel cortical areas. This approach captures nontrivial spatial dependencies among midlevel features in CNNs, such as those present in textures and other visual stimuli, that arise from tiling high-order features geometrically. We expect that the proposed approach can make predictions about when spatial normalization might be recruited in midlevel cortical areas. We also expect this approach to be useful as part of the CNN tool kit, therefore going beyond more restrictive fixed forms of normalization.

MSC:

92B20

Neural networks for/in biological studies, artificial life and related topics

Keywords:

deep convolutional neural networks; neural response; visual cortex

Software:

ImageNet; Steerable pyramid; AlexNet

Cite Review PDF

Full Text: DOI arXiv

References:

[1]	Albrecht, D. G., & Geisler, W. S. (1991). Motion selectivity and the contrast response function of simple cells in the visual cortex. Visual Neuroscience, 7(6), 531-546. ,
[2]	Andrews, D., & Mallows, C. (1974). Scale mixtures of normal distributions. J. Royal Stat. Soc., 36, 99-102. · Zbl 0282.62017
[3]	Attneave, F. (1954). Some informational aspects of visual perception. Psychological Review, 61(3), 183-193. ,
[4]	Ba, L. J., Kiros, R., & Hinton, G. E. (2016). Layer normalization. CoRR abs/1607.06450.
[5]	Balle, J., Laparra, V., & Simoncelli, E. P. (2016). Density modelling of images using a generalized normalization transformation. In Proceedings of the International Conference on Learning Representations. CoRR abs/1511.06281.
[6]	Barlow, H. B. (1961). Possible principles underlying the transformations of sensory messages. Cambridge, MA: MIT Press.
[7]	Bell, A. J., & Sejnowski, T. J. (1997). The“independent component” of natural scenes are edge filters. Vision Research, 37(23), 3327-3338. ,
[8]	Carandini, M., & Heeger, D. J. (2012). Normalization as a canonical neural computation. Nature Reviews Neuroscience, 13, 51-62. ,
[9]	Carandini, M., Heeger, D. J., & Movshon, J. A. (1997). Linearity and normalization in simple cells of the macaque primary visual cortex. Journal of Neuroscience, 17(21), 8621-8644. ,
[10]	Cavanaugh, J. R., Bair, W., & Movshon, J. A. (2002a). Nature and interaction of signals from the receptive field center and surround in macaque V1 neurons. Journal of Neurophysiology, 88(5), 2530-2546. ,
[11]	Cavanaugh, J. R., Bair, W., & Movshon, J. A. (2002b). Selectivity and spatial distribution of signals from the receptive field surround in macaque V1 neurons. Journal of Neurophysiology, 88(5), 2547-2556. ,
[12]	Cichy, R. M., Khosla, A., Pantazis, D., Torralba, A., & Oliva, A. (2016). Deep neural networks predict hierarchical spatio-temporal cortical dynamics of human visual object recognition. arXiv:1601.02970.
[13]	Coates, A., & Ng, A. Y. (2011). Selecting receptive fields in deep networks. In J. Shawe-Taylor, R. S. Zemel, P. L. Bartlett, F. Pereira, & K. Q. Weinberger (Eds.), Advances in neural information processing systems, 24 (pp. 2528-2536). Red Hook, NY: Curran.
[14]	Coen-Cagli, R., Dayan, P., & Schwartz, O. (2009). Statistical models of linear and nonlinear contextual interactions in early visual processing. In Y. Bengio, D. Schuurmans, J. D. Lafferty, C. C. K. Williams, & A. Culota (Eds.), Advances in neural information processing systems, 22. Red Hook, NY: Curran.
[15]	Coen-Cagli, R., Dayan, P., & Schwartz, O. (2012). Cortical surround interactions and perceptual salience via natural scene statistics. PLoS Computational Biology, 8(3). ,
[16]	Coen-Cagli, R., Kohn, A., & Schwartz, O. (2015). Flexible gating of contextual modulation during natural vision. Nature Neuroscience, 18, 1648-1655. ,
[17]	Coen-Cagli, R., & Schwartz, O. (2013). The impact on mid-level vision of statistically optimal divisive normalization in V1. Journal of Vision, 13(8). ,
[18]	Field, D. J. (1987). Relations between the statistics of natural images and the response properties of cortical cells. Journal of the Optical Society of America, 4(12), 2379-2394. ,
[19]	Freeman, J., Ziemba, C. M., Heeger, D. J., Simoncelli, E. P., & Movshon, J. A. (2013). A functional and perceptual signature of the second visual area in primates. Nature Neuroscience, 16(7), 974-981. ,
[20]	Gatys, L. A., Ecker, A. S., & Bethge, M. (2015). Texture synthesis using convolutional neural networks. In C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, & R. Garne (Eds.), Advances in neural information processing systems, 28. Red Hook, NY: Curran.
[21]	Geirhos, R., Rubisch, P., Michaelis, C., Bethge, M., Wichmann, F. A., & Brendel, W. (2019). Imagenet-trained CNNs are biased towards texture: Increasing shape bias improves accuracy and robustness. In Proceedings of the International Conference on Learning Representations. CoRR abs/1811.12231.
[22]	Geisler, W. S. (2008). Visual perception and the statistical properties of natural scenes. Annual Review of Psychology, 59, 167-192. ,
[23]	Golden, J. R., Vilankar, K. P., Wu, M. C., & Field, D. J. (2016). Conjectures regarding the nonlinear geometry of visual neurons. Vision Research, 120, 74-92. ,
[24]	Han, S., & Vasconcelos, N. (2010). Biologically plausible saliency mechanisms improve feedforward object recognition. Vision Research, 50(22), 2295-2307. ,
[25]	Han, S., & Vasconcelos, N. (2014). Object recognition with hierarchical discriminant saliency networks. Frontiers in Computational Neuroscience, 8, 109. ,
[26]	Heeger, D. J. (1992). Normalization of cell responses in cat striate cortex. Visual Neuroscience, 9, 181-197. ,
[27]	Hosoya, H., & Hyvärinen, A. (2015). A hierarchical statisitical model of natural images explains tuning properties in V2. Journal of Neuroscience, 35(29), 10412-10428. ,
[28]	Hyvärinen, A., Hurri, J., & Hoyer, P. O. (2009). Natural image statistics: A probabilistic approach to early computational vision. Berlin: Springer. , · Zbl 1178.68622
[29]	Ioffe, S., & Szegedy, C. (2015). Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of the 32nd International Conference on International Conference on Machine Learning (pp. 448-456).
[30]	Ito, M., & Komatsu, H. (2004). Representation of angles embedded within contour stimuli in area V2 of macaque monkeys. Journal of Neuroscience, 24(13), 3313-3324. ,
[31]	Jarrett, K., Kavukcuoglu, K., Ranzato, M., & LeCun, Y. (2009). What is the best multi-stage architecture for object recognition? In Proceedings of the International Conference on Computer Vision (pp. 2146-2153). Piscataway, NJ: IEEE. ,
[32]	Karklin, Y., & Lewicki, M. S. (2009). Emergence of complex cell properties by learning to generalize in natural scenes. Nature, 457(1), 83-87. ,
[33]	Kim, T., Bair, W., & Pasupathy, A. (2019). Neural coding for shape and texture in macaque area V4. Journal of Neuroscience, 39, 4760-4774. ,
[34]	Kriegeskorte, N. (2015). Deep neural networks: A new framework for modeling biological vision and brain information processing. Annual Review of Vision Science, 1, 417-446. ,
[35]	Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In F. Pereira, C. J. C. Burges, L. Bottou, & K. Q. Weinberger (Eds.), Neural information processing systems, 25. Red Hook, NY: Curran.
[36]	Laskar, M. N. U., Sanchez Giraldo, L. G., & Schwartz, O. (2018). Correspondence of deep neural networks and the brain for visual textures. CoRR abs/1806.02888.
[37]	Levitt, J. B., & Lund, J. S. (1997). Contrast dependence of contextual effects in primate visual cortex. Nature, 387, 73-76. ,
[38]	Li, Z. (1999). Visual segmentation by contextual influences via intra-cortical interactions in the primary visual cortex. Network, 10(2), 187-212. , · Zbl 0939.92009
[39]	Lochmann, T., Ernst, U. A., & Deneve, S. (2012). Perceptual inference predicts contextual modulations of sensory responses. Journal of Neuroscience, 32(12), 4179-4195. ,
[40]	Olshausen, B. A., & Field, J. (1997). Sparse coding with an overcomplete basis set: A strategy employed by V1?Vision Research, 37(23), 3311-3325. ,
[41]	Olshausen, B. A., & Lewicki, M. S. (2014). What natural scenes statistics can tell us about cortical representation. In J. S. Werner & L. M. Chalupa (Eds.), New visual neurosciences (pp. 1247-1262). Cambridge, MA: MIT Press.
[42]	Poggio, T., & Anselmi, F. (2016). Visual cortex and deep networks: Learning invariant representations. Cambridge, MA: MIT Press. ,
[43]	Portilla, J., & Simoncelli, E. P. (2000). A parametric texture model based on joint statistics of complex wavelet coefficients. International Journal of Computer Vision, 40(1), 49-70. , · Zbl 1012.68698
[44]	Pospisil, D., Pasupathy, A., & Bair, W. (2016). Comparing the brain’s representation of shape to that of a deep convolutional neural network. In Proceedings of the 9th EAI International Conference on Bio-Inspired Information and Communications Technologies (pp. 516-523). Brussels: Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering.
[45]	Pospisil, D. A., Pasupathy, A., & Bair, W. (2018).“Artiphysiolog” reveals V4-like shape tuning in a deep network trained for image classification. eLife, 7, e38242. ,
[46]	Rao, R. P., & Ballard, D. H. (1999). Predictive coding in the visual cortex: A functional interpretation of some extra-classical receptive-field effects. Nature Neuroscience, 2(1), 79-87. ,
[47]	Ren, M., Liao, R., Urtasun, R., Sinz, F. H., & Zemel, R. S. (2017). Normalizing the normalizers: Comparing and extending network normalization schemes. In Proceedings of the 5th International Conference on Learning Representations. CoRR abs/1611.12231.
[48]	Rowekamp, R. J., & Sharpee, T. O. (2017). Cross-orientation suppression in visual area V2. Nature Communications, 8. ,
[49]	Sceniak, M. P., Ringach, D. L., Hawken, M. J., & Shapley, R. (1999). Contrast’s effect on spatial summation by macaque V1 neurons. Nature Neuroscience, 2(8), 733-739. ,
[50]	Schmid, A. M., & Victor, J. D. (2014). Possible functions of contextual modulations and receptive field nonlinearities: Pop-out and texture segmentation. Vision Research, 104, 57-67. ,
[51]	Schwartz, O., Sejnowski, T. J., & Dayan, P. (2006). Soft mixer assignment in a hierarchichal generative model of natural scene statistics. Neural Computation, 18(11), 2680-2718. , · Zbl 1162.94308
[52]	Schwartz, O., Sejnowski, T., & Dayan, P. (2009). Perceptual organization in the tilt illusion. Journal of Vision, 9(4), 1-20. ,
[53]	Schwartz, O., & Simoncelli, E. P. (2001). Natural signal statistics and sensory gain control. Nature Neuroscience, 4(8), 819-825. ,
[54]	Shushruth, S., Ichida, J. M., Levitt, J. B., & Angelucci, A. (2009). Comparison of spatial summation properties of neurons in macaque V1 and V2. Journal of Neurophysiology, 102(4), 2069-2083. PMID:19657084. ,
[55]	Simoncelli, E. P. (1997). Statistical models for images: Compression, restoration and synthesis. In Proceedings of the 31st Asilomar Conference on Signals, Systems, and Computers (pp. 673-678). Washington, DC: IEEE Computer Society. ,
[56]	Simoncelli, E. P., & Olshausen, B. A. (2001). Natural image statistics and neural representation. Annual Reviews Neuroscience, 24, 1193-1216. ,
[57]	Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. In Proceedings of the 3rd International Conference on Learning Representations. CoRR abs/1409.1556.
[58]	Spratling, M. W. (2010). Predictive coding as a model of response properties in cortical area V1. Journal of Neuroscience, 30(9), 3531-3543. ,
[59]	Wainwright, M. J., & Simoncelli, E. P. (2000). Scale mixtures of Gaussians and the statistics of natural images. In S. A. Solla, T. K. Leen, & K.-R. Müller (Eds.), Advances in neural information processing systems, 12 (pp. 855-861). Cambridge, MA: MIT Press.
[60]	Wainwright, M. J., Simoncelli, E. P., & Willsky, A. S. (2001). Random cascades on wavelet trees and their use in modeling and analyzing natural imagery. Applied and Computational Harmonic Analysis, 11(1), 89-123. , · Zbl 0983.68228
[61]	Wegmann, B., & Zetzsche, C. (1990). Visual-system-based polar quantization of local amplitude and local phase of orientation filter outputs. In M. Kunt (Ed.), Human vision and electronic imaging: Models, methods, and applications. Bellingham, WA: SPIE. ,
[62]	Yamins, D. L., & DiCarlo, J. J. (2016). Using goal-driven deep learning models to understand sensory cortex. Nature Neuroscience, 19(3), 356-365. ,
[63]	Zeiler, M. D., & Fergus, R. (2014). Visualizing and understanding convolutional networks. In D. Fleet, T. Pajdla, B. Schiele, & T. Tuytelaars (Eds.), Computer Vision—ECCV 2014 (pp. 818-833). Cham: Springer International. ,
[64]	Zetzshe, C., & Nuding, U. (2005). Nonlinear and higher-order approaches to the encoding of natural scenes. Network, 16(2-3), 191-221. ,
[65]	Zhaoping, L. (2005). Border ownership from intracortical interactions in visual area V2. Neuron, 47(1), 143-153. ,
[66]	Zhaoping, L. (2006). Theoretical understanding of the early visual processes by data compression and data selection. Network, 17(4), 301-334. ,
[67]	Zhaoping, L. (2014). Understanding vision: Theory, models, and data. Oxford: Oxford University Press. ,
[68]	Zhou, H., Friedman, H. S., & von der Heydt, R. (2000). Coding of border ownership in monkey visual cortex. Journal of Neuroscience, 20(17), 6594-6611. ,
[69]	Zhu, M., & Rozell, C. J. (2013). Visual nonclassical receptive field effects emerge from sparse coding in a dynamical system. PLOS Computational Biology, 9(8). ,
[70]	Ziemba, C. M., Freeman, J., Movshon, J. A., & Simoncelli, E. P. (2016). Selectivity and tolerance for visual texture in macaque V2. Proceedings of the National Academy of Sciences, 113(22), E3140-E3149. ,
[71]	Ziemba, C. M., Freeman, J., Simoncelli, E. P., & Movshon, J. A. (2018). Contextual modulation of sensitivity to naturalistic image structure in macaque v2. Journal of Neurophysiology, 120, 409-430. ,

This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.