×

Learning visual spatial pooling by strong PCA dimension reduction. (English) Zbl 1414.92099

Summary: In visual modeling, invariance properties of visual cells are often explained by a pooling mechanism, in which outputs of neurons with similar selectivities to some stimulus parameters are integrated so as to gain some extent of invariance to other parameters. For example, the classical energy model of phase-invariant V1 complex cells pools model simple cells preferring similar orientation but different phases. Prior studies, such as independent subspace analysis, have shown that phase-invariance properties of V1 complex cells can be learned from spatial statistics of natural inputs. However, those previous approaches assumed a squaring nonlinearity on the neural outputs to capture energy correlation; such nonlinearity is arguably unnatural from a neurobiological viewpoint but hard to change due to its tight integration into their formalisms. Moreover, they used somewhat complicated objective functions requiring expensive computations for optimization. In this study, we show that visual spatial pooling can be learned in a much simpler way using strong dimension reduction based on principal component analysis. This approach learns to ignore a large part of detailed spatial structure of the input and thereby estimates a linear pooling matrix. Using this framework, we demonstrate that pooling of model V1 simple cells learned in this way, even with nonlinearities other than squaring, can reproduce standard tuning properties of V1 complex cells. For further understanding, we analyze several variants of the pooling model and argue that a reasonable pooling can generally be obtained from any kind of linear transformation that retains several of the first principal components and suppresses the remaining ones. In particular, we show how the classic Wiener filtering theory leads to one such variant.

MSC:

92C20 Neural biology
91E40 Memory and learning in psychology

Software:

AlexNet; ImageNet
Full Text: DOI

References:

[1] Adelson, E. H., & Bergen, J. R. (1985). Spatiotemporal energy models for the perception of motion. Journal of the Optical Society of America A, 2(2), 284-299. ,
[2] Anzai, A., Ohzawa, I., & Freeman, R. D. (1999). Neural mechanisms for processing binocular information I. Simple cells. Journal of Neurophysiology, 82(2), 891-908.
[3] Bell, A. J., & Sejnowski, T. J. (1997). The “independent components” of natural scenes are edge filters. Vision Research, 37(23), 3327-3338. ,
[4] Berkes, P., & Wiskott, L. (2005). Slow feature analysis yields a rich repertoire of complex cell properties. Journal of Vision, 5(6), 579-602. , · Zbl 1013.68929
[5] Cox, D. D., Meier, P., Oertelt, N., & DiCarlo, J. J. (2005). “Breaking” position-invariant object recognition. Nature Neuroscience, 8(9), 1145-1147. ,
[6] De Valois, R. L., Albrecht, D. G., & Thorell, L. G. (1982). Spatial frequency selectivity of cells in macaque visual cortex. Vision Research, 22(5), 545-559. ,
[7] De Valois, R. L., Yund, E. W., & Hepler, N. (1982). The orientation and direction selectivity of cells in macaque visual cortex. Vision Research, 22(5), 531-544. ,
[8] Deng, J., Berg, A. C., Li, K., & Fei-Fei, L. (2010). What does classifying more than 10,000 image categories tell us? In Proceedings of the 11th European Conference on Computer Vision (pp. 71-84). New York: Springer.
[9] DiCarlo, J. J., & Cox, D. D. (2007). Untangling invariant object recognition. Trends in Cognitive Sciences, 11(8), 333-341. ,
[10] Doi, E., & Lewicki, M. S. (2011). Characterization of minimum error linear coding with sensory and neural noise. Neural Computation, 23(10), 2498-2510. , · Zbl 1231.94020
[11] Einhäuser, W., Hipp, J., Eggert, J., Körner, E., & König, P. (2005). Learning viewpoint invariant object representations using a temporal coherence principle. Biological Cybernetics, 93(1), 79-90. , · Zbl 1123.91365
[12] Einhäuser, W., Kayser, C., König, P., & Kording, K. P. (2002). Learning the invariance properties of complex cells from their responses to natural stimuli. European Journal of Neuroscience, 15(3), 475-486. ,
[13] Földiák, P. (1991). Learning invariance from transformation sequences. Neural Computation, 3(2), 194-200. ,
[14] Fukushima, K. (1980). Neocognitron: A self organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, 36(4), 193-202. , · Zbl 0419.92009
[15] Hashimoto, W. (2003). Quadratic forms in natural images. Network: Computation in Neural Systems, 14(4), 765-788. ,
[16] Hosoya, H., & Hyvärinen, A. (2015). A hierarchical statistical model of natural images explains tuning properties in V2. Journal of Neuroscience, 35(29), 10412-10428. ,
[17] Hubel, D. H., & Wiesel, T. N. (1962). Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. Journal of Physiology, 160(1), 106. ,
[18] Hurri, J., & Hyvärinen, A. (2003). Simple-cell-like receptive fields maximize temporal coherence in natural video. Neural Computation, 15(3), 663-691. , · Zbl 1046.92015
[19] Hyvärinen, A. (1999). Fast and robust fixed-point algorithms for independent component analysis. IEEE Transactions on Neural Networks, 10(3), 626-634. ,
[20] Hyvärinen, A., & Hoyer, P. (2000). Emergence of phase- and shift-invariant features by decomposition of natural images into independent feature subspaces. Neural Computation, 12(7), 1705-1720. ,
[21] Hyvärinen, A., & Hoyer, P. O. (2001). A two-layer sparse coding model learns simple and complex cell receptive fields and topography from natural images. Vision Research, 41(18), 2413-2423. ,
[22] Hyvärinen, A., Hurri, J., & Hoyer, P. O. (2009). Natural image statistics: A probabilistic approach to early computational vision.New York: Springer. · Zbl 1178.68622
[23] Karklin, Y., & Lewicki, M. S. (2003). Learning higher-order structures in natural images. Network: Computation in Neural Systems, 14(3), 483-499. ,
[24] Karklin, Y., & Lewicki, M. S. (2009). Emergence of complex cell properties by learning to generalize in natural scenes. Nature, 457(7225), 83-86. ,
[25] Kayser, C., Kording, K. P., & König, P. (2003). Learning the nonlinearity of neurons from natural visual stimuli. Neural Computation, 15(8), 1751-1759. , · Zbl 1054.92012
[26] Kohonen, T. (1972). Correlation matrix memories. IEEE Transactions on Computers, c-21, 353-359. , · Zbl 0232.68027
[27] Köster, U., & Hyvärinen, A. (2010). A two-layer model of natural stimuli estimated with score matching. Neural Computation, 22(9), 2308-2333. , · Zbl 1205.68292
[28] Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. In F. Pereira, C. J. C. Burges, L. Bottou, & K. Q. Weinberger (Eds.), Advances in neural information processing systems, 25. Red Hook, NY: Curran.
[29] Lau, B., Stanley, G. B., & Dan, Y. (2002). Computational subunits of visual cortical neurons revealed by artificial neural networks. Proceedings of the National Academy of Sciences of the United States of America, 99(13), 8974-8979. ,
[30] Le, Q. V., Ranzato, M., Monga, R., Devin, M., Corrado, G., Chen, K., Dean, J., & Ng, A. Y. (2012). Building high-level features using large scale unsupervised learning. In Proceedings of the 29th International Conference on Machine Learning. N.p.: International Machine Learning Society.
[31] Lies, J.-P., Häfner, R. M., & Bethge, M. (2014). Slowness and sparseness have diverging effects on complex cell learning. PLoS Computational Biology, 10(3), e1003468. ,
[32] Oja, E., & Karhunen, J. (1985). On stochastic approximation of the eigenvectors and eigenvalues of the expectation of a random matrix. Journal of Mathematical Analysis and Applications, 106(1), 69-84. , · Zbl 0583.62077
[33] Olshausen, B. A., & Field, D. J. (1996). Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature, 381(6583), 607-609. ,
[34] Osindero, S., Welling, M., & Hinton, G. E. (2006). Topographic product models applied to natural scene statistics. Neural Computation, 18(2), 381-414. , · Zbl 1095.68648
[35] Riesenhuber, M., & Poggio, T. (1999). Hierarchical models of object recognition in cortex. Nature Neuroscience, 2(11), 1019-1025. ,
[36] Rojas, R. (1996). Neural networks: A systematic introduction. New York: Springer. · Zbl 0861.68072
[37] Skottun, B. C., De Valois, R. L., Grosof, D. H., Movshon, J. A., Albrecht, D. G., & Bonds, A. B. (1991). Classifying simple and complex cells on the basis of response modulation. Vision Research, 31(7-8), 1078-1086. ,
[38] van Hateren, J. H., & van der Schaaf, A. (1998). Independent component filters of natural images compared with simple cells in primary visual cortex. Proceedings of the Royal Society B: Biological Sciences, 265(1394), 359-366. ,
[39] Wiskott, L., & Sejnowski, T. J. (2002). Slow feature analysis: Unsupervised learning of invariances. Neural Computation, 14(4), 715-770. , · Zbl 0994.68591
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.