×

An information-geometric approach to a theory of pragmatic structuring. (English) Zbl 1010.62007

From the paper: In the field of neural networks, so-called infomax principles like the principle of “maximum information preservation” by R. Linsker [Computer 21, 105-117 (1988)] are formulated to derive learning rules that improve the information processing properties of neural systems. These principles, which are based on information-theoretic measures, are intended to describe the mechanism of learning in the brain. There, the starting point is a low-dimensional and biophysiologically motivated parametrization of the neural system, which need not necessarily be compatible with the given optimization principle. In contrast to this, we establish theoretical results about the low complexity of optimal solutions for the optimization problem of frequently used measures like the mutual information in an unconstrained and more theoretical setting. We do not comment on applications to modeling neural networks.
Within the framework of information geometry, the interaction among units of a stochastic system is quantified in terms of the Kullback-Leibler divergence of the underlying joint probability distribution from an appropriate exponential family. In the present paper, the main example for such a family is given by the set of all factorizable random fields. Motivated by this example, the locally farthest points from an arbitrary exponential family \({\mathcal E}\) are studied. In the corresponding dynamical setting, such points can be generated by the structuring process with respect to \({\mathcal E}\) as a repelling set. The main results concern the low complexity of such distributions which can be controlled by the dimension of \({\mathcal E}\).

MSC:

62B10 Statistical aspects of information-theoretic topics
62M40 Random fields; image analysis
62M45 Neural nets and related approaches to inference from stochastic processes
68T05 Learning and adaptive systems in artificial intelligence
92B20 Neural networks for/in biological studies, artificial life and related topics
92C20 Neural biology
Full Text: DOI

References:

[1] AMARI, S.-I. (1985). Differential-Geometric Methods in Statistics. Lecture Notes in Statist. 28. Springer, Berlin. · Zbl 0559.62001
[2] AMARI, S.-I. (1997). Information geometry. Contemp. Math. 203 81-95. · Zbl 0881.62034 · doi:10.2307/3318651
[3] AMARI, S.-I. (2001). Information geometry on hierarchy of probability distributions. IEEE Trans. Inform. Theory 47 1701-1711. · Zbl 0997.94009 · doi:10.1109/18.930911
[4] AMARI, S.-I. and NAGAOKA, H. (2000). Methods of Information Geometry. Math. Monogr. 191. Oxford Univ. Press. · Zbl 0960.62005
[5] AMARI, S.-I., BARNDORFF-NIELSEN, O. E., KASS, R. E., LAURITZEN, S. L. and RAO, C. R. (1987). Differential Geometry in Statistical Inference. IMS, Hayward, CA. · Zbl 0694.62001
[6] AY, N. (2000). Aspekte einer Theorie pragmatischer Informationsstrukturierung. Ph.D. dissertation, Univ. Leipzig.
[7] BOOTHBY, W. M. (1975). An Introduction to Differentiable Manifolds and Riemannian Geometry. Pure Appl. Math. 63. Academic Press, New York. · Zbl 0333.53001
[8] BRONDSTED, A. (1983). An Introduction to Convex Polytopes. Springer, New York.
[9] COVER, T. M. and THOMAS, J. A. (1991). Elements of Information Theory. WileyInterscience, New York. · Zbl 0762.94001
[10] CSISZÁR, I. (1967). On topological properties of f -divergence. Studia Sci. Math. Hungar. 2 329-339.
[11] CSISZÁR, I. (1975). I -divergence geometry of probability distributions and minimization problems. Ann. Probab. 3 146-158. · Zbl 0318.60013 · doi:10.1214/aop/1176996454
[12] DECO, G. and OBRADOVIC, D. (1996). An Information-Theoretic Approach to Neural Computing. Perspectives in Neural Computing. Springer, New York. · Zbl 0849.68103
[13] FUJIWARA, A. and AMARI, S.-I. (1995). Gradient systems in view of information geometry. Phys. D 80 317-327. · Zbl 0883.53020 · doi:10.1016/0167-2789(94)00175-P
[14] GZYL, H. (1995). The Method of Maximum Entropy. Ser. Adv. Math. Appl. Sci. 29. World Scientific, Singapore. · Zbl 0822.62001
[15] HIRSCH, M. and SMALE, S. (1974). Differential Equations, Dynamical Systems, and Linear Algebra. Academic Press, New York. · Zbl 0309.34001
[16] INGARDEN, R. S., KOSSAKOWSKI A. and OHYA M. (1997). Information Dynamics and Open Systems, Classical and Quantum Approach. Kluwer, Dordrecht. · Zbl 0891.94007
[17] JAYNES, E. T. (1957). Information theory and statistical mechanics. Phys. Rev. 106. · Zbl 0084.43701 · doi:10.1103/PhysRev.106.620
[18] KULLBACK, S. (1968). Information Theory and Statistics. Dover, Mineola, NY. · Zbl 0149.37901
[19] KULLBACK, S. and LEIBLER, R. A. (1951). On information and sufficiency. Ann. Math. Statist. 22 79-86. · Zbl 0042.38403 · doi:10.1214/aoms/1177729694
[20] LINSKER, R. (1988). Self-organization in a perceptual network. Computer 21 105-117.
[21] MARTIGNON, L., VON HASSELN, H., GRÜN, S., AERTSEN, A. and PALM, G. (1995). Detecting higher-order interactions among the spiking events in a group of neurons. Biol. Cybernet. 73 69-81. · Zbl 0826.92008 · doi:10.1007/BF00199057
[22] MURRAY, M. K. and RICE, J. W. (1994). Differential Geometry and Statistics. Chapman and Hall, London. · Zbl 0804.53001
[23] NAGAOKA, H. and AMARI, S. (1982). Differential geometry of smooth families of probability distributions. AETR 82-7, Univ. Tokyo.
[24] NAKAMURA, Y. (1993). Completely integrable gradient systems on the manifolds of gaussian and multinomial distributions. Japan J. Indust. Appl. Math. 10 179-189. · Zbl 0814.58021 · doi:10.1007/BF03167571
[25] RAO, C. R. (1945). Information and the accuracy attainable in the estimation of statistical parameters. Bull. Calcutta Math. Soc. 37 81-91. · Zbl 0063.06420
[26] ROCKAFELLAR, R. T. and WETS, J. B. R. (1998). Variational Analysis. Springer, New York. · Zbl 0888.49001
[27] ROMAN, S. (1992). Coding and Information Theory. Springer, New York. · Zbl 0752.94001
[28] SHANNON, C. E. (1948). A mathematical theory of communication. Bell System Tech. J. 27 379-423, 623-656. · Zbl 1154.94303
[29] VAPNIK, V. (1998). Statistical Learning Theory. Adaptive and Learning Systems for Signal Processing, Communications, and Control. Wiley, New York. · Zbl 0935.62007
[30] VAPNIK, V. and CHERVONENKIS, A. (1971). On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab. Appl. 16 264-280. · Zbl 0247.60005 · doi:10.1137/1116025
[31] WEBSTER, R. (1994). Convexity. Oxford Univ. Press. · Zbl 0835.52001
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.