×

On the capabilities of multilayer perceptrons. (English) Zbl 0648.68085

What is the smallest multilayer perceptron able to compute arbitrary and random functions? Previous results show that a net with one hidden layer containing N-1 threshold units is capable of implementing an arbitrary dichotomy of N points. A construction is presented here for implementing an arbitrary dichotomy with one hidden layer containing \(\lceil N/d\rceil\) units, for any set of N points in general position in d dimensions. This is in fact the smallest such net as dichotomies which cannot be implemented by any net with fewer units are described.
Several constructions are presented of one-hidden-layer nets implementing arbitrary functions into the e-dimensional hypercube. One of these has only \(\lfloor 4N/d\rfloor \lceil e/\lfloor \log_ 2(N/d)\rfloor \rceil\) units in its hidden layer. Arguments based on a function counting theorem of T. M. Cover [IEEE Trans. Electron. Comput. EC-14, 326-334 (1965; Zbl 0152.182)] establish that any net implementing arbitrary functions must have at least N e/log\({}_ 2(N)\) weights, so that no net with one hidden layer containing less than N e/(d log\({}_ 2(N))\) units will suffice. Simple counts also show that if the weights are only allowed to assume one of \(n_ g\) possible values, no net with fewer than N e/log\({}_ 2(n_ g)\) weights will suffice. Thus the gain coming from using real valued synapses appears to be only logarithmic. The circuit implementing functions into the e hypercube realizes such logarithmic gains.
Since the counting arguments limit below only the number of weights, the possibility is suggested that, if suitable restrictions are imposed on the input vector set to avoid topological obstructions, two-hidden-layer nets with O(N) weights but only O(\(\sqrt{N})\) threshold units might suffice for arbitrary dichotomies. Interesting and potentially sufficient restrictions include (a) if the vectors are binary, i.e., lie on the d hypercube, or (b) if they are randomly and uniformly selected from a bounded region.

MSC:

68T05 Learning and adaptive systems in artificial intelligence

Citations:

Zbl 0152.182
Full Text: DOI

References:

[1] Abu-mostafa, Y. S., Number of synapses per neuron, (Mead, C. A., Analog VLSI and Neural Systems (1987), Addison-Wesley: Addison-Wesley Reading, MA)
[2] Abu-mostafa, Y. S., Random problems, J. Complexity (1987), in press
[3] Abu-mostafa, Y. S.; Psaltis, D., Optical neural computers, Sci. Amer., 256, No. 3, 88-95 (1987)
[4] Abu-mostafa, Y. S.; St. Jacques, J., Information capacity of the Hopfield model, IEEE Trans Inf. Theory, IT31, 461-464 (1985) · Zbl 0571.94030
[5] Ackley, D. H.; Hinton, G. E.; Sejnowski, T. J., A learning algorithm for Boltzmann machines, Cog. Sci., 9, 147-169 (1985)
[6] Baum, E. B., Generalizing back propagation to computation, (Denker, J., Neural Networks for Computing. Neural Networks for Computing, AIP Conf. Proc., 151 (1986), Snowbird: Snowbird UT), 47-53
[7] Baum, E. B.haussler, D.; Baum, E. B.haussler, D.
[8] Baum, E. B.; Moody, J.; Wilczek, F., Internal representations for associative memory (1986), Biol. Cybernetics, to appear · Zbl 0655.68022
[9] Blumer, A.; Ehrenfeucht, A.; Haussler, D.; Warmuth, M., Classifying learnable geometric concepts with the Vapnik-Chernovenkis dimension, (Proc. 18th ACM Symp. on Theory of Computing (1986), Assoc. Comp. Mach: Assoc. Comp. Mach New York), 273-282
[10] Borsellino, A.; Gamba, A., An outline of a mathematical theory of PAPA, Nuouo Cimento Suppl. 2, 20, 221-231 (1961) · Zbl 0101.10502
[11] Chaitin, G., Information theoretic computational complexity, IEEE Tran. Inf. Theory, IT-20, 10-15 (1974) · Zbl 0282.68022
[12] Cover, T. M., Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition, IEEE Trans Electron. Comput., EC-14, 326-334 (1965) · Zbl 0152.18206
[13] Dertouzas, M., Threshold Logic: A Synthesis Approach (1965), MIT Press: MIT Press Cambridge, MA
[14] Gamba, A.; Gamberini, L.; Palmieri, G.; Sanna, R., Further experiments with PAPA, Nuouo Cimento Suppl. 2, 20, 112 (1961) · Zbl 0101.10503
[15] Gorman, P.; Sejnowski, T. J., Learned classification of sonar targets using a massively parallel network, (Workshop on Neural Network Devices and Applications. Workshop on Neural Network Devices and Applications, JPLD-4406 (1987)), 224-237 · Zbl 0709.94576
[16] Hinton, G. E.; Sejnowski, T. J., Learning and relearning in Boltzmann machines, (Rumelhart, D. E.; McClelland, J. L., Parallel Distributed Processing, Vol. 1 (1986), MIT Press: MIT Press Cambridge, MA), 282-317
[17] Hopfield, J. J., Neural networks and physical systems with emergent collective computational abilities, (Proc. Natl. Acad. Sci. U.S.A., 79 (1982)), 2554-2558 · Zbl 1369.92007
[18] Hu, S.-T, Threshold Logic (1965), Univ. of California Press: Univ. of California Press Berkeley · Zbl 0166.27101
[19] Kholmogorov, A., Three approaches for defining the concept of information quantity, Inf. Transmission, 1, 3-11 (1965) · Zbl 0271.94018
[20] Lapedes, A.; Farber, R., Nonlinear signal processing using neural networks: Prediction and system modelling (1987), Los Alamos
[21] Le Cun, Y., A learning procedure for asymmetric threshold networks, (Proc. Cognitiva, 85 (1985)), 599-604
[22] Lewis, P. M.; Coates, C. L., Threshold Logic (1967), Wiley: Wiley New York · Zbl 0153.31901
[23] McCulloch, W. A.; Pitts, W., A logical calculus of the ideas immanent in neural nets, Bull. Math. Biophys., 5, 115-137 (1943) · Zbl 0063.03860
[24] McEliece, R. J.; Posner, E. C.; Rodemich, E. R.; Venkatesh, S. S., The capacity of the Hopfield associative memory, IEEE Trans. Inf. Theory, IT33 (1987) · Zbl 0631.94016
[25] Minsky, M.; Papert, S., Perceptions, an Introduction to Computational Geometry (1969), MIT Press: MIT Press Cambridge, MA · Zbl 0197.43702
[26] Muroga, S., Logic Design and Switching Theory (1979), Wiley: Wiley New York · Zbl 0473.94018
[27] Nilsson, N. J., Learning Machines (1965), McGraw-Hill: McGraw-Hill New York · Zbl 0152.35705
[28] Papert, S., Some mathematical models of learning, (Cherry, C., Proceedings, 4th London Symp. on Information Theory (1961), Academic Press: Academic Press New York) · Zbl 0178.33703
[29] Parker, D. B., Learning Logic, (MIT Tech. Report TR-47 (1985), Center for Computational Research in Economics and Management Science)
[30] Rosenblatt, F., Principles of Neurodynamics (1962), Spartan: Spartan New York · Zbl 0143.43504
[31] Rumelhart, D. E.; Hinton, G. E.; Williams, G. E., Learning internal representations by error propagation, (Rumelhart, D. E.; McClelland, J. L., Parallel Distributed Processing, Vol. 1 (1986), MIT Press: MIT Press Cambridge, MA)
[32] Samuel, A., Some studies in machine learning using the game of checkers, IBM J. Res. Dev., 3, 211-229 (1959)
[33] Samuel, A., Some studies in machine learning using the game of checkers, Part II, IBM J. Res. Dev., 11, 601-617 (1967)
[34] Sejnowski, T. J.; Rosenberg, C. R., NET Talk: A parallel network that learns to read aloud, Complex Syst., 1, 145-168 (1987) · Zbl 0655.68107
[35] Werbos, P., Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences (1974), Harvard University dissertation
[36] Werbos, P.; Titus, J., An empirical test of new forecasting methods derived from a theory of intelligence: The prediction of conflict in Latin America, IEEE Trans. Syst. Man Cybernetics, SMC-8, No. 9, 657-666 (1978)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.