×

Decision theoretic generalizations of the PAC model for neural net and other learning applications. (English) Zbl 0762.68050

The paper introduces an extension of the probably approximately correct model of learing from examples. The class of functions on an instance space that usually are defined only on the set \(\{0,1\}\) has been changed so that values are members of arbitrary sets. The new extended model is based on statistical decision theory. The results of the paper are based on a proposed generalized notion of dimension of Vapnik-Chervonenkis that is applicable to real-valued functions. Due to the extended model, it is possible to formulate distribution-independent upper bounds on the size of the instance space for learning in feed-forward neural networks.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
Full Text: DOI

References:

[1] Abe, N.; Warmuth, M., On the computational complexity of approximating distributions by probabilistic automata, (Proceedings of the 3rd Workshop on Computational Learning Theory (1990), Morgan Kaufmann: Morgan Kaufmann San Mateo), 52-66
[2] Alexander, K., Rates of growth for weighted empirical processes, (Proceedings of Berkeley Conference in Honor of Jerzy Neyman and Jack Kiefer, Vol. 2 (1985)), 475-493 · Zbl 1372.60051
[3] Alexander, K., Rates of growth and sample moduli for weighted empirical processes indexed by sets, Probab. Theory Related Fields, 75, 379-423 (1987) · Zbl 0596.60029
[4] Angluin, D., Queries and concept learning, Mach. Learning, 2, 319-342 (1988) · Zbl 1470.68050
[5] Angluin, D.; Laird, P., Learning from noisy examples, Mach. Learning, 2, 4, 343-370 (1988)
[6] Angluin, D.; Valiant, L. G., Fast probabilistic algorithms for Hamiltonian circuits and matchings, J. Comput. System Sci., 18, 155-193 (1979) · Zbl 0437.05040
[7] Anthony, M.; Shawe-Taylor, J., A result of Vapnik with Applications, (Technical Report CSD-TR-628 (1990), University of London: University of London Surrey) · Zbl 0822.68088
[8] Assouad, P., Densité et dimension, Ann. Inst. Fourier, 33, 3, 233-282 (1983) · Zbl 0504.60006
[9] Barron, A., Statistical properties of artificial neural networks, (28th Conference on Decision and Control (1989)), 280-285
[10] Barron, A.; Cover, T., Minimum complexity density estimation, IEEE Trans. Inform. Theory (1990), to appear
[11] Barto, A. G.; Anandan, P., Pattern recognizing stochastic learning automata, IEEE Trans. Systems Man Cybernet., 15, 360-374 (1985) · Zbl 0561.68042
[12] Baum, E.; Haussler, D., What size net gives valid generalization?, Neural Comput., 1, 1, 151-160 (1989)
[13] Benedek, G. M.; Itai, A., Learnability by fixed distributions, (Proceedings, 1988 Workshop on Comp. Learning Theory (1988), Morgan Kaufmann: Morgan Kaufmann San Mateo, CA), 80-90
[14] Berger, J., (Statistical Decision Theory and Bayesian Analysis (1985), Springer-Verlag: Springer-Verlag New York) · Zbl 0572.62008
[15] Billingsley, P., (Probability and Measure (1986), Wiley: Wiley New York) · Zbl 0649.60001
[16] Blumer, A.; Ehrenfeucht, A.; Haussler, D.; Warmuth, M. K., Learnability and the Vapnik-Chervonenkis dimension, J. Assoc. Comput. Mach., 36, 4, 929-965 (1989) · Zbl 0697.68079
[17] Breiman, L.; Friedman, J. H.; Olshen, R. A.; Stone, C. J., Classification and Regression Trees (1984), Wadsworth · Zbl 0541.62042
[18] Buntine, W. L., A Theory of Learning Classification Rules, (Ph.D. Thesis (1990), University of Technology: University of Technology Sydney)
[19] Buntine, W.; Weigend, A., Bayesian back propagation (1991), unpublished manuscript · Zbl 0761.62031
[20] Clarke, B., and Barron, A.; Clarke, B., and Barron, A.
[21] Clarke, B.; Barron, A., Information-theoretic asymptotics of Bayes methods, IEEE Trans. Inform. Theory, 36, 3, 453-471 (1990) · Zbl 0709.62008
[22] Cover, T. M., Capacity problems for linear machines, (Kanal, L., Pattern Recognition (1968), Thompson), 283-289
[23] Denker, J.; Schwartz, D.; Wittner, B.; Solla, S.; Howard, R.; Jackel, L.; Hopfield, J., Automatic learning, rule extraction and generalization, Complex Systems, 1, 877-922 (1987) · Zbl 0662.68085
[24] Devroye, L., Automatic pattern recognition: A study of the probability of error, IEEE Trans. Pattern Anal. Mach. Intelligence, 10, 4, 530-543 (1988) · Zbl 0661.62056
[25] Duda, R. O.; Hart, P. E., (Pattern Classification and Scene Analysis (1973), Wiley: Wiley New York) · Zbl 0277.68056
[26] Dudley, R. M., Central limit theorems for empirical measures, Ann. Probab., 6, 6, 899-929 (1978) · Zbl 0404.60016
[27] Dudley, R. M., A course on empirical processes, (Lecture Notes in Mathematics, Vol. 1097 (1984), Springer-Verlag: Springer-Verlag Berlin/New York), 2-142 · Zbl 0554.60029
[28] Dudley, R. M., Universal Donsker classes and metric entropy, Ann. Probab., 15, 4, 1306-1326 (1987) · Zbl 0631.60004
[29] Durbin, R.; Rumelhart, D. E., Product units: A computationally powerful and biologically plausible extension to backpropagation networks, Neural Comput., 1, 1, 133-142 (1989)
[30] Edelsbrunner, H., (Algorithms in Combinatorial Geometry (1987), Springer-Verlag: Springer-Verlag Berlin/New York) · Zbl 0634.52001
[31] Ehrenfeucht, A.; Haussler, D.; Kearns, M.; Valiant, L., A general lower bound on the number of examples needed for learning, Inform. Comput., 82, 247-261 (1989) · Zbl 0679.68158
[32] Farmer, J. D., Information dimension and the probabilistic structure of chaos, Z. Naturforsch. A, 37, 1304-1325 (1982)
[33] Farmer, J. D.; Ott, E.; Yorke, J. A., The dimension of chaotic attractors, Physica D, 7, 153-180 (1983) · Zbl 0561.58032
[34] Ferguson, T., (Mathematical Statistics: A Decision Theoretic Approach (1967), Academic Press: Academic Press New York) · Zbl 0153.47602
[35] Gullapalli, V., A stochastic reinforcement algorithm for learning real-valued functions, Neural Networks, 3, 6, 671-692 (1990)
[36] Gyorgy, G.; Tishby, N., Statistical theory of learning a rule, (Thuemann, K.; Koeberle, R., Neural Networks and Spin Glasses (1990), World Scientific)
[37] Haussler, D., Quantifying inductive bias: AI learning algorithms and Valiant’s learning framework, Artifical Intelligence, 36, 177-221 (1988) · Zbl 0651.68104
[38] Haussler, D., Learning conjunctive concepts in structural domains, Mach. Learning, 4, 7-40 (1989)
[39] Haussler, D., Decision theoretic generalizations of the PAC learning model, (Proceedings, First Workshop on Algorithmic Learning Theory. Proceedings, First Workshop on Algorithmic Learning Theory, Tokyo, Japan (1990)), 21-41
[40] Haussler, D.; Kearns, M.; Littlestone, N.; Warmuth, M. K., Equivalence of models for polynomial learnability, Inform. Comput., 95, 129-161 (1991) · Zbl 0743.68115
[41] Haussler, D.; Kearns, M.; Schapire, R., Bounds on the sample complexity of Bayesian learning using information theory and the VC dimension, (Proceedings of the Fourth Workshop on Computational Learning Theory (1991)), 61-74
[42] Haussler, D.; Littlestone, N.; Warmuth, M., (Predicting {0, 1}-Functions on Randomly Drawn Points (1990), Computer Research Laboratory, University of California: Computer Research Laboratory, University of California Santa Cruz), Technical Report UCSC-CRL-90-54 · Zbl 0938.68785
[43] Inform Comput.; Inform Comput.
[44] Haussler, D.; Welzl, E., Epsilon nets and simplex range queries, Discrete Comput. Geom., 2, 127-151 (1987) · Zbl 0619.68056
[45] Kearns, M.; Li, M.; Pitt, L.; Valiant, L., On the learnability of Boolean formulae, (Proceedings of the Nineteenth Annual ACM Symposium on Theory of Computing (1987)), 285-295, New York
[46] Kearns, M.; Schapire, R., Efficient distribution-free learning of probabilistic concepts, (31st Annual IEEE Symposium on Foundations of Computer Science (1990)), 382-391
[47] Kiefer, J., (Introduction to Statistical Inference (1987), Springer-Verlag: Springer-Verlag Berlin/New York) · Zbl 0626.62001
[48] Kolmogorov, A. N.; Tihomirov, V. M., ε-entropy and ε-capacity of sets in functional spaces, Amer. Math. Soc. Transl. Ser. 2, 17, 277-364 (1961)
[49] Kulkarni, S., (On Metric Entropy, Vapnik-Chervonenkis Dimension, and Learnability for a Class of Distributions (1989), Center for Intelligent Control Systems, MIT), Technical Report LIDS-P-1910
[50] Kullback, S., (Information Theory and Statistics (1959), Wiley: Wiley New York) · Zbl 0149.37901
[51] LeCun, Y.; Denker, J.; Solla, S., Optimal brain damage, (Touretsky, D., Advances in Neural Information Processing Systems, Vol. 2 (1990), Morgan Kaufmann: Morgan Kaufmann San Mateo, CA), 589
[52] Lindley, D. V., The present position in Bayesian Statistics, Statist. Sci., 5, 1, 44-89 (1990) · Zbl 0955.62515
[53] Lineal, N.; Mansour, Y.; Rivest, R., Results on learnability and the Vapnik-Chervonenkis dimension, (Proceedings of the 1988 Workshop on Computational Learning Theory (1988), Morgan Kaufmann: Morgan Kaufmann San Mateo, CA), 56-68
[54] Littlestone, N., Learning quickly when irrelevant attributes abound: A new linearthreshold algorithm, Mach. Learning, 2, 285-318 (1988)
[55] MacKay, D., Bayesian Methods for Adaptive Models, (Ph.D. Thesis (1992), California Institute of Technology)
[56] Mandelbrot, B. B., (The Fractal Geometry of Nature (1982), Freeman: Freeman San Francisco) · Zbl 0504.28001
[57] McCullagh, P.; Nelder, J. A., (Generalized Linear Models (1989), Chapman & Hall: Chapman & Hall London) · Zbl 0744.62098
[58] Moody, J.; Darken, C., Fast Learning in networks of locally-tuned processing units, Neural Comput., 1, 2, 281-294 (1989)
[59] Narendra, K. S.; Thathachar, M. A.L., (Learning Automata—An Introduction (1989), Frentice-Hall: Frentice-Hall Englewood Cliffs, NJ)
[60] Natarajan, B. K., Learning over classes of distributions, (Proceedings of the 1988 Workshop on Computational Learning Theory (1988), Morgan Kaufmann: Morgan Kaufmann San Mateo, CA), 408-409 · Zbl 0761.68080
[61] Natarajan, B. K., (Probably-Approximate Learning over Classes of Distributions (1989), Hewlett Packard Labs: Hewlett Packard Labs Palo Alto, CA), Technical Report HPL-SAL-89-29 · Zbl 0761.68080
[62] Natarajan, B. K., Some Results on Learning (1989), Technical Report CMU-RI-TR-89-6, Carnegie Mellon · Zbl 0747.68056
[63] Natarajan, B. K.; Tadepalli, P., Two new frameworks for learning, (Proceedings of the 5th International Conference on Machine Learning (1988), Morgan Kaufmann: Morgan Kaufmann San Mateo, CA), 402-415
[64] Nobel, A.; Dembo, A., (On Univorm Convergence for Dependent Processes (1990), Dept. of Statistics, Stanford University: Dept. of Statistics, Stanford University Stanford, CA), Technical Report 74
[65] Nolan, D.; Pollard, D., U-processes: Rates of convergence, Ann. Statist., 15, 2, 780-799 (1987) · Zbl 0624.60048
[66] Nowlan, S., Maximum Likelihood Competitive Learning, (Touretsky, D., Advances in Neural Information Processing Systems, Vol. 2 (1990), Morgan Kaufmann: Morgan Kaufmann San Mateo, CA), 574-582
[67] Nowlan, S.; Hinton, G., (Soft Weight-Sharing (1991), Department of Computer Science, University of Toronto), Technical Report
[68] Opper, M.; Haussler, D., Calculation of the learning curve of Bayes optimal classification algorithm for learning a perceptron with noise, (Computational Learning Theory: Proceedings of the Fourth Annual Workshop (1991), Morgan Kaufmann: Morgan Kaufmann San Mateo, CA), 75-87
[69] Opper, M.; Haussler, D., Generalization performance of Bayes optimal classification algorithm for learning a perceptron, Phys. Rev. Lett., 66, 20, 2677-2680 (1991) · Zbl 0968.92502
[70] Poggio, T.; Girosi, F., (A Theory of Networks for Approximation and Learning (1989), Massachusetts Institute of Technology: Massachusetts Institute of Technology Cambridge, MA), Technical Report A.I. Memo No. 1140 · Zbl 1226.92005
[71] Pollard, D., (Convergence of Stochastic Processes (1984), Springer-Verlag: Springer-Verlag Berlin/New York) · Zbl 0544.60045
[72] Pollard, D., Rates of uniform almost-sure convergence for empirical processes indexed by unbounded classes of functions (1986), manuscript
[73] Pollard, D., Empirical Processes: Theory and Applications, (NSF-CBMS Regional Conference Series in Probability and Statistics, Vol. 2 (1990), Inst. of Math. Stat. and Am. Stat. Assoc) · Zbl 0741.60001
[74] Quiroz, A. J., Metric entropy and learnability (1989), Universidad Simón Bolívar: Universidad Simón Bolívar Caracas, Venezuela, unpublished mannuscript
[75] Renyi, A., (Probability Theory (1970), North-Holland: North-Holland Amsterdam) · Zbl 0206.18002
[76] Rissanen, J., Stochastic complexity and modeling, Ann. Statist., 14, 3, 1080-1100 (1986) · Zbl 0602.62008
[77] Rumelhart, D. E.; McClelland, J. L., (Parallel Distributed Processing: Explorations in the Microstructure of Cognition (1986), MIT Press: MIT Press Cambridge, MA), Volume 1: Foundations
[78] Sauer, N., On the density of families of sets, J. Combin. Theory Ser. A, 13, 145-147 (1972) · Zbl 0248.05005
[79] Shackelford, G.; Volper, D., Learning \(k\)-DNF with noise in the attributes, (Proceedings of the 1988 Workshop on Computational Learning Theory (1988), Morgan Kaufmann: Morgan Kaufmann San Mateo, CA), 97-103
[80] Simmons, G. F., (Introduction to Topology and Modern Analysis (1963), McGraw-Hill: McGraw-Hill New York) · Zbl 0105.30603
[81] Sloan, R., Types of noise in data for concept learning, (Proceedings, 1988 Workshop on Comp. Learning Theory (1988), Morgan Kaufmann: Morgan Kaufmann San Mateo, CA), 91-96
[82] Sompolinsky, H.; Tishby, N.; Seung, H. S., Learning from examples in large neural networks, Phys. Rev. Lett., 65, 1683-1686 (1990)
[83] Talagrand, M., Sharper bounds for empirical processes (1991), unpublished manuscript · Zbl 0798.60051
[84] Tishby, N.; Levin, E.; Solla, S., Consistent inference of probabilities in layered networks: Predictions and generalizations, (IJCNN International Joint Conference on Neural Networks, Vol. II (1989), IEEE: IEEE New York), 403-409
[85] Touretsky, D., (Advances in Neural Information Processing Systems, Vol. 1 (1989), Morgan Kaufmann: Morgan Kaufmann San Mateo, CA)
[86] Touretsky, D., (Advances in Neural Information Processing Systems, Vol. 2 (1990), Morgan Kaufmann: Morgan Kaufmann San Mateo, CA)
[87] Valiant, L. G., A theory of the learnable, Comm. ACM, 27, 11, 1134-1142 (1984) · Zbl 0587.68077
[88] Vapnik, V. N., (Estimation of Dependences Based on Empirical Data (1982), Springer-Verlag: Springer-Verlag New York) · Zbl 0499.62005
[89] Vapnik, V. N., Inductive principles of the search for empirical dependences (methods based on weak convergence of probability measures, (Proceedings of the 2nd Workshop on Computational Learning Theory (1989), Morgan Kaufmann: Morgan Kaufmann San Mateo, CA) · Zbl 0746.68075
[90] Vapnik V. N., and Chervonenkis A. Ya.Theory Probab. Appl.16; Vapnik V. N., and Chervonenkis A. Ya.Theory Probab. Appl.16 · Zbl 0247.60005
[91] Weigend, A.; Huberman, B.; Rumelhart, D., Predicting the future: A connectionist approach, Internat. J. Neural Systems, 1, 193-209 (1990)
[92] Weiss, S.; Kulikowski, C., (Computer Systems That Learn (1991), Morgan Kaufmann: Morgan Kaufmann San Mateo, CA)
[93] Welzl, E., Partition trees for triangle counting and other range search problems, (4th ACM Symposium on Computational Geometry (1988)), 23-33, Urbana, IL
[94] Wenocur, R. S.; Dudley, R. M., Some special Vapnik-Chervonenkis classes, Discrete Math., 33, 313-318 (1981) · Zbl 0459.60008
[95] White, H., Connectionist nonparametric regression: Multilayer feedforward networks can learn arbitrary mappings, Neural Networks, 3, 535-549 (1990)
[96] White, H., Learning in artificial neural networks: A statistical perspective, Neural Comput., 1, 4, 425-464 (1990)
[97] Yamanishi, K., A learning criterion for stochastic rules, (Proceedings of the 3rd Workshop on Computational Learning Theory (1990), Morgan Kaufmann), 67-81
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.