×

Quantifying inductive bias: AI learning algorithms and Valiant’s learning framework. (English) Zbl 0651.68104

We show that the notion of inductive bias in concept learning can be quantified in a way that directly relates to learning performance in the framework recently introduced by Valiant. Our measure of bias is based on the growth function introduced by Vapnik and Chervonenkis, and on the Vapnik-Chervonenkis dimension. We measure some common language biases, including restriction to conjunctive concepts, conjunctive concepts with internal disjunction, k-DNF and k-CNF concepts. We also measure certain types of bias that result from a preference for simpler hypotheses. Using these bias measurements we analyze the performance of the classical learning algorithm for conjunctive concepts from the perspective of Valiant’s learning framework.
We then augment this algorithm with a hypothesis simplification routine that uses a greedy heuristic and show how this improves learning performance on simpler target concepts. Improved learning algorithms are also developed for conjunctive concepts with internal disjunction, k-DNF and k-CNF concepts. We show that all our algorithms are within a logarithmic factor of optimal in terms of the number of examples they require to achieve a given level of leaning performance in the Valiant framework. Our results hold for arbitrary attribute-based instance spaces defined by either tree-structured or linear attributes.

MSC:

68T05 Learning and adaptive systems in artificial intelligence
Full Text: DOI

References:

[1] Angluin, A., Queries and concept learning, Machine Learning, 2, 4, 319-342 (1988) · Zbl 1470.68050
[2] Angluin, D.; Laird, P. D., Learning from noisy examples, Machine Learning, 2, 4, 343-370 (1988)
[3] Banerji, R., The logic of learning: a basis for pattern recognition and improvement of performance, Adv. Comput., 24, 177-216 (1985)
[4] Blumer, A.; Ehrenfeucht, A.; Haussler, D.; Warmuth, M., Occam’s razor, (Inf. Proc. Lett., 24 (1987)), 377-380 · Zbl 0653.68084
[5] Blumer, A., Ehrenfeucht, A., Haussler, D. and Warmuth, M., Learnability and the Vapnik-Chervonenkis dimension, J. ACM; Blumer, A., Ehrenfeucht, A., Haussler, D. and Warmuth, M., Learnability and the Vapnik-Chervonenkis dimension, J. ACM · Zbl 0697.68079
[6] Bruner, J. S.; Goodnow, J.; Austin, G. A., (A Study in Thinking (1956), Wiley: Wiley New York)
[7] Bundy, A.; Silver, B.; Plummer, D., An analytical comparison of some rule-learning programs, Artificial Intelligence, 27, 137-181 (1985) · Zbl 0583.68043
[8] Chvatal, V., A greedy heuristic for the set covering problem, Math. Oper. Res., 4, 3, 233-235 (1979) · Zbl 0443.90066
[9] Cover, T., Geometrical and statistical properties of systems of linear inequalities with applications to pattern recognition, IEEE Trans. Elect. Comput., 14, 326-334 (1965) · Zbl 0152.18206
[10] Dietterich, T. G.; Michalski, R. S., A comparative review of selected methods for learning from examples, (Michalski, R. S.; Carbonell, J. G.; Mitchell, T. M., Machine Learning: An Artificial Intelligence Approach (1983), Tioga: Tioga Palo Alto, CA), 41-81
[11] Ehrenfeucht, A.; Haussler, D., Learning decision trees from random examples, (Tech. Rept. UCSC-CRL-87-15 (1987), University of California: University of California Santa Cruz. CA) · Zbl 0679.68157
[12] Ehrenfeucht, A., Haussler, D., Kearns, M. and Valiant, L.G., A general lower bound on the number of examples needed for learning, Inf. Comput.; Ehrenfeucht, A., Haussler, D., Kearns, M. and Valiant, L.G., A general lower bound on the number of examples needed for learning, Inf. Comput. · Zbl 0679.68158
[13] Garey, M.; Johnson, D., (Computers and Intractability: A Guide to the Theory of NP-Completeness (1979), Freeman: Freeman San Francisco, CA) · Zbl 0411.68039
[14] Haussler, D., Quantifying the inductive bias in concept learning, (Tech. Rept. UCSC-CRL-86-25 (1986), University of California: University of California Santa Cruz, CA)
[15] Haussler, D., Learning conjunctive concepts in structural domains, Machine Learning; Haussler, D., Learning conjunctive concepts in structural domains, Machine Learning
[16] (Michalski, R. S.; Kodratoff, Machine Learning, III (1987)), Los Altos, CA
[17] Haussler, D.; Welzl, E., Epsilon nets and simplex range queries, Discrete Comput. Geometry, 2, 127-151 (1987) · Zbl 0619.68056
[18] Johnson, D. S., Approximation algorithms for combinatorial problems, J. Comput. Syst. Sci., 9, 256-278 (1974) · Zbl 0296.65036
[19] Johnson, D. S.; Niemi, K. A., On knapsacks, partitions and a new dynamic programming technique for trees, Math. Oper. Res., 8, 1, 1-14 (1983) · Zbl 0506.90035
[20] Kearns, M.; Li, M., Learning in the presence of malicious errors, (Proceedings 20th ACM Symposium on Theory of Computing. Proceedings 20th ACM Symposium on Theory of Computing, Chicago, IL (1988)), 267-280
[21] Kearns, M.; Li, M.; Pitt, L.; Valiant, L., Recent results on Boolean concept learning, (Proceedings 4th International Workshop on Machine Learning. Proceedings 4th International Workshop on Machine Learning, Irvine, CA (1987)), 337-352
[22] Laird, P. D., Inductive inference by refinement, (Tech. Rept. YALEU/DCS/TR-376 (1986), Yale University: Yale University New Haven, CT)
[23] Littlestone, N., Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm, Machine Learning, 2, 4, 245-318 (1988)
[24] Michalski, R. S., A theory and methodology of inductive learning, (Michalski, R. S.; Carbonell, J. G.; Mitchell, T. M., Machine Learning: An Artificial Intelligence Approach (1983), Tioga: Tioga Palo Alto, CA), 83-134
[25] Milosavljevic, A., Learning in the presence of background knowledge, (Tech. Rept. UCSC-CRL-87-27 (1987), University of California: University of California Santa Cruz, CA)
[26] Mitchell, T. M., The need for biases in learning generalizations, (Tech. Rept. CBM-TR-117 (1980), Department of Computer Science, Rutgers University: Department of Computer Science, Rutgers University New Brunswick, NJ)
[27] Mitchell, T. M., Generalization as search, Artificial Intelligence, 18, 203-226 (1982)
[28] Natarajan, B. K.; Tadepalli, P., Two new frameworks for learning, (Proceedings 5th International Workshop on Machine Learning. Proceedings 5th International Workshop on Machine Learning, Ann Arbor, MI (1988)) · Zbl 0900.68365
[29] Nigmatullin, R. G., The fastest descent method for covering problems, (Proceedings Symposium on Questions of Precision and Efficiency of Computer Algorithms, 5 (1969)), 116-126, Kiev, U.S.S.R.
[30] Pearl, J., On the connection between the complexity and credibility of inferred models, Int. J. General Syst., 4, 255-264 (1978) · Zbl 0386.93005
[31] Quinlan, J. R., Induction of decision trees, Machine Learning, 1, 1, 81-106 (1986)
[32] Rendell, L., A general framework for induction and a study of selective induction, Machine Learning, 1, 2, 177-226 (1986)
[33] Rissanen, J., Modeling by shortest data description, Automatica, 14, 465-471 (1978) · Zbl 0418.93079
[34] Rivest, R., Learning decision-lists, Machine Learning, 2, 3, 229-246 (1987)
[35] Schlimmer, J. C., Incremental adjustment of representations for learning, (Proceedings 4th International Workshop on Machine Learning. Proceedings 4th International Workshop on Machine Learning, Irvine, CA (1987)), 79-90
[36] Sammut, C.; Banerji, R., Learning concepts by asking questions, (Michalski, R.; Carbonell, J.; Mitchell, T., Machine Learning, II (1986), Morgan Kaufmann: Morgan Kaufmann Los Altos, CA)
[37] Subramanian, D.; Feigenbaum, J., Factorization in experiment generation, (Proceedings AAAI-86. Proceedings AAAI-86, Philadelphia, PA (1986)), 518-522
[38] Utgoff, P., Shift of bias for inductive concept learning, (Michalski, R.; Carbonell, J.; Mitchell, T., Machine Learning, II (1986), Morgan Kaufmann: Morgan Kaufmann Los Altos, CA)
[39] Valiant, L. G., A theory of the learnable, Commun. ACM, 27, 11, 1134-1142 (1984) · Zbl 0587.68077
[40] Valiant, L. G., Learning disjunctions of conjunctions, (Proceedings IJCAI-85. Proceedings IJCAI-85, Los Angeles, CA (1985)), 560-566
[41] Pitt, L.; Valiant, L. G., Computational limitations on learning from examples, (Tech. Rept. TR-05-86 (1986), Aiken Computing Lab., Harvard University: Aiken Computing Lab., Harvard University Cambridge, MA), also: J. ACM, to appear · Zbl 0662.68086
[42] Vapnik, V. N., (Estimation of Dependences Based on Empirical Data (1982), Springer: Springer New York) · Zbl 0499.62005
[43] Vapnik, V. N.; Chervonenkis, A. Ya., On the uniform convergence of relative frequencies of events to their probabilities, Theor. Probab. Appl., 16, 2, 264-280 (1971) · Zbl 0247.60005
[44] Wenocur, R. S.; Dudley, R. M., Some special Vapnik-Chervonenkis classes, Discrete Math., 33, 313-318 (1981) · Zbl 0459.60008
[45] Winston, P., Learning structural descriptions from examples, (Winston, P. H., The Psychology of Computer Vision (1975), McGraw-Hill: McGraw-Hill New York) · Zbl 1007.68915
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.