×

Learning juntas in the presence of noise. (English) Zbl 1124.68051

Summary: We investigate the combination of two major challenges in computational learning: dealing with huge amounts of irrelevant information and learning from noisy data. It is shown that large classes of Boolean concepts that depend only on a small fraction of their variables – so-called juntas – can be learned efficiently from uniformly distributed examples that are corrupted by random attribute and classification noise. We present solutions to cope with the manifold problems that inhibit a straightforward generalization of the noise-free case. Additionally, we extend our methods to non-uniformly distributed examples and derive new results for monotone juntas and for parity juntas in this setting. It is assumed that the attribute noise is generated by a product distribution. Without any restrictions of the attribute noise distribution, learning in the presence of noise is in general impossible. This follows from our construction of a noise distribution \(P\) and a concept class \(\mathcal C\) such that it is impossible to learn \(\mathcal C\) under \(P\)-noise.

MSC:

68Q32 Computational learning theory
Full Text: DOI

References:

[1] Akutsu, Tatsuya; Miyano, Satoru; Kuhara, Satoru, Algorithms for identifying Boolean networks and related biological networks based on matrix multiplication and fingerprint function, J. Comput. Biol., 7, 3-4, 331-343 (2000)
[2] Alon, Noga; Spencer, Joel, (The Probabilistic Method. The Probabilistic Method, Wiley-Intersci. Ser. Discrete Math. Optim. (1992), John Wiley and Sons) · Zbl 0767.05001
[3] Angluin, Dana; Laird, Philip D., Learning from noisy examples, Machine Learning, 2, 4, 343-370 (1988)
[4] Bahadur, Raghu Raj, A representation of the joint distribution of responses to \(n\) dichotomous items, (Solomon, Herbert, Studies in Item Analysis and Prediction (1961), Stanford University Press: Stanford University Press Stanford, CA), 158-168 · Zbl 0103.36701
[5] Beckner, William, Inequalities in Fourier analysis, Ann. of Math. (2), 102, 1, 159-182 (1975) · Zbl 0338.42017
[6] Benjamini, Itai; Kalai, Gil; Schramm, Oded, Noise sensitivity of Boolean functions and applications to percolation, Inst. Hautes Études Sci. Publ. Math., 90, 5-43 (1999) · Zbl 0986.60002
[7] Anna Bernasconi, Mathematical techniques for the analysis of Boolean functions, Ph.D. Thesis, Università degli Studi di Pisa, Dipartimento di Ricerca in Informatica, March 1998; Anna Bernasconi, Mathematical techniques for the analysis of Boolean functions, Ph.D. Thesis, Università degli Studi di Pisa, Dipartimento di Ricerca in Informatica, March 1998 · Zbl 0914.94029
[8] Blum, Avrim; Langley, Pat, Selection of relevant features and examples in machine learning, Artificial Intelligence, 97, 1-2, 245-271 (1997) · Zbl 0904.68142
[9] Blumer, Anselm; Ehrenfeucht, Andrzej; Haussler, David; Warmuth, Manfred K., Occam’s razor, Inform. Process. Lett., 24, 6, 377-380 (1987) · Zbl 0653.68084
[10] Bonami, Aline, Étude des coefficients de Fourier des fonctions de \(l^p(g)\), Ann. Inst. Fourier, 20, 2, 335-402 (1970) · Zbl 0195.42501
[11] Bshouty, Nader H.; Jackson, Jeffrey C.; Tamon, Christino, Uniform-distribution attribute noise learnability, Inform. Comput., 187, 2, 277-290 (2003) · Zbl 1076.68035
[12] Decatur, Scott E.; Gennaro, Rosario, On learning from noisy and incomplete examples, (Proceedings of the Eigth Annual Conference on Computational Learning Theory. Proceedings of the Eigth Annual Conference on Computational Learning Theory, COLT 1995, Santa Cruz, California, USA (1995), ACM Press), 353-360
[13] Furst, Merrick L.; Jackson, Jeffrey C.; Smith, Sean W., Improved learning of \(AC^0\) functions, (Valiant, Leslie G.; Warmuth, Manfred K., Proceedings of the Fourth Annual Workshop on Computational Learning Theory. Proceedings of the Fourth Annual Workshop on Computational Learning Theory, COLT 1991, Santa Cruz, California, USA (1991), Morgan Kaufmann), 317-325
[14] Goldman, Sally A.; Sloan, Robert H., Can PAC learning algorithms tolerate random attribute noise?, Algorithmica, 14, 1, 70-84 (1995) · Zbl 0837.68094
[15] Hoeffding, Wassily, Probability inequalities for sums of bounded random variables, J. Amer. Statist. Assoc., 58, 13-30 (1963) · Zbl 0127.10602
[16] Kahn, Jeff; Kalai, Gil; Linial, Nathan, The influence of variables on Boolean functions (extended abstract), (29th Annual Symposium on Foundations of Computer Science. 29th Annual Symposium on Foundations of Computer Science, FOCS ’88 (1988), IEEE Computer Society: IEEE Computer Society White Plains, NY), 68-80
[17] Mihail N. Kolountzakis, Evangelios Markakis, Aranyak Mehta, Learning symmetric juntas in time \(n^{o ( k )}\) arXiv:math.CO/0504246 v1http://arxiv.org/abs/math.CO/0504246v1; Mihail N. Kolountzakis, Evangelios Markakis, Aranyak Mehta, Learning symmetric juntas in time \(n^{o ( k )}\) arXiv:math.CO/0504246 v1http://arxiv.org/abs/math.CO/0504246v1
[18] Linial, Nathan; Mansour, Yishay; Nisan, Noam, Constant depth circuits, Fourier transform, and learnability, J. ACM, 40, 3, 607-620 (1993) · Zbl 0781.94006
[19] Littlestone, Nick, Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm, Machine Learning, 2, 4, 285-318 (1987)
[20] Miyata, Akinobu; Tarui, Jun; Tomita, Etsuji, Learning Boolean functions in \(AC^0\) on attribute and classification noise, (Ben-David, Shai; Case, John; Maruoka, Akira, Algorithmic Learning Theory, 15th International Conference, ALT 2004, Padova, Italy, October 2-5, 2004, Proceedings. Algorithmic Learning Theory, 15th International Conference, ALT 2004, Padova, Italy, October 2-5, 2004, Proceedings, Lecture Notes in Artificial Intelligence, vol. 3244 (2004), Springer), 142-155 · Zbl 1110.68403
[21] Mossel, Elchanan; O’Donnell, Ryan W., On the noise sensitivity of monotone functions, Random Structures Algorithms, 23, 3, 333-350 (2003) · Zbl 1047.68106
[22] Mossel, Elchanan; O’Donnell, Ryan W.; Servedio, Rocco A., Learning functions of \(k\) relevant variables, J. Comput. System Sci., 69, 3, 421-434 (2004) · Zbl 1084.68057
[23] Ryan W. O’Donnell, Computational applications of noise sensitivity, Ph.D. Thesis, Department of Mathematics, Massachusetts Institute of Technology, June 2003; Ryan W. O’Donnell, Computational applications of noise sensitivity, Ph.D. Thesis, Department of Mathematics, Massachusetts Institute of Technology, June 2003
[24] Servedio, Rocco A., On learning monotone DNF under product distributions, Inform. Comput., 193, 1, 57-74 (2004) · Zbl 1076.68037
[25] Shackelford, George; Volper, Dennis, Learning \(k\)-DNF with noise in the attributes, (Proceedings of the 1988 Workshop on Computational Learning Theory. Proceedings of the 1988 Workshop on Computational Learning Theory, August 3-5, 1988, MIT (1988), Morgan Kaufmann), 97-103
[26] Valiant, Leslie G., A theory of the learnable, Commun. ACM, 27, 11, 1134-1142 (1984) · Zbl 0587.68077
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.