×

Score-based methods for learning Markov boundaries by searching in constrained spaces. (English) Zbl 1260.68318

Summary: Within probabilistic classification problems, learning the Markov boundary of the class variable consists in the optimal approach for feature subset selection. In this paper we propose two algorithms that learn the Markov boundary of a selected variable. These algorithms are based on the score+search paradigm for learning Bayesian networks. Both algorithms use standard scoring functions but they perform the search in constrained spaces of class-focused directed acyclic graphs, going through the space by means of operators adapted for the problem. The algorithms have been validated experimentally by using a wide spectrum of databases, and their results show a performance competitive with the state-of-the-art.

MSC:

68T05 Learning and adaptive systems in artificial intelligence

Software:

WEKA; Hailfinder; TETRAD
Full Text: DOI

References:

[1] Abramson B, Brown J, Murphy A, Winkler RL (1996) Hailfinder: a Bayesian system for forecasting severe weather. Int J Forecast 12: 57–71 · doi:10.1016/0169-2070(95)00664-8
[2] Acid S, de Campos LM (2003) Searching for Bayesian network structures in the space of restricted acyclic partially directed graphs. J Artif Intell Res 18: 445–490 · Zbl 1056.68142
[3] Acid S, de Campos LM, Castellano JG (2005) Learning Bayesian network classifiers: searching in a space of partially directed acyclic graphs. Mach Learn 59(3): 213–235 · Zbl 1101.68710
[4] Akaike H (1974) A new look at the statistical model identification. IEEE Trans Autom Control 19: 716–723 · Zbl 0314.62039 · doi:10.1109/TAC.1974.1100705
[5] Aliferis CF, Tsamardinos I, Statnikov A (2003) HITON, a novel Markov blanket algorithm for optimal variable selection. In: Proceedings of the 2003 American Medical Informatics Association annual symposium, pp 21–25
[6] Alizadeh AA et al (2000) Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature 403: 503–511 · doi:10.1038/35000501
[7] Alon U et al (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci USA 96: 6745–6750 · doi:10.1073/pnas.96.12.6745
[8] Armstrong SA et al (2002) MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nat Genet 30: 41–47 · doi:10.1038/ng765
[9] Beinlich IA, Suermondt HJ, Chavez RM, Cooper GF (1989) The alarm monitoring system: a case study with two probabilistic inference techniques for belief networks. In: Proceedings of the European conference on artificial intelligence in medicine, pp 247–256
[10] Binder J, Koller D, Russell S, Kanazawa K (1997) Adaptive probabilistic networks with hidden variables. Mach Learn 29: 213–244 · Zbl 0892.68079 · doi:10.1023/A:1007421730016
[11] Blum A, Langley P (1997) Selection of relevant features and examples in machine learning. Artif Intell 97(1-2): 245–271 · Zbl 0904.68142 · doi:10.1016/S0004-3702(97)00063-5
[12] Buntine W (1991) Theory refinement of Bayesian networks. In: Proceedings of the seventh conference on uncertainty in artificial intelligence, pp 52–60
[13] Chickering DM (2002) Learning equivalence classes of Bayesian network structures. J Mach Learn Res 2: 445–498 · Zbl 1007.68179
[14] Chickering DM, Geiger D, Heckerman D (1995) Learning Bayesian networks: search methods and experimental results. In: Preliminary papers of the fifth international workshop on artificial intelligence and statistics, pp 112–128
[15] Cooper GF, Herskovits E (1992) A Bayesian method for the induction of probabilistic networks from data. Mach Learn 9: 309–348 · Zbl 0766.68109
[16] Cowell R (2001) Conditions under which conditional independence and scoring methods lead to identical selection of Bayesian network models. In: Proceedings of the seventeenth annual conference on uncertainty in artificial intelligence, pp 91–97
[17] de Campos LM (2006) A scoring function for learning Bayesian networks based on mutual information and conditional independence tests. J Mach Learn Res 7: 2149–2187 · Zbl 1222.62036
[18] de Campos LM, Castellano JG (2007) Bayesian network learning algorithms using structural restrictions. Int J Approx Reason 45(2): 233–254 · Zbl 1122.68104 · doi:10.1016/j.ijar.2006.06.009
[19] de Campos LM, Huete JF (2000) A new approach for learning belief networks using independence criteria. Int J Approx Reason 24(1): 11–37 · Zbl 0995.68109 · doi:10.1016/S0888-613X(99)00042-0
[20] Ding C, Peng H (2005) Minimum redundancy feature selection from microarray gene expression data. J Bioinform Comput Biol 3: 185–205 · doi:10.1142/S0219720005001004
[21] Friedman N, Geiger D, Goldszmidt M (1997) Bayesian network classifiers. Mach Learn 29: 131–163 · Zbl 0892.68077 · doi:10.1023/A:1007465528199
[22] Fu S, Desmarais MC (2007) Local learning algorithm for Markov blanket discovery. Lect Notes Comput Sci 4830: 68–79 · doi:10.1007/978-3-540-76928-6_9
[23] Fu S, Desmarais MC (2008a) Tradeoff analysis of different Markov blanket local learning approaches. Lect Notes Artif Intell 5012: 562–571
[24] Fu S, Desmarais MC (2008b) Fast Markov blanket discovery algorithm via local learning within single pass. Lect Notes Comput Sci 5032: 96–107 · doi:10.1007/978-3-540-68825-9_10
[25] Gámez JA, Mateo JL, Puerta JM (2007) A Fast hill-climbing algorithm for Bayesian networks structure learning. Lect Notes Comput Sci 4724: 585–597 · Zbl 1148.68514 · doi:10.1007/978-3-540-75256-1_52
[26] Golub TR et al (1999) Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science 286: 531–537 · doi:10.1126/science.286.5439.531
[27] Gordon GJ et al (2002) Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res 62: 4963–4967
[28] Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3: 1157–1182 · Zbl 1102.68556
[29] Guyon I, Weston J, Barnhill S, Vapnik V (2002) Gene selection for cancer classification using support vector machines. Mach Learn 46(1–3): 389–422 · Zbl 0998.68111 · doi:10.1023/A:1012487302797
[30] Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explor 11(1): 10–18 · doi:10.1145/1656274.1656278
[31] Heckerman D, Meek C (1997a) Embedded Bayesian network classifiers. Technical Report MSR-TR-97-06, Microsoft Research, Redmond
[32] Heckerman D, Meek C (1997b) Models and selection criteria for regression and classification. In: Proceedings of the thirteenth conference on uncertainty in artificial intelligence, pp 223–228
[33] Heckerman D, Geiger D, Chickering DM (1995) Learning Bayesian networks: The combination of knowledge and statistical data. Mach Learn 20: 197–243 · Zbl 0831.68096
[34] Inza I, Merino M, Larrañaga P, Quiroga J, Sierra B, Girala M (2001) Feature subset selection by genetic algorithms and estimation of distribution algorithms–a case study in the survival of cirrhotic patients treated with TIPS. Artif Intell Med 23(2): 187–205 · Zbl 0986.68825 · doi:10.1016/S0933-3657(01)00085-9
[35] Jensen CS (1997) Blocking Gibbs sampling for inference in large and complex Bayesian networks with applications in genetics. PhD Thesis, Aalborg University
[36] John GH, Kohavi R (1994) Irrelevant features and the subset selection problem. In: Proceedings of the eleventh international conference on machine learning, pp 121–129
[37] Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1-2): 273–324 · Zbl 0904.68143 · doi:10.1016/S0004-3702(97)00043-X
[38] Koller D, Friedman N (2009) Probabilistic graphical models: principles and techniques. MIT, Cambridge · Zbl 1183.68483
[39] Koller D, Sahami M (1996) Toward optimal feature selection. In: ICML-96: Proceedings of the thirteenth international conference on machine learning, pp 284–292
[40] Kristensen K, Rasmussen IA (2002) The use of a Bayesian network in the design of a decision support system for growing malting barley without use of pesticides. Comput Electron Agric 33: 197–217 · doi:10.1016/S0168-1699(02)00007-8
[41] Lam W, Bacchus F (1994) Learning Bayesian belief networks. An approach based on the MDL principle. Comput Intell 10: 269–293 · doi:10.1111/j.1467-8640.1994.tb00166.x
[42] Langley P, Sage S (1994) Induction of selective Bayesian classifiers. In: Proceedings of the tenth conference on uncertainty in artificial intelligence, pp 399–406
[43] Langley P, Iba W, Thompson K (1992) An analysis of Bayesian classifiers. In: Proceedings of the national conference on artificial intelligence, pp 223–228
[44] Li H, Huang H, Sun J, Lin C (2010) On sensitivity of case-based reasoning to optimal feature subsets in business failure prediction. Expert Syst Appl 37(7): 4811–4821 · doi:10.1016/j.eswa.2009.12.034
[45] Mamitsuka H (2003) Empirical evaluation of ensemble feature subset selection methods for learning from a high-dimensional database in drug design In: Proceedings of the third IEEE symposium on bioinformatics and bioengineering, pp 253–257
[46] Margaritis D, Thrun S (2000) Bayesian network induction via local neighborhoods. In: Proceedings of neural information processing systems, pp 505–511
[47] Moral S (2004) An empirical comparison of score measures for independence. In: Proceedings of the 10th IPMU international conference, pp 1307–1314
[48] Neapolitan RE (2003) Learning Bayesian Networks. Prentice Hall · Zbl 1069.91094
[49] Pearl J (1988) Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufman, San Francisco · Zbl 0746.68089
[50] Pellet JP, Elisseeff A (2008) Using Markov blankets for causal structure learning. J Mach Learn Res 9: 1295–1342 · Zbl 1225.68205
[51] Peña JM, Björkegren J, Tegner J (2005) Growing Bayesian network models of gene networks from seed genes. Bioinformatics 21: ii224–ii229 · doi:10.1093/bioinformatics/bti1137
[52] Peña JM, Nilsson R, Björkegren J, Tegner J (2007) Towards scalable and data efficient learning of Markov boundaries. Int J Approx Reason 45(2): 211–232 · Zbl 1122.68136 · doi:10.1016/j.ijar.2006.06.008
[53] Peng H, Long F, Ding C (2005) Feature selection based on mutual information: criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans Pattern Anal Mach Intell 27(8): 1226–1238 · doi:10.1109/TPAMI.2005.159
[54] Pearl J, Verma T (1990) Equivalence and synthesis of causal models. In: Proceedings of the sixth conference on uncertainty in artificial intelligence, pp 220–227
[55] Pomeroy SL et al (2002) Prediction of central nervous system embryonal tumour outcome based on gene expression. Nature 415: 436–442 · doi:10.1038/415436a
[56] Ramsey J (2006) A PC-style Markov blanket search for high dimensional datasets. Technical Report CMU-PHIL-177, Carnegie Mellon University
[57] Rasmussen LK (1995) Bayesian Network for blood typing and parentage verification of cattle. Ph.D. Thesis, Research Centre Foulum, Denmark
[58] Rodriguesde Morais S, Aussem A (2008a) A novel scalable and data efficient feature subset selection algorithm. Lect Notes Comput Sci 5212: 298–312 · doi:10.1007/978-3-540-87481-2_20
[59] Rodrigues de Morais S, Aussem A (2008b) A novel scalable and correct Markov boundary learning algorithm under faithfulness condition. In: Proceedings of the 4th European workshop on probabilistic graphical models, pp 81–88
[60] Roos T, Wettig H, Grünwald P, Myllymäki P, Tirri H (2005) On discriminative Bayesian network classifiers and logistic regression. Mach Learn 59: 267–296 · Zbl 1101.68785
[61] Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6: 461–464 · Zbl 0379.62005 · doi:10.1214/aos/1176344136
[62] Shipp MA et al (2002) Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nat Med 8(1): 68–74 · doi:10.1038/nm0102-68
[63] Sierra B, Larrañaga P (1998) Predicting survival in malignant skin melanoma using Bayesian networks automatically induced by genetic algorithms. An empirical comparison between different approaches. Artif Intell Med 14(1–2): 215–230 · doi:10.1016/S0933-3657(98)00024-4
[64] Singh M, Provan GM (1996) Efficient learning of selective Bayesian network classifiers. In: Proceedings of the thirteenth international conference on machine learning, pp 453–461
[65] Singh D et al (2002) Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1: 203–209 · doi:10.1016/S1535-6108(02)00030-2
[66] Song X, Ding Y, Huang J, Ge Y (2010) Feature selection for support vector machine in financial crisis prediction: a case study in China. Expert Syst 27(4): 299–310 · doi:10.1111/j.1468-0394.2010.00546.x
[67] Spirtes P, Glymour C, Scheines R (1993) Causation, prediction, and search. Springer, New York · Zbl 0806.62001
[68] Tsamardinos I, Aliferis CF, Statnikov A (2003a) Algorithms for large scale Markov blanket discovery. In: Proceedings of the sixteenth international Florida artificial intelligence research society conference, pp 376–380
[69] Tsamardinos I, Aliferis CF, Statnikov A (2003b) Time and sample efficient discovery of Markov blankets and direct causal relations. In: Proceedings of the ninth ACM SIGKDD international conference on knowledge discovery and data mining, pp 673–678
[70] Tsamardinos I, Brown LE, Aliferis CF (2006) The max-min hill-climbing Bayesian network structure learning algorithm. Mach Learn 65: 31–78 · Zbl 1470.68192 · doi:10.1007/s10994-006-6889-7
[71] Yang Y, Pedersen JO (1997) A comparative study on feature selection in text categorization. In: Proceedings of the fourteenth international conference on machine learning, pp 412–420
[72] Yaramakala S, Margaritis D (2005) Speculative Markov blanket discovery for optimal feature selection. In: Proceedings of the fifth ieee international conference on data mining, pp 809–812
[73] Yeoh EJ et al (2002) Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. Cancer Cell 1: 133–143 · doi:10.1016/S1535-6108(02)00032-6
[74] Yishi Z, Hong X, Yang H, Gangyi Q (2009) S-IAMB algorithm for Markov blanket discovery. In: Proceedings of the Asia-Pacific conference on information processing, vol 2, pp 379–382
[75] Zheng H, Zhang Y (2008) Feature selection for high-dimensional data in astronomy. Adv Space Res 41(12): 1960–1964 · doi:10.1016/j.asr.2007.08.033
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.