×

Network-based naive Bayes model for social network. (English) Zbl 1392.62183

Summary: Naive Bayes (NB) is one of the most popular classification methods. It is particularly useful when the dimension of the predictor is high and data are generated independently. In the meanwhile, social network data are becoming increasingly accessible, due to the fast development of various social network services and websites. By contrast, data generated by a social network are most likely to be dependent. The dependency is mainly determined by their social network relationships. Then, how to extend the classical NB method to social network data becomes a problem of great interest. To this end, we propose here a network-based naive Bayes (NNB) method, which generalizes the classical NB model to social network data. The key advantage of the NNB method is that it takes the network relationships into consideration. The computational effciency makes the NNB method even feasible in large scale social networks. The statistical properties of the NNB model are theoretically investigated. Simulation studies have been conducted to demonstrate its finite sample performance. A real data example is also analyzed for illustration purpose.

MSC:

62H30 Classification and discrimination; cluster analysis (statistical aspects)
91D30 Social networks; opinion dynamics

Software:

ergm; NetKit
Full Text: DOI

References:

[1] Antonakis A C, Sfakianakis M E. Assessing naïve Bayes as a method for screening credit applicants. J Appl Stat, 2009, 36: 537-545 · Zbl 1473.62346 · doi:10.1080/02664760802554263
[2] Belkin M, Niyogi P, Sindhwani V. Manifold regularization: A geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res, 2006, 7: 2399-2434 · Zbl 1222.68144
[3] Bickel P J, Chen A. A nonparametric view of network models and Newman-Girvan and other modularities. Proc Natl Acad Sci USA, 2009, 106: 21068-21073 · Zbl 1359.62411 · doi:10.1073/pnas.0907096106
[4] Breiman L. Random forest. Mach Learn, 2001, 45: 5-32 · Zbl 1007.68152 · doi:10.1023/A:1010933404324
[5] Buhlmann P, Yu B. Boosting with the L2 loss: Regression and classification. J Amer Statist Assoc, 2003, 98: 324-340 · Zbl 1041.62029 · doi:10.1198/016214503000125
[6] Choi D, Wolfe P, Airoldi E. Stochastic blockmodels with a growing number of classes. Biometrika, 2012, 99: 273-284 · Zbl 1318.62207 · doi:10.1093/biomet/asr053
[7] Craven, M.; McCallum, A.; PiPasquo, D.; etal., Learning to extract symbolic knowledge from the World Wide Web, 509-516 (1998), Menlo Park
[8] Erdős P, Rényi A. On the evolution of random graphs. Magyar Tud Akad Mat Kutató Int Közl, 1960, 5: 17-61 · Zbl 0103.16301
[9] Fan J, Feng Y, Jiang J, et al. Feature augmentation via nonparametrics and selection (FANS) in high-dimensional classification. J Amer Statist Assoc, 2016, 111: 275-287 · doi:10.1080/01621459.2015.1005212
[10] Friedman N, Geiger D, Goldszmidt M. Bayesian network classifiers. Mach Learn, 1997, 29: 131-163 · Zbl 0892.68077 · doi:10.1023/A:1007465528199
[11] Guan G, Guo J, Wang H. Varying naive Bayes models with applications to classification of Chinese text documents. J Bus Econom Statist, 2014, 32: 445-456 · doi:10.1080/07350015.2014.903086
[12] Guan G, Shan N, Guo J. Feature screening for ultrahigh dimensional binary data. Stat Interface, 2018, 11: 41-50 · Zbl 06938679 · doi:10.4310/SII.2018.v11.n1.a4
[13] Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning. New York: Springer, 2001 · Zbl 0973.62007 · doi:10.1007/978-0-387-21606-5
[14] Holland P W, Leinhardt S. An exponential family of probability distributions for directed graphs. J Amer Statist Assoc, 1981, 76: 33-50 · Zbl 0457.62090 · doi:10.1080/01621459.1981.10477598
[15] Hunter D R, Handcock M S. Inference in curved exponential family models for networks. J Comput Graph Statist, 2006, 15: 565-583 · doi:10.1198/106186006X133069
[16] Hunter D R, Handcock M S, Butts C T, et al. Ergm: A package to fit, simulate and diagnose exponential-family models for networks. J Statist Softw, 2008, 24: 1-29 · doi:10.18637/jss.v024.i03
[17] Lewis, D. D., Evaluating and optimizing autonomous text classification systems, 246-254 (1995), New York
[18] Lewis, D. D., Naive Bayes at forty: The independence assumption in information retrieval, 4-15 (1998), London · doi:10.1007/BFb0026666
[19] Macskassy S A, Provost F. Classification in networked data: A toolkit and a univariate case study. J Mach Learn Res, 2007, 8: 935-983
[20] Minnier J, Yuan M, Liu J S, et al. Risk classification with an adaptive naive Bayes kernel machine model. J Amer Statist Assoc, 2015, 110: 393-404 · Zbl 1373.62297 · doi:10.1080/01621459.2014.908778
[21] Neville, J.; Jensen, D., Iterative classification in relational data, 42-49 (2000), Palo Alto
[22] Nowicki K, Snijders T A B. Estimation and prediction for stochastic block structures. J Amer Statist Assoc, 2001, 96: 1077-1087 · Zbl 1072.62542 · doi:10.1198/016214501753208735
[23] Ozuysal M, Calonder M, Lepetit V, et al. Fast keypoint recognition using random ferns. IEEE Trans Pattern Anal Mach Intell, 2010, 32: 448-461 · doi:10.1109/TPAMI.2009.23
[24] Robins G, Pattison P, Elliott P. Network models for social in uence processes. Psychometrika, 2001, 66: 161-189 · Zbl 1293.62270 · doi:10.1007/BF02294834
[25] Wang Y J, Wong G Y. Stochastic blockmodels for directed graphs. J Amer Statist Assoc, 1987, 82: 8-19 · Zbl 0613.62146 · doi:10.1080/01621459.1987.10478385
[26] Wasserman S, Faust K. Social Network Analysis: Methods and Applications. Cambridge: Cambridge University Press, 1994 · Zbl 0926.91066 · doi:10.1017/CBO9780511815478
[27] Webb G I, Boughton J R, Wang Z. Not so naive Bayes: Aggregating one-dependence estimators. Mach Learn, 2005, 58: 5-24 · Zbl 1075.68078 · doi:10.1007/s10994-005-4258-6
[28] Wu Y, Liu Y. Robust truncated-hinge-loss support vector machines. J Amer Statist Assoc, 2007, 102: 974-983 · Zbl 1469.62293 · doi:10.1198/016214507000000617
[29] Zaidi N A, Cerquides J, Carman M, et al. Alleviating naive Bayes attribute independence assumption by attribute weighting. J Mach Learn Res, 2013, 14: 1947-1988 · Zbl 1317.68199
[30] Zanin M, Papo D, Sousa P A, et al. Combining complex networks and data mining: Why and how. Phys Rep, 2016, 635: 1-44 · doi:10.1016/j.physrep.2016.04.005
[31] Zheng Z, Webb G I. Lazy learning of Bayesian rules. Mach Learn, 2000, 41: 53-84 · doi:10.1023/A:1007613203719
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.