×

ORCA: outlier detection and robust clustering for attributed graphs. (English) Zbl 1481.90274

Summary: A framework is proposed to simultaneously cluster objects and detect anomalies in attributed graph data. Our objective function along with the carefully constructed constraints promotes interpretability of both the clustering and anomaly detection components, as well as scalability of our method. In addition, we developed an algorithm called Outlier detection and Robust Clustering for Attributed graphs (ORCA) within this framework. ORCA is fast and convergent under mild conditions, produces high quality clustering results, and discovers anomalies that can be mapped back naturally to the features of the input data. The efficacy and efficiency of ORCA is demonstrated on real world datasets against multiple state-of-the-art techniques.

MSC:

90C27 Combinatorial optimization
90C17 Robustness in mathematical programming
Full Text: DOI

References:

[1] Aggarwal, C.C.: Outlier analysis. In: Data Mining, pp. 237-263. Springer (2015). doi:10.1007/978-3-319-14142-8_8
[2] Aggarwal, C.C.: An introduction to outlier analysis. In: Outlier Analysis, pp. 1-34. Springer (2017). doi:10.1007/978-3-319-47578-3_1 · Zbl 1353.68004
[3] Akoglu, L., McGlohon, M., Faloutsos, C.: Oddball: spotting anomalies in weighted graphs. In: Advances in Knowledge Discovery and Data Mining, pp. 410-421 (2010). doi:10.1007/978-3-642-13672-6_40
[4] Akoglu, L.; Tong, H.; Koutra, D., Graph based anomaly detection and description: a survey, Data Min. Knowl. Discov., 29, 3, 626-688 (2015) · doi:10.1007/s10618-014-0365-y
[5] Bertsekas, DP, Nonlinear Programming (1999), Belmont: Athena Scientific, Belmont · Zbl 1015.90077
[6] Bouwmans, T.; Sobral, A.; Javed, S.; Jung, SK; Zahzah, EH, Decomposition into low-rank plus additive matrices for background/foreground separation: a review for a comparative evaluation with a large-scale dataset, Comput. Sci. Rev., 23, 1-71 (2017) · Zbl 1398.68572 · doi:10.1016/j.cosrev.2016.11.001
[7] Candès, EJ; Li, X.; Ma, Y.; Wright, J., Robust principal component analysis?, J. ACM (JACM), 58, 3, 11 (2011) · Zbl 1327.62369 · doi:10.1145/1970392.1970395
[8] Candès, EJ; Recht, B., Exact matrix completion via convex optimization, Found. Comput. Math., 9, 6, 717 (2009) · Zbl 1219.90124 · doi:10.1007/s10208-009-9045-5
[9] Chakrabarti, D.; Faloutsos, C., Graph mining: laws, generators, and algorithms, ACM Comput. Surv. (CSUR), 38, 1, 2-es (2006) · doi:10.1145/1132952.1132954
[10] Davis, J., Goadrich, M.: The relationship between precision-recall and roc curves. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 233-240. ACM (2006). doi:10.1145/1143844.1143874
[11] Dhillon, I.S., Guan, Y., Kulis, B.: Kernel k-means: spectral clustering and normalized cuts. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 551-556. ACM (2004). doi:10.1145/1014052.1014118
[12] Drineas, P.; Frieze, A.; Kannan, R.; Vempala, S.; Vinay, V., Clustering large graphs via the singular value decomposition, Mach. Learn., 56, 1, 9-33 (2004) · Zbl 1089.68090 · doi:10.1023/B:MACH.0000033113.59016.96
[13] Du, R.; Drake, B.; Park, H., Hybrid clustering based on content and connection structure using joint nonnegative matrix factorization, J. Glob. Optim., 74, 861-877 (2017) · Zbl 1434.15011 · doi:10.1007/s10898-017-0578-x
[14] Du, R.; Kuang, D.; Drake, B.; Park, H., Hierarchical community detection via rank-2 symmetric nonnegative matrix factorization, Computat. Soc. Netw., 4, 1, 7 (2017) · doi:10.1186/s40649-017-0043-5
[15] Dunlavy, DM; Kolda, TG; Acar, E., Temporal link prediction using matrix and tensor factorizations, ACM Trans. Knowl. Discov. Data (TKDD), 5, 2, 1-27 (2011) · doi:10.1145/1921632.1921636
[16] Fortunato, S., Community detection in graphs, Phys. Rep., 486, 3-5, 75-174 (2010) · doi:10.1016/j.physrep.2009.11.002
[17] Gao, H.; Chen, Y.; Lee, K.; Palsetia, D.; Choudhary, AN, Towards online spam filtering in social networks, NDSS, 12, 1-16 (2012) · doi:10.1109/ICDM.2011.124
[18] Gao, J., Liang, F., Fan, W., Wang, C., Sun, Y., Han, J.: On community outliers and their efficient detection in information networks. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 813-822. ACM (2010). doi:10.1145/1835804.1835907
[19] Henderson, K., Gallagher, B., Li, L., Akoglu, L., Eliassi-Rad, T., Tong, H., Faloutsos, C.: It’s who you know: graph mining using recursive structural features. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 663-671. ACM (2011). doi:10.1145/2020408.2020512
[20] Huber, PJ, Robust Statistics (2004), Hoboken: Wiley, Hoboken · Zbl 1276.62022 · doi:10.1002/9780470434697
[21] Kannan, R.; Ballard, G.; Park, H., Mpi-faun: an mpi-based framework for alternating-updating nonnegative matrix factorization, IEEE Trans. Knowl. Data Eng., 30, 3, 544-558 (2018) · doi:10.1109/TKDE.2017.2767592
[22] Kannan, R., Woo, H., Aggarwal, C.C., Park, H.: Outlier detection for text data. In: Proceedings of the 2017 SIAM International Conference on Data Mining, pp. 489-497. SIAM (2017). doi:10.1137/1.9781611974973.55
[23] Kim, J.; He, Y.; Park, H., Algorithms for nonnegative matrix and tensor factorizations: a unified view based on block coordinate descent framework, J. Glob. Optim., 58, 2, 285-319 (2014) · Zbl 1321.90129 · doi:10.1007/s10898-013-0035-4
[24] Kim, J.; Park, H., Fast nonnegative matrix factorization: an active-set-like method and comparisons, SIAM J. Sci. Comput., 33, 6, 3261-3281 (2011) · Zbl 1232.65068 · doi:10.1137/110821172
[25] Kleinberg, JM, Authoritative sources in a hyperlinked environment, J. ACM (JACM), 46, 5, 604-632 (1999) · Zbl 1065.68660 · doi:10.1145/324133.324140
[26] Kuang, D.; Yun, S.; Park, H., Symnmf: nonnegative low-rank approximation of a similarity matrix for graph clustering, J. Glob. Optim., 62, 3, 545-574 (2015) · Zbl 1326.90080 · doi:10.1007/s10898-014-0247-2
[27] Kumar, S., Hooi, B., Makhija, D., Kumar, M., Faloutsos, C., Subrahmanian, V.: Rev2: fraudulent user prediction in rating platforms. In: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, pp. 333-341. ACM (2018). doi:10.1145/3159652.3159729
[28] Lee, D.D., Seung, H.S.: (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788-791. doi:10.1038/44565 · Zbl 1369.68285
[29] Lee, D.D., Seung, H.S.: Algorithms for non-negative matrix factorization. In: Advances in Neural Information Processing Systems, pp. 556-562 (2001). doi:10.5555/3008751.3008829
[30] Leskovec, J., Lang, K.J., Dasgupta, A., Mahoney, M.W.: Statistical properties of community structure in large social and information networks. In: Proceedings of the 17th International Conference on World Wide Web, pp. 695-704. ACM (2008). doi:10.1145/1367497.1367591
[31] Leskovec, J., Lang, K.J., Mahoney, M.: Empirical comparison of algorithms for network community detection. In: Proceedings of the 19th International Conference on World Wide Web, pp. 631-640. ACM (2010). doi:10.1145/1772690.1772755
[32] Li, J., Dani, H., Hu, X., Liu, H.: Radar: Residual analysis for anomaly detection in attributed networks. In: IJCAI, pp. 2152-2158 (2017). doi:10.24963/ijcai.2017/299
[33] Liu, N., Huang, X., Hu, X.: Accelerated local anomaly detection via resolving attributed networks. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence (2017). doi:10.24963/ijcai.2017/325
[34] Lu, Q., Getoor, L.: Link-based classification. In: Proceedings of the 20th International Conference on Machine Learning (ICML-03), pp. 496-503 (2003). doi:10.5555/3041838.3041901
[35] Mahoney, MW; Drineas, P., Cur matrix decompositions for improved data analysis, Proc. Natl. Acad. Sci., 106, 3, 697-702 (2009) · Zbl 1202.68480 · doi:10.1073/pnas.0803205106
[36] McCallum, AK; Nigam, K.; Rennie, J.; Seymore, K., Automating the construction of internet portals with machine learning, Inf. Retrieval, 3, 2, 127-163 (2000) · doi:10.1023/A:1009953814988
[37] Muller, E., Sánchez, P.I., Mulle, Y., Bohm, K.: Ranking outlier nodes in subspaces of attributed graphs. In: 2013 IEEE 29th International Conference on Data Engineering Workshops (ICDEW), pp. 216-222. IEEE (2013). doi:10.1109/ICDEW.2013.6547453
[38] Peng, Z., Luo, M., Li, J., Liu, H., Zheng, Q.: Anomalous: a joint modeling approach for anomaly detection on attributed networks. In: IJCAI, pp. 3513-3519 (2018). doi:10.5555/3304222.3304256
[39] Pfeiffer III, J.J., Moreno, S., La Fond, T., Neville, J., Gallagher, B.: Attributed graph models: modeling network structure with correlated attributes. In: Proceedings of the 23rd International Conference on World Wide Web, pp. 831-842. ACM (2014). doi:10.1145/2566486.2567993
[40] Revelle, M., Domeniconi, C., Sweeney, M., Johri, A.: Finding community topics and membership in graphs. In: Machine Learning and Knowledge Discovery in Databases. Lecture Notes in Computer Science, vol. 9285, pp. 625-640. Springer (2015). doi:10.1007/978-3-319-23525-7_38
[41] She, Y.; Owen, AB, Outlier detection using nonconvex penalized regression, J. Am. Stat. Assoc., 106, 494, 626-639 (2011) · Zbl 1232.62068 · doi:10.1198/jasa.2011.tm10390
[42] Tong, H., Lin, C.Y.: Non-negative residual matrix factorization with application to graph anomaly detection. In: Proceedings of the 2011 SIAM International Conference on Data Mining, pp. 143-153. SIAM (2011). doi:10.1137/1.9781611972818.13
[43] Von Luxburg, U., A tutorial on spectral clustering, Stat. Comput., 17, 4, 395-416 (2007) · doi:10.1007/s11222-007-9033-z
[44] Wang, G., Xie, S., Liu, B., Philip, S.Y.: Review graph based online store review spammer detection. In: 2011 IEEE 11th International Conference on Data Mining (ICDM), pp. 1242-1247. IEEE (2011)
[45] Whang, JJ; Du, R.; Jung, S.; Lee, G.; Drake, B.; Liu, Q.; Kang, S.; Park, H., Mega: multi-view semi-supervised clustering of hypergraphs, Proc. VLDB Endowment, 13, 5, 698-711 (2020) · doi:10.14778/3377369.3377378
[46] Wright, J., Ganesh, A., Rao, S., Peng, Y., Ma, Y.: Robust principal component analysis: exact recovery of corrupted low-rank matrices via convex optimization. In: Advances in Neural Information Processing Systems, pp. 2080-2088 (2009). doi:10.5555/2984093.2984326
[47] Xu, H., Caramanis, C., Sanghavi, S.: Robust PCA via outlier pursuit. In: Advances in Neural Information Processing Systems, pp. 2496-2504 (2010). doi:10.5555/2997046.2997174 · Zbl 1365.62228
[48] Yu, R.; He, X.; Liu, Y., Glad: group anomaly detection in social media analysis, ACM Trans. Knowl. Discov. Data (TKDD), 10, 2, 18 (2015) · doi:10.1145/2811268
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.