×

Community detection in attributed collaboration network for statisticians. (English) Zbl 07858650

Summary: Scientific collaboration helps to promote the dissemination of knowledge and is essential in breeding innovation. Collaboration network analysis is a useful tool to study researchers’ collaborations. In this work, we collect papers published between 2001 and 2018 in 43 statistical journals and investigate the collaborative trends and patterns. We find that more and more researchers take part in statistical research, and cooperation among them is strengthening. We further construct an attributed collaboration network and extract its core. Community detection is conducted on the core network by using the edge cross-validation (ECV) method and the attributed network clustering algorithm (ANCA). In particular, we extend the ANCA to deal with networks having both categorical and continuous attributes. Influential researchers are identified in each community. Furthermore, two kinds of homophily are revealed in our collaboration network: research topic homophily and spatial proximity homophily. Based on the homophily and transitivity, we can make recommendations for researchers. Finally, we compare ANCA with the other three methods and confirm that the combination of nodal attributes and network structure improves the quality of community detection. Our studies show the features of the collaboration among statisticians and present a new perspective to explore researchers.
© 2022 John Wiley & Sons Ltd.

MSC:

62-XX Statistics
Full Text: DOI

References:

[1] Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. The Journal of Machine Learning Research, 3, 993-1022. · Zbl 1112.68379
[2] Bondy, J. A., & Murty, U. S. R. (1976). Graph theory with applications, Vol. 290: Macmillan London. · Zbl 1226.05083
[3] Bordons, M., Aparicio, J., González‐Albo, B., & Díaz‐Faes, A. A. (2015). The relationship between the research performance of scientists and their position in co‐authorship networks in three fields. Journal of Informetrics, 9(1), 135-144.
[4] Brunson, J. C., Fassino, S., McInnes, A., Narayan, M., Richardson, B., Franck, C., Ion, P., & Laubenbacher, R. (2014). Evolutionary events in a mathematical sciences research collaboration network. Scientometrics, 99(3), 973-998.
[5] Çavuşoğlu, A., & Türker, I. (2013). Scientific collaboration network of turkey. Chaos, Solitons & Fractals, 57, 9-18.
[6] Chakraborty, T., Dalmia, A., Mukherjee, A., & Ganguly, N. (2017). Metrics for community analysis: A survey. ACM Computing Surveys (CSUR), 50(4), 1-37.
[7] Chunaev, P. (2020). Community detection in node‐attributed social networks: A survey. Computer Science Review, 37, 100286. · Zbl 1478.91146
[8] Chunaev, P., Gradov, T., & Bochenina, K. (2020). Community detection in node‐attributed social networks: How structure‐attributes correlation affects clustering quality. Procedia Computer Science, 178, 355-364.
[9] Ding, Y. (2011). Scientific collaboration and endorsement: Network analysis of coauthorship and citation networks. Journal of Informetrics, 5(1), 187-203.
[10] Dráždilová, P., Babskova, A., Martinovič, J., Slaninová, K., & Minks, S. (2012). Method for identification of suitable persons in collaborators’ networks. In IFIP International Conference on Computer Information Systems and Industrial Management, Springer, pp. 101-110.
[11] Evans, T. S., Lambiotte, R., & Panzarasa, P. (2011). Community structure and patterns of scientific collaboration in business and management. Scientometrics, 89(1), 381-396.
[12] Falih, I., Grozavu, N., Kanawati, R., & Bennani, Y. (2017). Anca: Attributed network clustering algorithm. In International Conference on Complex Networks and Their Applications, Springer, pp. 241-252.
[13] Falih, I., Grozavu, N., Kanawati, R., & Bennani, Y. (2018). Community detection in attributed network. In Companion Proceedings of the the Web Conference 2018, pp. 1299-1306.
[14] Freeman, L. C., Borgatti, S. P., & White, D. R. (1991). Centrality in valued graphs: A measure of betweenness based on network flow. Social Networks, 13(2), 141-154.
[15] Gao, T., Pan, R., Wang, S., Yang, Y., & Zhang, Y. (2021). Community detection for statistical citation network by d‐score. Statistics and Its Interface, 14(3), 279-294. · Zbl 07342196
[16] Gao, Y., Zhang, Z., Lin, H., Zhao, X., Du, S., & Zou, C. (2022). Hypergraph learning: Methods and practices. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(5), 2548-2566.
[17] Gel, Y. R., Lyubchich, V., & Ramirez, L. L. R. (2017). Bootstrap quantification of estimation uncertainties in network degree distributions. Scientific Reports, 7(1), 1-12.
[18] Ghoshdastidar, D., & Dukkipati, A. (2015). A provable generalized tensor spectral method for uniform hypergraph partitioning. In International Conference on Machine Learning, PMLR, pp. 400-409.
[19] Girvan, M., & Newman, M. arkE. J. (2002). Community structure in social and biological networks. Proceedings of the National Academy of Sciences, 99(12), 7821-7826. · Zbl 1032.91716
[20] González‐Alcaide, G., Aleixandre‐Benavent, R., Navarro‐Molina, C., & Valderrama‐Zurián, J. C. (2008). Coauthorship networks and institutional collaboration patterns in reproductive biology. Fertility and Sterility, 90(4), 941-956.
[21] Gui, Q., Liu, C., & Du, D. (2019). The structure and dynamic of scientific collaboration network among countries along the belt and road. Sustainability, 11(19), 5187.
[22] Han, H., Xu, S., Gui, J., Qiao, X., Zhu, L., & Zhang, H. (2014). Uncovering research topics of academic communities of scientific collaboration network. International Journal of Distributed Sensor Networks, 10(4), 529842.
[23] Ji, P., & Jin, J. (2016). Coauthorship and citation networks for statisticians. The Annals of Applied Statistics, 10(4), 1779-1812. · Zbl 1454.62541
[24] Ji, P., Jin, J., Ke, Z. T., & Li, W. (2022). Co‐citation and co‐authorship networks of statisticians. Journal of Business & Economic Statistics, 40(2), 469-485.
[25] Jin, J. (2015). Fast community detection by SCORE. The Annals of Statistics, 43(1), 57-89. · Zbl 1310.62076
[26] Kossinets, G., & Watts, D. J. (2009). Origins of homophily in an evolving social network. American Journal of Sociology, 115(2), 405-450.
[27] Li, T., Levina, E., & Zhu, J. (2020). Network cross‐validation by edge sampling. Biometrika, 107(2), 257-276. · Zbl 1441.62049
[28] Ma, Z., Ma, Z., & Yuan, H. (2020). Universal latent space model fitting for large networks with edge covariates. Journal of Machine Learning Research, 21(4), 1-67. · Zbl 1497.68432
[29] McPherson, M., Smith‐Lovin, L., & Cook, J. M. (2001). Birds of a feather: Homophily in social networks. Annual Review of Sociology, 27(1), 415-444.
[30] Mohammadamin, E., Ali, R. V., & Abrizah, A. (2017). Co‐authorship network of scientometrics research collaboration. Malaysian Journal of Library & Information Science, 17(3), 73-93.
[31] Moody, J. (2004). The structure of a social science collaboration network: Disciplinary cohesion from 1963 to 1999. American Sociological Review, 69(2), 213-238.
[32] Newman, M. arkE. J. (2004). Coauthorship networks and patterns of scientific collaboration. Proceedings of the National Academy of Sciences, 101(suppl 1), 5200-5205.
[33] Newman, M. E. J. (2006). Finding community structure in networks using the eigenvectors of matrices. Physical Review E, 74(3), 36104.
[34] Newman, M. E. J., & Girvan, M. (2004). Finding and evaluating community structure in networks. Physical Review E, 69(2), 26113.
[35] Parés, F., Gasulla, D. G., Vilalta, A., Moreno, J., Ayguadé, E., Labarta, J., Cortés, U., & Suzumura, T. (2017). Fluid communities: A competitive, scalable and diverse community detection algorithm. In International Conference on Complex Networks and Their Applications, Springer, pp. 229-240.
[36] Qing, H., & Wang, J. (2020). An improved spectral clustering method for mixed membership community detection. arXiv preprint arXiv:2012.04867.
[37] Rodriguez, M. A., & Pepe, A. (2008). On the relationship between the structural and socioacademic communities of a coauthorship network. Journal of Informetrics, 2(3), 195-201.
[38] Ruan, Y., Fuhry, D., & Parthasarathy, S. (2013). Efficient community detection in large networks using content and links. In Proceedings of the 22nd International Conference on World Wide Web, pp. 1089-1098.
[39] Saldana, D. F., Yu, Y., & Feng, Y. (2017). How many communities are there?Journal of Computational and Graphical Statistics, 26(1), 171-181.
[40] Sampaio, R. B., Fonseca, M. V. D. A., & Zicker, F. (2016). Co‐authorship network analysis in health research: method and potential use. Health Research Policy and Systems, 14(1), 1-10.
[41] Savić, M., Ivanović, M., Radovanović, M., Ognjanović, Z., Pejović, A., & Krüger, T. J. (2014). Exploratory analysis of communities in co‐authorship networks: A case study. In International Conference on ICT Innovations, Springer, pp. 55-64.
[42] Signorelli, M., & Wit, E. C. (2018). A penalized inference approach to stochastic block modelling of community structure in the italian parliament. Journal of the Royal Statistical Society: Series C (Applied Statistics), 67(2), 355-369.
[43] Steinhaeuser, K., & Chawla, N. V. (2008). Community detection in a large real‐world social network, Social computing, behavioral modeling, and prediction: Springer, pp. 168-175.
[44] Sun, L., & Rahwan, I. (2017). Coauthorship network in transportation research. Transportation Research Part A: Policy and Practice, 100, 135-151.
[45] Wang, S., & Rohe, K. (2016). Discussion of “coauthorship and citation networks for statisticians”. The Annals of Applied Statistics, 10(4), 1820-1826. · Zbl 1454.62548
[46] Watts, D. J., & Strogatz, S. H. (1998). Collective dynamics of ‘small‐world’ networks. Nature, 393(6684), 440-442. · Zbl 1368.05139
[47] Weng, H., & Feng, Y. (2022). Community detection with nodal information: likelihood and its variational approximation. Stat, 11(1), e428. · Zbl 07853566
[48] Xu, Z., Ke, Y., Wang, Y., Cheng, H., & Cheng, J. (2012). A model‐based approach to attributed graph clustering. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, pp. 505-516.
[49] Yan, B., & Sarkar, P. (2021). Covariate regularized community detection in sparse graphs. Journal of the American Statistical Association, 116(534), 734-745. · Zbl 1464.62333
[50] Yang, J., McAuley, J., & Leskovec, J. (2013). Community detection in networks with node attributes. In 2013 IEEE 13th International Conference on Data Mining, IEEE, pp. 1151-1156.
[51] Yao, L., Wang, L., Pan, L., & Yao, K. (2016). Link prediction based on common‐neighbors for dynamic social network. Procedia Computer Science, 83, 82-89.
[52] Yuan, M., Liu, R., Feng, Y., & Shang, Z. (2022). Testing community structure for hypergraphs. The Annals of Statistics, 50(1), 147-169. · Zbl 1486.62179
[53] Zhang, C., Bu, Y., Ding, Y., & Xu, J. (2018). Understanding scientific collaboration: Homophily, transitivity, and preferential attachment. Journal of the Association for Information Science and Technology, 69(1), 72-86.
[54] Zhang, X., Xu, G., & Zhu, J. (2021). Joint latent space models for network data with high‐dimensional node variables. Biometrika, 109(3), 707-720. · Zbl 07582647
[55] Zhen, Y., & Wang, J. (2022). Community detection in general hypergraph via graph embedding. Journal of the American Statistical Association, 1-10. https://doi.org/10.1080/01621459.2021.2002157 · Zbl 07751795 · doi:10.1080/01621459.2021.2002157
[56] Zhou, Y., Cheng, H., & Yu, J. X. (2009). Graph clustering based on structural/attribute similarities. Proceedings of the VLDB Endowment, 2(1), 718-729.
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.