×

Power laws, the price model, and the Pareto type-2 distribution. (English) Zbl 07605486

Summary: We consider a version of D. Price’s model for the growth of a bibliographic network, where in each iteration, a constant number of citations is randomly allocated according to a weighted combination of the accidental (uniformly distributed) and the preferential (rich-get-richer) rule. Instead of relying on the typical master equation approach, we formulate and solve this problem in terms of the rank-size distribution. We show that, asymptotically, such a process leads to a Pareto-type 2 distribution with a new, appealingly interpretable parametrisation. We prove that the solution to the Price model expressed in terms of the rank-size distribution coincides with the expected values of order statistics in an independent Paretian sample. An empirical analysis of a large repository of academic papers yields a good fit not only in the tail of the distribution (as it is usually the case in the power law-like framework), but also across a significantly larger fraction of the data domain.

MSC:

82-XX Statistical mechanics, structure of matter

Software:

ArnetMiner; plfit

References:

[1] Thelwall, M.; Sud, P., Do new research issues attract more citations? A comparison between 25 Scopus subject categories, J. Assoc. Inf. Sci. Technol., 72, 269-279 (2021)
[2] Lyu, D.; Ruan, X.; Xie, J.; Cheng, Y., The classification of citing motivations: A meta-synthesis, Scientometrics, 126, 3243-3264 (2021)
[3] Gagolewski, M., Scientific impact assessment cannot be fair, J. Informetr., 7, 792-802 (2013)
[4] Dorogovtsev, S. N.; Mendes, J. F., Ranking scientists, Nat. Phys., 11, 882-883 (2015)
[5] Price, D., Little Science, Big Science (1963), Columbia Univ. Press: Columbia Univ. Press New York
[6] Merton, R. K., The Matthew effect in science, Science, 159, 56-63 (1968)
[7] Perc, M., The Matthew effect in empirical data, J. R. Soc. Interface, 11, Article 20140378 pp. (2014)
[8] Ionescu, G.; Chopard, B., An agent-based model for the bibliometric h-index, Eur. Phys. J. B, 86, 426 (2013) · Zbl 1515.91122
[9] Żogała-Siudem, B.; Siudem, G.; Cena, A.; Gagolewski, M., Agent-based model for the h-index – exact solution, Eur. Phys. J. B, 21 (2016)
[10] Sinatra, R.; Wang, D.; Deville, P.; Song, C.; Barabási, A. L., Quantifying the evolution of individual scientific impact, Science, 354 (2016)
[11] Janosov, M.; Battiston, F.; Sinatra, R., Success and luck in creative careers, EPJ Data Sci., 9, 9 (2020)
[12] Pluchino, A.; Biondo, A. E.; Rapisarda, A., Talent versus luck: The role of randomness in success and failure, Adv. Complex Syst., 21, Article 1850014 pp. (2018) · Zbl 07825206
[13] Pluchino, A.; Burgio, G.; Rapisarda, A.; Biondo, A. E.; Pulvirenti, A.; Ferro, A.; Giorgino, T., Exploring the role of interdisciplinarity in physics: Success, talent and luck, PLoS One, 14, Article e0218793 pp. (2019)
[14] Heesen, R., Academic superstars: Competent or lucky?, Synthese, 194, 4499-4518 (2017) · Zbl 1382.03032
[15] Kharel, S. R.; Mezei, T. R.; Chung, S.; Erdős, P. L.; Toroczkai, Z., Degree-preserving network growth, Nat. Phys., 1-7 (2021)
[16] Golosovsky, M.; Solomon, S., Growing complex network of citations of scientific papers: Modeling and measurements, Phys. Rev. E, 95, Article 012324 pp. (2017)
[17] Steinbock, C.; Biham, O.; Katzav, E., Distribution of shortest path lengths in a class of node duplication network models, Phys. Rev. E, 96, Article 032301 pp. (2017)
[18] Golosovsky, M., Mechanisms of complex network growth: Synthesis of the preferential attachment and fitness models, Phys. Rev. E, 97, Article 062310 pp. (2018)
[19] Steinbock, C.; Biham, O.; Katzav, E., Analytical results for the distribution of shortest path lengths in directed random networks that grow by node duplication, Eur. Phys. J. B, 92, 1-16 (2019) · Zbl 1515.05169
[20] Steinbock, C.; Biham, O.; Katzav, E., Analytical results for the in-degree and out-degree distributions of directed random networks that grow by node duplication, J. Stat. Mech. Theory Exp., 2019, Article 083403 pp. (2019) · Zbl 1457.90047
[21] Price, D., A general theory of bibliometric and other cumulative advantage processes, J. Am. Soc. Inf. Sci., 27, 292-306 (1976)
[22] Newman, M., Networks (2018), Oxford University Press · Zbl 1391.94006
[23] Liu, Z.; Lai, Y. C.; Ye, N.; Dasgupta, P., Connectivity distribution and attack tolerance of general networks with both preferential and random attachments, Phys. Lett. A, 303, 337-344 (2002) · Zbl 0999.82055
[24] Bedogne, C.; Rodgers, G. J., Complex growing networks with intrinsic vertex fitness, Phys. Rev. E, 74, Article 046115 pp. (2006)
[25] Shao, Z. G.; Zou, X. W.; Tan, Z. J.; Jin, Z. Z., Growing networks with mixed attachment mechanisms, J. Phys. A: Math. Gen., 39, 2035 (2006) · Zbl 1082.92001
[26] Peterson, G. J.; Pressé, S.; Dill, K. A., Nonuniversal power law scaling in the probability distribution of scientific citations, Proc. Natl. Acad. Sci., 107, 16023-16027 (2010)
[27] Néda, Z.; Varga, L.; Biró, T. S., Science and facebook: The same popularity law!, PLoS One, 12, 1-11 (2017)
[28] Siudem, G.; Żogała-Siudem, B.; Cena, A.; Gagolewski, M., Three dimensions of scientific impact, Proc. Natl. Acad. Sci., 117, 13896-13900 (2020)
[29] Evans, T. S.; Calmon, L.; Vasiliauskaite, V., The longest path in the price model, Sci. Rep., 10, 1-9 (2020)
[30] Dorogovtsev, S. N.; Mendes, J. F.F.; Samukhin, A. N., Structure of growing networks with preferential linking, Phys. Rev. Lett., 85, 4633-4636 (2000)
[31] Arnold, B. C., Pareto Distributions (2015), Chapman and Hall/CRC: Chapman and Hall/CRC New York, NY, USA · Zbl 1361.62004
[32] Bourguignon, M.; Saulo, H.; Fernandez, R. N., A new pareto-type distribution with applications in reliability and income data, Physica A, 457, 166-175 (2016) · Zbl 1400.60015
[33] Figueira, F. C.; Moura, N. J.; Ribeiro, M. B., The gompertz-pareto income distribution, Physica A, 390, 689-698 (2011)
[34] J. Tang, J. Zhang, L. Yao, J. Li, L. Zhang, Z. Su, Arnetminer: Extraction and mining of academic social networks, in: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2008, pp. 990-998.
[35] Gautschi, W., Some elementary inequalities relating to the gamma and incomplete gamma function, J. Math. Phys., 38, 77-81 (1959) · Zbl 0094.04104
[36] Billingsley, P., Probability and Measure (1995), Wiley · Zbl 0822.60002
[37] David, H. A.; Nagaraja, H. N., Order Statistics (2003), Wiley · Zbl 1053.62060
[38] Newman, M. E., Power laws, Pareto distributions and Zipf’s law, Contemp. Phys., 46, 323-351 (2005)
[39] Beirlant, J.; Goegebeur, Y.; Teugels, J.; Segers, J., Statistics of Extremes: Theory and Applications (2004), Wiley · Zbl 1070.62036
[40] Clauset, A.; Shalizi, C. R.; Newman, M. E., Power-law distributions in empirical data, SIAM Rev., 51, 661-703 (2009) · Zbl 1176.62001
[41] Zhang, J.; Stephens, M. A., A new and efficient estimation method for the generalized Pareto distribution, Technometrics, 51, 316-325 (2009)
[42] Luceno, A., Fitting the generalized Pareto distribution to data using maximum goodness-of-fit estimators, Comput. Statist. Data Anal., 1, 904-917 (2006) · Zbl 1157.62399
[43] Bermudez, P. Z.; Kotz, S., Parameter estimation of the generalized Pareto distribution. Part II, J. Statist. Plann. Inference, 140, 1374-1388 (2010) · Zbl 1190.62039
[44] Zhang, J., Improving on estimation for the generalized Pareto distribution, Technometrics, 52, 335-339 (2010)
[45] Schubert, A.; Glänzel, W., A systematic analysis of Hirsch-type indices for journals, J. Informetr., 1, 179-184 (2007)
[46] W., Glänzel, On the h-index – a mathematical approach to a new measure of publication activity and citation impact, Scientometrics, 67, 315-321 (2006)
[47] Gagolewski, M., Statistical hypothesis test for the difference between Hirsch indices of two Pareto-distributed random samples, (Kruse, R.; etal., Synergies of Soft Computing and Statistics for Intelligent Data Analysis. Synergies of Soft Computing and Statistics for Intelligent Data Analysis, Advances in Intelligent Systems and Computing, vol. 190 (2013), Springer), 359-367 · Zbl 1347.62034
[48] Gagolewski, M., Sugeno integral-based confidence intervals for the theoretical h-index, (Grzegorzewski, P.; etal., Strengthening Links Between Data Analysis and Soft Computing. Strengthening Links Between Data Analysis and Soft Computing, Advances in Intelligent Systems and Computing, vol. 315 (2015), Springer), 233-240 · Zbl 1337.62389
[49] Arnold, N. A.; Mondragón, R. J.; Clegg, R. G., Likelihood-based approach to discriminate mixtures of network models that vary in time, Sci. Rep., 11, 1-13 (2021)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.