×

A topological approach for capturing high-order interactions in graph data with applications to anomaly detection in time-varying cryptocurrency transaction graphs. (English) Zbl 07927553

Summary: Time-varying graphs are increasingly common in financial, social and biological data analysis applications. Feature extraction that efficiently encodes the complex structure of sparse, multi-layered, dynamic graphs presents computational and methodological challenges. In the past decade, topological data analysis has become a popular method of studying the shape of data. This is achieved by building an increasing sequence of simplicial complexes (called filtration) indexed by a scale parameter on top of the data to keep track of topological changes along with the filtration. This multi-scale summary, called persistence diagram (PD), is often vectorized to be used in machine learning algorithms. This paper introduces a topological approach to extract information on higher-order interactions encoded in persistence diagrams from graph data. Our framework has two main steps: first, we convert the graph into a higher-dimensional simplicial complex by adding structures such as triangles, tetrahedrons etc., and compute a PD using the so-called lower-star filtration which utilizes quantitative node attributes. Then, we vectorize the PD by averaging the associated Betti function over successive scale values of a one-dimensional grid using integration. A notable aspect of our procedure is that it avoids embedding a graph into a metric space. We show that the proposed vectorization summary is robust against input noise with respect to the \(L_1\) 1-Wasserstein distance. In simulation studies, the proposed approach leads to improved change point detection rates and outperforms one of the state-of-the-art methods for anomaly detection in time-varying graphs. In real data application, our approach leads to up to a 20% gain in anomalous price prediction in the Ethereum cryptocurrency transaction network.

MSC:

55N31 Persistent homology and applications, topological data analysis
62R40 Topological data analysis
68T09 Computational aspects of data analysis and big data
Full Text: DOI

References:

[1] N. C. Abay, C. G. Akcora, Y. R. Gel, M. Kantarcioglu, U. D. Islambekov, Y. Tian and B. Thuraisingham, Chainnet: Learning on blockchain graphs with topological features, in 2019 IEEE international conference on data mining (ICDM), IEEE, 2019, 946-951. doi: 10.1109/ICDM.2019.00105. · doi:10.1109/ICDM.2019.00105
[2] H. Adams, T. Emerson, M. Kirby, R. Neville, C. Peterson, P. Shipman, S. Chepushtanova, E. Hanson, F. Motta and L. Ziegelmeier, Persistence images: A stable vector representation of persistent homology, The Journal of Machine Learning Research, 18 (2017), Paper No. 8, 35 pp. · Zbl 1431.68105
[3] P. Bubenik, Statistical topological data analysis using persistence landscapes, The Journal of Machine Learning Research, 16 (2015), 77-102. · Zbl 1337.68221
[4] P. Bubenik and T. Vergili, Topological spaces of persistence modules and their properties, Journal of Applied and Computational Topology, 2 (2018), 233-269. doi: 10.1007/s41468-018-0022-4. · Zbl 1423.55012 · doi:10.1007/s41468-018-0022-4
[5] G. Carlsson, Topology and data, Bulletin of the American Mathematical Society, 46 (2009), 255-308. doi: 10.1090/S0273-0979-09-01249-X. · Zbl 1172.62002 · doi:10.1090/S0273-0979-09-01249-X
[6] M. Carrière, F. Chazal, Y. Ike, T. Lacombe, M. Royer and Y. Umeda, Perslay: A neural network layer for persistence diagrams and new graph topological signatures, in International Conference on Artificial Intelligence and Statistics, PMLR, 2020, 2786-2796.
[7] M. Carrière and R. Rabadán, Topological data analysis of single-cell Hi-C contact maps, in Topological Data Analysis-the Abel Symposium 2018, Springer, Cham, Abel Symp., 15 (2020), 147-162. doi: 10.1007/978-3-030-43408-3_6. · Zbl 1448.62213 · doi:10.1007/978-3-030-43408-3_6
[8] F. Chazal, V. De Silva and S. Oudot, Persistence stability for geometric complexes, Geometriae Dedicata, 173 (2014), 193-214. doi: 10.1007/s10711-013-9937-z. · Zbl 1320.55003 · doi:10.1007/s10711-013-9937-z
[9] F. Chazal and B. Michel, An introduction to topological data analysis: Fundamental and practical aspects for data scientists, Frontiers in Artificial Intelligence, 4 (2021). doi: 10.3389/frai.2021.667963. · doi:10.3389/frai.2021.667963
[10] Y.-M. Chung and A. Lawson, Persistence curves: A canonical framework for summarizing persistence diagrams, Advances in Computational Mathematics, 48 (2022), Paper No. 6, 42 pp. doi: 10.1007/s10444-021-09893-4. · Zbl 1497.55008 · doi:10.1007/s10444-021-09893-4
[11] D. Cohen-Steiner, H. Edelsbrunner and J. Harer, Stability of persistence diagrams, Discrete & Computational Geometry, 37 (2007), 103-120. doi: 10.1007/s00454-006-1276-5. · Zbl 1117.54027 · doi:10.1007/s00454-006-1276-5
[12] D. Cohen-Steiner, H. Edelsbrunner, J. Harer and Y. Mileyko, Lipschitz functions have l p-stable persistence, Foundations of Computational Mathematics, 10 (2010), 127-139. doi: 10.1007/s10208-010-9060-6. · Zbl 1192.55007 · doi:10.1007/s10208-010-9060-6
[13] G. Csardi and T. Nepusz, The igraph software package for complex network research, InterJournal, Complex Systems, 1695 (2006), 1-9.
[14] H. Edelsbrunner and J. Harer, Computational Topology: An Introduction, American Mathematical Soc., Providence, RI, 2010. doi: 10.1090/mbk/069. · Zbl 1193.55001 · doi:10.1090/mbk/069
[15] H. Edelsbrunner, J. Harer et al., Persistent homology-a survey, Surveys on Discrete and Computational Geometry, Contemporary Mathematics, American Mathematical Society, Providence, RI, 453 (2008), 257-282. doi: 10.1090/conm/453/08802. · Zbl 1145.55007 · doi:10.1090/conm/453/08802
[16] H. Edelsbrunner and J. L. Harer, Computational Topology: An Introduction, American Mathematical Society, Providence, RI, 2010. doi: 10.1090/mbk/069. · Zbl 1193.55001 · doi:10.1090/mbk/069
[17] B. T. Fasy, J. Kim, F. Lecci and C. Maria, Introduction to the R package TDA, arXiv preprint, arXiv: 1411.1830.
[18] B. T. Fasy, J. Kim, F. Lecci, C. Maria, D. L. Millman and V. Rouvreau., TDA: Statistical Tools for Topological Data Analysis, 2021, URL https://CRAN.R-project.org/package=TDA, R package version 1.7.7.
[19] R. Ghrist, Barcodes: the persistent topology of data, Bulletin of the American Mathematical Society, 45 (2008), 61-75. doi: 10.1090/S0273-0979-07-01191-3. · Zbl 1391.55005 · doi:10.1090/S0273-0979-07-01191-3
[20] M. Gidea, Topology data analysis of critical transitions in financial networks, arXiv preprint arXiv: 1701.06081. doi: 10.2139/ssrn.2903278. · doi:10.2139/ssrn.2903278
[21] M. Gidea, D. Goldsmith, Y. Katz, P. Roldan and Y. Shmalo, Topological recognition of critical transitions in time series of cryptocurrencies, Physica A: Statistical Mechanics and Its Applications, 548 (2020), 123843. doi: 10.1016/j.physa.2019.123843. · Zbl 07530523 · doi:10.1016/j.physa.2019.123843
[22] M. Hajij, B. Wang, C. Scheidegger and P. Rosen, Persistent homology guided exploration of time-varying graphs, arXiv preprint, arXiv: 1707.06683.
[23] P. W. Holland, K. B. Laskey and S. Leinhardt, Stochastic blockmodels: First steps, Social Networks, 5 (1983), 109-137. doi: 10.1016/0378-8733(83)90021-7. · doi:10.1016/0378-8733(83)90021-7
[24] S. Huang, Y. Hitti, G. Rabusseau and R. Rabbany, Laplacian change point detection for dynamic graphs, in Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2020, 349-358. doi: 10.1145/3394486.3403077. · doi:10.1145/3394486.3403077
[25] U. Islambekov and A. Luchinsky, TDAvec: Vector Summaries of Persistence Diagrams, 2022, URL https://CRAN.R-project.org/package=TDAvec, R package version 0.1.2.
[26] U. Islambekov and H. Pathirana, Vector summaries of persistence diagrams for permutation-based hypothesis testing, Foundations of Data Science, 6 (2024), 41-61. doi: 10.3934/fods.2024002. · Zbl 07927536 · doi:10.3934/fods.2024002
[27] M. Kerber, D. Morozov and A. Nigmetov, Geometry helps to compare persistence diagrams, ACM J. Exp. Algorithmics, 22 (2017), Art. 1.4, 20 pp. doi: 10.1145/3064175. · Zbl 1414.68129 · doi:10.1145/3064175
[28] M. Kraetzl, C. Nickel and E. R. Scheinerman, Random dot product graphs: A model for social networks, Preliminary manuscript.
[29] H. W. Kuhn, The hungarian method for the assignment problem, Naval Research Logistics Quarterly, 2 (1955), 83-97. doi: 10.1002/nav.3800020109. · Zbl 0143.41905 · doi:10.1002/nav.3800020109
[30] M. Z. Li, M. S. Ryerson and H. Balakrishnan, Topological data analysis for aviation applications, Transportation Research Part E: Logistics and Transportation Review, 128 (2019), 149-174. doi: 10.1016/j.tre.2019.05.017. · doi:10.1016/j.tre.2019.05.017
[31] Y. Li, U. Islambekov, C. Akcora, E. Smirnova, Y. R. Gel and M. Kantarcioglu, Dissecting ethereum blockchain analytics: What we learn from topology and geometry of the ethereum graph?, in Proceedings of the 2020 SIAM International Conference on Data Mining, SIAM, 2020, 523-531. doi: 10.1137/1.9781611976236.59. · doi:10.1137/1.9781611976236.59
[32] S. López-Pintado and J. Romo, On the concept of depth for functional data, Journal of the American statistical Association, 104 (2009), 718-734. doi: 10.1198/jasa.2009.0108. · Zbl 1388.62139 · doi:10.1198/jasa.2009.0108
[33] D. S. Matteson and N. A. James, A nonparametric approach for multiple change point analysis of multivariate data, Journal of the American Statistical Association, 109 (2014), 334-345. doi: 10.1080/01621459.2013.849605. · Zbl 1367.62260 · doi:10.1080/01621459.2013.849605
[34] Y. Mileyko, S. Mukherjee and J. Harer, Probability measures on the space of persistence diagrams, Inverse Problems, 27 (2011), 124007. doi: 10.1088/0266-5611/27/12/124007. · Zbl 1247.68310 · doi:10.1088/0266-5611/27/12/124007
[35] V. Nanda and R. Sazdanović, Simplicial models and topological inference in biological systems, in Discrete and Topological Models in Molecular Biology, Springer, 2014, 109-141. doi: 10.1007/978-3-642-40193-0_6. · Zbl 1290.92005 · doi:10.1007/978-3-642-40193-0_6
[36] S. W. Park, Y. Y. Choi, D. Joe, U. J. Choi and Y. Woo, The pwlr graph representation: A persistent weisfeiler-lehman scheme with random walks for graph classification, in Topological, Algebraic and Geometric Learning Workshops 2022, PMLR, 2022, 287-297.
[37] J. A. Pike, A. O. Khan, C. Pallini, S. G. Thomas, M. Mund, J. Ries, N. S. Poulter and I. B. Styles, Topological data analysis quantifies biological nano-structure from single molecule localization microscopy, Bioinformatics, 36 (2020), 1614-1621. doi: 10.1093/bioinformatics/btz788. · doi:10.1093/bioinformatics/btz788
[38] B. B. F. Pontiveros, M. Steichen and R. State, Mint centrality: A centrality measure for the bitcoin transaction graph, in 2019 IEEE International Conference on Blockchain and Cryptocurrency (ICBC), IEEE, 2019, 159-162. doi: 10.1109/BLOC.2019.8751401. · doi:10.1109/BLOC.2019.8751401
[39] T. Qaiser, Y.-W. Tsang, D. Taniyama, N. Sakamoto, K. Nakane, D. Epstein and N. Rajpoot, Fast and accurate tumor segmentation of histology images using persistent homology and deep convolutional features, Medical Image Analysis, 55 (2019), 1-14. doi: 10.1016/j.media.2019.03.014. · doi:10.1016/j.media.2019.03.014
[40] R Core Team, R: A Language and Environment for Statistical Computing, R Foundation for Statistical Computing, Vienna, Austria, 2021, URL https://www.R-project.org/.
[41] B. Rieck, C. Bock and K. Borgwardt, A persistent weisfeiler-lehman procedure for graph classification, in International Conference on Machine Learning, PMLR, 2019, 5448-5458.
[42] B. Rieck, U. Fugacci, J. Lukasczyk and H. Leitte, Clique community persistence: A topological visual analysis approach for complex networks, IEEE Transactions on Visualization and Computer Graphics, 24 (2018), 822-831. doi: 10.1109/TVCG.2017.2744321. · doi:10.1109/TVCG.2017.2744321
[43] S. Rudkin, W. Rudkin and P. Dłotko, On the topology of cryptocurrency markets, International Review of Financial Analysis, 89 (2023), 102759. doi: 10.1016/j.irfa.2023.102759. · doi:10.1016/j.irfa.2023.102759
[44] K. Shamsi, F. Victor, M. Kantarcioglu, Y. Gel and C. G. Akcora, Chartalist: Labeled graph datasets for utxo and account-based blockchains, Advances in Neural Information Processing Systems, 35 (2022), 34926-34939.
[45] A. D. Smith, P. Dłotko and V. M. Zavala, Topological data analysis: concepts, computation, and applications in chemical engineering, Computers & Chemical Engineering, 146 (2021), 107202. doi: 10.1016/j.compchemeng.2020.107202. · doi:10.1016/j.compchemeng.2020.107202
[46] Y. Umeda, Time series classification via topological data analysis, Information and Media Technologies, 32 (2017), 1-12. doi: 10.1527/tjsai.D-G72. · doi:10.1527/tjsai.D-G72
[47] F. Victor, Address clustering heuristics for ethereum, in International Conference on Financial Cryptography and Data Security, Springer, 12059 (2020), 617-633. doi: 10.1007/978-3-030-51280-4_33. · doi:10.1007/978-3-030-51280-4_33
[48] F. Victor, C. G. Akcora, Y. R. Gel and M. Kantarcioglu, Alphacore: Data depth based core decomposition, in Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, 2021, 1625-1633. doi: 10.1145/3447548.3467322. · doi:10.1145/3447548.3467322
[49] G. Wood, Ethereum: A secure decentralised generalised transaction ledger, Ethereum Project Yellow Paper, 151 (2014), 1-32.
[50] Q. Zhao and Y. Wang, Learning metrics for persistence-based summaries and applications for graph classification, Advances in Neural Information Processing Systems, 32.
[51] A. Zomorodian and G. Carlsson, Computing persistent homology, Discrete & Computational Geometry, 33 (2005), 249-274. doi: 10.1007/s00454-004-1146-y. · Zbl 1069.55003 · doi:10.1007/s00454-004-1146-y
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.