×

The minimum description length principle for pattern mining: a survey. (English) Zbl 1509.68240

Summary: Mining patterns is a core task in data analysis and, beyond issues of efficient enumeration, the selection of patterns constitutes a major challenge. The Minimum Description Length (MDL) principle, a model selection method grounded in information theory, has been applied to pattern mining with the aim to obtain compact high-quality sets of patterns. After giving an outline of relevant concepts from information theory and coding, we review MDL-based methods for mining different kinds of patterns from various types of data. Finally, we open a discussion on some issues regarding these methods.

MSC:

68T09 Computational aspects of data analysis and big data
68P30 Coding and information theory (compaction, compression, models of communication, encoding schemes, etc.) (aspects in computer science)
68T10 Pattern recognition, speech recognition
68-02 Research exposition (monographs, survey articles) pertaining to computer science

References:

[1] Adriaens, F.; Lijffijt, J.; De Bie, T., Subjectively interesting connecting trees and forests, Data Min Knowl Disc, 33, 4, 1088-1124 (2019) · Zbl 1458.68155 · doi:10.1007/s10618-019-00627-1
[2] Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proceedings of 20th International Conference on Very Large Data Bases, VLDB’94, Morgan Kaufmann, pp 487-499
[3] Agrawal, R.; Imieliński, T.; Swami, A., Mining association rules between sets of items in large databases, ACM SIGMOD Rec, 22, 2, 207-216 (1993) · doi:10.1145/170036.170072
[4] Akoglu L, Tong H, Meeder B, Faloutsos C (2012a) PICS: Parameter-free identification of cohesive subgroups in large attributed graphs. In: Proceedings of the 2012 SIAM International Conference on Data Mining, SDM’12, SIAM, pp 439-450, doi:10.1137/1.9781611972825.38
[5] Akoglu L, Tong H, Vreeken J, Faloutsos C (2012b) Fast and reliable anomaly detection in categorical data. In: Proceedings of the 21st ACM international conference on Information and knowledge management, CIKM’12, ACM, pp 415-424, doi:10.1145/2396761.2396816
[6] Akoglu L, Chau DH, Vreeken J, Tatti N, Tong H, Faloutsos C (2013) Mining connection pathways for marked nodes in large graphs. In: Proceedings of the 2013 SIAM International Conference on Data Mining, SDM’13, SIAM, pp 37-45, doi:10.1137/1.9781611972832.5
[7] Anderson, EC; Novembre, J., Finding haplotype block boundaries by using the minimum-description-length principle, Am J Hum Genet, 73, 2, 336-354 (2003) · doi:10.1086/377106
[8] Aoga JOR, Guns T, Nijssen S, Schaus P (2018) Finding probabilistic rule lists using the minimum description length principle. In: Proceedings of the International Conference on Discovery Science, DS’18, Springer, pp 66-82, doi:10.1007/978-3-030-01771-2_5
[9] Araujo M, Günnemann S, Mateos G, Faloutsos C (2014a) Beyond blocks: Hyperbolic community detection. In: Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML/PKDD’14, Springer, pp 50-65, doi:10.1007/978-3-662-44848-9_4
[10] Araujo M, Papadimitriou S, Günnemann S, Faloutsos C, Basu P, Swami A, Papalexakis EE, Koutra D (2014b) Com2: Fast automatic discovery of temporal (‘comet’) communities. In: Proceedings of 18th Pacific-Asia Conference on the Advances in Knowledge Discovery and Data Mining, PAKDD’14, Springer, pp 271-283, doi:10.1007/978-3-319-06605-9_23
[11] Araujo, M.; Günnemann, S.; Papadimitriou, S.; Faloutsos, C.; Basu, P.; Swami, A.; Papalexakis, EE; Koutra, D., Discovery of “comet” communities in temporal and labeled graphs \(COM^2\), Knowl Inf Syst, 46, 3, 657-677 (2016) · doi:10.1007/s10115-015-0847-2
[12] Asadi B, Varadharajan V (2019a) An MDL-based classifier for transactional datasets with application in malware detection. arXiv:1910.03751
[13] Asadi B, Varadharajan V (2019b) Towards a robust classifier: An MDL-based method for generating adversarial examples. arXiv:1912.05945
[14] Bariatti F (2021) Mining tractable sets of graph patterns with the minimum description length principle. Phd thesis, Université de Rennes 1, https://hal.inria.fr/tel-03523742
[15] Bariatti F, Cellier P, Ferré S (2020a) GraphMDL: Graph pattern selection based on minimum description length. In: Proceedings of the 18th International Symposium on Advances in Intelligent Data Analysis, IDA’20, Springer, pp 54-66, doi:10.1007/978-3-030-44584-3_5
[16] Bariatti F, Cellier P, Ferré S (2020b) GraphMDL visualizer: Interactive visualization of graph patterns. In: Proceedings of the Graph Embedding and Mining Workshop GEM@ECML/PKDD’20, https://hal.inria.fr/hal-03142207
[17] Bariatti F, Cellier P, Ferré S (2021) GraphMDL+: interleaving the generation and MDL-based selection of graph patterns. In: Proceedings of the 36th Annual ACM Symposium on Applied Computing, SAC’21, ACM, pp 355-363, doi:10.1145/3412841.3441917
[18] Bastide Y, Pasquier N, Taouil R, Stumme G, Lakhal L (2000) Mining minimal non-redundant association rules using frequent closed itemsets. In: Proceedings of the First International Conference on Computational Logic, CL’00, Springer, pp 972-986 · Zbl 0983.68511
[19] Begum N, Hu B, Rakthanmanon T, Keogh E (2013) Towards a minimum description length based stopping criterion for semi-supervised time series classification. In: Proceedings of the 14th IEEE International Conference on Information Reuse Integration, IRI’13, IEEE Computer Society, pp 333-340, doi:10.1109/IRI.2013.6642490
[20] Begum N, Hu B, Rakthanmanon T, Keogh E (2014) A minimum description length technique for semi-supervised time series classification. Integration of Reusable Systems pp 171-192, doi:10.1007/978-3-319-04717-1_8
[21] Belth C, Zheng X, Vreeken J, Koutra D (2020) What is normal, what is strange, and what is missing in a knowledge graph: Unified characterization via inductive summarization. In: Proceedings of The Web Conference, WWW’20, ACM, pp 1115-1126, doi:10.1145/3366423.3380189
[22] Bertens R (2017) Insight in information : from abstract to anomaly. Phd thesis, Universiteit Utrecht, Netherland
[23] Bertens R, Siebes A (2014) Characterising seismic data. In: Proceedings of the 2014 SIAM International Conference on Data Mining, SDM’14, SIAM, pp 884-892, doi:10.1137/1.9781611973440.101
[24] Bertens R, Vreeken J, Siebes A (2015) Beauty and brains: Detecting anomalous pattern co-occurrences. arXiv:1512.07048
[25] Bertens R, Vreeken J, Siebes A (2016) Keeping it short and simple: Summarising complex event sequences with multivariate patterns. In: Proceedings of the 22th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’16, ACM
[26] Bertens R, Vreeken J, Siebes A (2017) Efficiently discovering unexpected pattern-co-occurrences. In: Proceedings of the 2017 SIAM International Conference on Data Mining, SDM’17, SIAM, pp 126-134, doi:10.1137/1.9781611974973.15
[27] Bhattacharyya A, Vreeken J (2017) Efficiently summarising event sequences with rich interleaving patterns. In: Proceedings of the 2017 SIAM International Conference on Data Mining, SDM’17, SIAM
[28] Blanco, F.; Calatayud, J.; Martín-Perea, DM; Domingo, MS; Menéndez, I.; Müller, J.; Fernández, MH; Cantalapiedra, JL, Punctuated ecological equilibrium in mammal communities over evolutionary time scales, Science, 372, 6539, 300-303 (2021) · doi:10.1126/science.abd5110
[29] Bloem P (2013) Compression-based inference on graph data. In: Proceedings of the 22nd annual Belgian-Dutch Conference on Machine Learning, BENELEARN’13
[30] Bloem P, de Rooij S (2018) A tutorial on MDL hypothesis testing for graph analysis. arXiv:1810.13163
[31] Bloem, P.; de Rooij, S., Large-scale network motif analysis using compression, Data Min Knowl Disc, 34, 5, 1421-1453 (2020) · Zbl 1455.68134 · doi:10.1007/s10618-020-00691-y
[32] Bobed C, Maillot P, Cellier P, Ferré S (2019) Data-driven assessment of structural evolution of RDF graphs. Semantic Web - Interoperability, Usability, Applicability
[33] Bohlin L, Edler D, Lancichinetti A, Rosvall M (2014) Community detection and visualization of networks with the map equation framework. In: Scholarly Measuring (ed) Ding Y, Rousseau R, Wolfram D. Impact, Methods and Practice, Springer International Publishing, pp 3-34
[34] Boley M, Lucchese C, Paurat D, Gärtner T (2011) Direct local pattern sampling by efficient two-step random procedures. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’11, ACM, pp 582-590, doi:10.1145/2020408.2020500
[35] Boley M, Mampaey M, Kang B, Tokmakov P, Wrobel S (2013) One click mining: Interactive local pattern discovery through implicit preference and performance learning. In: Proceedings of the Workshop on Interactive Data Exploration and Analytics, IDEA @KDD’13, ACM, pp 27-35, doi:10.1145/2501511.2501517
[36] Bonchi F, van Leeuwen M, Ukkonen A (2011) Characterizing uncertain data using compression. In: Proceedings of the 2011 SIAM International Conference on Data Mining, SDM’11, SIAM, pp 534-545
[37] Bourrand E, Galárraga L, Galbrun E, Fromont E, Termier A (2021a) Discovering useful compact sets of sequential rules in a long sequence. In: Proceedings of the 2021 IEEE 33rd International Conference on Tools with Artificial Intelligence, ICTAI’21, IEEE Computer Society, pp 1295-1299, doi:10.1109/ICTAI52525.2021.00204
[38] Bourrand E, Galárraga L, Galbrun E, Fromont E, Termier A (2021b) Discovering useful compact sets of sequential rules in a long sequence. arXiv:2109.07519
[39] Budhathoki K, Vreeken J (2015) The difference and the norm – characterising similarities and differences between databases. In: Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML/PKDD’15, Springer, vol 9285, pp 206-223, doi:10.1007/978-3-319-23525-7_13
[40] Budhathoki K, Vreeken J (2017a) Correlation by compression. In: Proceedings of the 2017 SIAM International Conference on Data Mining, SDM’17, SIAM, pp 525-533, doi:10.1137/1.9781611974973.59
[41] Budhathoki K, Vreeken J (2017b) MDL for causal inference on discrete data. In: Proceedings of the 17th IEEE International Conference on Data Mining, ICDM’17, IEEE Computer Society, pp 751-756, doi:10.1109/ICDM.2017.87
[42] Calatayud, J.; Bernardo-Madrid, R.; Neuman, M.; Rojas, A.; Rosvall, M., Exploring the solution landscape enables more reliable network community detection, Phys Rev E, 100, 5 (2019) · doi:10.1103/PhysRevE.100.052308
[43] Chakrabarti D (2004) AutoPart: Parameter-free graph partitioning and outlier detection. In: Proceedings of the European Conference on Knowledge Discovery in Databases, PKDD’04, Springer, pp 112-124, doi:10.1007/978-3-540-30116-5_13
[44] Chakrabarti D, Papadimitriou S, Modha DS, Faloutsos C (2004) Fully automatic cross-associations. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’04, ACM, pp 79-88, doi:10.1145/1014052.1014064
[45] Chen L, Amiri SE, Prakash BA (2018) Automatic segmentation of data sequences. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, AAAI’18, Association for the Advancement of Artificial Intelligence
[46] Cook, DJ; Holder, LB, Substructure discovery using minimum description length and background knowledge, J Artifi Intell Res, 1, 1, 231-255 (1994) · doi:10.1613/jair.43
[47] Coupette C, Vreeken J (2021) Graph similarity description: How are these graphs similar? In: Proceedings of the 27th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’21, ACM
[48] Cover, TM; Thomas, JA, Elements Of Information Theory (2012), US: John Wiley & Sons, US · Zbl 1140.94001
[49] Cüppers J, Vreeken J (2020) Just wait for it...mining sequential patterns with reliable prediction delays. In: Proceedings of the 20th IEEE International Conference on Data Mining, ICDM’20, IEEE Computer Society
[50] Das SK, Cook DJ (2004) Health monitoring in an agent-based smart home. In: Proceedings of the International Conference on Smart Homes and Health Telematics, ICOST’04, IOS Press, pp 3-14
[51] De Bie T, Kontonasios KN, Spyropoulou E (2010) A framework for mining interesting pattern sets. SIGKDD Explorations (and Proceedings of the ACM SIGKDD Workshop on Useful Patterns, UP’10) 12(2):92-100
[52] De Domenico, M.; Lancichinetti, A.; Arenas, A.; Rosvall, M., Identifying modular flows on multilayer networks reveals highly overlapping organization in interconnected systems, Phys Rev X, 5, 1, 11027 (2015) · doi:10.1103/PhysRevX.5.011027
[53] De Raedt L, Zimmermann A (2007) Constraint-based pattern set mining. In: Proceedings of the 2007 SIAM International Conference on Data Mining, SDM’07, SIAM, pp 237-248, doi:10.1137/1.9781611972771.22
[54] Edler, D.; Bohlin, L.; Rosvall, M., Mapping higher-order network flows in memory and multilayer networks with infomap, Algorithms, 10, 4, 112 (2017) · Zbl 1462.90027 · doi:10.3390/a10040112
[55] Edler, D.; Guedes, T.; Zizka, A.; Rosvall, M.; Antonelli, A., Infomap bioregions: Interactive mapping of biogeographical regions from species distributions, Syst Biol, 66, 2, 197-204 (2017) · doi:10.1093/sysbio/syw087
[56] Emmons, S.; Mucha, PJ, Map equation with metadata: Varying the role of attributes in community detection, Phys Rev E, 100, 2 (2019) · doi:10.1103/PhysRevE.100.022301
[57] Evans S, Saulnier G, Bush SF (2003) A new universal two part code for estimation of string kolmogorov complexity and algorithmic minimum sufficient statistic. In: Proceedings of the DIMACS Workshop on Complexity and Inference
[58] Evans S, Markham TS, Torres A, Kourtidis A, Conklin D (2006) An improved minimum description length learning algorithm for nucleotide sequence analysis. In: Proceedings of the 2006 Fortieth Asilomar Conference on Signals, Systems and Computers, ACSSC’06, pp 1843-1850, doi:10.1109/ACSSC.2006.355081
[59] Evans, S.; Kourtidis, A.; Markham, TS; Miller, J.; Conklin, DS; Torres, AS, MicroRNA target detection and analysis for genes related to breast cancer using MDLcompress, EURASIP J Bioinf Syst Biol, 1, 43670 (2007) · doi:10.1186/1687-4153-2007-43670
[60] Faas M, van Leeuwen M (2020) Vouw: Geometric pattern mining using the MDL principle. In: Proceedings of the 18th International Symposium on Advances in Intelligent Data Analysis, IDA’20, Springer, pp 158-170, doi:10.1007/978-3-030-44584-3_13
[61] Feng J (2015) Information-theoretic Graph Mining. Phd thesis, Ludwig-Maximilians-Universität München, Germany
[62] Feng J, He X, Konte B, Böhm C, Plant C (2012) Summarization-based mining bipartite graphs. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD’12, ACM, pp 1249-1257, doi:10.1145/2339530.2339725
[63] Feng J, He X, Hubig N, Böhm C, Plant C (2013) Compression-based graph mining exploiting structure primitives. In: Proceedings of the 13th IEEE International Conference on Data Mining, ICDM’13, IEEE Computer Society, pp 181-190, doi:10.1109/ICDM.2013.56
[64] Fischer J, Vreeken J (2019) Sets of robust rules, and how to find them. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’19, ACM, pp 38-54, doi:10.1007/978-3-030-46150-8_3
[65] Fischer J, Vreeken J (2020) Discovering succinct pattern sets expressing co-occurrence and mutual exclusivity. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’19, ACM
[66] Fischer J, Oláh A, Vreeken J (2021) What’s in the box? explaining neural networks with robust rules. In: Proceedings of the 38th International Conference on Machine Learning, ICML’21
[67] Fowkes J, Sutton C (2016) A subsequence interleaving model for sequential pattern mining. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’16, ACM, pp 835-844
[68] Galbrun E, Cellier P, Tatti N, Termier A, Crémilleux B (2018) Mining periodic patterns with a MDL criterion. In: Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML/PKDD’18, pp 535-551
[69] Gallo A, De Bie T, Cristianini N (2007) MINI: Mining informative non-redundant itemsets. In: Proceedings of the European Conference on Knowledge Discovery in Databases, PKDD’07, Springer, pp 438-445, doi:10.1007/978-3-540-74976-9_44
[70] Gautrais C, Cellier P, van Leeuwen M, Termier A (2020) Widening for MDL-based retail signature discovery. In: Proceedings of the 18th International Symposium on Advances in Intelligent Data Analysis, IDA’20, Springer, pp 197-209, doi:10.1007/978-3-030-44584-3_16
[71] Geng, L.; Hamilton, HJ, Interestingness measures for data mining: A survey, ACM Comput Surv, 38, 3, 9 (2006) · doi:10.1145/1132960.1132963
[72] Gionis, A.; Mannila, H.; Mielikäinen, T.; Tsaparas, P., Assessing data mining results via swap randomization, ACM Transactions on Knowledge Discovery from Data, 1, 3, 14 (2007) · doi:10.1145/1297332.1297338
[73] Goebl S, Tonch A, Böhm C, Plant C (2016) MeGS: Partitioning meaningful subgraph structures using minimum description length. In: Proceedings of the 16th IEEE International Conference on Data Mining, ICDM’16, IEEE Computer Society, pp 889-894, doi:10.1109/ICDM.2016.0108
[74] Greenspan G, Geiger D (2003) Model-based inference of haplotype block variation. In: Proceedings of the seventh annual international conference on Research in computational molecular biology, RECOMB’03, ACM, pp 131-137, doi:10.1145/640075.640092
[75] Greenspan, G.; Geiger, D., Model-based inference of haplotype block variation, J Comput Biol, 11, 2, 493-504 (2004) · doi:10.1089/1066527041410300
[76] Grosse K, Vreeken J (2017) Summarising event sequences using serial episodes and an ontology. In: Proceedings of the Workshop on Interactions between Data Mining and Natural Language Processing @ECML/PKDD’17
[77] Grünwald, PD, The Minimum Description Length Principle (2007), Cambridge, MA: MIT Press, Cambridge, MA · doi:10.7551/mitpress/4643.001.0001
[78] Guns, T.; Nijssen, S.; De Raedt, L., Itemset mining: A constraint programming perspective, Artif Intell, 175, 12, 1951-1983 (2011) · Zbl 1353.68233 · doi:10.1016/j.artint.2011.05.002
[79] Guns, T.; Nijssen, S.; De Raedt, L., k-pattern set mining under constraints, IEEE Trans Knowl Data Eng, 25, 2, 402-418 (2013) · doi:10.1109/TKDE.2011.204
[80] Hämäläinen, W.; Webb, GI, A tutorial on statistically sound pattern discovery, Data Min Knowl Disc (2018) · Zbl 1464.62305 · doi:10.1007/s10618-018-0590-x
[81] Hanhijärvi S, Ojala M, Vuokko N, Puolamäki K, Tatti N, Mannila H (2009) Tell me something i don’t know: Randomization strategies for iterative data mining. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’09, ACM, pp 379-388, doi:10.1145/1557019.1557065
[82] He J, Tong H, Papadimitriou S, Eliassi-Rad T, Faloutsos C, Carbonell J (2009) PaCK: Scalable parameter-free clustering on k-partite graphs. In: Proceedings of the 2006 SIAM International Conference on Data Mining, SDM’09, SIAM, pp 1278-1287
[83] He X, Feng J, Plant C (2011) Automatically spotting information-rich nodes in graphs. In: Proceedings of the 11th IEEE International Conference on Data Mining Workshops, ICDMW’11, IEEE Computer Society, pp 941-948, doi:10.1109/ICDMW.2011.37
[84] He X, Feng J, Konte B, Mai ST, Plant C (2014) Relevant overlapping subspace clusters on categorical data. In: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD’14, ACM, pp 213-222, doi:10.1145/2623330.2623652
[85] Heierman EO, Cook DJ (2003) Improving home automation by discovering regularly occurring device usage patterns. In: Proceedings of the 3rd IEEE International Conference on Data Mining, ICDM’03, IEEE Computer Society, pp 537-540, doi:10.1109/ICDM.2003.1250971
[86] Heierman EO, Youngblood GM, Cook DJ (2004) Mining temporal sequences to discover interesting patterns. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’19, ACM
[87] Heikinheimo H, Siebes A, Vreeken J, Mannila H (2009) Low-entropy set selection. In: Proceedings of the 2009 SIAM International Conference on Data Mining, SDM’09, SIAM, pp 569-580, doi:10.1137/1.9781611972795.49
[88] Hess S, Piatkowski N, Morik K (2014) SHrimp: Descriptive patterns in a tree. In: Proceedings of the LWA (Lernen, Wissen, Adaption) 2014 Workshops: KDML, IR, FGWM
[89] Hess, S.; Morik, K.; Piatkowski, N., The PRIMPING routine - tiling through proximal alternating linearized minimization, Data Min Knowl Disc, 31, 4, 1090-1131 (2017) · Zbl 1409.68233 · doi:10.1007/s10618-017-0508-z
[90] Hinrichs F, Vreeken J (2017) Characterising the difference and the norm between sequence databases. In: Proceedings of the Workshop on Interactions between Data Mining and Natural Language Processing @ECML/PKDD’17
[91] Hu B, Rakthanmanon T, Hao Y, Evans S, Lonardi S, Keogh E (2011) Discovering the intrinsic cardinality and dimensionality of time series using MDL. In: Proceedings of the 11th IEEE International Conference on Data Mining, ICDM’11, IEEE Computer Society, pp 1086-1091, doi:10.1109/ICDM.2011.54 · Zbl 1403.68189
[92] Hu B, Rakthanmanon T, Hao Y, Evans S, Lonardi S, Keogh E (2013) Towards discovering the intrinsic cardinality and dimensionality of time series using MDL. In: Proceedings of the Ray Solomonoff 85th Memorial Conference, Algorithmic Probability and Friends. Bayesian Prediction and Artificial Intelligence, Springer, pp 184-197, doi:10.1007/978-3-642-44958-1_14 · Zbl 1403.68189
[93] Hu, B.; Rakthanmanon, T.; Hao, Y.; Evans, S.; Lonardi, S.; Keogh, E., Using the minimum description length to discover the intrinsic cardinality and dimensionality of time series, Data Min Knowl Disc, 29, 2, 358-399 (2015) · Zbl 1403.68190 · doi:10.1007/s10618-014-0345-2
[94] Ibrahim, A.; Sastry, S.; Sastry, PS, Discovering compressing serial episodes from event sequences, Knowl Inf Syst, 47, 2, 405-432 (2016) · doi:10.1007/s10115-015-0854-3
[95] Jaroszewicz S, Simovici DA (2004) Interestingness of frequent itemsets using bayesian networks as background knowledge. In: Proceedings of the 10th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD’04, ACM, pp 178-186, doi:10.1145/1014052.1014074
[96] Jiang M, Faloutsos C, Han J (2016) CatchTartan: Representing and summarizing dynamic multicontextual behaviors. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’16, ACM, pp 945-954, doi:10.1145/2939672.2939749
[97] Jonyer, I.; Holder, LB; Cook, DJ, Mdl-based context-free graph grammar induction and applications, Int J Artif Intell Tools, 13, 1, 65-79 (2004) · doi:10.1142/S0218213004001429
[98] Kameya Y (2011) Time series discretization via MDL-based histogram density estimation. In: Proceedings of the 23rd IEEE International Conference on Tools with Artificial Intelligence, ICTAI’11, IEEE Computer Society, pp 732-739, doi:10.1109/ICTAI.2011.115
[99] Kang U, Faloutsos C (2011) Beyond ‘caveman communities’: Hubs and spokes for graph compression and mining. In: Proceedings of the 11th IEEE International Conference on Data Mining, ICDM’11, IEEE Computer Society, pp 300-309, doi:10.1109/ICDM.2011.26
[100] Ketkar NS, Holder LB, Cook DJ (2005) Subdue: compression-based frequent pattern discovery in graph data. In: Proceedings of the 1st international workshop on open source data mining: frequent pattern mining implementations, OSDM’05, ACM, pp 71-76, doi:10.1145/1133905.1133915
[101] Khan KU (2015) Set-based approach for lossless graph summarization using locality sensitive hashing. In: Proceedings of the 31st IEEE International Conference on Data Engineering Workshops, ICDEW’15, IEEE Computer Society, pp 255-259, doi:10.1109/ICDEW.2015.7129586
[102] Khan KU, Nawaz W, Lee YK (2014) Set-based unified approach for attributed graph summarization. In: Proceedings of the 4th IEEE International Conference on Big Data and Cloud Computing, BDCloud’14, IEEE Computer Society, pp 378-385, doi:10.1109/BDCloud.2014.108
[103] Khan KU, Nawaz W, Lee YK (2015a) Lossless graph summarization using dense subgraphs discovery. In: Proceedings of the 9th International Conference on Ubiquitous Information Management and Communication, IMCOM’15, ACM, pp 1-7, doi:10.1145/2701126.2701157
[104] Khan, KU; Nawaz, W.; Lee, YK, Set-based approximate approach for lossless graph summarization, Computing, 97, 12, 1185-1207 (2015) · Zbl 1347.68370 · doi:10.1007/s00607-015-0454-9
[105] Kiernan J, Terzi E (2008) Constructing comprehensive summaries of large event sequences. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’08, ACM, pp 417-425, doi:10.1145/1401890.1401943
[106] Kiernan, J.; Terzi, E., Constructing comprehensive summaries of large event sequences, ACM Transactions on Knowledge Discovery from Data, 3, 4, 21:1-21:31 (2009) · doi:10.1145/1631162.1631169
[107] Kiernan J, Terzi E (2009b) EventSummarizer: A tool for summarizing large event sequences. In: Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology, EDBT’09, ACM, pp 1136-1139, doi:10.1145/1516360.1516497
[108] Koivisto M, Perola M, Varilo T, Hennah W, Ekelund J, Lukk M, Peltonen L, Ukkonen E, Mannila H (2002) An MDL method for finding haplotype blocks and for estimating the strength of haplotype block boundaries. In: Proceedings of the 2003 Pacific Symposium on Biocomputing, PSB’03, World Scientific, pp 502-513, doi:10.1142/9789812776303_0047 · Zbl 1256.92038
[109] Kontkanen P, Myllymäki P (2007) MDL histogram density estimation. In: Proceedings of the Eleventh International Conference on Artificial Intelligence and Statistics, AISTATS’07, pp 219-226
[110] Kontonasios KN, De Bie T (2012) Formalizing complex prior information to quantify subjective interestingness of frequent pattern sets. In: Proceedings of the 11th International Symposium on Advances in Intelligent Data Analysis, IDA’12, Springer, pp 161-171
[111] Kontonasios KN, Vreeken J, De Bie T (2013) Maximum entropy models for iteratively identifying subjectively interesting structure in real-valued data. In: Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML/PKDD’13, Springer, pp 256-271
[112] Koopman A, Siebes A (2008) Discovering relational item sets efficiently. In: Proceedings of the 2008 SIAM International Conference on Data Mining, SDM’08, SIAM, pp 108-119, doi:10.1137/1.9781611972788.10
[113] Koopman A, Siebes A (2009) Characteristic relational patterns. In: Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’09, ACM, pp 437-446, doi:10.1145/1557019.1557071
[114] Koutra D, Kang U, Vreeken J, Faloutsos C (2014) VOG: Summarizing and understanding large graphs. In: Proceedings of the 2014 SIAM International Conference on Data Mining, SDM’14, SIAM, pp 91-99, doi:10.1137/1.9781611973440.11 · Zbl 07260433
[115] Koutra, D.; Kang, U.; Vreeken, J.; Faloutsos, C., Summarizing and understanding large graphs, Statistical Analysis and Data Mining, 8, 3, 183-202 (2015) · Zbl 07260433 · doi:10.1002/sam.11267
[116] Lakshmanan LVS, Ng RT, Wang CX, Zhou X, Johnson TJ (2002) The generalized MDL approach for summarization. In: Proceedings of the 28th international conference on Very Large Data Bases, VLDB’02, VLDB Endowment, pp 766-777
[117] Lam HT, Mörchen F, Fradkin D, Calders T (2012) Mining compressing sequential patterns. In: Proceedings of the 2012 SIAM International Conference on Data Mining, SDM’12, SIAM, pp 319-330, doi:10.1137/1.9781611972825.28 · Zbl 07260381
[118] Lam HT, Calders T, Yang J, Mörchen F, Fradkin D (2013) Zips: Mining compressing sequential patterns in streams. In: Proceedings of the Workshop on Interactive Data Exploration and Analytics, IDEA @KDD’13, ACM, pp 54-62, doi:10.1145/2501511.2501520
[119] Lam HT, Kiseleva J, Pechenizkiy M, Calders T (2014a) Decomposing a sequence into independent subsequences using compression algorithms. In: Proceedings of the Workshop on Interactive Data Exploration and Analytic, IDEA @KDD’14, pp 67-75
[120] Lam, HT; Mörchen, F.; Fradkin, D.; Calders, T., Mining compressing sequential patterns, Stat Anal Data Mining, 7, 1, 34-52 (2014) · Zbl 07260381 · doi:10.1002/sam.11192
[121] Lee K, Jo H, Ko J, Lim S, Shin K (2020) SSumM: Sparse summarization of massive graphs. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’20, ACM, pp 144-154, doi:10.1145/3394486.3403057
[122] LeFevre K, Terzi E (2010) GraSS: Graph structure summarization. In: Proceedings of the 2010 SIAM International Conference on Data Mining, SDM’10, SIAM, pp 454-465, doi:10.1137/1.9781611972801.40
[123] Lim, Y.; Kang, U.; Faloutsos, C., SlashBurn: Graph compression and mining beyond caveman communities, IEEE Trans Knowl Data Eng, 26, 12, 3077-3089 (2014) · doi:10.1109/TKDE.2014.2320716
[124] Liu Y, Shah N, Koutra D (2015) An empirical comparison of the summarization power of graph clustering methods. arXiv:1511.06820
[125] Liu Y, Safavi T, Shah N (2016) Reducing million-node graphs to a few structural patterns: A unified approach. In: Proceedings of the 12th International Workshop on Mining and Learning with Graphs, MLG @KDD’16, p 8
[126] Liu, Y.; Safavi, T.; Dighe, A.; Koutra, D., Graph summarization methods and applications: A survey, ACM Computing Surveys, 51, 3, 62:1-62:34 (2018) · doi:10.1145/3186727
[127] Liu, Y.; Safavi, T.; Shah, N.; Koutra, D., Reducing large graphs to small supergraphs: a unified approach, Soc Netw Anal Min, 8, 1, 17 (2018) · doi:10.1007/s13278-018-0491-4
[128] Lucchese C, Orlando S, Perego R (2010a) A generative pattern model for mining binary datasets. In: Proceedings of the 2010 ACM Symposium on Applied Computing, SAC’10, ACM, pp 1109-1110, doi:10.1145/1774088.1774320
[129] Lucchese C, Orlando S, Perego R (2010b) Mining top-k patterns from binary datasets in presence of noise. In: Proceedings of the 2007 SIAM International Conference on Data Mining, SDM’07, SIAM, pp 165-176, doi:10.1137/1.9781611972801.15
[130] Lucchese, C.; Orlando, S.; Perego, R., A unifying framework for mining approximate top-\(k\) binary patterns, IEEE Trans Knowl Data Eng, 26, 12, 2900-2913 (2014) · doi:10.1109/TKDE.2013.181
[131] Makhalova T (2021) Contributions to pattern set mining : from complex datasets to significant and useful pattern sets. Phd thesis, Université de Lorraine, https://hal.univ-lorraine.fr/tel-03342124
[132] Makhalova T, Trnecka M (2019) From-below boolean matrix factorization algorithm based on MDL. arXiv:1901.09567 · Zbl 07363864
[133] Makhalova, T.; Trnecka, M., From-below boolean matrix factorization algorithm based on MDL, Adv Data Anal Classif, 15, 1, 37-56 (2021) · Zbl 07363864 · doi:10.1007/s11634-019-00383-6
[134] Makhalova T, Kuznetsov SO, Napoli A (2018a) A first study on what MDL can do for FCA. In: Proceedings of the Fifteen International Conference on Concept Lattices and Their Applications, CLA’18, pp 25-36
[135] Makhalova T, Kuznetsov SO, Napoli A (2018b) MDL for FCA: Is there a place for background knowledge? In: Proceedings of the 6th International Workshop “What can FCA do for Artificial Intelligence?” @ IJCAI/ECAI’18, CEUR Workshop Proceedings, vol 2149, pp 45-56, http://ceur-ws.org/Vol-2149/paper5.pdf
[136] Makhalova T, Kuznetsov SO, Napoli A (2019a) Numerical pattern mining through compression. In: Proceedings of the Data Compression Conference, DCC’19, pp 112-121, doi:10.1109/DCC.2019.00019
[137] Makhalova T, Kuznetsov SO, Napoli A (2019b) On coupling FCA and MDL in pattern mining. In: Proceedings of the international conference on Formal Concept Analysis, FCA’19, Springer, pp 332-340, doi:10.1007/978-3-030-21462-3_23 · Zbl 1529.68263
[138] Makhalova T, Kuznetsov SO, Napoli A (2020) Mint: MDL-based approach for mining INTeresting numerical pattern sets. arXiv:2011.14843 · Zbl 1494.68223
[139] Makhalova T, Kuznetsov SO, Napoli A (2021) Likely-occurring itemsets for pattern mining. In: Proceedings of the 6th International Workshop “What can FCA do for Artificial Intelligence?” @ IJCAI’21, CEUR Workshop Proceedings, vol 2972, pp 39-50, http://ceur-ws.org/Vol-2972/paper4.pdf
[140] Makhalova, T.; Kuznetsov, SO; Napoli, A., Mint: MDL-based approach for mining INTeresting numerical pattern sets, Data Min Knowl Disc, 36, 1, 108-145 (2022) · Zbl 1494.68223 · doi:10.1007/s10618-021-00799-9
[141] Mampaey M (2010) Mining non-redundant information-theoretic dependencies between itemsets. In: Proceedings of the 12th International Conference on Data Warehousing and Knowledge Discovery, DaWaK’10, Springer, pp 130-141, doi:10.1007/978-3-642-15105-7_11
[142] Mampaey M, Vreeken J (2010) Summarising data by clustering items. In: Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML/PKDD’10, pp 321-336, doi:10.1007/978-3-642-15883-4_21
[143] Mampaey, M.; Vreeken, J., Summarizing categorical data by clustering attributes, Data Min Knowl Disc, 26, 1, 130-173 (2013) · Zbl 1260.68339 · doi:10.1007/s10618-011-0246-6
[144] Mampaey M, Tatti N, Vreeken J (2011) Tell me what i need to know: succinctly summarizing data with itemsets. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’11, ACM, pp 573-581, doi:10.1145/2020408.2020499
[145] Mampaey, M.; Vreeken, J.; Tatti, N., Summarizing data succinctly with the most informative itemsets, ACM Transactions on Knowledge Discovery from Data, 6, 4, 16:1-16:42 (2012) · doi:10.1145/2382577.2382580
[146] Mannila H, Toivonen H, Verkamo AI (1994) Efficient algorithms for discovering association rules. In: Proceedings of the KDD Workshop, Association for the Advancement of Artificial Intelligence, pp 181-192
[147] Mannila, H.; Koivisto, M.; Perola, M.; Varilo, T.; Hennah, W.; Ekelund, J.; Lukk, M.; Peltonen, L.; Ukkonen, E., Minimum description length block finder, a method to identify haplotype blocks and to compare the strength of block boundaries, The American Journal of Human Genetics, 73, 1, 86-94 (2003) · Zbl 1256.92038 · doi:10.1086/376438
[148] Markham TS, Evans S, Impson J, Steinbrecher E (2009) Implementation of an incremental MDL-based two part compression algorithm for model inference. In: Proceedings of the 2009 Data Compression Conference, DCC’09, pp 322-331, doi:10.1109/DCC.2009.66
[149] Matsubara Y, Sakurai Y, Faloutsos C (2014) AutoPlait: automatic mining of co-evolving time sequences. In: Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, SIGMOD’14, ACM, pp 193-204, doi:10.1145/2588555.2588556
[150] Miettinen P, Vreeken J (2011) Model order selection for boolean matrix factorization. In: Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD’11, ACM, pp 51-59, doi:10.1145/2020408.2020424
[151] Miettinen, P.; Vreeken, J., MDL4BMF: Minimum description length for boolean matrix factorization, ACM Transactions on Knowledge Discovery from Data, 8, 4, 18:1-18:31 (2014) · doi:10.1145/2601437
[152] Mitra S, Sastry PS (2019) Summarizing event sequences with serial episodes: A statistical model and an application. arXiv:1904.00516
[153] Navlakha S, Rastogi R, Shrivastava N (2008) Graph summarization with bounded error. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD’08, ACM, pp 419-432, doi:10.1145/1376616.1376661
[154] Nguyen, HV; Müller, E.; Vreeken, J.; Böhm, K., Unsupervised interaction-preserving discretization of multivariate data, Data Min Knowl Disc, 28, 5, 1366-1397 (2014) · Zbl 1342.62001 · doi:10.1007/s10618-014-0350-5
[155] Otaki K, Yamamoto A (2015) Edit operations on lattices for MDL-based pattern summarization. In: Proceedings of the International Workshop on Formal Concept Analysis and Applications @ICFCA’15
[156] Papadimitriou S, Gionis A, Tsaparas P, Väisänen RA, Mannila H, Faloutsos C (2005) Parameter-free spatial data mining using MDL. In: Proceedings of the 5th IEEE International Conference on Data Mining, ICDM’05, IEEE Computer Society, pp 346-353, doi:10.1109/ICDM.2005.117
[157] Papadimitriou S, Sun J, Faloutsos C, Yu PS (2008) Hierarchical, parameter-free community discovery. In: Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML/PKDD’08, Springer, pp 170-187, doi:10.1007/978-3-540-87481-2_12
[158] Phan NH, Ienco D, Poncelet P, Teisseire M (2013) Mining representative movement patterns through compression. In: Advances in Knowledge Discovery and Data Mining, Springer, pp 314-326, doi:10.1007/978-3-642-37453-1_26
[159] Plant C, Biedermann S, Böhm C (2020) Data compression as a comprehensive framework for graph drawing and representation learning. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’20, ACM, pp 1212-1222, doi:10.1145/3394486.3403174
[160] Prakash, BA; Vreeken, J.; Faloutsos, C., Efficiently spotting the starting points of an epidemic in a large graph, Knowl Inf Syst, 38, 1, 35-59 (2014) · doi:10.1007/s10115-013-0671-5
[161] Proença, HM; van Leeuwen, M., Interpretable multiclass classification by MDL-based rule lists, Inf Sci, 512, 1372-1393 (2020) · doi:10.1016/j.ins.2019.10.050
[162] Proença HM, van Leeuwen M (2020b) Interpretable multiclass classification by MDL-based rule lists. arXiv:1905.00328
[163] Proença HM, Grünwald PD, Bäck T, van Leeuwen M (2020) Discovering outstanding subgroup lists for numeric targets using MDL. In: Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML/PKDD’20
[164] Proença HM, Bäck T, van Leeuwen M (2021a) Robust subgroup discovery. arXiv:2103.13686
[165] Proença HM, Grünwald PD, Bäck T, van Leeuwen M (2021b) Discovering outstanding subgroup lists for numeric targets using MDL. arXiv:2006.09186
[166] Puolamäki, K.; Oikarinen, E.; Kang, B.; Lijffijt, J.; De Bie, T., Interactive visual data exploration with subjective feedback: an information-theoretic approach, Data Min Knowl Disc, 34, 1, 21-49 (2020) · doi:10.1007/s10618-019-00655-x
[167] Rakthanmanon T, Keogh EJ, Lonardi S, Evans S (2011) Time series epenthesis: Clustering time series streams requires ignoring some data. In: Proceedings of the 11th IEEE International Conference on Data Mining, ICDM’11, IEEE Computer Society, pp 547-556, doi:10.1109/ICDM.2011.146
[168] Rakthanmanon, T.; Keogh, EJ; Lonardi, S.; Evans, S., MDL-based time series clustering, Knowl Inf Syst, 33, 2, 371-399 (2012) · doi:10.1007/s10115-012-0508-7
[169] Rashidi, P.; Cook, DJ, COM: A method for mining and monitoring human activity patterns in home-based health monitoring systems, ACM Transactions on Intelligent Systems and Technology, 4, 4, 64:1-64:20 (2013) · doi:10.1145/2508037.2508045
[170] Rissanen, J., Modeling by shortest data description, Automatica, 14, 5, 465-471 (1978) · Zbl 0418.93079 · doi:10.1016/0005-1098(78)90005-5
[171] Rojas, A.; Calatayud, J.; Kowalewski, M.; Neuman, M.; Rosvall, M., A multiscale view of the phanerozoic fossil record reveals the three major biotic transitions, Communications Biology, 4, 1, 1-8 (2021) · doi:10.1038/s42003-021-01805-y
[172] Rosvall, M.; Bergstrom, CT, An information-theoretic framework for resolving community structure in complex networks, Proc Natl Acad Sci, 104, 18, 7327-7331 (2007) · doi:10.1073/pnas.0611034104
[173] Rosvall, M.; Bergstrom, CT, Maps of random walks on complex networks reveal community structure, Proc Natl Acad Sci, 105, 4, 1118-1123 (2008) · doi:10.1073/pnas.0706851105
[174] Rosvall, M.; Bergstrom, CT, Mapping change in large networks, PLoS ONE, 5, 1, 1-7 (2010) · doi:10.1371/journal.pone.0008694
[175] Rosvall, M.; Bergstrom, CT, Multilevel compression of random walks on networks reveals hierarchical organization in large integrated systems, PLoS ONE, 6, 4 (2011) · doi:10.1371/journal.pone.0018209
[176] Rosvall, M.; Axelsson, D.; Bergstrom, CT, The map equation, The European Physical Journal Special Topics, 178, 1, 13-23 (2009) · doi:10.1140/epjst/e2010-01179-1
[177] Sampson O, Berthold MR (2014) Widened KRIMP: Better performance through diverse parallelism. In: Proceedings of the 13th International Symposium on Advances in Intelligent Data Analysis, IDA’14, Springer, pp 276-285, doi:10.1007/978-3-319-12571-8_24
[178] Saran, D.; Vreeken, J., Summarizing dynamic graphs using MDL. Tech. rep. (2019), Germany: Saarland University, Germany
[179] Shah N, Koutra D, Zou T, Gallagher B, Faloutsos C (2015) TimeCrunch: Interpretable dynamic graph summarization. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’15, ACM, pp 1055-1064, doi:10.1145/2783258.2783321
[180] Shah, N.; Koutra, D.; Jin, L.; Zou, T.; Gallagher, B.; Faloutsos, C., On summarizing large-scale dynamic graphs, IEEE Data Engineering Bulletin, 40, 3, 75-88 (2017)
[181] Shannon, CE, A mathematical theory of communication, Bell Syst Tech J, 27, 3, 379-423 (1948) · Zbl 1154.94303 · doi:10.1002/j.1538-7305.1948.tb01338.x
[182] Shokoohi-Yekta M, Chen Y, Campana B, Hu B, Zakaria J, Keogh E (2015) Discovery of meaningful rules in time series. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’15, ACM, pp 1085-1094, doi:10.1145/2783258.2783306
[183] Siebes A (2012) Queries for data analysis. In: Proceedings of the 11th International Symposium on Advances in Intelligent Data Analysis, IDA’12, Springer, pp 7-22
[184] Siebes A (2014) MDL in pattern mining: A brief introduction to krimp. In: Proceedings of the international conference on Formal Concept Analysis, FCA’14, Springer, pp 37-43, doi:10.1007/978-3-319-07248-7_3 · Zbl 1444.68162
[185] Siebes A, Kersten R (2011) A structure function for transaction data. In: Proceedings of the 2011 SIAM International Conference on Data Mining, SDM’11, SIAM, pp 558-569, doi:10.1137/1.9781611972818.48
[186] Siebes A, Kersten R (2012) Smoothing categorical data. In: Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML/PKDD’12, Springer, pp 42-57, doi:10.1007/978-3-642-33460-3_8
[187] Siebes A, Vreeken J, van Leeuwen M (2006) Item sets that compress. In: Proceedings of the 2006 SIAM International Conference on Data Mining, SDM’06, SIAM · Zbl 1235.68071
[188] Smets K, Vreeken J (2011) The odd one out: Identifying and characterising anomalies. In: Proceedings of the 2011 SIAM International Conference on Data Mining, SDM’11, SIAM, pp 804-815, doi:10.1137/1.9781611972818.69
[189] Smets K, Vreeken J (2012) Slim: Directly mining descriptive patterns. In: Proceedings of the 2012 SIAM International Conference on Data Mining, SDM’12, SIAM, pp 236-247
[190] Soulet A, Raïssi C, Plantevit M, Crémilleux B (2011) Mining dominant patterns in the sky. In: Proceedings of the 11th IEEE International Conference on Data Mining, ICDM’11, IEEE Computer Society, pp 655-664, doi:10.1109/ICDM.2011.100
[191] Stone, JV, Information Theory: A Tutorial Introduction (2013), Sheffield: Sebtel Press, Sheffield
[192] Sun J, Faloutsos C, Papadimitriou S, Yu PS (2007) GraphScope: parameter-free mining of large time-evolving graphs. In: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD’07, ACM, pp 687-696, doi:10.1145/1281192.1281266
[193] Tanaka Y, Uehara K (2003) Discover motifs in multi-dimensional time-series using the principal component analysis and the MDL principle. In: Proceedings of the 3rd international conference on Machine learning and data mining in pattern recognition, MLDM’03, Springer, pp 252-265 · Zbl 1029.68592
[194] Tanaka, Y.; Iwamoto, K.; Uehara, K., Discovery of time-series motif from multi-dimensional data based on MDL principle, Mach Learn, 58, 2, 269-300 (2005) · Zbl 1075.62084 · doi:10.1007/s10994-005-5829-2
[195] Tatti N (2010) Probably the best itemsets. In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, KDD’10, ACM, pp 293-302, doi:10.1145/1835804.1835843
[196] Tatti N, Heikinheimo H (2008) Decomposable families of itemsets. In: Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML/PKDD’08, pp 472-487, doi:10.1007/978-3-540-87481-2_31
[197] Tatti N, Vreeken J (2008) Finding good itemsets by packing data. In: Proceedings of the 8th IEEE International Conference on Data Mining, ICDM’08, IEEE Computer Society, pp 588-597, doi:10.1109/ICDM.2008.39
[198] Tatti N, Vreeken J (2012a) Discovering descriptive tile trees. In: Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML/PKDD’12, Springer, pp 9-24, doi:10.1007/978-3-642-33460-3_6
[199] Tatti N, Vreeken J (2012b) The long and the short of it: Summarising event sequences with serial episodes. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’12, ACM, pp 462-470
[200] van Leeuwen M (2010) Patterns that matter. Phd thesis, Universiteit Utrecht
[201] van Leeuwen, M.; Galbrun, E., Association discovery in two-view data, IEEE Trans Knowl Data Eng, 27, 12, 3190-3202 (2015) · doi:10.1109/TKDE.2015.2453159
[202] van Leeuwen M, Siebes A (2008) StreamKrimp: Detecting change in data streams. In: Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML/PKDD’08, Springer, pp 672-687, doi:10.1007/978-3-540-87479-9_62
[203] van Leeuwen M, Vreeken J (2014) Mining and using sets of patterns through compression. In: Frequent Pattern Mining, Springer, pp 165-198, doi:10.1007/978-3-319-07821-2_8 · Zbl 1298.68250
[204] van Leeuwen M, Vreeken J, Siebes A (2006) Compression picks item sets that matter. In: Proceedings of the European Conference on Knowledge Discovery in Databases, PKDD’06, Springer, pp 585-592, doi:10.1007/11871637_59
[205] van Leeuwen M, Bonchi F, Sigurbjörnsson B, Siebes A (2009a) Compressing tags to find interesting media groups. In: Proceedings of the 18th ACM conference on Information and knowledge management, CIKM’09, ACM, pp 1147-1156, doi:10.1145/1645953.1646099
[206] van Leeuwen, M.; Vreeken, J.; Siebes, A., Identifying the components, Data Min Knowl Disc, 19, 2, 176-193 (2009) · doi:10.1007/s10618-009-0137-2
[207] van Leeuwen, M.; De Bie, T.; Spyropoulou, E.; Mesnage, C., Subjective interestingness of subgraph patterns, Mach Learn, 105, 1, 41-75 (2016) · Zbl 1392.68376 · doi:10.1007/s10994-015-5539-3
[208] Vanetik N, Litvak M (2017) Query-based summarization using MDL principle. In: Proceedings of the MultiLing 2017 Workshop on Summarization and Summary Evaluation Across Source Types and Genres @ACL’17, pp 22-31
[209] Vanetik N, Litvak M (2018) DRIM: MDL-based approach for fast diverse summarization. In: Proceedings of the 2018 IEEE/WIC/ACM International Conference on Web Intelligence, WI’18, pp 660-663, doi:10.1109/WI.2018.00-17
[210] Vespier U, Knobbe A, Nijssen S, Vanschoren J (2012) MDL-based analysis of time series at multiple time-scales. In: Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML/PKDD’12, Springer, pp 371-386, doi:10.1007/978-3-642-33486-3_24
[211] Viamontes Esquivel, A.; Rosvall, M., Compression of flow can reveal overlapping-module organization in networks, Phys Rev X, 1, 2 (2011) · doi:10.1103/PhysRevX.1.021025
[212] Vreeken J (2009) Making pattern mining useful. Phd thesis, Universiteit Utrecht
[213] Vreeken J, Siebes A (2008) Filling in the blanks – krimp minimisation for missing data. In: Proceedings of the 8th IEEE International Conference on Data Mining, ICDM’08, IEEE Computer Society, pp 1067-1072, doi:10.1109/ICDM.2008.40
[214] Vreeken J, van Leeuwen M, Siebes A (2007a) Characterising the difference. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’07, ACM, pp 765-774, doi:10.1145/1281192.1281274
[215] Vreeken J, van Leeuwen M, Siebes A (2007b) Preserving privacy through data generation. In: Proceedings of the 7th IEEE International Conference on Data Mining, ICDM’07, IEEE Computer Society, pp 685-690, doi:10.1109/ICDM.2007.25
[216] Vreeken, J.; van Leeuwen, M.; Siebes, A., Krimp: Mining itemsets that compress, Data Min Knowl Disc, 23, 1, 169-214 (2011) · Zbl 1235.68071 · doi:10.1007/s10618-010-0202-x
[217] Wang P, Wang H, Liu M, Wang W (2010) An algorithmic approach to event summarization. In: Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data, SIGMOD’10, ACM, pp 183-194, doi:10.1145/1807167.1807189
[218] Webb, GI, Discovering significant patterns, Mach Learn, 68, 1, 1-33 (2007) · Zbl 1470.68195 · doi:10.1007/s10994-007-5006-x
[219] Webb, GI; Vreeken, J., Efficient discovery of the most interesting associations, ACM Transactions on Knowledge Discovery from Data, 8, 3, 15:1-15:31 (2013) · doi:10.1145/2601433
[220] Wiegand B, Klakow D, Vreeken J (2021) Mining easily understandable models from complex event logs. In: Proceedings of the 2021 SIAM International Conference on Data Mining, SDM’21, SIAM, pp 244-252, doi:10.1137/1.9781611976700.28
[221] Wiegand B, Klakow D, Vreeken J (2022) Mining interpretable data-to-sequence generators. In: Proceedings of the Thirty-Sixth AAAI Conference on Artificial Intelligence, AAAI’22, Association for the Advancement of Artificial Intelligenc
[222] Witteveen J, Duivesteijn W, Knobbe A, Grünwald PD (2014) RealKrimp – finding hyperintervals that compress with MDL for real-valued data. In: Proceedings of the 13th International Symposium on Advances in Intelligent Data Analysis, IDA’14, Springer, pp 368-379, doi:10.1007/978-3-319-12571-8_32
[223] Wu D, Gundimeda S, Mou S, Quinn CJ (2020) Modeling piece-wise stationary time series. In: Proceedings of the 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP’20, IEEE Computer Society, pp 3817-3821, doi:10.1109/ICASSP40776.2020.9053470
[224] Yan X, Cheng H, Han J, Xin D (2005) Summarizing itemset patterns: a profile-based approach. In: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD’05, ACM, pp 314-323, doi:10.1145/1081870.1081907
[225] Yan Y, Cao L, Madden S, Rundensteiner EA (2018) SWIFT: Mining representative patterns from large event streams. Proc VLDB Endow 12(3):265-277. doi:10.14778/3291264.3291271
[226] Yang L, Baratchi M, van Leeuwen M (2020) Unsupervised discretization by two-dimensional MDL-based histogram. arXiv:2006.01893
[227] Youngblood GM, Heierman EO, Cook DJ, Holder LB (2005) Automated HPOMDP construction through data-mining techniques in the intelligent environment domain. In: Proceedings of the Eighteenth International Florida Artificial Intelligence Research Society Conference, FLAIRS’05
[228] Yurov M, Ignatov DI (2017) Turning krimp into a triclustering technique on sets of attribute-condition pairs that compress. In: Proceedings of the International Joint Conference on Rough Sets, IJCRS’17, Springer, pp 558-569, doi:10.1007/978-3-319-60840-2_40
[229] Zhao P, Zhao Q, Zhang C, Su G, Zhang Q, Rao W (2019) CLEAN: Frequent pattern-based trajectory spatial-temporal compression on road networks. In: Proceedings of the 20th IEEE International Conference on Mobile Data Management, MDM’19, IEEE Computer Society, pp 605-610, doi:10.1109/MDM.2019.00127
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.