×

Discovering all associations in discrete data using frequent minimally infrequent attribute sets. (English) Zbl 1245.68072

Summary: Associating categories with measured or observed attributes is a central challenge for discrete mathematics in life sciences. We propose a new concept to formalize this question: Given a binary matrix of objects and attributes, determine all attribute sets characterizing object sets of cardinality \(t_{1}\) that do not characterize any object set of size \(t_{2}>t_{1}\). We determine how many such attribute sets exist, give an output-sensitive quasi-polynomial time algorithm to determine them, and show that \(k\)-sum matrix decompositions known from matroid theory are compatible with the characterization.

MSC:

68P05 Data structures
92C42 Systems biology, networks

Software:

LCM
Full Text: DOI

References:

[1] Bioch, J. C.; Ibaraki, T., Complexity of identification and dualization of positive Boolean functions, Information and Computation, 123, 1, 50-63 (1995) · Zbl 1096.68633
[2] Boros, E.; Elbassioni, K. M.; Gurvich, V.; Khachiyan, L., An efficient implementation of a joint generation algorithm, (Ribeiro, C. C.; Martins, S. L., WEA. WEA, Lecture Notes in Computer Science, vol. 3059 (2004), Springer), 114-128
[3] Boros, E.; Gurvich, V.; Khachiyan, L.; Makino, K., On maximal frequent and minimal infrequent sets in binary matrices, Annals of Mathematics and Artificial Intelligence, 39, 3, 211-221 (2003), URL http://dx.doi.org/10.1023/A:1024605820527 · Zbl 1038.68041
[4] Eisenschmidt, E.; Haus, U.-U.; Schmidt, E.; Saemann, J., Monotone and almost-monotone Boolean functions: frequent maximally infrequent sets, (Franco, J., Proceedings of the Eleventh International Symposium on Artificial Intelligence and Mathematics (2010), Ft. Lauderdale)
[5] Fredman, M. L.; Khachiyan, L., On the complexity of dualization of monotone disjunctive normal forms, Journal of Algorithms, 21, 3, 618-628 (1996) · Zbl 0864.68038
[6] Goethals, B., Frequent set mining, (Maimon, O.; Rokach, L., The Data Mining and Knowledge Discovery Handbook (2005), Springer), 377-397 · Zbl 1087.68029
[7] Gurvich, V.; Khachiyan, L., On generating the irredundant conjunctive and disjunctive normal forms of monotone Boolean functions, Discrete Applied Mathematics, 96-97, 363-373 (1999) · Zbl 0953.06013
[8] Ou, G.; Vale, R. D., Molecular signatures of cell migration in C. elegans \(Q\) neuroblasts, The Journal of Cell Biology, 185, 1, 77-85 (2009)
[9] (Pardalos, P. M.; Hansen, P., Data Mining and Mathematical Programming. Data Mining and Mathematical Programming, CRM Proceedings & Lecture Notes, vol. 45 (2008), American Mathematical Society: American Mathematical Society Providence, RI), Papers from the workshop held in Montreal, QC, October 10-13, 2006
[10] Seymour, P. D., Decomposition of regular matroids, Journal of Combinatorial Theory, 28, 305-359 (1980) · Zbl 0443.05027
[11] Steven, R.; Kubiseski, T. J.; Zheng, H.; Kulkarni, S.; Mancillas, J.; Morales, A. R.; Hogue, C. W.V.; Pawson, T.; Culotti, J., UNC-73 activates the Rac GTPase and is required for and growth cone migrations in C. elegans, Cell, 92, 785-795 (1998)
[12] Trümper, K., Matroid Decomposition (1992), Academic Press: Academic Press New York, NY · Zbl 0760.05001
[13] T. Uno, M. Kiyomi, H. Arimura, LCM ver. 2: efficient mining algorithms for frequent/closed/maximal itemsets, in: R. Bayardo, B. Goethals, M.J. Zaki (Eds.), Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations Brighton, UK, November 1, 2004, vol. 126, CEUR Workshop Proceedings, 2005.; T. Uno, M. Kiyomi, H. Arimura, LCM ver. 2: efficient mining algorithms for frequent/closed/maximal itemsets, in: R. Bayardo, B. Goethals, M.J. Zaki (Eds.), Proceedings of the IEEE ICDM Workshop on Frequent Itemset Mining Implementations Brighton, UK, November 1, 2004, vol. 126, CEUR Workshop Proceedings, 2005.
[14] Van Mechelen, I.; Bock, H.-H.; De Boeck, P., Two-mode clustering methods: a structured overview, Statistical Methods in Medical Research, 13, 5, 363-394 (2004), URL http://smm.sagepub.com/content/13/5/363.abstract · Zbl 1053.62078
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.