×

A two-stage integer programming model considering transaction equivalence for privacy preservation. (English) Zbl 1520.68033

Summary: Preserving privacy is one of the fundamental requirements of firms that share data with their business partners for building advanced data mining models. Firms often aim to protect the disclosure of sensitive knowledge or information discovered during the data mining process. In this study, we investigate the problem of Frequent Itemset Hiding (FIH) which aims to hide sensitive itemset relationships present in a transactional database. We propose a two-stage integer programming model that maximizes the proportion of unaltered transactions in the sanitized database and protects sensitive itemset relationships. The model exploits the concept of transactional equivalence and significantly reduces the size of the FIH problem. In addition, our model enables the identification of solutions with minimal side effects. We conduct an experimental evaluation on both real and synthetic databases to show that our approach is scalable and produces a sanitized database with maximum accuracy. The generated solution is also found to have lower side effects (itemset information loss) compared to other state-of-the-art methods. Our experiments on very large problem instances show problem size reductions of one to three orders of magnitude. The proposed approach is quite attractive and practically useful for solving large-scale FIH problem instances and preserving privacy in increasingly shared and big data-driven organizational environments.

MSC:

68P27 Privacy of data
90C10 Integer programming
Full Text: DOI

References:

[1] Agrawal, R., Srikant, R., et al., 1994. Fast algorithms for mining association rules. In: Proc. 20th Int. Conf. Very Large Data Bases, VLDB, Vol. 1215. pp. 487-499.
[2] Ahmed, U.; Srivastava, G.; Lin, J. C.-W., A machine learning model for data sanitization, Comput. Netw., 189, Article 107914 pp. (2021)
[3] Atallah, M., Bertino, E., Elmagarmid, A., Ibrahim, M., Verykios, V., 1999. Disclosure limitation of sensitive rules. In: Proceedings 1999 Workshop on Knowledge and Data Engineering Exchange (KDEX’99). pp. 45-52.
[4] Bayardo, R.J., Jr., 1998. Efficiently mining long patterns from databases. In: Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data. pp. 85-93.
[5] Brijs, T., Swinnen, G., Vanhoof, K., Wets, G., 1999. Using association rules for product assortment decisions: A case study. In: Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 254-260.
[6] Clifton, C.; Marks, D., Security and privacy implications of data mining, (ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery (1996)), 15-19
[7] Dasseni, E.; Verykios, V. S.; Elmagarmid, A. K.; Bertino, E., Hiding association rules by using confidence and support, (International Workshop on Information Hiding (2001), Springer), 369-383 · Zbl 1008.68671
[8] Dolley, S.; Gonzales, H.; Henry, K., Retailer-Direct Data Report (2009), Grocery Manufacturers Association
[9] Ehrgott, M., Multicriteria Optimization, Vol. 491 (2005), Springer Science & Business Media · Zbl 1132.90001
[10] Gkoulalas-Divanis, A., Verykios, V.S., 2006. An integer programming approach for frequent itemset hiding. In: Proceedings of the 15th ACM International Conference on Information and Knowledge Management. pp. 748-757.
[11] Gkoulalas-Divanis, A.; Verykios, V. S., Exact knowledge hiding through database extension, IEEE Trans. Knowl. Data Eng., 21, 5, 699-713 (2009)
[12] Goethals, B.; Zaki, M. J., Advances in frequent itemset mining implementations: report on FIMI’03, Acm Sigkdd Explor. Newsl., 6, 1, 109-117 (2004)
[13] Grean, M.; Shaw, M. J., Supply-chain partnership between P&G and Wal-Mart, (E-Business Management (2002)), 155-171
[14] Kagklis, V.; Verykios, V. S.; Tzimas, G.; Tsakalidis, A. K., An integer linear programming scheme to sanitize sensitive frequent itemsets, (2014 IEEE 26th International Conference on Tools with Artificial Intelligence (2014)), 771-775
[15] Kappelman, A. C.; Sinha, A. K., Optimal control in dynamic food supply chains using big data, Comput. Oper. Res., 126, Article 105117 pp. (2021) · Zbl 1510.90013
[16] Keifer, S., Beyond point of sale data: Looking forward, not backwards for demand forecasting, GXS White Pap. (2010)
[17] Kenthapadi, K., Mironov, I., Thakurta, A.G., 2019. Privacy-preserving data mining in industry. In: Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining. pp. 840-841.
[18] Kohavi, R.; Brodley, C. E.; Frasca, B.; Mason, L.; Zheng, Z., KDD-Cup 2000 organizers’ report: Peeling the onion, Acm Sigkdd Explor. Newsl., 2, 2, 86-93 (2000)
[19] Krasadakis, P.; Verykios, V. S.; Sakkopoulos, E., Resolving infeasibility in linear programs for the frequent itemset hiding problem, (2021 IEEE 33rd International Conference on Tools with Artificial Intelligence (ICTAI) (2021), IEEE), 1067-1074
[20] Lee, H. L.; Whang, S., Information sharing in a supply chain, Int. J. Manuf. Technol. Manage., 1, 1, 79-93 (2000)
[21] Leloğlu, E.; Ayav, T.; Ergenc, B., Coefficient-based exact approach for frequent itemset hiding, (Proceedngs of the 6th International Conference on Information, Process, and Knowledge Management (2014), IARIA), 124-130
[22] Li, S.; Mu, N.; Le, J.; Liao, X., Privacy preserving frequent itemset mining: Maximizing data utility based on database reconstruction, Comput. Secur., 84, 17-34 (2019)
[23] Lin, J. C.-W.; Liu, Q.; Fournier-Viger, P.; Hong, T.-P.; Voznak, M.; Zhan, J., A sanitization approach for hiding sensitive itemsets based on particle swarm optimization, Eng. Appl. Artif. Intell., 53, 1-18 (2016)
[24] Lin, J. C.-W.; Srivastava, G.; Zhang, Y.; Djenouri, Y.; Aloqaily, M., Privacy-preserving multiobjective sanitization model in 6G IoT environments, IEEE Internet Things J., 8, 7, 5340-5349 (2021)
[25] Menon, S.; Ghoshal, A.; Sarkar, S., Modifying transactional databases to hide sensitive association rules, Inf. Syst. Res., 33, 1, 152-178 (2022)
[26] Menon, S.; Sarkar, S., Minimizing information loss and preserving privacy, Manage. Sci., 53, 1, 101-116 (2007) · Zbl 1232.68038
[27] Menon, S.; Sarkar, S., Privacy and big data: Scalable approaches to sanitize large transactional databases for sharing, MIS Q., 40, 4, 963-981 (2016)
[28] Menon, S.; Sarkar, S.; Mukherjee, S., Maximizing accuracy of shared databases when concealing sensitive patterns, Inf. Syst. Res., 16, 3, 256-270 (2005)
[29] Miettinen, K., Nonlinear Multiobjective Optimization, Vol. 12 (2012), Springer Science & Business Media
[30] Moustakides, G. V.; Verykios, V. S., A maxmin approach for hiding frequent itemsets, Data Knowl. Eng., 65, 1, 75-89 (2008)
[31] Oliveira, S.R., Zaiane, O.R., 2002. Privacy preserving frequent itemset mining. In: Proceedings of the IEEE International Conference on Privacy, Security and Data Mining-Volume 14. pp. 43-54.
[32] Oliveira, S. R.; Zaiane, O. R., A unified framework for protecting sensitive association rules in business collaboration, Int. J. Bus. Intell. Data Min., 1, 3, 247-287 (2006)
[33] Saygin, Y.; Verykios, V. S.; Clifton, C., Using unknowns to prevent discovery of association rules, ACM Sigmod Rec., 30, 4, 45-54 (2001)
[34] Schwarz, L. B.; Zhao, H., The unexpected impact of information sharing on US pharmaceutical supply chains, Interfaces, 41, 4, 354-364 (2011)
[35] Shang, W.; Ha, A. Y.; Tong, S., Information sharing in a supply chain with a common retailer, Manage. Sci., 62, 1, 245-263 (2016)
[36] Stavropoulos, E. C.; Verykios, V. S.; Kagklis, V., A transversal hypergraph approach for the frequent itemset hiding problem, Knowl. Inf. Syst., 47, 3, 625-645 (2016)
[37] Sun, X.; Yu, P. S., A border-based approach for hiding sensitive frequent itemsets, (Fifth IEEE International Conference on Data Mining (ICDM’05) (2005)), 426-433
[38] Sun, X.; Yu, P. S., Hiding sensitive frequent itemsets by a border-based approach, J. Comput. Sci. Eng., 1, 1, 74-94 (2007)
[39] Tulabandhula, T.; Sinha, D.; Karra, S., Optimizing revenue while showing relevant assortments at scale, European J. Oper. Res. (2021) · Zbl 1495.62121
[40] Verykios, V. S.; Elmagarmid, A. K.; Bertino, E.; Saygin, Y.; Dasseni, E., Association rule hiding, IEEE Trans. Knowl. Data Eng., 16, 4, 434-447 (2004)
[41] Verykios, V. S.; Stavropoulos, E. C.; Krasadakis, P.; Sakkopoulos, E., Frequent itemset hiding revisited: pushing hiding constraints into mining, Appl. Intell., 52, 2539-2555 (2022)
[42] Weinswig, D., Measuring the value of retail data sharing and analytics, (Coresight Research/SPS Commerce Research Report (2020), Coresight Research: Coresight Research New York)
[43] Wu, C., Applying frequent itemset mining to identify a small itemset that satisfies a large percentage of orders in a warehouse, Comput. Oper. Res., 33, 11, 3161-3170 (2006) · Zbl 1113.90092
[44] Wu, Y.-H.; Chiang, C.-M.; Chen, A. L., Hiding sensitive association rules with limited side effects, IEEE Trans. Knowl. Data Eng., 19, 1, 29-42 (2007)
[45] Wu, J. M.-T.; Srivastava, G.; Jolfaei, A.; Pirouz, M.; Lin, J. C.-W., Security and privacy in shared HitLCPS using a GA-based multiple-threshold sanitization model, IEEE Trans. Emerg. Top. Comput. Intell. (2021)
[46] Wu, J. M.-T.; Srivastava, G.; Lin, J. C.-W.; Teng, Q., A multi-threshold ant colony system-based sanitization model in shared medical environments, ACM Trans. Internet Technol. (TOIT), 21, 2, 1-26 (2021)
[47] Wu, J. M.-T.; Teng, Q.; Huda, S.; Chen, Y.-C.; Chen, C.-M., A privacy frequent itemsets mining framework for collaboration in IoT using federated learning, ACM Trans. Sensor Netw. (2022)
[48] Zhao, H.; Xiong, C.; Gavirneni, S.; Fein, A., Fee-for-service contracts in pharmaceutical distribution supply chains: design, analysis, and management, Manuf. Serv. Oper. Manage., 14, 4, 685-699 (2012)
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.