Abstract
Outsourcing data to external parties for analysis is risky as the privacy of confidential variables can be easily violated. To eliminate this threat, the data values of these variables should be perturbed before releasing the data. However, the perturbation itself may significantly change the underlying properties of the data, affecting the analysis results. What is required is a subtle transformation to generate perturbed data that maintains, as much as possible, the statistical properties and effectiveness (i.e. the utility) of the original data whilst preserving the privacy. We examine privacy-preserving transformations in the context of data clustering. In particular, this paper demonstrates how non-metric multidimensional scaling (MDS) can be profitably used as a perturbation tool and how the perturbed data can be effectively used in clustering analysis without compromising privacy or utility. We apply the proposed technique to real datasets and compare the results, which were, in some circumstances, exactly the same as those obtained from the original data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Agrawal, D., Aggarwal, C.C.: On the design and quantification of privacy preserving data mining algorithms. In: Proceedings of the Twentieth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, p. 255. ACM, New York (2001)
Agrawal, R., Srikant, R.: Privacy-preserving data mining. ACM Sigmod Record 29(2), 439–450 (2000)
Chen, K., Liu, L.: Privacy preserving data classification with rotation perturbation. In: Proceedings of the Fifth IEEE International Conference on Data Mining, p. 4 (2005)
Chen, K., Sun, G., Liu, L.: Towards attack-resilient geometric data perturbation. In: Proceedings of the 2007 SIAM Data Mining Conference. SDM 2007 (2007)
Clifton, C., Kantarcioǧlu, M., Vaidya, J.: Defining privacy for data mining. In: National Science Foundation Workshop on Next Generation Data Mining, pp. 126–133 (2002)
Domingo-Ferrer, J.: A survey of inference control methods for privacy-preserving data mining. In: Aggarwal, C., Yu, P. (eds.) Privacy-Preserving Data Mining: Models and Algorithms, ch. 3, pp. 53–80. Springer, Heidelberg (2008)
Domingo-Ferrer, J., Torra, V.: Disclosure control methods and information loss for microdata. In: Confidentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical Agencies, pp. 93–112 (2002)
Guo, S., Wu, X.: Deriving private information from arbitrarily projected data. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4426, pp. 84–95. Springer, Heidelberg (2007)
Kargupta, H., Datta, S., Wang, Q., Sivakumar, K.: On the Privacy Preserving Properties of Random Data Perturbation Techniques. In: Proceedings of the Third IEEE International Conference on Data Mining, pp. 99–106. IEEE Computer Society, Los Alamitos (2003)
Kargupta, H., Datta, S., Wang, Q., Sivakumar, K.: Random-data perturbation techniques and privacy-preserving data mining. Knowledge and Information Systems 7(4), 387–414 (2005)
Kim, J.J., Winkler, W.E.: Multiplicative Noise for Masking Continuous Data. Technical report, Research Report Series - statistics 2003-01, Statistical Research Division. US Bureau of the Census, Washington, DC (2003)
Kruskal, J.B.: Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika 29(1), 1–27 (1964)
Kruskal, J.B.: Nonmetric multidimensional scaling: a numerical method. Psychometrika 29(2), 115–129 (1964)
Lindell, Y., Pinkas, B.: Privacy preserving data mining. In: Bellare, M. (ed.) CRYPTO 2000. LNCS, vol. 1880, pp. 36–54. Springer, Heidelberg (2000)
Liu, K., Giannella, C., Kargupta, H.: An attacker’s view of distance preserving maps for privacy preserving data mining. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) PKDD 2006. LNCS (LNAI), vol. 4213, pp. 297–308. Springer, Heidelberg (2006)
Liu, K., Kargupta, H., Ryan, J.: Random projection-based multiplicative data perturbation for privacy preserving distributed data mining. IEEE Transactions on Knowledge and Data Engineering 18(1), 92–106 (2006)
Meila, M.: Comparing clusterings–an information based distance. Journal of Multivariate Analysis 98(5), 873–895 (2007)
Oliveira, S., Zaïane, O.R.: Privacy-preserving clustering to uphold business collaboration: A dimensionality reduction based transformation approach. International Journal of Information Security and Privacy 1(2), 13 (2007)
Sammon Jr., J.W.: A nonlinear mapping for data structure analysis. IEEE Transactions on Computers 100(5), 401–409 (1969)
Turgay, E., Pedersen, T., Saygın, Y., Savaş, E., Levi, A.: Disclosure risks of distance preserving data transformations. In: Ludäscher, B., Mamoulis, N. (eds.) SSDBM 2008. LNCS, vol. 5069, pp. 79–94. Springer, Heidelberg (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Alotaibi, K., Rayward-Smith, V.J., de la Iglesia, B. (2011). Non-metric Multidimensional Scaling for Privacy-Preserving Data Clustering. In: Yin, H., Wang, W., Rayward-Smith, V. (eds) Intelligent Data Engineering and Automated Learning - IDEAL 2011. IDEAL 2011. Lecture Notes in Computer Science, vol 6936. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23878-9_35
Download citation
DOI: https://doi.org/10.1007/978-3-642-23878-9_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23877-2
Online ISBN: 978-3-642-23878-9
eBook Packages: Computer ScienceComputer Science (R0)