Abstract
The problem of anonymization in large networks and the utility of released data are considered in this paper. Although there are some anonymization methods for networks, most of them cannot be applied in large networks because of their complexity. In this paper, we devise a simple and efficient algorithm for k-degree anonymity in large networks. Our algorithm constructs a k-degree anonymous network by the minimum number of edge modifications. We compare our algorithm with other well-known k-degree anonymous algorithms and demonstrate that information loss in real networks is lowered. Moreover, we consider the edge relevance in order to improve the data utility on anonymized networks. By considering the neighbourhood centrality score of each edge, we preserve the most important edges of the network, reducing the information loss and increasing the data utility. An evaluation of clustering processes is performed on our algorithm, proving that edge neighbourhood centrality increases data utility. Lastly, we apply our algorithm to different large real datasets and demonstrate their efficiency and practical utility.
Similar content being viewed by others
Notes
A preliminary, short version [9] of this paper appeared at ASONAM 2013.
Java implementation and source code available at: https://jcasasr.wordpress.com.
References
Adamic LA, Glance N (2005) The political blogosphere and the 2004 US election. In: International workshop on link discovery. ACM, New York, NY, USA, pp 36–43
Backstrom L, Dwork C, Kleinberg J (2007) Wherefore art thou r3579x? Anonymized social networks, hidden patterns, and structural steganography. In: International conference on world wide web. ACM, New York, NY, USA, pp 181–190
Blondel VD, Guillaume J-L, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech Theory Exp 2008(10):P10008
Boldi P, Bonchi F, Gionis A, Tassa T (2012) Injecting uncertainty in graphs for identity obfuscation. Proc VLDB Endow 5(11):1376–1387
Bredereck R, Froese V, Hartung S, Nichterlein A, Niedermeier R, Talmon N (2014) The complexity of degree anonymization by vertex addition. In: Proceedings of the 10th international conference on algorithmic aspects in information and management. Springer, Vancouver, BC, Canada, pp 44–55
Cai B-J, Wang H-Y, Zheng H-R, Wang H (2010) Evaluation repeated random walks in community detection of social networks. In: International conference on machine learning and cybernetics. IEEE, Qingdao, China, pp 1849–1854
Campan A, Alufaisan Y, Truta TM (2015) Preserving communities in anonymized social networks. Trans Data Priv 8(1):55–87
Campan A, Truta TM (2009) Data and structural \(k\)-anonymity in social networks. In: Bonchi F, Ferrari E, Jiang W, Malin B (eds) Privacy, security, and trust in KDD. Springer, New York, pp 33–54
Casas-Roma J, Herrera-Joancomartí J, Torra V (2013) An algorithm for \(k\)-degree anonymity on large networks. In: IEEE international conference on advances on social networks analysis and mining. IEEE, Niagara Falls, CA, USA, pp 671–675
Casas-Roma J, Herrera-Joancomartí J, Torra V (2013) Analyzing the impact of edge modifications on networks. In: International conference on modeling decisions for artificial intelligence. Springer, Barcelona, Spain, pp 296–307
Cheng J, Fu AW, Liu J (2010) \(k\)-isomorphism: privacy preserving network publication against structural attacks. In: International conference on management of data. ACM, New York, NY, USA, pp 459–470
Chester S, Kapron BM, Ramesh G, Srivastava G, Thomo A, Venkatesh S (2011) \(k\)-anonymization of social networks by vertex addition. In: ADBIS 2011 research communications, pp 107–116
Chester S, Gaertner J, Stege U, Venkatesh S (2012) Anonymizing subsets of social networks with degree constrained subgraphs. In: IEEE International conference on advances on social networks analysis and mining. IEEE, Washington, DC, USA, pp 418–422
Chester S, Kapron BM, Ramesh G, Srivastava G, Thomo A, Venkatesh S (2013) Why Waldo befriended the dummy? \(k\)-anonymization of social networks with pseudo-nodes. Soc Netw Anal Min 3(3):381–399
Clauset A, Newman MEJ, Moore C (2004) Finding community structure in very large networks. Phys Rev E 70(6):1–6
De Capitani di Vimercati S, Foresti S, Livraga G, Samarati P (2012) Data privacy: definitions and techniques. Int J Uncertain Fuzziness Knowl Based Syst 20(6):793–818
Dwork C (2006) Differential privacy. In: International conference on automata, languages and programming, vol 4052, pp 1–12
Dwork C (2011) A firm foundation for private data analysis. Commun ACM 54(1):86–95
Ferri F, Grifoni P, Guzzo T (2011) New forms of social and professional digital relationships: the case of Facebook. Soc Netw Anal Min 2(2):121–137
Girvan M, Newman MEJ (2002) Community structure in social and biological networks. Proc Natl Acad Sci USA 99(12):7821–7826
Hansen SL, Mukherjee S (2003) A polynomial algorithm for optimal univariate microaggregation. IEEE Trans Knowl Data Eng 15(4):1043–1044
Hartung S, Hoffmann C, Nichterlein A (2014) Improved upper and lower bound heuristics for degree anonymization in social networks. In: Proceedings of the 13th international symposium on experimental algorithms. Springer, Copenhagen, Denmark, pp 376–387
Hay M, Miklau G, Jensen D, Weis P, Srivastava S (2007) Anonymizing social networks. Technical Report No. 07-19, Computer Science Department, University of Massachusetts, Amherst
Hay M, Miklau G, Jensen D, Towsley D, Weis P (2008) Resisting structural re-identification in anonymized social networks. Proc VLDB Endow 1(1):102–114
Hay M, Li C, Miklau G, Jensen D (2009) Accurate estimation of the degree distribution of private networks. In: IEEE International conference on data mining (ICDM). IEEE, Miami, FL, USA, pp 169–178
Hay M, Liu K, Miklau G, Pei J, Terzi E (2011) Privacy-aware data management in information networks. In: International conference on management of data. ACM Press, New York, NY, USA, pp 1201–1204
Krebs V (2006). US politics book. Retrieved from http://www.orgnet.com
Lancichinetti A, Fortunato S (2009) Community detection algorithms: a comparative analysis. Phys Rev E 80(5):056117
Leskovec J, Kleinberg J, Faloutsos C (2007) Graph evolution: densification and shrinking diameters. ACM Trans Knowl Discov Data 1(1):1–40
Leskovec J, Adamic LA, Huberman BA (2007) The dynamics of viral marketing. ACM Trans Web 1(1):5:1–5:46
Leskovec J, Lang K, Dasgupta A, Mahoney M (2009) Community structure in large networks: natural cluster sizes and the absence of large well-defined clusters. Internet Math 6(1):29–123
Li N, Li T, Venkatasubramanian S (2007) \(t\)-closeness: privacy beyond \(k\)-anonymity and \(\ell \)-diversity. In: IEEE international conference on data engineering. IEEE, pp 106–115
Liu K, Terzi E (2008) Towards identity anonymization on graphs. In ACM SIGMOD international conference on management of data. ACM, New York, NY, USA, pp 93–106
Lu X, Song Y, Bressan S (2012) Fast identity anonymization on graphs. In: Proceedings of the 23rd international conference on database and expert systems applications. Springer, Vienna, Austria, pp 281–295
Nagle F (2013) Privacy breach analysis in social networks. In: Özyer T, Erdem Z, Rokne J, Khoury S (eds) Mining social networks and security informatics. Springer, Dordrecht, pp 63–77
Nguyen HH, Imine A, Rusinowitch M (2015) Anonymizing social graphs via uncertainty semantics. In: Proceedings of the 10th ACM symposium on information, computer and communications security. ACM, Singapore, pp 495–506
Machanavajjhala A, Kifer D, Gehrke J, Venkitasubramaniam M (2007) \(\ell \)-diversity: privacy beyond k\(k\)-anonymity. ACM Trans Knowl Discov Data 1(1):3:1–3:12
McSherry F, Mironov I (2009) Differentially private recommender systems. In: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, NY, USA, pp 627–636
Pons P, Latapy M (2005) Computing communities in large networks using random walks. In: 20th international symposium computer and information sciences. Springer, Istanbul, Turkey, pp 284–293
Rosvall M, Bergstrom CT (2008) Maps of random walks on complex networks reveal community structure. Proc Natl Acad Sci USA 105(4):1118–1123
Samarati P (2001) Protecting respondents’ identities in microdata release. IEEE Trans Knowl Data Eng 13(6):1010–1027
Sweeney L (2002) \(k\)-anonymity: a model for protecting privacy. Int J Uncertain Fuzziness Knowl Based Syst 10(5):557–570
Tripathy BK, Panda GK (2010) A new approach to manage security against neighborhood attacks in social networks. In: IEEE international conference on advances on social networks analysis and mining. IEEE, pp 264–269
Wu W, Xiao Y, Wang W, He Z, Wang Z (2010) \(k\)-symmetry model for identity anonymization in social networks. In: International conference on extending database technology. ACM, New York, NY, USA, pp 111–122
Yahoo! Webscope, Yahoo! Instant Messenger friends connectivity graph, version 1.0. http://research.yahoo.com/Academic_Relations
Ying X, Wu X (2008) Randomizing social networks: a spectrum preserving approach. In: SIAM Conference on data mining. SIAM, Atlanta, GA, USA, pp 739–750
Ying X, Pan K, Wu X, Guo L (2009) Comparisons of randomization and \(k\)-degree anonymization schemes for privacy preserving social network publishing. In: Workshop on social network mining and analysis. ACM, New York, NY, USA, pp 10:1–10:10
Zachary WW (1977) An information flow model for conflict and fission in small groups. J Anthropol Res 33(4):452–473
Zhang K, Lo D, Lim E-P, Prasetyo PK (2013) Mining indirect antagonistic communities from social interactions. Knowl Inf Syst (KAIS) 35(3):553–583
Zheleva E, Getoor L (2011) Privacy in social networks: a survey. Social network data analytics. Springer, Berlin
Zhou B, Pei J (2008) Preserving privacy in social networks against neighborhood attacks. In: IEEE International conference on data engineering (ICDE). IEEE, Washington, DC, USA, pp 506–515
Zhou B, Pei J (2011) The \(k\)-anonymity and \(\ell \)-diversity approaches for privacy preservation in social networks against neighborhood attacks. Knowl Inf Syst 28(1):47–77
Zhou B, Pei J, Luk W (2008) A brief survey on anonymization techniques for privacy preserving publishing of social network data. ACM SIGKDD Explor Newsl 10(2):12–22
Zou L, Chen L, Özsu MT (2009) \(k\)-automorphism: a general framework for privacy preserving network publication. Proc VLDB Endow 2(1):946–957
Acknowledgments
This work was partly funded by the Spanish Government through Grants TIN2011-27076-C03-02 “CO-PRIVACY” and TIN2014-57364-C2-2-R “SMARTGLACIS.”
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Casas-Roma, J., Herrera-Joancomartí, J. & Torra, V. k-Degree anonymity and edge selection: improving data utility in large networks. Knowl Inf Syst 50, 447–474 (2017). https://doi.org/10.1007/s10115-016-0947-7
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-016-0947-7