×

Bounds on the sample complexity for private learning and private data release. (English) Zbl 1274.94038

Micciancio, Daniele (ed.), Theory of cryptography. 7th theory of cryptography conference, TCC 2010, Zurich, Switzerland, February 9–11, 2010. Proceedings. Berlin: Springer (ISBN 978-3-642-11798-5/pbk). Lecture Notes in Computer Science 5978, 437-454 (2010).
Summary: Learning is a task that generalizes many of the analyses that are applied to collections of data, and in particular, collections of sensitive individual information. Hence, it is natural to ask what can be learned while preserving individual privacy. S. P. Kasiviswanathan et al. [SIAM J. Comput. 40, No. 3, 793–826 (2011; Zbl 1235.68093)] initiated such a discussion. They formalized the notion of private learning, as a combination of PAC learning and differential privacy, and investigated what concept classes can be learned privately. Somewhat surprisingly, they showed that, ignoring time complexity, every PAC learning task could be performed privately with polynomially many samples, and in many natural cases this could even be done in polynomial time.
While these results seem to equate non-private and private learning, there is still a significant gap: the sample complexity of (non-private) PAC learning is crisply characterized in terms of the VC-dimension of the concept class, whereas this relationship is lost in the constructions of private learners, which exhibit, generally, a higher sample complexity.
Looking into this gap, we examine several private learning tasks and give tight bounds on their sample complexity. In particular, we show strong separations between sample complexities of proper and improper private learners (such separation does not exist for non-private learners), and between sample complexities of efficient and inefficient proper private learners. Our results show that VC-dimension is not the right measure for characterizing the sample complexity of proper private learning.
We also examine the task of private data release (as initiated in [A. Blum et al., in: Proceedings of the 40th annual ACM symposium on theory of computing, STOC 2008. New York, NY: ACM Press. 609–618 (2008; Zbl 1231.68120)]), and give new lower bounds on the sample complexity. Our results show that the logarithmic dependence on size of the instance space is essential for private data release.
For the entire collection see [Zbl 1181.94004].

MSC:

94A60 Cryptography
68T05 Learning and adaptive systems in artificial intelligence

Software:

SuLQ
Full Text: DOI

References:

[1] Beimel, A., Kasiviswanathan, S., Nissim, K.: Bounds on the Sample Complexity for Private Learning and Private Data Release (Full version) (2009) · Zbl 1311.68131
[2] Blum, A., Dwork, C., McSherry, F., Nissim, K.: Practical privacy: The SuLQ framework. In: PODS, pp. 128–138. ACM, New York (2005)
[3] Blum, A., Ligett, K., Roth, A.: A learning theory approach to non-interactive database privacy. In: STOC, pp. 609–618. ACM, New York (2008) · Zbl 1231.68120
[4] Blum, A., Ligett, K., Roth, A.: Private communication (2008)
[5] Blumer, A., Ehrenfeucht, A., Haussler, D., Warmuth, M.K.: Learnability and the Vapnik-Chervonenkis dimension. Journal of the Association for Computing Machinery 36(4), 929–965 (1989) · Zbl 0697.68079 · doi:10.1145/76359.76371
[6] Dwork, C.: The differential privacy frontier (extended abstract). In: Reingold, O. (ed.) TCC 2009. LNCS, vol. 5444, pp. 496–502. Springer, Heidelberg (2009) · Zbl 1213.94156 · doi:10.1007/978-3-642-00457-5_29
[7] Dwork, C., McSherry, F., Nissim, K., Smith, A.: Calibrating noise to sensitivity in private data analysis. In: Halevi, S., Rabin, T. (eds.) TCC 2006. LNCS, vol. 3876, pp. 265–284. Springer, Heidelberg (2006) · Zbl 1112.94027 · doi:10.1007/11681878_14
[8] Dwork, C., Naor, M., Reingold, O., Rothblum, G., Vadhan, S.: On the complexity of differentially private data release. In: STOC, pp. 381–390. ACM, New York (2009) · Zbl 1304.94050
[9] Ehrenfeucht, A., Haussler, D., Kearns, M.J., Valiant, L.G.: A general lower bound on the number of examples needed for learning. Inf. Comput. 82(3), 247–261 (1989) · Zbl 0679.68158 · doi:10.1016/0890-5401(89)90002-3
[10] Kasiviswanathan, S.P., Lee, H.K., Nissim, K., Raskhodnikova, S., Smith, A.: What can we learn privately? In: FOCS, pp. 531–540. IEEE Computer Society, Los Alamitos (2008) · Zbl 1235.68093
[11] Kasiviswanathan, S.P., Smith, A.: A note on differential privacy: Defining resistance to arbitrary side information. CoRR, arXiv:0803.39461 [cs.CR] (2008)
[12] Kearns, M.J.: Efficient noise-tolerant learning from statistical queries. Journal of the ACM 45(6), 983–1006 (1998); Preliminary version in Proceedings of STOC 1993 · Zbl 1065.68605 · doi:10.1145/293347.293351
[13] Kearns, M.J., Vazirani, U.V.: An Introduction to Computational Learning Theory. MIT Press, Cambridge (1994)
[14] McSherry, F., Talwar, K.: Mechanism design via differential privacy. In: FOCS, pp. 94–103. IEEE, Los Alamitos (2007) · Zbl 1232.68047
[15] Mishra, N., Sandler, M.: Privacy via pseudorandom sketches. In: PODS, pp. 143–152. ACM, New York (2006)
[16] Pitt, L., Valiant, L.G.: Computational limitations on learning from examples. Journal of the ACM 35(4), 965–984 (1988) · Zbl 0662.68086 · doi:10.1145/48014.63140
[17] Valiant, L.G.: A theory of the learnable. Communications of the ACM 27, 1134–1142 (1984) · Zbl 0587.68077 · doi:10.1145/1968.1972
[18] Vapnik, V.N., Chervonenkis, A.Y.: On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications 16, 264 (1971) · Zbl 0247.60005 · doi:10.1137/1116025
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.