×

Testing distributions of huge objects. (English) Zbl 07875618

Summary: We initiate a study of a new model of property testing that is a hybrid of testing properties of distributions and testing properties of strings. Specifically, the new model refers to testing properties of distributions, but these are distributions over huge objects (i.e., very long strings). Accordingly, the model accounts for the total number of local probes into these objects (resp., queries to the strings) as well as for the distance between objects (resp., strings), and the distance between distributions is defined as the earth mover’s distance with respect to the relative Hamming distance between strings.
We study the query complexity of testing in this new model, focusing on three directions. First, we try to relate the query complexity of testing properties in the new model to the sample complexity of testing these properties in the standard distribution-testing model. Second, we consider the complexity of testing properties that arise naturally in the new model (e.g., distributions that capture random variations of fixed strings). Third, we consider the complexity of testing properties that were extensively studied in the standard distribution-testing model: Two such cases are uniform distributions and pairs of identical distributions, where we obtain the following results.
Testing whether a distribution over \(n\)-bit long strings is uniform on some set of size \(m\) can be done with query complexity \(\widetilde{O}(m/\epsilon^3)\) where \(\epsilon>(\log_2m)/n\) is the proximity parameter. Testing whether two distribution over \(n\)-bit long strings that have support size at most \(m\) are identical can be done with query complexity \(\widetilde{O}(m^{2/3}/\epsilon^3)\).
Both upper bounds are quite tight; that is, for \(\epsilon=\Omega(1)\), the first task requires \(\Omega(m^\epsilon)\) queries for any \(c<1\) and \(n=\omega(\log m)\), whereas the second task requires \(\Omega(m^{2/3})\) queries. Note that the query complexity of the first task is higher than the sample complexity of the corresponding task in the standard distribution-testing model, whereas in the case of the second task the bounds almost match.

MSC:

68-XX Computer science

References:

[1] Jayadev Acharya, Clément L. Canonne, and Himanshu Tyagi. Inference under information constraints I: lower bounds from chi-square contraction. IEEE Transactions on Information Theory, 66(12):7835-7855, 2020 DOI (14). · Zbl 1457.62091 · doi:10.1109/TIT.2020.3028440
[2] Tugkan Batu. Testing properties of distributions. PhD thesis, Computer Science department, Cornell University, 2001 (32).
[3] Tugkan Batu and Clement L. Canonne. Generalized uniformity testing. Proceedings of the Fiftieth-Eighth Annual Symposium on Foundations of Computer Science (FOCS), pages 880-889, 2017 DOI (3, 9, 10). · doi:10.1109/FOCS.2017.86
[4] Tugkan Batu, Lance Fortnow, Eldar Fischer, Ravi Kumar, Ronitt Rubinfeld, and Patrick White. Testing random variables for independence and identity. Proceedings of the Forty-Second Annual Symposium on Foundations of Computer Science (FOCS), pages 442-451, 2001 DOI . · doi:10.1109/SFCS.2001.959920
[5] Tugkan Batu, Lance Fortnow, Ronitt Rubinfeld, Warren D. Smith, and Patrick White. Testing closeness of discrete distributions. Journal of the ACM, 60(1):4:1-4:25, 2013 DOI . This is a long version of [6]. (10). · Zbl 1281.68227 · doi:10.1145/2432622.2432626
[6] Tugkan Batu, Lance Fortnow, Ronitt Rubinfeld, Warren D. Smith, and Patrick White. Testing that distributions are close. Proceedings of the Forty-First Annual Symposium on Foundations of Computer Science (FOCS), pages 259-269, 2000 DOI (10, 46). · doi:10.1109/SFCS.2000.892113
[7] Eli Ben-Sasson, Oded Goldreich, Prahladh Harsha, Madhu Sudan, and Salil Vadhan. Robust PCPs of proximity, shorter PCPs, and applications to coding. SIAM Journal on Computing, 36(4):889-974, 2006 DOI (31, 32). · Zbl 1118.68071 · doi:10.1137/S0097539705446810
[8] Clément L. Canonne. A Survey on Distribution Testing: Your Data is Big. But is it Blue?, number 9 in Graduate Surveys. Theory of Computing Library, 2020, pages 1-100 DOI (2, 14). · doi:10.4086/toc.gs.2020.009
[9] Clément L. Canonne. Topics and techniques in distribution testing: A biased but representative sample. Foundations and Trends in Communication and Information Theory, 19(6):1032-1198, 2022 DOI (2). · Zbl 1524.62128 · doi:10.1561/0100000114
[10] Sourav Chakraborty, Eldar Fischer, Arijit Ghosh, Gopinath Mishra, and Sayantan Sen. Testing of Index-Invariant Properties in the Huge Object Model. Technical report TR22-155, Electronic Colloquium on Computational Complexity (ECCC), 2022 URL (14).
[11] Alessandro Chiesa, Tom Gur, and Igor Shinkar. Relaxed locally correctable codes with nearly-linear block length and constant query complexity. Proceedings of the Thirty-First Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1395-1411, 2020 DOI (32). · Zbl 1502.68119 · doi:10.1137/1.9781611975994.84
[12] Ilias Diakonikolas and Daniel Kane. A new approach for testing properties of discrete distributions. Proceedings of the Fiftieth-Seventh Annual Symposium on Foundations of Computer Science (FOCS), pages 685-694, 2016 DOI (44, 52). · doi:10.1109/FOCS.2016.78
[13] Ilias Diakonikolas, Daniel M. Kane, and Alistair Stewart. Sharp bounds for generalized uniformity testing. Advances in Neural Information Processing Systems (NeurIPS), volume 31, 2018 URL (10).
[14] Eldar Fischer and Arieh Matsliah. Testing graph isomorphism. SIAM Journal on Computing, 38(1):207-225, 2008 DOI (11, 40, 41). · Zbl 1165.05026 · doi:10.1137/070680795
[15] Oded Goldreich. Introduction to Property Testing. Cambridge University Press, 2017 DOI (2, 14, 16, 23, 30, 32, 33, 38, 40, 41, 44, 48, 52, 53). · Zbl 06797790 · doi:10.1017/9781108135252
[16] Oded Goldreich. On multiple input problems in property testing. Proceedings of the Eighteenth International Workshop on Randomization and Computation (RANDOM), pages 704-720, 2014 DOI (29). · Zbl 1359.68085 · doi:10.4230/LIPICS.APPROX-RANDOM.2014.704
[17] Oded Goldreich. Testing Isomorphism in the Bounded-Degree Graph Model. Technical report TR19-102, Electronic Colloquium on Computational Complexity (ECCC), 2019 URL (41).
[18] Oded Goldreich and Dana Ron. Lower bounds on the complexity of testing grained distributions. Computational Complexity, 32(2):Article number 11, 2023 DOI (9, 22, 27). · Zbl 07773324 · doi:10.1007/S00037-023-00245-W
[19] Oded Goldreich and Dana Ron. Testing Distributions of Huge Objects. Technical report TR21-133, Electronic Colloquium on Computational Complexity (ECCC), 2021 URL (14).
[20] Tom Gur, Govind Ramnarayan, and Ron Rothblum. Relaxed locally correctable codes. Proceedings of the Ninth Innovations in Theoretical Computer Science conference (ITCS), 27:1-27:11, 2018 DOI (31, 32). · Zbl 1462.68050 · doi:10.4230/LIPICS.ITCS.2018.27
[21] Krzysztof Onak and Xiaorui Sun. The query complexity of graph isomorphism: bypassing distribution testing lower bounds. Proceedings of the Fifteeth Annual ACM Symposium on the Theory of Computing (STOC), pages 165-171, 2018 DOI (40). · Zbl 1428.68390 · doi:10.1145/3188745.3188952
[22] Sofya Raskhodnikova, Dana Ron, Amir Shpilka, and Adam Smith. Strong lower bonds for approximating distributions support size and the distinct elements problem. SIAM Journal on Computing, 39(3):813-842, 2009 DOI (9). · Zbl 1194.68124 · doi:10.1137/070701649
[23] Gregory Valiant and Paul Valiant. Estimating the unseen: an 𝑛/log(𝑛)-sample estimator for entropy and support size, shown optimal via new CLTs. Proceedings of the Fourty-Third Annual ACM Symposium on the Theory of Computing (STOC), pages 685-694, 2011 DOI (3, 25). · Zbl 1288.68186 · doi:10.1145/1993636.1993727
[24] Gregory Valiant and Paul Valiant. Estimating the unseen: improved estimators for entropy and other properties. Journal of the ACM, 64(6), 2017 DOI (3, 21). · Zbl 1426.62107 · doi:10.1145/3125643
[25] Paul Valiant. Testing symmetric properties of distributions. SIAM Journal on Computing, 40(6):1927-1968, 2011 DOI (10). · Zbl 1233.62112 · doi:10.1137/080734066
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.