Exploring the Gap between Tolerant and Non-Tolerant Distribution Testing
Article Type
Research Article
Publication Title
IEEE Transactions on Information Theory
Abstract
The framework of distribution testing is currently ubiquitous in the field of property testing. In this model, the input is a probability distribution accessible via independently drawn samples from an oracle. The testing task is to distinguish a distribution that satisfies some property from a distribution that is far in some distance measure from satisfying it. The task of tolerant testing imposes a further restriction, that distributions close to satisfying the property are also accepted. This work focuses on the connection between the sample complexities of non-tolerant testing of distributions and their tolerant testing counterparts. When limiting our scope to label-invariant (symmetric) properties of distributions, we prove that the gap is at most quadratic, ignoring poly-logarithmic factors. Conversely, the property of being the uniform distribution is indeed known to have an almost-quadratic gap. When moving to general, not necessarily label-invariant properties, the situation is more complicated, and we show some partial results. We show that if a property requires the distributions to be non-concentrated, that is, the probability mass of the distribution is sufficiently spread out, then it cannot be non-tolerantly tested with o(√n) many samples, where n denotes the universe size. Clearly, this implies at most a quadratic gap, because a distribution can be learned (and hence tolerantly tested against any property) using O(n) many samples. Being non-concentrated is a strong requirement on properties, as we also prove a close to linear lower bound against their tolerant tests. Apart from the case where the distribution is non-concentrated, we also show if an input distribution is very concentrated, in the sense that it is mostly supported on a subset of size s of the universe, then it can be learned using only O(s) many samples. The learning procedure adapts to the input, and works without knowing s in advance.
First Page
1153
Last Page
1170
DOI
10.1109/TIT.2024.3483995
Publication Date
1-1-2025
Recommended Citation
Chakraborty, Sourav; Fischer, Eldar; Ghosh, Arijit; Mishra, Gopinath; and Sen, Sayantan, "Exploring the Gap between Tolerant and Non-Tolerant Distribution Testing" (2025). Journal Articles. 5359.
https://digitalcommons.isical.ac.in/journal-articles/5359