Improved Streaming Algorithm for the Klee’s Measure Problem and Generalizations

Document Type

Conference Article

Publication Title

Leibniz International Proceedings in Informatics Lipics

Abstract

Estimating the size of the union of a stream of sets S1, S2, . . ., SM where each set is a subset of a known universe Ω is a fundamental problem in data streaming. This problem naturally generalizes the well-studied F0 estimation problem in the streaming literature, where each set contains a single element from the universe. We consider the general case when the sets Si can be succinctly represented and allow efficient membership, cardinality, and sampling queries (called a Delphic family of sets). A notable example in this framework is the Klee’s Measure Problem (KMP), where every set Si is an axis-parallel rectangle in d-dimensional spaces (Ω “r∆sd where r∆s:“t1, . . ., ∆u and ∆ P N). Recently, Meel, Chakraborty, and Vinodchandran (PODS-21, PODS-22) designed a streaming algorithm for pϵ, δq-estimation of the size of the union of set streams over Delphic family with space and update time complexity O ´ logε32|Ω| ¨ log 1δ¯ and Or ´ logε42|Ω| ¨ log 1δ¯, respectively. This work presents a new, sampling-based algorithm for estimating the size of the union of Delphic sets that has space and update time complexity Or ´ logε22|Ω| ¨ log 1δ¯ . This improves the space complexity bound by a log |Ω| factor and update time complexity bound by a log2 |Ω| factor. A critical question is whether quadratic dependence of log |Ω| on space and update time complexities is necessary. Specifically, can we design a streaming algorithm for estimating the size of the union of sets over Delphic family with space and complexity linear in log |Ω| and update time polyplog |Ω|q? While this appears technically challenging, we show that establishing a lower bound of ωplog |Ω|q with polyplog |Ω|q update time is beyond the reach of current techniques. Specifically, we show that under certain hard-to-prove computational complexity hypothesis, there is a streaming algorithm for the problem with optimal space complexity Oplog |Ω|q and update time polyplogp|Ω|qq. Thus, establishing a space lower bound of ωplog |Ω|q will lead to break-through complexity class separation results.

DOI

10.4230/LIPIcs.APPROX/RANDOM.2024.26

Publication Date

9-1-2024

Share

COinS