Algorithms and hardness results for nearest neighbor problems in bicolored point sets
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
In the context of computational supervised learning, the primary objective is the classification of data. Especially, the goal is to provide the system with “training” data and design a method which uses the training data to classify new objects with the correct label. A standard scenario is that the examples are points from a metric space, and “nearby” points should have “similar” labels. In practice, it is desirable to reduce the size of the training set without compromising too much on the ability to correctly label new objects. Such subsets of the training data are called as edited sets. Wilfong [SOCG ’91] defined two types of edited subsets: consistent subsets (those which correctly label all objects from the training data) and selective subsets (those which correctly label all new objects the same way as the original training data). This leads to the following two optimization problems: k-MCS-(X) Given k sets of points P1, P2, …, Pk in a metric space X, the goal is to choose subsets of points Pi′⊆Pi for i= 1, 2, …, k such that ∀p∈Pi its nearest neighbor among (Formula Presented) lies in Pi′ for each i∈ [k] while minimizing (Note that we also enforce the condition (Formula Presented) the quantity (Formula Presented) - k-MSS-(X): Given k sets of points P1, P2, …, Pk in a metric space X, the goal is to choose subsets of points Pi′⊆Pi for i= 1, 2, …, k such that ∀p∈Pi its nearest neighbor among (Formula Presented) lies in Pi′ for each i∈ [k] while minimizing (Note that we again enforce the condition |Pi′|≥1∀i∈[k].) the quantity (Formula Presented). While there have been several heuristics proposed for these two problems in the computer vision and machine learning community, the only theoretical results for these problems (to the best of our knowledge) are due to Wilfong [SOCG ’91] who showed that both 3-MCS-(ℝ2) and 2-MSS-(ℝ2) are NP-complete. We initiate the study of these two problems from a theoretical perspective, and obtain several algorithmic and hardness results. On the algorithmic side, we first design an O(n2) time exact algorithm and O(nlog n) time 2-approximation for the 2-MCS-(R) problem, i.e., the points are located on the real line. Moreover, we show that the exact algorithm also extends to the case when the points are located on the circumference of a circle. Next, we design an O(r2) time online algorithm for the 2-MCS-(R) problem such that r< n, where n is the set of points and r is an integer. Finally, we give a PTAS for the k-MSS-(ℝ2) problem. On the hardness side, we show that both the 2-MCS and 2-MSS problems are NP-complete on graphs. Additionally, the problems are W-hard parameterized by the size k of the solution. For points on the Euclidean plane, we show that the 2-MSS problem is contained in W. Finally, we show a lower bound of Ω(√n) bits for the storage of any (randomized) algorithm which solves both 2-MCS-(ℝ) and 2-MSS-(ℝ).
Banerjee, Sandip; Bhore, Sujoy; and Chitnis, Rajesh, "Algorithms and hardness results for nearest neighbor problems in bicolored point sets" (2018). Conference Articles. 162.