Samenvatting
Given a point set P ⊆ X of size n in a metric space (X, dist) of doubling dimension d and two parameters k ∈ ℕ and z ∈ ℕ, the k-center problem with z outliers asks to return a set (Equation Presented)of k centers such that the maximum distance of all but z points of P to their nearest center in C∗ is minimized. An (ϵ, k, z)-coreset for this problem is a weighted point set P∗ such that an optimal solution for the k-center problem with z outliers on P∗ gives a (1 ± ϵ)-approximation for the k-center problem with z outliers on P. We study the construction of such coresets in the Massively Parallel Computing (MPC) model, and in the insertion-only as well as the fully dynamic streaming model. We obtain the following results, for any given 0 < ϵ ≤ 1: In all cases, the size of the computed coreset is O(k/ϵd + z).• In the MPC model the data are distributed over m machines. One is the coordinator machine, which will contain the final answer, the others are worker machines.We present a deterministic 2-round algorithm using (Equation Presented)machines, where the worker machines have (Equation Presented)local memory, and the coordinator has (Equation Presented local memory. The algorithm can handle point sets P that are distributed arbitrarily (possibly adversarially) over the machines. We also present a randomized algorithm that uses only a single round, under the assumption that the input set P is initially distributed randomly over the machines. Then we present a deterministic algorithm that obtains a trade-off between the number of rounds, R, and the storage per machine.In the streaming model we have a single machine with limited storage, and P is revealed in a streaming fashion.○ We present the first lower bound for the insertion-only streaming model, where the points arrive one by one and no points are deleted. We show that any deterministic algorithm that maintains an (ϵ, k, z)-coreset must use Ω(k/ϵd + z) space. We complement this by a deterministic streaming algorithm using O(k/ϵd + z) space, which is thus optimal. ○ For the fully dynamic data streams, where points can be inserted as well as deleted we give a randomized algorithm for point sets from a d-dimensional discrete Euclidean space [Δ]d, where Δ ∈ ℕ indicates the size of the universe from which the coordinates are taken. Our algorithm uses only O((k/ϵd + z)log4(kΔ/ϵδ)) space, and it is the first algorithm for this setting. We also present an Ω((k/ϵd)logΔ + z) lower bound for deterministic fully dynamic streaming algorithms. ○ For the sliding-window model, we show that any deterministic streaming algorithm that guarantees a (1 + ϵ)-approximation for the k-center problem with outliers in ℝd must use Ω((kz/ϵd) logσ) space, where σ is the ratio of the largest and smallest distance between any two points in the stream. This (negatively) answers a question posed by De Berg, Monemizadeh, and Zhong [1].
Originele taal-2 | Engels |
---|---|
Titel | Proceedings - 2023 IEEE International Parallel and Distributed Processing Symposium, IPDPS 2023 |
Uitgeverij | Institute of Electrical and Electronics Engineers |
Pagina's | 853-863 |
Aantal pagina's | 11 |
ISBN van elektronische versie | 9798350337662 |
DOI's | |
Status | Gepubliceerd - 2023 |
Evenement | 37th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2023 - St. Petersburg, Verenigde Staten van Amerika Duur: 15 mei 2023 → 19 mei 2023 |
Congres
Congres | 37th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2023 |
---|---|
Land/Regio | Verenigde Staten van Amerika |
Stad | St. Petersburg |
Periode | 15/05/23 → 19/05/23 |
Bibliografische nota
Publisher Copyright:© 2023 IEEE.