Independent range sampling
Refereed conference paper presented and published in conference proceedings


摘要This paper studies the independent range sampling prob-lem. The input is a set P of n points in ℝ. Given an interval q = [x, y] and an integer t ≥ 1, a query returns t elements uniformly sampled (with/without replacement) from P ∩ q. The sampling result must be independent from those returned by the previous queries. The objective is to store P in a structure for answering all queries efficiently. If P fits in memory, the problem is interesting when P is dynamic (i.e., allowing insertions and deletions). The state of the art is a structure of O(n) space that answers a query in O(t log n) time, and supports an update in O(log n) time. We describe a new structure of O(n) space that answers a query in O(log n+t) expected time, and supports an update in O(log n) time. If P does not fit in memory, the problem is challeng-ing even when P is static. The best known structure in-curs O(logB n + t) I/Os per query, where B is the block size. We develop a new structure of O(n/B) space that an-swers a query in O(log*(n/B)+logB n+(t/B) logM/B(n/B)) amortized expected I/Os, where M is the memory size, and log*(n/B) is the number of iterative log2(.) operations we need to perform on n/B before going below a constant. We also give a lower bound argument showing that this is nearly optimal-in particular, the multiplicative term log M/B(n/B) is necessary. Copyright 2014 ACM.
著者Hu X., Qiao M., Tao Y.
會議名稱33rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2014
會議地點Snowbird, UT
詳細描述organized by ACM,
頁次246 - 255
關鍵詞Independent range sampling, Lower bound, Range reporting

上次更新時間 2021-21-04 於 00:54