Continuously maintaining quantile summaries of the most recent N elements over a data stream
Refereed conference paper presented and published in conference proceedings


摘要Statistics over the most recently observed data elements are often required in applications involving data streams, such as intrusion detection in network monitoring, stock price prediction in financial markets, web log mining for access prediction, and user click stream mining for personalization. Among various statistics, computing quantile summary is probably most challenging because of its complexity. In this paper, we study the problem of continuously maintaining quantile summary of the most recently observed N elements over a stream so that quantile queries can be answered with a guaranteed precision of εN. We developed a space efficient algorithm for pre-defined N that requires only one scan of the input data stream and O(log(ε2N)/ ε + 1/ε2) space in the worst cases. We also developed an algorithm that maintains quantile summaries for most recent N elements so that quantile queries on any most recent n elements (n ≤ N) can be answered with a guaranteed precision of εn. The worst case space requirement for this algorithm is only O(log2(εN)/ε2). Our performance study indicated that not only the actual quantile estimation error is far below the guaranteed precision but the space requirement is also much less than the given theoretical bound.
著者Lin X., Lu H., Xu J., Yu J.X.
會議名稱Proceedings - 20th International Conference on Data Engineering - ICDE 2004
會議地點Boston, MA.
頁次362 - 373

上次更新時間 2021-03-12 於 00:12