A General Framework for Estimating Graphlet Statistics via Random Walk
Publication in refereed journal


摘要Graphlets are induced subgraph patterns and have been fre- quently applied to characterize the local topology structures of graphs across various domains, e.g., online social networks (OSNs) and biological networks. Discovering and computing graphlet statistics are highly challenging. First, the massive size of real-world graphs makes the exact computation of graphlets extremely expensive. Secondly, the graph topol- ogy may not be readily available so one has to resort to web crawling using the available application programming inter- faces (APIs). In this work, we propose a general and novel framework to estimate graphlet statistics of “any size”. Our framework is based on collecting samples through consecu- tive steps of random walks. We derive an analytical bound on the sample size (via the Chernoff-Hoeffding technique) to guarantee the convergence of our unbiased estimator. To further improve the accuracy, we introduce two novel opti- mization techniques to reduce the lower bound on the sample size. Experimental evaluations demonstrate that our meth- ods outperform the state-of-the-art method up to an order of magnitude both in terms of accuracy and time cost.
著者Xiaowei Chen, Yongkun Li, Pinghui Wang, John C.S. Lui
期刊名稱Proceedings of the VLDB Endowment
出版社VLDB Endowment
頁次253 - 264
關鍵詞Big Data, Network Science, Random Walk

上次更新時間 2021-12-09 於 00:05