Sparse Poisson coding for high dimensional document clustering
Refereed conference paper presented and published in conference proceedings

替代計量分析
.

其它資訊
摘要Document clustering plays an important role in large scale textual data analysis, which generally faces with great challenge of the high dimensional textual data. One remedy is to learn the high-level sparse representation by the sparse coding techniques. In contrast to traditional Gaussian noise-based sparse coding methods, in this paper, we employ a Poisson distribution model to represent the word-count frequency feature of a text for sparse coding. Moreover, a novel sparse-constrained Poisson regression algorithm is proposed to solve the induced optimization problem. Different from previous Poisson regression with the family of ℓ1-regularization to enhance the sparse solution, we introduce a sparsity ratio measure which make use of both ℓ1-norm and ℓ2-norm on the learned weight. An important advantage of the sparsity ratio is that it bounded in the range of 0 and 1. This makes it easy to set for practical applications. To further make the algorithm trackable for the high dimensional textual data, a projected gradient descent algorithm is proposed to solve the regression problem. Extensive experiments have been conducted to show that our proposed approach can achieve effective representation for document clustering compared with state-of-the-art regression methods. © 2013 IEEE.
著者Chenxia Wu, Haiqin Yang, Jianke Zhu, Jiemi Zhang, Irwin King, Michael R. Lyu
會議名稱2013 IEEE International Conference on Big Data, Big Data 2013
會議開始日06.10.2013
會議完結日09.10.2013
會議地點Santa Clara, CA
會議國家/地區美國
會議論文集題名Proceedings of the 2013 IEEE International Conference on Big Data
出版年份2013
月份12
日期1
頁次512 - 517
國際標準書號9781479912926
電子國際標準書號978-1-4799-1293-3
語言英式英語
關鍵詞document clustering, Poisson regression, sparse coding

上次更新時間 2020-25-10 於 01:14