Website Community Mining from Query Logs with Two-Phase Clustering
Refereed conference paper presented and published in conference proceedings


Full Text

Times Cited
Web of Science2WOS source URL (as at 09/04/2021) Click here for the latest count

Other information
AbstractA website community refers to a set of websites that concentrate on the same or similar topics. There are two major challenges in website community mining task. First, the websites in the same topic may not have direct links among them because of competition concerns. Second, one website may contain information about several topics. Accordingly, the website community mining method should be able to capture such phenomena and assigns such website into different communities. In this paper, we propose a method to automatically mine website communities by exploiting the query log data in Web search. Query log data can be regarded as a comprehensive summarization of the real Web. The queries that result in a particular website clicked can be regarded as the summarization of that website content. The websites in the same topic are indirectly connected by the queries that convey information need in this topic. This observation can help us overcome the first challenge. The proposed two-phase method can tackle the second challenge. In the first phase, we cluster the queries of the same host to obtain different content aspects of the host. In the second phase, we further cluster the obtained content aspects from different hosts. Because of the two-phase clustering, one host may appear in more than one website communities.
All Author(s) ListBing LD, Lam W, Jameel S, Lu CL
Name of Conference15th Annual Conference on Intelligent Text Processing and Computational Linguistics (CICLing)
Start Date of Conference06/04/2014
End Date of Conference12/04/2014
Place of ConferenceKathmandu
Country/Region of ConferenceNepal
Journal nameLecture Notes in Artificial Intelligence
Detailed descriptionorganized by Association for Computational Linguistics (ACL),
Year2014
Month1
Day1
Volume Number8404
PublisherSPRINGER-VERLAG BERLIN
Pages201 - 212
ISBN978-3-642-54902-1
eISBN978-3-642-54903-8
ISSN0302-9743
LanguagesEnglish-United Kingdom
KeywordsQuery Logs; Tow-phase Clustering; Website Community
Web of Science Subject CategoriesComputer Science; Computer Science, Artificial Intelligence; Computer Science, Information Systems; Computer Science, Theory & Methods

Last updated on 2021-10-04 at 00:55