Adaptive Concept Resolution for document representation and its applications in text mining
Publication in refereed journal


摘要It is well-known that synonymous and polysemous terms often bring in some noise when we calculate the similarity between documents. Existing ontology-based document representation methods are static so that the selected semantic concepts for representing a document have a fixed resolution. Therefore, they are not adaptable to the characteristics of document collection and the text mining problem in hand. We propose an Adaptive Concept Resolution (ACR) model to overcome this problem. ACR can learn a concept border from an ontology taking into the consideration of the characteristics of the particular document collection. Then, this border provides a tailor-made semantic concept representation for a document coming from the same domain. Another advantage of ACR is that it is applicable in both classification task where the groups are given in the training document set and clustering task where no group information is available. The experimental results show that ACR outperforms an existing static method in almost all cases. We also present a method to integrate Wikipedia entities into an expert-edited ontology, namely WordNet, to generate an enhanced ontology named WordNet-Plus, and its performance is also examined under the ACR model. Due to the high coverage, WordNet-Plus can outperform WordNet on data sets having more fresh documents in classification. (C) 2014 Elsevier B.V. All rights reserved.
著者Bing LD, Jiang S, Lam W, Zhang Y, Jameel S
期刊名稱Knowledge-Based Systems
頁次1 - 13
關鍵詞Adaptive Concept Resolution; Ontology; Wikipedia; WordNet; WordNet-Plus
Web of Science 學科類別Computer Science; Computer Science, Artificial Intelligence

上次更新時間 2021-18-01 於 00:13