Entropy-based Term Weighting Schemes for Text Categorization in VSM
Refereed conference paper presented and published in conference proceedings


Times Cited
Web of Science13WOS source URL (as at 02/08/2020) Click here for the latest count
Altmetrics Information
.

Other information
AbstractTerm weighting schemes have been widely used in information retrieval and text categorization models. In this paper, we first investigate into the limitations of several state-of-the-art term weighting schemes in the context of text categorization tasks. Considering that category-specific terms are more useful to discriminate different categories, and these terms tend to have smaller entropy with respect to these categories, we then explore the relationship between a term's discriminating power and its entropy with respect to a set of categories. To this end, we propose two entropy-based term weighting schemes (i.e., tf.dc and tf.bdc) which measure the discriminating power of a term based on its global distributional concentration in the categories of a corpus. To demonstrate the effectiveness of the proposed term weighting schemes, we compare them with seven state-of-the-art schemes on a long-text corpus and a short-text corpus respectively. Our experimental results show that the proposed schemes outperform the state-of-the-art schemes in text categorization tasks with KNN and SVM.
All Author(s) ListWang T, Cai Y, Leung HF, Cai ZW, Min HQ
Name of Conference27th IEEE International Conference on Tools with Artificial Intelligence (ICTAI)
Start Date of Conference09/11/2015
End Date of Conference11/11/2015
Place of ConferenceVietri sul Mare
Country/Region of ConferenceItaly
Journal name2011 23RD IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2011)
Year2015
Month1
Day1
PublisherIEEE
Pages325 - 332
eISBN978-1-5090-0163-7
ISSN1082-3409
LanguagesEnglish-United Kingdom
KeywordsEntropy; Term Weighting; Text Categorization
Web of Science Subject CategoriesComputer Science; Computer Science, Artificial Intelligence; Computer Science, Theory & Methods

Last updated on 2020-03-08 at 04:54