A Model of Extended Paragraph Vector for Document Categorization and Trend Analysis
Other conference paper


Full Text

Other information
AbstractThe increasing number of academic papers published each year has led to a growing demand of organizing the
papers into different categories according to their topics and analyzing topic trends over time. Domain knowledge such as journal categories and conference sessions are potentially useful for categorizing papers and obtaining trends easily interpretable to users. In this paper, we aim to organize a collection of papers into journal categories which describe research areas of a field, and then analyze the trend of each research area. Conference sessions are used to link with journal categories assuming that papers from the same session are put into the same category. Sessions are also adopted to reflect the trend of a category over years as they are derived by domain experts to describe each year’s topics. First, we present a model of extended paragraph vector to model the hierarchical structure of sessions, papers
and words, and capture their semantics with distributed vector representations in the same space. Then, we propose a two stage approach for document categorization, which first chooses a subset of journal categories covering the major research areas in the corpus and then associates each session with its most similar category based on session vectors. Finally, we present the research trend of a category through its matching sessions ordered in time
and showing the most similar words of each session.
All Author(s) ListPengfei Liu, King Keung Wu, Helen Meng
Name of ConferenceInternational Joint Conference on Neural Networks (IJCNN)
Start Date of Conference14/05/2017
End Date of Conference19/05/2017
Place of ConferenceAlaska
Country/Region of ConferenceUnited States of America
Year2017
LanguagesEnglish-United States

Last updated on 2018-22-01 at 09:20