A corpus based analysis of lexical richness of Beijing Mandarin speakers: variable identification and model construction
AbstractThis work concerns the lexical richness of Beijing Mandarin speakers measured by entropy. The data used for the study are the Beijing Mandarin Spoken Corpora, a conversational and spontaneous speech corpus of contemporary Beijing Mandarin speakers. Based on the sociovariational linguistic hypotheses and data analysis, the study attempts to identify and explain the key demographical and socioeconomic parameters that impact the entropy of each subject's spoken texts. Both one-dimensional and multi-dimensional statistical models are proposed to quantify the relationships between the pertinent measure of lexical richness and the prominent indicative variables, including age, level of education, and profession premium. A multi-dimensional nonlinear model encompassing these findings is designed and calibrated with statistical estimation methods. Possible future directions and applications in relevant field of applied linguistic are provided. (C) 2014 The Author. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/3.0/).
All Author(s) ListZhang YH
Journal nameLanguage Sciences
Volume Number44
Pages60 - 69
LanguagesEnglish-United Kingdom
KeywordsBeijing Mandarin; Corpus linguistics; Entropy; Lexical richness; Sociovariational analysis; Statistical modeling
Web of Science Subject CategoriesLanguage & Linguistics; Linguistics; LINGUISTICS

