Similarity and Degree of Perplexity Analysis of Chinese Characters
Publication in refereed journal

Times Cited
Web of Science2WOS source URL (as at 03/07/2020) Click here for the latest count
Altmetrics Information

Other information
AbstractThe study of Chinese scripts has always been a topic drawing continuous scholarly attention from various perspectives. Facilitated with the development of computing technology, recent years saw an increasing interest in the graphic pattern of Chinese characters. This paper focuses on the likelihood of orthographical confusions among any given set of Chinese characters. By defining a normalized metric based on character stroke sequences, we introduce the similarity between two Chinese characters, a unique numerical value which falls within the interval [0, 1], measuring to what extent one character is prone to be confused with another. For a set consisting of a large quantity of characters, we introduce the concept of degree of perplexity (DP), measuring the number of strokes weighted average similarities between a given character and the rest of the characters in the set. An efficient and easy-to-implement algorithm is designed to compute the similarity and degree of perplexity. Our formulas are calibrated with numerical experiments simulated with the most frequently used 200 characters. Based on the numerical simulations, an exponential functional relationship between the degree of perplexity and the number of strokes is proposed and is calibrated with least square regression. Finally, possible applications of the measures introduced are discussed.
All Author(s) ListZhang YH
Journal nameJournal of Quantitative Linguistics
Volume Number18
Issue Number3
Pages189 - 206
LanguagesEnglish-United Kingdom
Web of Science Subject CategoriesLanguage & Linguistics; Linguistics; LINGUISTICS

Last updated on 2020-04-07 at 02:12