Advancing Sino-Philippine linguistics and sociolinguistics using the Lannang Corpus (LanCorp) – a multilingual, POS-tagged, and audio-textual databank
Publication in refereed journal
Officially Accepted for Publication

Altmetrics Information
.

Other information
AbstractThis paper introduces the Lannang Corpus (LanCorp), a public 375,000-word collection of raw and transcribed recordings of Lannang languages spoken in metropolitan Manila, which have been annotated with part-of-speech tags and linked to 40 types of sociolinguistic metadata. It begins by providing an overview of the LanCorp (e.g., design, formats, accessibility). Then, it goes on to show various examples of how the corpus can be used for variationist sociolinguistic research, using Lánnang-uè data as a case study. The findings from the exploratory studies indicate that Lannang languages are influenced by sociolinguistic factors, demonstrating the intricate nature of the Sino-Philippine sociolinguistic ecology. Due to its large size, sociolinguistic metadata, and various formats, LanCorp can be used to study Lannang languages in general and how they are used by specific social groups. It enables scholars to investigate multilingual interactions in a wide range of sociolinguistic factors, furthering the field of Sino-Philippine linguistics.
Acceptance Date28/03/2023
All Author(s) ListWilkinson Daniel Wong Gonzales
Journal nameInternational Journal of Corpus Linguistics
Year2023
PublisherJohn Benjamins Publishing
ISSN1384-6655
LanguagesEnglish-United States
KeywordsSino-Philippine sociolinguistics, language variation and change, mixed language and multilingual corpora, computational methods to multilingual phenomena, language documentation through corpora

Last updated on 2024-23-04 at 10:30