Collection of user judgments on spoken dialog system with crowdsourcing
Refereed conference paper presented and published in conference proceedings

Times Cited
Altmetrics Information

Other information
AbstractThis paper presents an initial attempt at the use of crowdsourcing for collection of user judgments on spoken dialog systems (SDSs). This is implemented on Amazon Mechanical Turk (MTurk), where a Requester can design a human intelligence task (HIT) to be performed by a large number of Workers efficiently and cost-effectively. We describe a design methodology for two types of HITs - the first targets at fast rating of a large number of dialogs regarding some dimensions of the SDS's performance and the second aims to assess the reliability of Workers on MTurk through the variability in ratings across different Workers. A set of approval rules are also designed to control the quality of ratings from MTurk. At the end of the collection work, user judgments for about 8,000 dialogs rated by around 700Workers are collected in 45 days. We observe reasonable consistency between the manual MTurk ratings and an automatic categorization of dialogs in terms of task completion, which partially verifies the reliability of the approved ratings from MTurk. From the second type of HITs, we also observe moderate inter-rater agreement for ratings in task completion which provides support for the utilization of MTurk as a judgments collection platform. Further research on the exploration of SDS evaluation models could be developed based on the collected corpus. ©2010 IEEE.
All Author(s) ListYang Z., Li B., Zhu Y., King I., Levow G., Meng H.
Name of Conference2010 IEEE Workshop on Spoken Language Technology, SLT 2010
Start Date of Conference12/12/2010
End Date of Conference15/12/2010
Place of ConferenceBerkeley, CA
Country/Region of ConferenceUnited States of America
Detailed descriptionorganized by IEEE,
Pages277 - 282
LanguagesEnglish-United Kingdom
KeywordsAmazon mechanical Turk, Crowdsourcing, Let's go, Spoken dialog system, User judgment

Last updated on 2021-18-02 at 23:48