An online-updating algorithm on probabilistic matrix factorization with active learning for task recommendation in crowdsourcing systems
Publication in refereed journal


To ensure the output quality, current crowdsourcing systems highly rely on redundancy of answers provided by multiple workers with varying expertise, however massive redundancy is very expensive and time-consuming. Task recommendation can help requesters to receive good quality output quicker as well as help workers to find their right tasks faster. To reduce the cost, a number of previous works adopted active learning in crowdsourcing systems for quality assurance. Active learning is a learning approach to achieve certain accuracy with a very low cost. However, previous works do not consider the varying expertise of workers for various task categories in real crowdsourcing scenarios; and they do not consider new workers who are not willing to work on a large amount of tasks before having a list of preferred tasks recommended. In this paper, we propose ActivePMFv2, Probabilistic Matrix Factorization with Active Learning (version 2), on a task recommendation framework called TaskRec to recommend tasks to workers in crowdsourcing systems for quality assurance. By assigning the most uncertain task for new workers to work on, this paper identifies a flaw in our previous ActivePMFv1, Probabilistic Matrix Factorization with Active Learning (version 1). Therefore, ActivePMFv2 can give new workers a list of preferred tasks recommended faster than that of ActivePMFv1. Our factor analysis model considers not only worker task selection preference, but also worker performance history. It actively selects the most uncertain task for the most reliable workers to work on to retrain the classification model. Moreover, we propose a generic online-updating method for learning the model, ActivePMFv2. The larger the profile of a worker (or task) is, the less important is retraining its profile on each new work done. In case of the worker (or task) having large profile, our online-updating algorithm retrains the whole feature vector of the worker (or task) and keeps all other entries in the matrix fixed. Our online-updating algorithm runs batch update to reduce the running time of model update.

Complexity analysis shows that our model is efficient and is scalable to large datasets. Based on experiments on real-world datasets, the result shows that the MAE results and RMSE results of our proposed ActivePMFv2 are improved up to 29 % and 35 % respectively comparing with ActivePMFv1, where ActivePMFv1 outperforms the PMF with other active learning approaches significantly as shown in previous work. Experiment results show that our online-updating algorithm is accurate in approximating to a full retrain of the learning model while the average runtime of model update for each work done is reduced by more than 80 % (decreases from a few minutes to several seconds).

To the best of our knowledge, we are the first one to use PMF, active learning and dynamic model update to recommend tasks for quality assurance in crowdsourcing systems for real scenarios.
著者Man-Ching Yuen, Irwin King, Kwong-Sak Leung
期刊名稱Big Data Analytics
出版社BioMed Central
關鍵詞Crowdsourcing, Task recommendation, Matrix factorization, Probabilistic matrix factorization

上次更新時間 2021-13-09 於 00:21