Skip to main content

An online-updating algorithm on probabilistic matrix factorization with active learning for task recommendation in crowdsourcing systems

Abstract

Background

To ensure the output quality, current crowdsourcing systems highly rely on redundancy of answers provided by multiple workers with varying expertise, however massive redundancy is very expensive and time-consuming. Task recommendation can help requesters to receive good quality output quicker as well as help workers to find their right tasks faster. To reduce the cost, a number of previous works adopted active learning in crowdsourcing systems for quality assurance. Active learning is a learning approach to achieve certain accuracy with a very low cost. However, previous works do not consider the varying expertise of workers for various task categories in real crowdsourcing scenarios; and they do not consider new workers who are not willing to work on a large amount of tasks before having a list of preferred tasks recommended. In this paper, we propose ActivePMFv2, Probabilistic Matrix Factorization with Active Learning (version 2), on a task recommendation framework called TaskRec to recommend tasks to workers in crowdsourcing systems for quality assurance. By assigning the most uncertain task for new workers to work on, this paper identifies a flaw in our previous ActivePMFv1, Probabilistic Matrix Factorization with Active Learning (version 1). Therefore, ActivePMFv2 can give new workers a list of preferred tasks recommended faster than that of ActivePMFv1. Our factor analysis model considers not only worker task selection preference, but also worker performance history. It actively selects the most uncertain task for the most reliable workers to work on to retrain the classification model. Moreover, we propose a generic online-updating method for learning the model, ActivePMFv2. The larger the profile of a worker (or task) is, the less important is retraining its profile on each new work done. In case of the worker (or task) having large profile, our online-updating algorithm retrains the whole feature vector of the worker (or task) and keeps all other entries in the matrix fixed. Our online-updating algorithm runs batch update to reduce the running time of model update.

Results

Complexity analysis shows that our model is efficient and is scalable to large datasets. Based on experiments on real-world datasets, the result shows that the MAE results and RMSE results of our proposed ActivePMFv2 are improved up to 29 % and 35 % respectively comparing with ActivePMFv1, where ActivePMFv1 outperforms the PMF with other active learning approaches significantly as shown in previous work. Experiment results show that our online-updating algorithm is accurate in approximating to a full retrain of the learning model while the average runtime of model update for each work done is reduced by more than 80 % (decreases from a few minutes to several seconds).

Conclusions

To the best of our knowledge, we are the first one to use PMF, active learning and dynamic model update to recommend tasks for quality assurance in crowdsourcing systems for real scenarios.

Background

Crowdsourcing is an idea of outsourcing a task to a large group of networked people in the form of an open call to reduce the production cost [1, 2]. In recent years, crowdsourcing systems attract much attentions at present [3, 4]. Some examples of crowdsourcing systems are Amazon Mechanical Turk (or MTurk) [5], CrowdFlower [6], Taskcn [7] and TopCoder [8]. In a crowdsourcing system, the output quality of a completed task in a crowdsourcing system is “the extent to which the provided outcome fulfills the requirements of the requester” [9]. For quality assurance, a requester has to verify the quality of every answer submitted by workers, and it is very time-consuming. Alternatively, requesters highly rely on redundancy of answers provided by multiple workers with varying expertise, but massive redundancy is very expensive and time-consuming. “If we ask 10 workers to complete the same task, then the cost of crowdsourcing solutions tends to be comparable to the cost of in-house solutions” [10]. Therefore, it is important to investigate on how to support task requesters to verify correct answers on crowdsourcing platforms easily and effectively. On the other hand, it is not efficient that the amount of time for a worker spent on selecting a task is comparable with that spent on working on a task, but the monetary reward of a task is just a small amount. To address this problem, task recommendation can help to provide a list of preferred tasks to workers in crowdsourcing systems. However, new workers do not want to work on a large number of tasks or wait for a long time before having a list of preferred tasks recommended. Therefore, it is important to help workers to find their right tasks as quick as possible and minimize the number of task assignments to achieve a target output quality [11]. The worker performance history makes it possible to mine workers’ preference on tasks and to provide an indication of worker quality on tasks. Based on worker performance history, an active learning approach on task recommendation can be used to help requesters to receive good quality output quicker with lower cost, thus achieve quality assurance in crowdsourcing systems. Moreover, by assigning the most uncertain task for new workers to work on, it not only helps new workers having a list of preferred tasks recommended faster, but also improves the output quality of crowdsourcing systems.

Task recommendation can help requesters to receive good quality output quicker as well as help workers to find their right tasks faster. Probabilistic Matrix Factorization (PMF) [12] is the state-of-the-art approach for recommendation systems. A factorization model has to be trained and learned before the model can be applied for prediction. In real-world applications, the performance of a factorization model is highly affected by how the model is updated, and thus dynamic updating a model is very important [13]. When updating a worker’s profile, the profile will not change much if the worker having large profile; while the profile will have great change if the worker having small profile.

Active learning is a learning approach to improve the prediction accuracy with a low cost. By performing active learning in a task recommendation model, it can guarantee the accuracy of recommendations with a very low cost, but it still needs to consider the minimizaton of the user waiting time. The tasks recommended to the new workers have to carefully selected, because new workers are not willing to work on a lot of tasks before having their preferred tasks recommended. Therefore, systems that provide recommendations in large user waiting times are not suitable for real-world applications [14]. Moreover, it does not make sense to retrain the model from scratch whenever a worker of large profile completes a task, because the performance improvement by retraining the model in the case is tiny but the cost of retraing model is high. Furthermore, when a large number of workers are working in the crowdsourcing system at the same period of time, the computational complexity is very high if the model is retrained after each worker completes a task. Batch update provides a method of reducing both user waiting time and computational complexity.

Our contributions are as follows:

  • First, we propose a way for quality assurance by performing ActivePMFv2, Probabilistic Matrix Factorization with Active Learning (version 2), for task recommendation in crowdsourcing systems, where ActivePMF is an active learning approach on factor analysis based on probabilistic matrix factorization, such that the worker latent feature space, task latent feature space and task category latent feature space are learned. ActivePMF considers the varying expertise of workers for different tasks in real crowdsourcing scenarios. The most informative task and the most skillful worker are selected to learn the factor analysis model.

  • Second, we first assign all new tasks to the most reliable workers based on the task categories in our proposed ActivePMFv2, so new workers can receive a list of preferred tasks recommended faster than that of ActivePMFv1 [15]. Our proposed ActivePMFv2 also has better output prediction quality than that of ActivePMFv1.

  • Third, we propose a generic online-updating method for learning a factor analysis model, ActivePMFv2. In our proposed online-updating approach, our online-updating algorithm applies on a learned PMF model without having retrain the whole model. The proposed update methods are generic and appliable for all PMF models.

  • Fourth, we demonstrate the performance of our proposed ActivePMFv2 by using the real word dataset. The experimental results show that ActivePMFv2 outperforms ActivePMFv1 by 29 % in the MAE results and 35 % in RMSE results, where ActivePMFv1 outperforms PMF with various active learning approaches significantly (PMF is the state-of-the-art approach for recommendation systems).

  • Fifth, we demonstrate the performance of our online-updating algorithm by using the real world dataset. The experiment results show that the prediction of online-updating ActivePMF on TaskRec model approximates to that of a full retrain of ActivePMF on TaskRec model while the running time of online-updating algorithm is significantly lower than that of a full retrain of the model. By using online-updating algorithm, the average runtime of model update for each work done is reduced by more than 80 % (decreases from a few minutes to several seconds).

  • Finally, complexity analysis shows that our model is efficient and is scalable to large datasets.

Related work

Crowdsourcing systems

Crowdsourcing is outsourcing a task to a large group of networked people in the form of an open call to reduce the production cost. A crowdsourcing process involves operations of both requesters and workers. A requester submits a task request; a worker selects and completes a task; and the requester only pays the worker for the successful completion of the task. Task recommendation in crowdsourcing is important because of the following reasons:

  • Motivate workers of diverse background to work on crowdsourcing tasks in long run. Currently, on crowdsourcing sites, most workers only provide moderate contributions [16] and there is a significant population of young and well-educated Indian workers [17]. It can attract more workers to contribute their efforts in long run if a worker find a suitable task on a crowdsourcing site easily.

  • Improve the quality of work. Workers perform better if they are familiar with the tasks. Chilton et al. [18] showed that task workers only browsed the first few pages on crowdsourcing sites when searching for tasks. The task list for a worker of Amazon MTurk site is usually displayed on hundreds of pages. A worker selects a task from the list of available tasks sorted by a specified feature of tasks such as task creation date and reward amount. When the tasks posted on the first few pages are not suitable for a worker, the worker might choose a task that he does not familiar with and try to complete it to earn the rewards; otherwise, the worker does not select any task. Working with a unfamiliar task might decrease the quality of work.

Recommendation systems

Broadly speaking, recommendation systems are based on either content filtering approach or collaborative filtering approach. The content filtering approach creates a profile for each user or product, for example, a movie, to characterize its nature. The profiles of users and products allow programs to associate users with matching products. The advantage is that it can address the system’s new products and users. However, the profile information might not be available or easy to collect. On the other hand, the collaborative filtering approach relies only on past user behavior. This approach analyzes relationships between users and interdependencies among products to identify new user-item associations. It is generally more accurate than content filtering approach. However, collaborative filtering cannot address the system’s new products and users [19], which is the cold-start problem.

To address the cold-start problem, latent factor models are an alternative approach that can approximate the ratings by characterizing both users and items on a number of factors inferred from the ratings patterns. “Some of the most successful realizations of latent factor models are based on matrix factorization.” [19] Matrix factorization has a lot of applications [20, 21]. Although matrix factorization can solve the cold-start problem, it is not scalable. Probabilistic matrix factorization (PMF) model [22] can scale linearly with the number of observations, and performs very well on large, sparse, and imbalanced datasets.

Recently, several probabilistic matrix factorization methods [23] have been proposed for collaborative filtering approach in recommendation systems. These methods focus on using low-rank approximations to model the user-item rating matrix for making further predictions. The premise behind a low-dimensional factor model is that there is only a small number of factors influencing preferences, and that a user’s preference vector is determined by how each factor applies to that user. The above approaches are used for user recommendation in social tagging systems.

Task recommendation in crowdsourcing systems

A recommendation system can improve the performance of crowdsourcing systems by providing task requesters some output quality controls based on a number of parameters, such as task requirements, task properties, worker interests, worker incentives, and costs [24]. Based on collaborative filtering, Organisciak et al. [25] proposed two approaches for capturing personal preferences in personalized item recommendation in crowdsourcing systems; and they are taste-matching and taste-grokking. Taste-matching uses workers’ taste to infer the requester’s taste where workers and the requester have similar tastes, while taste-grokking uses workers’ explicit prediction on the requester’s taste. Both taste-matching and taste-grokking have better performance than the use of generic workers. Later, Organisciak et al. [26] demonstrated the performance of personalized crowdsourcing in a complexer environment by carrying out case studies on personalized text highlighting in film reviews. The results show that both approaches have better performance than a non-personalized baseline. Besides, the taste-grokking approach performs well in simpler tasks and the taste-matching approach performs well with larger crowds and tasks with latent decision-making variables. Ambati et al. [27] proposed classification based task recommendation approach to recommend tasks to users based on implicit modeling of skills and interests. However, these approaches can not solve cold-start problem. Besides, task recommendation is much difficult than product recommendation, and workers do not have to give ratings to tasks to indicate the extent of their favor of each task. A crowdsourcing system needs some signals indicating types of available tasks, and the number of tasks workers select and complete [28].

Active learning

Active learning is a learning approach to achieve certain accuracy with a very low cost. Broadly speaking, active learning systems are based on either stream-based approach [29] or pool-based approach [30]. The stream-based approach considers one unlabeled instance each time, and decides whether to query its label or ignore it. This approach is useful when unlabeled instance is continuously available but cannot be stored easily, such as sensor data. However, as the stream-based approach relies on a real underlying input distribution, it is difficult to decide whether to query the label of an instance or ignore it at its arrival time. On the other hand, the pool-based approach ranks all unlabeled instances in order of informativeness, and queries the label for the most informative unlabeled instance in the pool. The advantage is that large amount of unlabeled data are available in many domains at present, this approach is very important. However, it is difficult to find a good way to choose good queries from the pool. In both stream-based approach and pool-based approach, a query selection strategy is required to achieve high accuracy with as low labeling cost as possible.

Several query selection strategies are commonly used in active learning systems. For example, uncertainty sampling selects instances, which the current model is the most uncertain about, to query. There are many ways to measure the uncertainty, such as smallest margin [31], least confidence [32] and maximum entropy [33]. Another example of query selection strategies is query by committee [34]. This approach maintains a committee of independently trained classification models, and queries the instance for which the committee models disagree the most. Among all query selection strategies, uncertainty sampling is one of the simplest and widely used strategies in active learning systems [35].

Active learning for task recommendation in crowdsourcing systems to achieve quality assurance

Our motivation is the observation of the increase of difficulty for requesters to obtain good output with low cost. Currently, active learning approach has been applied on a number of quality assurance methods in crowdsourcing systems [3537] because active learning can achieve certain accuracy by using fewer annotations even in noisy annotation scenarios. Laws et al. [36] demonstrated that actively selecting instances for label query can achieve performance gains in natural language processing tasks. However, they did not consider actively selecting annotators. To further improve the output quality, Yan et al. [35] proposed to use uncertainty sampling to select the most uncertain instance to query, and also uses an optimization formulation to choose multiple confident workers to query from. However, it is costly to pay for multiple labelers. To characterize the strength of each worker and improve the compentency of weak workers, Fang et al. [37] proposed a Self-Taught Active Learning paradigm, where a weak worker can learn complementary knowledge from a strong worker. Each labeler has a knowledge set (a set of confidence scores), which is used for worker selection. By using the knowledge set of the most reliable labeler to replace that of the most unreliable labeler, the most reliable labeler can teach the unreliable labeler. However, the learning curves of different labelers vary in real scenarios. Later, Fang and Zhu [38] proposed to use diversity density to characterize the oracle’s uncertain knowledge. However, an oracle does not exist in real-world applications.

In recent years, a number of research works [3942] proposed recommendation systems based on a Probabilistic Matrix Factorization (PMF) model to improve the output quality in crowdsourcing systems, where Probabilistic Matrix Factorization (PMF) is the state-of-the-art approach for recommendation systems. Jung and Lease [39] proposed to use a PMF model to infer unobserved labels to reduce the bias of the existing crowdsourced labels, thus improve the quality of labels. Later, Jung [40] proposed to use a PMF model to improve the quality of crowdsourcing tasks. Experimental results proved that the strength of PMF over Singular Value Decomposition (SVD) and baseline methods. However, it is not suitable for a huge number of tasks on crowdsourcing systems in reality. Yuen et al. [41, 42] considered various task categories in real scenarios in crowdsourcing systems and proposed a PMF model for task recommendation in crowdsourcing systems. They proved that considering task categories in PMF can improve the performance. However, it does not consider to reduce the labeling cost by applying active learning approach. Later, Yuen et al. [15] proposed Probabilistic Matrix Factorization with Active Learning (version 1) for task recommendation systems. The model outperforms the PMF model with other active learning approaches, but new workers have to wait for a long time before having a list of perferred tasks recommended due to lack of worker performance history for all new workers.

Task recommendation for new workers in crowdsourcing systems

In crowdsourcing systems, workers prefer to have a list of recommended tasks, but they are not willing to work on a large number of tasks before having a list of preferred tasks recommended. Since new workers have not worked on a lot of tasks yet, it is difficult for a recommendation system to make a better recommendation for new workers due to the small working profiles of new workers. When performing active learning in recommendation systems, besides accuracy of recommendations and minimization of cost, it is also important for new workers to have a list of preferred tasks recommended as soon as they start working in a crowdsourcing system.

Dynamic-updating crowdsourcing systems for real-world scenarios

Some previous works proposed various ways to improve the performance of recommendation systems for real-world scenarios [13, 14]. Rendle et al. [13] proposed an online-updating algorithm for three kernal matrix factorization models, they are linear, logistic and linear non-negative matrix factorization model. They demonstrated that the output quality of their proposed online-updating algorithm approximates to that of fully retraining the models. However, the three kernal matrix factorization models are designed for user-item matrices. Besides, the user preference on item category, which is necessary for real-world applications, is not considered in the models. Karimi et al. [14] proposed an active learning method for aspect model in recommendation systems. They observed that “users are not willing to provide information for a large amount of items, thus the quality of recommendations is affected specially for new users” [14], and a full retrain of an aspect model needs a long time especially for a system of large number of existing users. In their online-updating method, the aspect model is updated to learn user latent factors for new users only and not the other users. Experimental results show that the user waiting time of their method is significantly less than that of bayesian method, but the prediction accuracy of their method is not always better than that of bayesian method.

Our motivation

Our motivation is the observation of the increase of difficulty for workers to find their preferred tasks [18, 43, 44], the increase of demand on task recommendation in crowdsourcing systems [4547], and no previous works on task recommendation model for crowdsourcing systems that considers dynamic-updating for reducing the user waiting time on model update. To achieve certain output quality with a very low cost, we propose ActivePMFv2, Probabilistic Matrix Factorization with Active Learning (version 2), for task recommendation in crowdsourcing systems to actively select the most uncertain tasks and the most reliable workers for retraining the classification model. To reduce the user waiting time on the learning model update, we propose a generic online-updating method for learning the model, ActivePMFv2.

Task recommendation framework

Our task recommendation framework (TaskRec) is based on matrix factorization method, to perform factor analysis to learn the worker latent feature, the task latent feature and the task category latent feature. To reduce the labeling cost and guarantee the output quality, Probabilistic Matrix Factorization with Active Learning (version 2) selects the most informative task to be learned and selects the best worker to query from.

The problem we study in this paper is how to effectively predict the missing values in the worker-task performance matrix so as to select the most informative task for the best worker to query from to achieve high accuracy with as few instances as possible. We define the problem of quality assurance in crowdsourcing systems as follows:

Definition 1

Quality assurance problem: Given a set of workers WS = \(\{ w_{i} \}^{m}_{i=1}\), a set of tasks VS= \(\{ v_{j} \}^{n}_{j=1}\), a set of ratings R = {r ij } associated between worker w i and task v j where \(r_{ij} \in \mathbb {R}^{M\times N}\) and only certain elements of R are initially known. The binary matrix I = {I ij } of the same shape as R represents the known points, so that I ij is 1 if r ij is observed and 0 otherwise. The set of (i, j) indexes where I ij = 0 is denoted by PS. Predict the set of the unknown elements of R = {r ij } where (i,j)P S. The aim of ActivePMF is to query the most informative tasks selected from the set of the unknown elements of R = {r ij } where (i,j)P S, and to query from the most reliable workers.

To facilitate our discussions, Table 1 defines basic terms and notations used throughout this paper.

Table 1 Basic notations throughout this paper

Probabilistic Matrix Factorization on Task Recommendation Framework

Our model consists of three parts. First, we connect workers’ task preferring information with workers’ category preferring information through the shared worker latent feature space. Second, we connect workers’ task preferring information with tasks’ category grouping information through the shared task latent feature space. Third, we connect workers’ category preferring information with tasks’ category grouping information through the shared category latent feature space. The graphical model of the TaskRec framework is represented in Fig. 1.

Fig. 1
figure 1

Graphical Model for TaskRec. This is the graphical model for TaskRec, a task recommendation framework in crowdsourcing systems

By using a worker-task preferring matrix, we can measure the extend the worker prefer to work the task and provide output that accepted by requesters. Unlike traditional recommendation systems, workers do not have to give ratings to tasks to indicate the extent of their favor of each task. To have ratings on tasks, we transform workers’ behaviors into values as follows:

Worker Behavior

 

Value

Worker’s work done is accepted by requester.

5

Worker’s work done is rejected by requester.

4

Worker completes a task and submits the work done.

3

Worker selects a task to work on but not complete it.

2

Worker browses the detailed information of a task.

1

Worker does not browse the detailed information of a task.

0

In some cases, the ratings based on value transformation of worker behavior would be inaccurate on reflecting workers’ task preference. For example, a worker’s work done is being accepted, but he might not like the task very much.

Worker-task preferring matrix factorization

We have m workers, n tasks. The worker-task preferring matrix is denoted as R, the element r ij in R means the extent of the favor of task v j for worker w i , where values of r ij are within the range [0, 1]. Without loss of generality, we first map the ratings that inferred from worker behavior 1,..., 5 to the interval [0, 1] using the function f(x)=x/5. Hence, we are given a partially observed worker-task preferring matrix, R, with m workers and n tasks.

To learn the workers’ preference on the tasks, we employ matrix factorization, more specifically, Probabilistic Matrix Factorization (PMF) [12], to recover the worker-task preferring matrix. Given the partial observed matrix R, we aim at decomposing the matrix R into two l-dimensional low-rank feature matrices, W and V, where \(W \in \mathbb {R}^{l \times m}\) is the latent feature matrix for workers with column vector W i , and \(V \in \mathbb {R}^{l \times n}\) is the latent feature matrix for tasks with column vector V j .

To learn the matrices, a Gaussian distribution on the residual of the observed ratings is assumed as [12], and it is defined in Eq. (1):

$$\begin{array}{*{20}l} p(R|W, V, {\sigma^{2}_{R}})= \prod^{m}_{i=1} \prod^{n}_{j=1} \left[N (r_{ij} | g({W^{T}_{i}} V_{j}), {\sigma^{2}_{R}}) \right]^{I^{R}_{ij}}, \end{array} $$
(1)

where N(x|μ,σ 2) is the probability density function of the Gaussian distribution with mean μ and variance σ 2, and \(I^{R}_{ij}\) is the indicator function that is equal to 1 if the entry r ij is observed and equal to 0 otherwise. The Gaussian distribution model can make predictions outside of the range of valid values. The function g(x) is the logistic function g(x)=1/(1+e x p(−x)), which makes it possible to bound the range of \({W^{T}_{i}} V_{j}\) within the range [0,1]. Similar to [48], to avoid overfitting, zero-mean spherical Gaussian priors are also placed on the worker and task feature matrices, which are defined in Eq. (2):

$$\begin{array}{*{20}l} p\left(W|{\sigma^{2}_{W}}\right) = \prod^{m}_{i=1} N \left(W_{i} | 0, {\sigma^{2}_{W}}\right), p\left(V|{\sigma^{2}_{V}}\right) = \prod^{n}_{j=1} N \left(V_{j} | 0, {\sigma^{2}_{V}}\right). \end{array} $$
(2)

Hence, through a Bayesian inference, the posterior distributions of W and V based only on the observed ratings are derived in Eq. (3):

$$\begin{array}{*{20}l} &p\left(W,V|R,{\sigma^{2}_{R}},{\sigma^{2}_{W}},{\sigma^{2}_{V}}\right) \propto p\left(R|W, V, {\sigma^{2}_{R}}\right) p\left(W|{\sigma^{2}_{W}}\right) p\left(V|{\sigma^{2}_{V}}\right) \\ = &\prod^{m}_{i=1} \prod^{n}_{j=1} \left[N \left(r_{ij} | g({W^{T}_{i}} V_{j}), {\sigma^{2}_{R}}\right) \right]^{I^{R}_{ij}} \times \prod^{m}_{i=1} N (W_{i} | 0, {\sigma^{2}_{W}}) \times \prod^{n}_{j=1} N \left(V_{j} | 0, {\sigma^{2}_{V}}\right). \end{array} $$
(3)

Worker-category preferring matrix factorization

We have m workers and o task categories. The worker-category preferring matrix is denoted as U, where the element u ik in U represents the extent of worker w i ’s preference for task category c k . Workers’ performance histories indicate workers’ preference for task categories, so the meaning of u ik can be interpreted as whether the worker w i has completed a task of the category c k where the task is accepted (a binary representation), or how strong the worker w i ’s preference is for the task category c k (a real value representation). We represent u ik as shown in Eq. (4):

$$\begin{array}{*{20}l} u_{ik} = g(f(w_{i}, c_{k})), \end{array} $$
(4)

where g(.) is the logistic function, and f(w i ,c k ) represents the number of times worker w i completes a task of the category c k where the task is accepted.

The idea of worker-category preferring matrix factorization is to derive two low-rank l-dimensional matrices W and C, where \(W \in \mathbb {R}^{l \times m}\) and \(C \in \mathbb {R}^{l \times o}\) are the latent feature matrices for workers and task categories, respectively. The column vectors W i and C k representing the l-dimensional worker-specific and category-specific latent feature vectors of worker w i and category c k , respectively. We can define the conditional distributions over the observed worker-category preferring matrix in Eq. (5):

$$\begin{array}{*{20}l} p(U|W, C, {\sigma^{2}_{U}})= \prod^{m}_{i=1} \prod^{o}_{k=1} \left[N (u_{ik} | g({W^{T}_{i}} C_{k}), {\sigma^{2}_{U}}) \right]^{I^{U}_{ik}}, \end{array} $$
(5)

where N(x|μ,σ 2) is the probability density function of the Gaussian distribution with mean μ and variance σ 2, and \(I^{U}_{ik}\) is the indicator function that is equal to 1 if worker w i has at least one completed task of the category c k being accepted and equal to 0 otherwise.

To avoid overfitting, zero-mean spherical Gaussian priors are placed on the worker and the category latent feature matrices, which are defined in Eq. (6):

$$\begin{array}{*{20}l} p(W|{\sigma^{2}_{W}}) = \prod^{m}_{i=1} N (W_{i} | 0, {\sigma^{2}_{W}}), p(C|{\sigma^{2}_{C}}) = \prod^{o}_{k=1} N (C_{k} | 0, {\sigma^{2}_{C}}). \end{array} $$
(6)

Hence, through a Bayesian inference, the posterior distributions of W and C based only on the observed ratings are derived in Eq. (7):

$$\begin{array}{*{20}l} &p(W,C|U,{\sigma^{2}_{C}},{\sigma^{2}_{W}},{\sigma^{2}_{U}}) \propto p(U|W, C, {\sigma^{2}_{U}}) p(W|{\sigma^{2}_{W}}) p(C|{\sigma^{2}_{C}}) \\ = &\prod^{m}_{i=1} \prod^{o}_{k=1} \left[N (u_{ik} | g({W^{T}_{i}} C_{k}), {\sigma^{2}_{U}}) \right]^{I^{U}_{ik}} \times \prod^{m}_{i=1} N (W_{i} | 0, {\sigma^{2}_{W}}) \times \prod^{o}_{k=1} N (C_{k} | 0, {\sigma^{2}_{C}}). \end{array} $$
(7)

Task-category grouping matrix factorization

We have n tasks and o task categories. The task-category grouping matrix is denoted as D, where the element d jk in D shows the category c k that task v j belongs to. The meaning of d jk can be interpreted as whether the task v j belongs to the category c k (a binary representation). We represent d jk as shown in Eq. (8):

$$\begin{array}{*{20}l} d_{jk} = f(v_{j}, c_{k}), \end{array} $$
(8)

where f(v j ,c k ) is an indicator variable with the value of 1 if the task v j belongs to the category c k , and 0 otherwise.

The idea of task-category grouping matrix factorization is to derive two low-rank l-dimensional matrices V and C, where \(V \in \mathbb {R}^{l \times n}\) and \(C \in \mathbb {R}^{l \times o}\) are the latent feature matrices for tasks and task categories, respectively. The column vectors V j and C k representing the l-dimensional task-specific and category-specific latent feature vectors of task v j and category c k , respectively. We can define the conditional distributions over the observed task-category grouping matrix in Eq. (9):

$$\begin{array}{*{20}l} p(D|V, C, {\sigma^{2}_{D}})= \prod^{n}_{j=1} \prod^{o}_{k=1} \left[N (d_{jk} | g({V^{T}_{j}} C_{k}), {\sigma^{2}_{D}}) \right]^{I^{D}_{jk}}, \end{array} $$
(9)

where N(x|μ,σ 2) is the probability density function of the Gaussian distribution with mean μ and variance σ 2, and \(I^{D}_{jk}\) is the indicator function that is equal to 1 if the entry d jk is observed and equal to 0 otherwise.

To avoid overfitting, zero-mean spherical Gaussian priors are placed on the task and the category latent feature matrices, which are defined in Eq. (10):

$$\begin{array}{*{20}l} p(V|{\sigma^{2}_{V}}) = \prod^{n}_{j=1} N (V_{j} | 0, {\sigma^{2}_{V}}), p(C|{\sigma^{2}_{C}}) = \prod^{o}_{k=1} N (C_{k} | 0, {\sigma^{2}_{C}}). \end{array} $$
(10)

Hence, through a Bayesian inference, the posterior distributions of V and C based only on the observed ratings are derived in Eq. (11):

$$\begin{array}{*{20}l} &p(V,C|D,{\sigma^{2}_{C}},{\sigma^{2}_{V}},{\sigma^{2}_{D}}) \propto p(D|V, C, {\sigma^{2}_{D}}) p(V|{\sigma^{2}_{V}}) p(C|{\sigma^{2}_{C}}) \\ = &\prod^{n}_{j=1} \prod^{o}_{k=1} \left[N (d_{jk} | g({V^{T}_{j}} C_{k}), {\sigma^{2}_{D}}) \right]^{I^{D}_{jk}} \times \prod^{n}_{j=1} N (V_{j} | 0, {\sigma^{2}_{V}}) \times \prod^{o}_{k=1} N (C_{k} | 0, {\sigma^{2}_{C}}). \end{array} $$
(11)

A unified matrix factorization for TaskRec

According to the graphical model of the TaskRec framework described in Fig. 1, we derive the log function of the posterior distributions of TaskRec in Eq. (12):

$$\begin{array}{*{20}l} &\ln p(W, V, C|R, U, D, {\sigma^{2}_{W}}, {\sigma^{2}_{V}}, {\sigma^{2}_{C}}, {\sigma^{2}_{R}}, {\sigma^{2}_{U}}, {\sigma^{2}_{D}}) \\ = &- \frac{1}{2 {\sigma^{2}_{R}}} \sum^{m}_{i=1} \sum^{n}_{j=1} { I^{R}_{ij} \left(r_{ij} - g \left({W^{T}_{i}} V_{j} \right) \right)^{2} } - \frac{1}{2 {\sigma^{2}_{U}}} \sum^{m}_{i=1} \sum^{o}_{k=1} { I^{U}_{ik} \left(u_{ik} - g \left({W^{T}_{i}} C_{k} \right) \right)^{2}} \\ &- \frac{1}{2 {\sigma^{2}_{D}}} \sum^{n}_{j=1} \sum^{o}_{k=1} { I^{D}_{jk} \left(d_{jk} - g \left({V^{T}_{j}} C_{k} \right) \right)^{2} } - \frac{1}{2 {\sigma^{2}_{W}}} \sum^{m}_{i=1} {W^{T}_{i}} W_{i} - \frac{1}{2 {\sigma^{2}_{V}}} \sum^{n}_{j=1} {V^{T}_{j}} V_{j} \\ & - \frac{1}{2 {\sigma^{2}_{C}}} \sum^{o}_{k=1} {C^{T}_{k}} C_{k} - \sum^{m}_{i=1} \sum^{n}_{j=1} { I^{R}_{ij} \ln \sigma_{R}} - \sum^{m}_{i=1} \sum^{o}_{k=1} { I^{U}_{ik} \ln \sigma_{U}} - \sum^{n}_{j=1} \sum^{o}_{k=1} { I^{D}_{jk} \ln \sigma_{D}} \\ & - l\sum^{m}_{i=1} { \ln \sigma_{W}} - l\sum^{n}_{j=1} { \ln \sigma_{V}} - l\sum^{o}_{k=1} { \ln \sigma_{C}} + \mathcal{C}, \end{array} $$
(12)

where \(\mathcal {C}\) is a constant independent of the parameters. We can see the Eq. (12) is an unconstrained optimization problem, and maximizing the log-posterior distributions with fixed hyper parameters is equivalent to minimizing the sum-of-squared-errors objective function with quadratic regularized terms in Eq. (13):

$$\begin{array}{*{20}l} &E(W, V, C, R, U, D) \\ = &\frac{1}{2} \sum^{m}_{i=1} \sum^{n}_{j=1} { I^{R}_{ij} \left(r_{ij} - g \left({W^{T}_{i}} V_{j} \right) \right)^{2}} + \frac{\theta_{U}}{2} \sum^{m}_{i=1} \sum^{o}_{k=1} { I^{U}_{ik} \left(u_{ik} - g \left({W^{T}_{i}} C_{k} \right) \right)^{2}} \\ & + \frac{\theta_{D}}{2} \sum^{n}_{j=1} \sum^{o}_{k=1} { I^{D}_{jk} \left(d_{jk} - g \left({V^{T}_{j}} C_{k} \right) \right)^{2}} \\ & + \frac{\theta_{W}}{2} \sum^{m}_{i=1} {W^{T}_{i}} W_{i} + \frac{\theta_{V}}{2} \sum^{n}_{j=1} {V^{T}_{j}} V_{j} + \frac{\theta_{C}}{2} \sum^{o}_{k=1} {C^{T}_{k}} C_{k}, \end{array} $$
(13)

where \(\theta _{U} = {\sigma ^{2}_{R}} / {\sigma ^{2}_{U}} \), \(\theta _{D} = {\sigma ^{2}_{R}} / {\sigma ^{2}_{D}} \), \(\theta _{W} = {\sigma ^{2}_{R}} / {\sigma ^{2}_{W}} \), \(\theta _{V} = {\sigma ^{2}_{R}} / {\sigma ^{2}_{V}} \), and \(\theta _{C} = {\sigma ^{2}_{R}} / {\sigma ^{2}_{C}} \). The local minimum can be found by performing the gradient descent on W i , V j and C k , and the derived gradient descent equations are described in Eq. (14), Eq. (15) and Eq. (16) respectively:

$$\begin{array}{*{20}l} \frac {\partial{E}} {\partial{W_{i} }} =& \sum^{n}_{j=1} { I^{R}_{ij} \left(g \left({W^{T}_{i}} V_{j} \right) - r_{ij} \right) g^{'} \left({W^{T}_{i}} V_{j} \right) V_{j} + \theta_{W} W_{i}} \\ & + \theta_{U} \sum^{o}_{k=1} { I^{U}_{ik} \left(g \left({W^{T}_{i}} C_{k} \right) - u_{ik} \right) g^{'} \left({W^{T}_{i}} C_{k} \right) C_{k}}, \end{array} $$
(14)
$$\begin{array}{*{20}l} \frac {\partial{E}} {\partial{V_{j} }} =& \sum^{m}_{i=1} { I^{R}_{ij} \left(g \left({W^{T}_{i}} V_{j} \right) - r_{ij} \right) g^{'} \left({W^{T}_{i}} V_{j} \right) W_{i} + \theta_{V} V_{j}} \\ & + \theta_{D} \sum^{o}_{k=1} { I^{D}_{jk} \left(g \left({V^{T}_{j}} C_{k} \right) - d_{jk} \right) g^{'} \left({V^{T}_{j}} C_{k} \right) C_{k}}, \end{array} $$
(15)
$$\begin{array}{*{20}l} \frac {\partial{E}} {\partial{C_{k} }} =& \theta_{U} \sum^{m}_{i=1} { I^{U}_{ik} \left(g \left({W^{T}_{i}} C_{k} \right) - u_{ik} \right) g^{'} \left({W^{T}_{i}} C_{k} \right) W_{i} + \theta_{C} C_{k}} \\ & + \theta_{D} \sum^{n}_{j=1} { I^{D}_{jk} \left(g \left({V^{T}_{j}} C_{k} \right) - d_{jk} \right) g^{'} \left({V^{T}_{j}} C_{k} \right) V_{j}}, \end{array} $$
(16)

where g (.) is the first-order derivative of the logistic function. To reduce the model complexity, we set θ W =θ V =θ C in our experiments. The training time for our model scales linearly with the number of observations.

ActivePMFv2 - active learning for concurrent selection of task and worker

We first query all new tasks, which have not been tried by anyone, from the most reliable worker. Then, we query the most uncertain task from all new workers. Next, we query the most uncertain task, and select the most reliable worker for the task to query from. Our proposed ActivePMFv2 is presented in Algorithm 1.

Random new task selection for reliable worker

To learn the most accurate classifier with the least number of work done, we first query all new tasks, as given in Eq. (17), and select the most reliable worker in the task category to query from, as given in Eq. (18).

$$\begin{array}{*{20}l} {v}^{*} = \{{v}_{j} | \exists {v}_{j} \in VS; {I}_{ij} = 0,\forall {w}_{i} \in WS\}, \end{array} $$
(17)
$$\begin{array}{*{20}l} {w}^{*} = \arg \max_{{w}_{i} \in WS} {u}_{ik} \quad where \quad {d}_{jk}=1, {v}_{j} = {v}^{*}, \end{array} $$
(18)

In Algorithm 1, Step 6 represents the process of new task selection for most reliable worker in the category.

Uncertainty sampling for task selection for randomly selected new worker

The algorithm assumes a particular active learning heuristic specified as an input, and we adopt uncertainty-sampling [30] using the Maximum Difference between predicted rate and observed rate as in Eq. (19) to choose the most uncertain task, that requires minimization of uncertainty. To let new workers having a list of preferred tasks recommended but not having to work on a large amount of tasks beforehand, we randomly select a new worker (if any) to query from, as given in Eq. (20).

$$\begin{array}{*{20}l} &{v}^{**} = \arg \max_{{v}_{j} \in VS} \sum^{m}_{i=1} { \frac{1}{ \sum^{m}_{i=1} I_{ij}} \left| I^{R}_{ij} \left(g \left({W^{T}_{i}} V_{j} \right) - r_{ij} \right) \right| }, \end{array} $$
(19)
$$\begin{array}{*{20}l} {w}^{**} = \{{w}_{i} | \exists {w}_{i} \in WS; {I}_{ij} = 0,\forall {v}_{j} \in VS\}. \end{array} $$
(20)

In Algorithm 1, Step 8 represents the process of most uncertain task selection for new worker.

Uncertainty sampling for task selection for reliable worker

The algorithm assumes a particular active learning heuristic specified as an input, and we adopt uncertainty-sampling [30] using the Maximum Difference between predicted rate and observed rate as in Eq. (21) to choose the most uncertain task, that requires minimization of uncertainty. To select the most reliable worker for the most uncertain task, we select the worker with the maximum worker-category perferring score where the category that the task belongs to as in Eq. (22).

$$\begin{array}{*{20}l} &{v}^{***} = \arg \max_{{v}_{j} \in VS} \sum^{m}_{i=1} { \frac{1}{ \sum^{m}_{i=1} I_{ij}} \left| I^{R}_{ij} \left(g \left({W^{T}_{i}} V_{j} \right) - r_{ij} \right) \right| }, \end{array} $$
(21)
$$\begin{array}{*{20}l} {w}^{***} = \arg \max_{{w}_{i} \in WS} {u}_{ik} \quad where \quad {d}_{jk}=1, {v}_{j} = {v}^{***}. \end{array} $$
(22)

In Algorithm 1, Step 9 represents the most uncertain task selection for the most reliable worker.

After annotation, the selected task is removed from the unlabeled data set. Next, the selected task and its rate are added to the set of labeled dataset. The model is then retrained on the labeled tasks and the informativeness of the remaining tasks in the unlabeled data set is re-evaluated.

Online-updating on ActivePMFv2 - online-update on active learning for concurrent selection of task and worker

In this section, we present our online-updating approach for learning the matrix factorization model, ActivePMFv2 model. The online-updating approach of ActivePMFv2 is presented in Algorithm 2, while the full retrain approach of ActivePMFv2 is presented in Algorithm 1. The online-updating approach uses the same sampling heuristics and the same active learning approach in query method as that of the full retrain approach. The main difference between the online-updating approach and the full retrain approach is the model update methods they used. In the full retrain approach, it retrains the whole learning model after each work done. In the online-updating approach, it has two main parts: (1) It retrains the learning model in batch mode where model update occurs after a number of work done; (2) For each work done related to a worker (or task) having profile larger than the threshold, it updates the whole feature vector of the worker (or task) and keeps all other entries in the feature matrix fixed. On the other hand, for work done related to a worker (or task) having profile smaller than the threshold, it updates the whole feature matrix.

Partial update

The impact on retraining the whole learning model decreases as the profile size of the worker (or task) increases. Especially when work done by new workers or work done on task having small profile, updating the feature matrix is crucial. For a new worker, each work done by him will result in much change in his task perference in his worker profile; while for a worker that has already completed a lot of tasks, each work done by him will not change much in his worker profile. Updating feature vectors for a worker (or task) having smaller profile results in a much better model. As a result, for a worker (or task) having large profile, we observe that the model learned from retraining the feature vector of the worker (or task) only is approximate to that learned from a full retrain.

As mentioned before, through a Bayesian inference, the posterior distributions of W and V based only on the observed ratings are derived in Eq. (3), while the posterior distributions of W and C based only on the observed ratings are derived in Eq. (7). For partial update on a large worker profile, we only retrain the feature vector of the selected worker \(w_{m^{\prime }}\phantom {\dot {i}\!}\) in worker-task preferring matrix and worker-category preferring matrix as shown in Eq. (23) and Eq. (24) respectively.

$$\begin{array}{*{20}l} &FV_{WV}(w_{m'}) \\ = &\prod^{m'}_{i=m'} \prod^{n}_{j=1} \left[N (r_{ij} | g({W^{T}_{i}} V_{j}), {\sigma^{2}_{R}}) \right]^{I^{R}_{ij}} \times \prod^{m'}_{i=m'} N (W_{i} | 0, {\sigma^{2}_{W}}\mathbb{I}) \times \prod^{n}_{j=1} N (V_{j} | 0, {\sigma^{2}_{V}}\mathbb{I}), \end{array} $$
(23)
$$\begin{array}{*{20}l} &FV_{WC}(w_{m'}) \\ = &\prod^{m'}_{i=m'} \prod^{o}_{k=1} \left[N (u_{ik} | g({W^{T}_{i}} C_{k}), {\sigma^{2}_{U}}) \right]^{I^{U}_{ik}} \times \prod^{m'}_{i=m'} N (W_{i} | 0, {\sigma^{2}_{W}}\mathbb{I}) \times \prod^{o}_{k=1} N (C_{k} | 0, {\sigma^{2}_{C}}\mathbb{I}). \end{array} $$
(24)

As stated previously, through a Bayesian inference, the posterior distributions of W and V based only on the observed ratings are derived in Eq. (3), while the posterior distributions of V and C based only on the observed ratings are derived in Eq. (11). For partial update on a large task profile, we only retrain the feature vector of the selected task \(v_{n^{\prime }}\) in worker-task preferring matrix and task-category grouping matrix as shown in Eq. (25) and Eq. (26) respectively.

$$\begin{array}{*{20}l} &FV_{WV}(v_{n'}) \\ = &\prod^{m}_{i=1} \prod^{n'}_{j=n'} \left[N (r_{ij} | g({W^{T}_{i}} V_{j}), {\sigma^{2}_{R}}) \right]^{I^{R}_{ij}} \times \prod^{m}_{i=1} N (W_{i} | 0, {\sigma^{2}_{W}}\mathbb{I}) \times \prod^{n'}_{j=n'} N (V_{j} | 0, {\sigma^{2}_{V}}\mathbb{I}), \end{array} $$
(25)
$$\begin{array}{*{20}l} &FV_{VC}(v_{n'}) \\ = &\prod^{n'}_{j=n'} \prod^{o}_{k=1} \left[N (d_{jk} | g({V^{T}_{j}} C_{k}), {\sigma^{2}_{D}}) \right]^{I^{D}_{jk}} \times \prod^{n'}_{j=n'} N (V_{j} | 0, {\sigma^{2}_{V}}\mathbb{I}) \times \prod^{o}_{k=1} N (C_{k} | 0, {\sigma^{2}_{C}}\mathbb{I}). \end{array} $$
(26)

In Algorithm 2, Step 12, 14 and 16 represents the process of partial update. For a worker (or task) having profile size larger than the threshold, the algorithm retrains the feature vector of the worker (or task) and keep all other entries in the matrix unchanged; otherwise, the algorithm retrains the feature vectors of all workers and all tasks.

Batch update

The time for retraining a learning model is proportional to the computational complexity of the model and the amount of information stored in the model; while the amount of information depends on the number of workers, the number of tasks and the number of work done. Retraining a large learning model takes a long time. For a large real-world crowdsourcing system, it is inefficient if the whole model is retrained from scratch once a worker completes a task.

In Algorithm 2, Step 24 and 25 represents the process of batch update. When the number of work done is smaller than the batch size, the learning model is not retrained. On the other hand, when the number of work done is larger than the batch size, the algorithm retrains the learning model.

General update problem

Our proposed online-updating approach can also be applied in a general update problem. In a general recommendation system, a new rating r u,i might affect the features of both user u and item i. The partial update method works based on the following conditions: (1) If the profiles of both user and item are large, it could update the feature vectors of both the selected user and the selected item; (2) If the user profile is large (but the item profile is small), it could update the user feature vector only; (3) If the item profile is large (but the user profile is small), it could update the item feature vector only; (4) If the profiles of both user and item are small, it could update the feature vectors of all users and all items. Besides, based on the number of new incoming ratings, the batch update method retrains the whole learning model. A general update algorithm is shown in Algorithm 3.

Complexity analysis

To compute the complexity of our ActivePMFv2, we consider both the computation of the gradient descent methods and the computation of selecting the most uncertain task for the most reliable worker. The main computation of the gradient descent methods is evaluating objective function E and corresponding gradients on variables. Because of the sparsity of matrices R, U, and D, the complexity of evaluating the objective function in Eq. (13) is \(\mathcal {O} \left (n_{R}l + n_{U}l + n_{D}l \right)\), where n R , n U and n D are the number of non-zero entries in matrices R, U, and D respectively, and l is the number of dimensions of latent feature space. By using the similar approach, we can derive the complexities of Eq. (14), Eq. (15) and Eq. (16). For the computation of selecting the most uncertain task for the most reliable worker, the complexity of selecting the most uncertain task in Eq. (21) is \(\mathcal {O} \left (n_{R}l \right)\), the complexity of selecting the most reliable worker in Eq. (22) is \(\mathcal {O} \left (m \right)\), and thus the total complexity of assigning a task to a worker is \(\mathcal {O} \left (m + n_{R}l \right)\). As a result, the total complexity for one iteration is \(\mathcal {O} \left (m + n_{R}l + n_{U}l + n_{D}l \right)\). It means that the complexity is linear with respect to the number of workers and the number of observations in the three sparse matrices. The complexity analysis shows that ActivePMFv2 can scale to very large datasets.

To apply online-updating approach when learning ActivePMFv2 model, we consider both the partial update method and the batch update method. Since only some non-zero entries in matrices is updated in the partial update method, the complexity of evaluating the objective function in Eq. (13) is \(\mathcal {O} \left (z_{R}l + z_{U}l + z_{D}l \right)\), where z R , z U and z D are the number of non-zero entries to be updated in matrices R, U, and D respectively, and l is the number of dimensions of latent feature space. As a result, the total complexity for one iteration is \(\mathcal {O} \left (m + z_{R}l + z_{U}l + z_{D}l \right)\), where z R <<n R , z U <<n U , z D <<n D . Based on our observation, the complexity of learning ActivePMFv2 model can be highly reduced by using the partial update method. Besides, the batch method can greatly reduce the number of iterations, thus further reduce the computational complexity.

Experimental analysis

In this section, our experiments are intended to address the following five research questions:

  1. 1.

    How is ActivePMFv1 approach compared with PMF with various active learning approaches?

  2. 2.

    How is ActivePMFv1 approach compared with PMF with various active sampling heuristics?

  3. 3.

    How is our ActivePMFv2 approach compared with ActivePMFv1?

  4. 4.

    How is the partial update part of online-updating approach compared with the full retrain when learning ActivePMFv2 model?

  5. 5.

    How is the batch update part of online-updating approach compared with the full retrain when learning ActivePMFv2 model?

Description of dataset

We use the same dataset as shown in [41]. Our dataset is retrieved from the recent NAACL 2010 workshop on crowdsourcing, which has made publicly available all the data collected as part of the workshop [49]. The data was collected within a month from multiple requesters seeking data for a diverse variety of tasks on MTurk. Table 2 provides some statistics about our dataset. Our dataset is mainly related to tasks for creating speech and language data. The task categorization is shown in Table 3.

Table 2 Statistics of our dataset
Table 3 Task categorization by both language and keywords given by MTurk in our dataset

Evaluation metrics

For performance comparison with our proposed ActivePMFv2, we implement PMF (Probabilistic Matrix Factorization) with the following Active Learning baselines:

  • T[Rand]W[Rand]: It assigns a randomly selected task to a randomly selected worker.

  • T[MaxDiff]W[Rand]: It assigns the task of maximum difference between its observed values and its predicted values to a randomly selected worker.

  • T[Rand]W[Reli]: It assigns a randomly selected task to the most reliable worker in the task category.

  • T[MaxDiff]W[Reli]: It assigns the task of maximum difference between its observed values and its predicted values to the most reliable worker in the task category.

  • T[N]W[Reli]+T[MaxPredictErr]W[Reli]: It has two parts. First, it assigns all new tasks to the most reliable worker in the task categories. Second, it assigns the task of maximum average prediction error of all its predicted values to the most reliable worker in the task category.

  • T[N]W[Reli]+T[MaxEntropy]W[Reli]: It has two parts. First, it assigns all new tasks to the most reliable worker in the task categories. Second, it assigns the task of maximum Entropy on the posterior variance to the most reliable worker in the task category.

  • T[N]W[Reli]+T[MaxDiff]W[Reli] (ActivePMFv1): It has two parts. First, it assigns all new tasks to the most reliable worker in the task categories. Second, it assigns the task of maximum difference between its observed values and its predicted values to the most reliable worker in the task category.

  • T[N]W[Reli]+T[MaxDiff]W[N]+T[MaxDiff]W[Reli] (ActivePMFv2): It has three parts. First, it assigns all new tasks to the most reliable worker in the task categories. Second, for all new workers, it assigns the task of maximum difference between its observed values and its predicted values to a randomly selected new worker. Third, it assigns the task of maximum difference between its observed values and its predicted values to the most reliable worker in the task category.

To compare the prediction quality of our method ActivePMFv2, we use the Mean Absolute Error (MAE) and the Root Mean Squared Error (RMSE) as the comparison metrics. MAE and RMSE are defined in Eq. (27):

$$\begin{array}{*{20}l} MAE = \frac {\sum_{i, j} | r_{i, j} -\hat{r}_{i, j} |} {N}, RMSE = \sqrt {\frac {\sum_{i, j} {\left(r_{i, j} -\hat{r}_{i, j} \right)}^{2}} {N}}, \end{array} $$
(27)

where r i,j denotes the rating that indicates the extent of the favor of task j for worker i, \(\hat {r}_{i, j}\) denotes the predicted rating, and N is the total number of testing ratings.

Performance comparison

To show the prediction performance improvements of ActivePMFv2, we first compare ActivePMFv1 [15] with PMF with various active learning approaches and different active sampling heuristics, where Probabilistic Matrix Factorization (PMF) [12] is the state-of-the-art approach for recommendation systems. Next, we compare our proposed ActivePMFv2 with ActivePMFv1.

From our dataset, we randomly select 80 % of ratings as training data (20 % as initial training set + 60 % as active set), and leave the remaining 20 % as prediction performance testing. The procedure is carried out 10 times independently, and we report the average values in this paper. For the value transformation, we have 10,411 approved tasks (value transformed to 5), 9,399 submitted tasks (value transformed to 3) and only 5 rejected tasks (value transformed to 4). Most rejected tasks are already removed in our dataset. In the comparison, we set θ W = θ V = θ C = 0.00004, set θ U = 0.0001 and θ D = 0.01. The MAE results and the RMSE results are reported from Tables 4, 5, 6, 7, 8 and 9. MAE measures the average magnitude of the errors in predicted values; while RMSE gives a relative high weight to large errors.

Table 4 MAE comparison among various active sampling heuristics in PMF (A smaller MAE means a better performance)
Table 5 RMSE comparison among various active sampling heuristics in PMF (A smaller MAE means a better performance)
Table 6 MAE comparison among various active learning approaches in PMF (A smaller MAE means a better performance)
Table 7 RMSE comparison among various active learning approaches in PMF (A smaller MAE means a better performance)
Table 8 MAE comparison between ActivePMFv2 and ActivePMFv1 (A smaller MAE means a better performance)
Table 9 RMSE comparison between ActivePMFv2 and ActivePMFv1 (A smaller RMSE means a better performance)

In Tables 4 and 5, we compare among PMF with three sampling heuristics on selecting tasks, which are Maximum Difference (ActivePMFv1), Maximum Prediction Error and Maximum Entropy. The performance results of the three approaches are similar when the number of selected samples is small. However, as the number of selected samples increases, Maximum Difference approach (ActivePMFv1) outperforms the other two approaches.

In Tables 6 and 7, we compare among PMF with different active learning approaches on task selection and worker selection. When the number of selected samples is very small, assigning tasks to workers randomly gives the best performance in both the MAE results and the RMSE results. However, as the number of selected samples increases, assigning new tasks in the first stage can greatly improve the performance in both the MAE results and the RMSE results. Compared with random selection on task and worker to the PMF learning model (i.e. T[Rand]W[Rand]), ActivePMFv1 can greatly improve both MAE and RMSE performance.

In Tables 8 and 9, we compare our proposed ActivePMFv2 with ActivePMFv1. The performance results of the three approaches are similar when the number of selected samples is small. However, as the number of selected samples increases, ActivePMFv2 outperforms ActivePMFv1 in both MAE and RMSE results. Compared with ActivePMFv1, the MAE results and the RMSE results of ActivePMFv2 are improved up to 29 % and 35 % respectively.

In Table 10, we compare the partial update part of our online-updating approach with the full retrain on ActivePMFv2 model learning. The prediction quality of ActivePMFv2 with partial update (threshold t = 0.001) approximates to that of ActivePMFv2 with full retrain, but the average runtime on model update per work done of ActivePMFv2 with partial update (threshold t = 0.001) is greatly reduced by 12 % compared with that of ActivePMFv2 with full retrain.

Table 10 Comparison on a Full-Retrain with Partial Update on Online-Updating Approach on ActivePMFv2 model learning (Feature k = 20; No of Work Done = 11,000; Batch = 1)

By using the batch update method, the average runtime on model update per work done can be further reduced. In Table 11, we compare the batch update part of our online-updating approach with the full retrain on ActivePMFv2 model learning. As batch size increases, both the MAE results and the RMSE results also increase, while the average runtime on model update per work done decreases significantly. For instance, compared with full retrain, when batch size is 500 with partial update (threshold t = 0.001), the average runtime on model update per work done is reduced by 99.6 % (decreases from 3.839 min to 0.017 min), but both the MAE results and the RMSE results increases by several times. On the other hand, compared with full retrain, when batch size is 10 with partial update (threshold t = 0.001), the average runtime on model update per work done is reduced by 82.4 % (decreases from 3.839 min to 0.675 min), while the MAE result is just increased by 22.4 % (increases from 0.0156 to 0.0191) and the RMSE result is only slightly increased by 8.2 % (increases from 0.0845 to 0.0914). Therefore, by adjusting the batch size, the average runtime on model update per work done can be reduced significantly, but only very small performance degradation is resulted. In the cases shown in Table 11, batch size 10 is the best choice among all the listed choices.

Table 11 Comparison on a Full-Retrain with Batch Update on Online-Updating Approach on ActivePMFv2 model learning (Feature k = 20; No of Work Done = 11,000)

Conclusion

In this paper, we have proposed Probabilistic Matrix Factorization with Active Learning (version 2), ActivePMFv2, on Task Recommendation framework, TaskRec, for quality assurance in crowdsourcing systems. It first randomly assigns new tasks to the most reliable worker in the task categories. Second, it actively selects the most uncertain task, and then request new workers to complete the task. Third, it actively selects the most uncertain task, and then request the most reliable workers to complete the task for retraining the classification model. Experimental results show that our ActivePMFv2 outperforms ActivePMFv1, where the MAE results and the RMSE results of ActivePMFv2 are improved up to 29 % and 35 % respectively. Previous work shows that ActivePMFv1 outperforms the PMF with other active learning approaches significantly.

Moreover, we have proposed online-updating on the task recommendation model to reduce the runtime of retraining the model. In a large-scale crowdsourcing system, it does not make sense to retrain the model from scratch whenever a worker of large profile completes a task, because the performance improvement by retraining the model in the case is tiny but the cost of retraing model is high. Moreover, when a large number of workers are working in the crowdsourcing system at the same period of time, the computational complexity is very high if the model is retrained after each worker completes a task. The larger the profile of a worker (or task) is, the less important is retraining its profile on each new work done. In case of the worker (or task) having large profile, our online-updating algorithm retrains the whole feature vector of the worker (or task) and keeps all other entries in the matrix fixed. Our online-updating algorithm runs batch update to reduce the running time of model update. Experiment results show that our online-updating algorithm is accurate in approximating to a full retrain of the learning model while the average runtime of model update for each work done is reduced by more than 80 % (decreases from a few minutes to several seconds).

References

  1. Howe J. The rise of crowdsourcing. Wired. 2006;14(6).

  2. Howe J. Crowdsourcing: Why the Power of the Crowd Is Driving the Future of Business. New York: Crown Business; 2008.

    Google Scholar 

  3. Yuen MC, King I, Leung KS. A survey of crowdsourcing systems. In: SocialCom ’11: Proceedings of The Third IEEE International Conference on Social Computing. Boston: IEEE Computer Society: 2011. p. 766–73.

    Google Scholar 

  4. Yuen MC, Chen LJ, King I. A survey of human computation systems. In: CSE ’09: Proceedings of IEEE International Conference on Computational Science and Engineering. Vancouver: IEEE Computer Society: 2009. p. 723–8, doi:10.1109/CSE.2009.395.

    Google Scholar 

  5. Amazon Mechanical Turk. https://www.mturk.com/.

  6. CrowdFlower. http://crowdflower.com/.

  7. Taskcn. http://www.taskcn.com/.

  8. TopCoder. http://www.topcoder.com/.

  9. Allahbakhsh M, Benatallah B, Ignjatovic A, Motahari-Nezhad HR, Bertino E, Dustdar S. Quality control in crowdsourcing systems: Issues and directions. IEEE Internet Computing. 2013; 17(2):76–81. doi:10.1109/MIC.2013.20.

    Article  Google Scholar 

  10. Ipeirotis PG, Provost F, Wang J. Quality management on amazon mechanical turk. In: HCOMP ’10: Proceedings of the ACM SIGKDD Workshop on Human Computation. New York: ACM: 2010. p. 64–7.

    Google Scholar 

  11. Karger DR, Oh S, Shah D. Iterative learning for reliable crowdsourcing systems. In: Advances in Neural Information Processing Systems 24: 25th Annual Conference on Neural Information Processing Systems 2011. Proceedings of a Meeting Held 12–14 December 2011, Granada: 2011. p. 1953–1961. http://papers.nips.cc/paper/4396-iterative-learning-for-reliable-crowdsourcing-systems.

  12. Salakhutdinov R, Mnih A. Probabilistic matrix factorization. In: NIPS ’07: Proceedings of the Twenty-First Annual Conference on Neural Information Processing Systems. Taipei: Curran Associates, Inc.: 2007.

    Google Scholar 

  13. Rendle S, Schmidt-Thieme L. Online-updating regularized kernel matrix factorization models for large-scale recommender systems. In: Proceedings of the 2008 ACM Conference on Recommender Systems. RecSys ’08. New York: ACM: 2008. p. 251–8, doi:10.1145/1454008.1454047. http://doi.acm.org/10.1145/1454008.1454047.

    Google Scholar 

  14. Karimi R, Freudenthaler C, Nanopoulos A, Schmidt-Thieme L. Active learning for aspect model in recommender systems. In: Proceedings of the IEEE Symposium on Computational Intelligence and Data Mining, CIDM 2011, Part of the IEEE Symposium Series on Computational Intelligence 2011, April 11–15, 2011, Paris: 2011. p. 162–7, doi:10.1109/CIDM.2011.5949431. http://dx.doi.org/10.1109/CIDM.2011.5949431.

  15. Yuen MC, King I, Leung KS. Probabilistic matrix factorization with active learning for quality assurance in crowdsourcing systems. In: ICWI 2015: Proceedings of The IADIS International Conference WWW/Internet 2015. Greater Dublin: 2015.

  16. Stewart O, Lubensky D, Huerta JM. Crowdsourcing participation inequality: a scout model for the enterprise domain. In: Proceedings of the ACM SIGKDD Workshop on Human Computation. HCOMP ’10. New York: ACM: 2010. p. 30–3, doi:10.1145/1837885.1837895. http://doi.acm.org/10.1145/1837885.1837895.

    Google Scholar 

  17. Ross J, Irani L, Silberman MS, Zaldivar A, Tomlinson B. Who are the crowdworkers?: shifting demographics in mechanical turk. In: CHI EA ’10: Proceedings of the 28th of the International Conference Extended Abstracts on Human Factors in Computing Systems. New York: ACM: 2010. p. 2863–872, doi:10.1145/1753846.1753873. http://doi.acm.org/10.1145/1753846.1753873.

  18. Chilton LB, Horton JJ, Miller RC, Azenkot S. Task search in a human computation market. In: HCOMP ’10: Proceedings of the ACM SIGKDD Workshop on Human Computation. New York: ACM: 2010. p. 1–9, doi:10.1145/1837885.1837889. http://doi.acm.org/10.1145/1837885.1837889.

    Google Scholar 

  19. Koren Y, Bell R, Volinsky C. Matrix factorization techniques for recommender systems. Computer. 2009; 42(8):30–7. doi:10.1109/MC.2009.263.

    Article  Google Scholar 

  20. Yang S, Ye M. Global Minima Analysis of Lee and Seung’s NMF Algorithms. Neural Process Lett. 2013; 38(1):29–51. doi:10.1007/s11063-012-9261-x.

    Article  Google Scholar 

  21. Yang S, Yi Z. Convergence Analysis of Non-Negative Matrix Factorization for BSS Algorithm. Neural Process Lett. 2010; 31(1):45–64. doi:10.1007/s11063-009-9126-0.

    Article  Google Scholar 

  22. Salakhutdinov R, Mnih A. Probabilistic matrix factorization. In: Advances in Neural Information Processing Systems: 2008.

  23. Zhou TC, Ma H, King I, Lyu MR. Tagrec: Leveraging tagging wisdom for recommendation. In: Proceedings of the 2009 International Conference on Computational Science and Engineering - Volume 04. Washington: IEEE Computer Society: 2009. p. 194–9, doi:10.1109/CSE.2009.75. http://dl.acm.org/citation.cfm?id=1632710.1633781.

  24. Allahbakhsh M, Benatallah B, Ignjatovic A, Motahari-Nezhad HR, Bertino E, Dustdar S. Quality control in crowdsourcing systems: Issues and directions. IEEE Internet Computing. 2013; 17(2):76–81. doi:10.1109/MIC.2013.20.

    Article  Google Scholar 

  25. Organisciak P, Teevan J, Dumais ST, Miller RC, Kalai AT. A crowd of your own: Crowdsourcing for on-demand personalization. In: Proceedings of the Seconf AAAI Conference on Human Computation and Crowdsourcing, HCOMP 2014, November 2–4, 2014, Pittsburgh: 2014. http://www.aaai.org/ocs/index.php/HCOMP/HCOMP14/paper/view/8972.

  26. Organisciak P, Teevan J, Dumais S, Miller RC, Kalai AT. Matching and grokking: Approaches to personalized crowdsourcing. In: Proceedings of the 24th International Conference on Artificial Intelligence. IJCAI’15. Pittsburgh: AAAI Press: 2015. p. 4296–302. http://dl.acm.org/citation.cfm?id=2832747.2832856.

    Google Scholar 

  27. Ambati V, Vogel S, Carbonell J. Towards task recommendation in micro-task markets. In: AAAI ’11: Proceedings of The 25th AAAI Workshop in Human Computation. Pittsburgh: AAAI Publications: 2011.

    Google Scholar 

  28. Lin CH, Kamar E, Horvitz E. Signals in the silence: Models of implicit feedback in a recommendation system for crowdsourcing. In: Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, July 27 -31, 2014, Québec City: 2014. p. 908–15. http://www.aaai.org/ocs/index.php/AAAI/AAAI14/paper/view/8425.

  29. Atlas L, Cohn D, Ladner R, El-Sharkawi MA, Marks II RJ. Training connectionist networks with queries and selective sampling. Adv Neural Inf Process Syst. 1990; 2:566–573.

    Google Scholar 

  30. Lewis DD, Gale WA. A sequential algorithm for training text classifiers. In: SIGIR ’94: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. Dublin: Springer: 1994. p. 3–12.

    Google Scholar 

  31. Scheffer T, Decomain C, Wrobel S. Active hidden markov models for information extraction. In: IDA ’01: Proceedings of the 4th International Conference on Advances in Intelligent Data Analysis. Cascais: Springer: 2001. p. 309–18.

    Google Scholar 

  32. Culotta A, McCallum A. Reducing labeling effort for structured prediction tasks. In: AAAI ’05: Proceedings of the 20th National Conference on Artificial Intelligence - Volume 2. Pittsburgh: AAAI Press: 2005. p. 746–51.

    Google Scholar 

  33. Dagan I, Engelson SP. Committee-based sampling for training probabilistic classifiers. In: ICML ’95: Proceedings of the Twelfth International Conference on Machine Learning. Tahoe City, California: Morgan Kaufmann: 1995. p. 150–7.

    Google Scholar 

  34. Seung HS, Opper M, Sompolinsky H. Query by committee. In: COLT ’92: Proceedings of the Fifth Annual Workshop on Computational Learning Theory. Pittsburgh: ACM: 1992. p. 287–94.

    Google Scholar 

  35. Yan Y, Rosales R, Fung G, Dy JG. Active learning from crowds. In: ICML’11: Proceedings of the 28th International Conference on Machine Learning. Bellevue: Omnipress: 2011. p. 1161–1168.

    Google Scholar 

  36. Laws F, Scheible C, Schütze H. Active learning with amazon mechanical turk. In: EMNLP ’11: Proceedings of the Conference on Empirical Methods in Natural Language Processing. Edinburgh: Association for Computational Linguistics: 2011. p. 1546–1556.

    Google Scholar 

  37. Fang M, Zhu X, Li B, Ding W, Wu X. Self-taught active learning from crowds. In: ICDM: 2012. p. 858–63.

  38. Fang M, Zhu X. Active learning with uncertain labeling knowledge. Pattern Recogn Lett. 2014; 43:98–108. doi:10.1016/j.patrec.2013.10.011.

    Article  Google Scholar 

  39. Jung HJ, Lease M. Improving quality of crowdsourced labels via probabilistic matrix factorization. In: Human Computation Workshop at the Twenty-Sixth AAAI Conference on Artificial Intelligence: 2012.

  40. Jung HJ. Quality assurance in crowdsourcing via matrix factorization based task routing. In: International Conference on World Wide Web 2014: 2014.

  41. Yuen MC, King I, Leung KS. Taskrec: Probabilistic matrix factorization in task recommendation in crowdsourcing systems. In: ICONIP (2): 2012. p. 516–25.

  42. Yuen MC, King I, Leung KS. Taskrec: A task recommendation framework in crowdsourcing systems. Neural Process Lett. 2015; 41(2):223–38. doi:10.1007/s11063-014-9343-z.

    Article  Google Scholar 

  43. Yuen MC, King I, Leung KS. Task matching in crowdsourcing. In: CPSCom ’11: Proceedings of The 4th IEEE International Conference on Cyber, Physical and Social Computing. Boston: IEEE Computer Society: 2011. p. 409–12.

    Google Scholar 

  44. Yuen MC, King I, Leung KS. Task recommendation in crowdsourcing systems. In: KDD ’12: Proceedings of ACM KDD 2012 Workshop on Data Mining and Knowledge Discovery with Crowdsourcing (CrowdKDD). New York: ACM: 2012.

    Google Scholar 

  45. Schnitzer S, Rensing C, Schmidt S, Borchert K, Hirth M, Tran-Gia P. Demands on Task Recommendation in Crowdsourcing Platforms - The Worker’s Perspective. In: CrowdRec Workshop. Vienna: 2015.

  46. Geiger D, Schader M. Personalized task recommendation in crowdsourcing information systems – current state of the art. Decision Support Systems. 2014; 65:3–16. doi:10.1016/j.dss.2014.05.007. Crowdsourcing and Social Networks Analysis.

    Article  Google Scholar 

  47. Aldhahri E, Shandilya V, Shiva SG. Towards an effective crowdsourcing recommendation system: A survey of the state-of-the-art. In: SOSE. Boston: IEEE: 2015. p. 372–7. http://dblp.uni-trier.de/db/conf/sose/sose2015.html\#AldhahriSS15.

    Google Scholar 

  48. Dueck D, Frey BJ, Dueck D, Frey BJ. Probabilistic sparse matrix factorization. Technical report, University of Toronto. 2004.

  49. NAACL 2010 Workshop. http://sites.google.com/site/amtworkshop2010/data-1.

Download references

Acknowledgements

This research was in part supported by grants from the National Grand Fundamental Research 973 Program of China (No. 2014CB340405), the Research Grants Council of the Hong Kong Special Administrative Region, China (Project No. CUHK 413212), and Microsoft Research Asia Regional Seed Fund in Big Data Research (Grant No. FY13-RES-SPONSOR-036).

Authors’ contributions

All authors made substantial contributions to conception and design of the work. MC carried out experiments, data analysis and data interpretation. Besides, MC participated in drafting and revising the manuscript. All authors read and approved the final manuscript.

Competing interests

The authors declare that they have no competing interests.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Man-Ching Yuen.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yuen, MC., King, I. & Leung, KS. An online-updating algorithm on probabilistic matrix factorization with active learning for task recommendation in crowdsourcing systems. Big Data Anal 1, 14 (2016). https://doi.org/10.1186/s41044-016-0012-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s41044-016-0012-2

Keywords