Spam_Score¶
- class Spam_Score(answers, **kwargs)¶
Spammer score (Raykar and Yu, 2011)¶
Compute the distance between the confusion matrix of each worker and the closest rank-1 matrix. The closer to 0, it is likely the worker is a spammer.
- __init__(answers, **kwargs)¶
Compute the spammer score for each worker, the larger the sore, the more likely we can trust the worker. On the contrary, the closer to 0, the more likely the worker is a spammer.
This is the Frobenius norm between the estimated confusion matrix \(\hat{\pi}^{(j)}\) and the closest rank-1 matrix. Denote \(\mathbf{e}\) the vector of ones in \(\mathbb{R}^K\).
\[\forall j\in [n_\texttt{worker}],\ s_j = \|\pi^{(j)}- \mathbf{e}u_j^\top\|_F^2\enspace \text{with } u_j = \underset{u\in\mathbb{R}^K, u_j\top \mathbf{e}=1}{\mathrm{argmin}} \|\pi^{(j)}- \mathbf{e}u^\top\|_F^2 \enspace.\]Solving this problem and standardizing the result in \([0,1]\) gives the spammer score:
\[\forall j \in [n_\texttt{worker}],\ s_j = \frac{1}{K(K-1)}\sum_{1\leq k<k'\leq K}\sum_{\ell\in[k]} (\pi^{(j)}_{k,\ell} - \pi^{(j)}_{k',\ell})^2 \enspace.\]- Parameters:
answers (dict) –
Dictionary of workers answers with format
{ task0: {worker0: label, worker1: label}, task1: {worker1: label} }
The number of classes
n_classes
and the number of workersn_workers
should be specified as keyword argument. If the matrices are known and stored in anpy
orpth
file, it can be specified asmatrix_file
. Otherwise, the model will run the DS model to obtain the matrices.