Spam_Score¶

class Spam_Score(answers, **kwargs)¶

Spammer score (Raykar and Yu, 2011)¶

Compute the distance between the confusion matrix of each worker and the closest rank-1 matrix. The closer to 0, it is likely the worker is a spammer.

__init__(answers, **kwargs)¶

Compute the spammer score for each worker, the larger the sore, the more likely we can trust the worker. On the contrary, the closer to 0, the more likely the worker is a spammer.

This is the Frobenius norm between the estimated confusion matrix \(\hat{\pi}^{(j)}\) and the closest rank-1 matrix. Denote \(\mathbf{e}\) the vector of ones in \(\mathbb{R}^K\).

\[\forall j\in [n_\texttt{worker}],\ s_j = \|\pi^{(j)}- \mathbf{e}u_j^\top\|_F^2\enspace \text{with } u_j = \underset{u\in\mathbb{R}^K, u_j\top \mathbf{e}=1}{\mathrm{argmin}} \|\pi^{(j)}- \mathbf{e}u^\top\|_F^2 \enspace.\]

Solving this problem and standardizing the result in \([0,1]\) gives the spammer score:

\[\forall j \in [n_\texttt{worker}],\ s_j = \frac{1}{K(K-1)}\sum_{1\leq k<k'\leq K}\sum_{\ell\in[k]} (\pi^{(j)}_{k,\ell} - \pi^{(j)}_{k',\ell})^2 \enspace.\]

Parameters:

answers (dict) –

Dictionary of workers answers with format

{
    task0: {worker0: label, worker1: label},
    task1: {worker1: label}
}

The number of classes n_classes and the number of workers n_workers should be specified as keyword argument. If the matrices are known and stored in a npy or pth file, it can be specified as matrix_file. Otherwise, the model will run the DS model to obtain the matrices.

run(path)¶

Compute the spam score for each worker and save it at <path>/identification/spam_score.npy in a numpy array of size n_worker.

Parameters:: path (str) – path to save the results