


Mathematical Definition


The total number of tasks in a dataset


The total number of workers in a dataset


The set of labels a task can take

\([K] = \{1,...,K\}\)


The simplex of dimension \(K-1\), used to represent soft labels (ie. labels as a probability vector along \([K]\))

\(\Delta_K = \{ p \in [K] : \sum_{k=1}^K p_k=1, p_k \geq 0 \}\)


The set of workers that answered the task \(i\)

\(\{j\in[ n_{worker} : w_j \text{ answered } x_i\}\)


The set of tasks answered by the worker \(j\)

\(\{i\in[ n_{task} : w_j \text{ answered }x_i\}\)


The vector of answered labels of the task \(i\)



The true label of the task \(i\)

\(y_i^* \in [K]\)


The computed label of the task \(i\) given the aggregation \(agg\) method

\(\begin{cases}\hat{y}_i^{agg} \in [K] \text{ if a hard label} \\ \hat{y}_i^{agg} \in \Delta_K \text{ if a soft label} \end{cases}\)


The label (hard) that the worker \(j\) assigned to the task \(i\)


The confusion matrix of the worker \(j\)

\(\pi^{(j)}_{k,\ell}=\mathbb{P}(y_i^{(j)​}=\ell∣y_i^\star​=k), \, \forall (\ell,k)\in [K]^2\)


A metric that measure aggregation strategies’ accuracies

\(AccTrain(\mathcal{D}) = \frac{1}{|\mathcal{D}|} \sum_{i=1}^{|\mathcal{D}|} \mathbf{1}_{\Big\{y_i = \operatorname*{argmax}\limits_{k\in [K]}(ŷ_i)_k\Big\}}\)