Glossary

Name

Definition

Mathematical Definition

\(n_{task}\)

The total number of tasks in a dataset

\(n_{worker}\)

The total number of workers in a dataset

\([K]\)

The set of labels a task can take

\([K] = \{1,...,K\}\)

\(\Delta_K\)

The simplex of dimension \(K-1\), used to represent soft labels (ie. labels as a probability vector along \([K]\))

\(\Delta_K = \{ p \in [K] : \sum_{k=1}^K p_k=1, p_k \geq 0 \}\)

\(\mathcal{A(x_i)}\)

The set of workers that answered the task \(i\)

\(\{j\in[ n_{worker} : w_j \text{ answered } x_i\}\)

\(\mathcal{T(w_j)}\)

The set of tasks answered by the worker \(j\)

\(\{i\in[ n_{task} : w_j \text{ answered }x_i\}\)

\(\mathcal{Lab(x_i)}\)

The vector of answered labels of the task \(i\)

\((y_i^{(j)})_{j\in\mathcal{A(x_i)}}\)

\(y_i^*\)

The true label of the task \(i\)

\(y_i^* \in [K]\)

\(\hat{y}_i^{agg}\)

The computed label of the task \(i\) given the aggregation \(agg\) method

\(\begin{cases}\hat{y}_i^{agg} \in [K] \text{ if a hard label} \\ \hat{y}_i^{agg} \in \Delta_K \text{ if a soft label} \end{cases}\)

\(y^{(j)}_i\)

The label (hard) that the worker \(j\) assigned to the task \(i\)

\(\pi^{(j)}\)

The confusion matrix of the worker \(j\)

\(\pi^{(j)}_{k,\ell}=\mathbb{P}(y_i^{(j)​}=\ell∣y_i^\star​=k), \, \forall (\ell,k)\in [K]^2\)

\(AccTrain(\mathcal{D})\)

A metric that measure aggregation strategies’ accuracies

\(AccTrain(\mathcal{D}) = \frac{1}{|\mathcal{D}|} \sum_{i=1}^{|\mathcal{D}|} \mathbf{1}_{\Big\{y_i = \operatorname*{argmax}\limits_{k\in [K]}(ŷ_i)_k\Big\}}\)