peerannot
¶
–Handling your crowdsourced datasets easily–
The peerannot library was created to handle crowdsourced labels in classification problems.
Getting started¶
Start here to get up and running
Tutorials and additional examples¶
Want to dive deeper into the library? Check out the tutorials You will find resources to add your own datasets, strategy, and run your first label aggregations.
More examples can be found in the published paper in Computo Journal
An interactive tool to compare the AUM and WAUM identification scores is available here
API and CLI Reference¶
Want to deep dive into the library? In addition to the tutorials, you can find the full API and CLI reference here.
Run peerannot from a python script
Run peerannot from your terminal
Glossary¶
Name |
Definition |
Mathematical Definition |
---|---|---|
\(n_{task}\) |
The total number of tasks in a dataset |
|
\(n_{worker}\) |
The total number of workers in a dataset |
|
\([K]\) |
The set of labels a task can take |
\([K] = \{1,...,K\}\) |
\(\Delta_K\) |
The simplex of dimension \(K-1\), used to represent soft labels (ie. labels as a probability vector along \([K]\)) |
\(\Delta_K = \{ p \in [K] : \sum_{k=1}^K p_k=1, p_k \geq 0 \}\) |
\(\mathcal{A(x_i)}\) |
The set of workers that answered the task \(i\) |
\(\{j\in[ n_{worker} : w_j \text{ answered } x_i\}\) |
\(\mathcal{T(w_j)}\) |
The set of tasks answered by the worker \(j\) |
\(\{i\in[ n_{task} : w_j \text{ answered }x_i\}\) |
\(\mathcal{Lab(x_i)}\) |
The vector of answered labels of the task \(i\) |
\((y_i^{(j)})_{j\in\mathcal{A(x_i)}}\) |
\(y_i^*\) |
The true label of the task \(i\) |
\(y_i^* \in [K]\) |
\(\hat{y}_i^{agg}\) |
The computed label of the task \(i\) given the aggregation \(agg\) method |
\(\begin{cases}\hat{y}_i^{agg} \in [K] \text{ if a hard label} \\ \hat{y}_i^{agg} \in \Delta_K \text{ if a soft label} \end{cases}\) |
\(y^{(j)}_i\) |
The label (hard) that the worker \(j\) assigned to the task \(i\) |
|
\(\pi^{(j)}\) |
The confusion matrix of the worker \(j\) |
\(\pi^{(j)}_{k,\ell}=\mathbb{P}(y_i^{(j)}=\ell∣y_i^\star=k), \, \forall (\ell,k)\in [K]^2\) |
\(AccTrain(\mathcal{D})\) |
A metric that measure aggregation strategies’ accuracies |
\(AccTrain(\mathcal{D}) = \frac{1}{|\mathcal{D}|} \sum_{i=1}^{|\mathcal{D}|} \mathbf{1}_{\Big\{y_i = \operatorname*{argmax}\limits_{k\in [K]}(ŷ_i)_k\Big\}}\) |
Citation¶
Cite us, join us, and let us collaboratively improve our toolbox!
@article{lefort2024,
author = {Lefort, Tanguy and Charlier, Benjamin and Joly, Alexis and Salmon, Joseph},
publisher = {French Statistical Society},
title = {Peerannot: Classification for Crowdsourced Image Datasets with {Python}},
journal = {Computo},
date = {2024-04-04},
url = {https://computo.sfds.asso.fr/published-202402-lefort-peerannot/},
}