peerannot

–Handling your crowdsourced datasets easily–

Python 3.8+ PyPI version

The peerannot library was created to handle crowdsourced labels in classification problems.

Getting started

Start here to get up and running

Tutorials and additional examples

Want to dive deeper into the library? Check out the tutorials You will find ressources to add your own datasets, strategy, and run your first label aggregations.

API and CLI Reference

Want to deep dive into the library? In addition to the tutorials, you can find the full API and CLI reference here.

Run peerannot from a python script

API Reference
API Reference

Run peerannot from your terminal

CLI Reference
CLI interface

Glossary

Name

Definition

Mathematical Definition

\(n_{task}\)

The total number of tasks in a dataset

\(n_{worker}\)

The total number of workers in a dataset

\([K]\)

The set of labels a task can take

\([K] = \{1,...,K\}\)

\(\Delta_K\)

The simplex of dimension \(K-1\), used to represent soft labels (ie. labels as a probability vector along \([K]\))

\(\Delta_K = \{ p \in [K] : \sum_{k=1}^K p_k=1, p_k \geq 0 \}\)

\(\mathcal{A(x_i)}\)

The set of workers that answered the task \(i\)

\(\{j\in[ n_{worker} : w_j \text{ answered } x_i\}\)

\(\mathcal{T(w_j)}\)

The set of tasks answered by the worker \(j\)

\(\{i\in[ n_{task} : w_j \text{ answered }x_i\}\)

\(\mathcal{Lab(x_i)}\)

The vector of answered labels of the task \(i\)

\((y_i^{(j)})_{j\in\mathcal{A(x_i)}}\)

\(y_i^*\)

The true label of the task \(i\)

\(y_i^* \in [K]\)

\(\hat{y}_i^{agg}\)

The computed label of the task \(i\) given the aggregation \(agg\) method

\(\begin{cases}\hat{y}_i^{agg} \in [K] \text{ if a hard label} \\ \hat{y}_i^{agg} \in \Delta_K \text{ if a soft label} \end{cases}\)

\(y^{(j)}_i\)

The label (hard) that the worker \(j\) assigned to the task \(i\)

\(\pi^{(j)}\)

The confusion matrix of the worker \(j\)

\(\pi^{(j)}_{k,\ell}=\mathbb{P}(y_i^{(j)​}=\ell∣y_i^\star​=k), \, \forall (\ell,k)\in [K]^2\)

\(AccTrain(\mathcal{D})\)

A metric that measure aggregation strategies’ accuracies

\(AccTrain(\mathcal{D}) = \frac{1}{|\mathcal{D}|} \sum_{i=1}^{|\mathcal{D}|} \mathbf{1}_{\Big\{y_i = \operatorname*{argmax}\limits_{k\in [K]}(ŷ_i)_k\Big\}}\)

Citation

Cite us, join us, and let us collaboratively improve our toolbox!

@article{lefort2024,
   author = {Lefort, Tanguy and Charlier, Benjamin and Joly, Alexis and Salmon, Joseph},
   publisher = {French Statistical Society},
   title = {Peerannot: Classification for Crowdsourced Image Datasets with {Python}},
   journal = {Computo},
   date = {2024-04-04},
   url = {https://computo.sfds.asso.fr/published-202402-lefort-peerannot/},
}