`peerannot`¶

–Handling your crowdsourced datasets easily–

The peerannot library was created to handle crowdsourced labels in classification problems.

Getting started¶

Start here to get up and running

Get started

Tutorials and additional examples¶

Want to dive deeper into the library? Check out the tutorials You will find resources to add your own datasets, strategy, and run your first label aggregations.

Tutorials

More examples can be found in the published paper in Computo Journal
An interactive tool to compare the AUM and WAUM identification scores is available here

API and CLI Reference¶

Want to deep dive into the library? In addition to the tutorials, you can find the full API and CLI reference here.

Run peerannot from a python script

API Reference

API Reference

Run peerannot from your terminal

CLI Reference

CLI interface

Glossary¶

Name	Definition	Mathematical Definition
\(n_{task}\)	The total number of tasks in a dataset
\(n_{worker}\)	The total number of workers in a dataset
\([K]\)	The set of labels a task can take	\([K] = \{1,...,K\}\)
\(\Delta_K\)	The simplex of dimension \(K-1\), used to represent soft labels (ie. labels as a probability vector along \([K]\))	\(\Delta_K = \{ p \in [K] : \sum_{k=1}^K p_k=1, p_k \geq 0 \}\)
\(\mathcal{A(x_i)}\)	The set of workers that answered the task \(i\)	\(\{j\in[ n_{worker} : w_j \text{ answered } x_i\}\)
\(\mathcal{T(w_j)}\)	The set of tasks answered by the worker \(j\)	\(\{i\in[ n_{task} : w_j \text{ answered }x_i\}\)
\(\mathcal{Lab(x_i)}\)	The vector of answered labels of the task \(i\)	\((y_i^{(j)})_{j\in\mathcal{A(x_i)}}\)
\(y_i^*\)	The true label of the task \(i\)	\(y_i^* \in [K]\)
\(\hat{y}_i^{agg}\)	The computed label of the task \(i\) given the aggregation \(agg\) method	\(\begin{cases}\hat{y}_i^{agg} \in [K] \text{ if a hard label} \\ \hat{y}_i^{agg} \in \Delta_K \text{ if a soft label} \end{cases}\)
\(y^{(j)}_i\)	The label (hard) that the worker \(j\) assigned to the task \(i\)
\(\pi^{(j)}\)	The confusion matrix of the worker \(j\)	\(\pi^{(j)}_{k,\ell}=\mathbb{P}(y_i^{(j)}=\ell∣y_i^\star=k), \, \forall (\ell,k)\in [K]^2\)
\(AccTrain(\mathcal{D})\)	A metric that measure aggregation strategies’ accuracies	\(AccTrain(\mathcal{D}) = \frac{1}{\|\mathcal{D}\|} \sum_{i=1}^{\|\mathcal{D}\|} \mathbf{1}_{\Big\{y_i = \operatorname*{argmax}\limits_{k\in [K]}(ŷ_i)_k\Big\}}\)

Citation¶

Cite us, join us, and let us collaboratively improve our toolbox!

@article{lefort2024,
   author = {Lefort, Tanguy and Charlier, Benjamin and Joly, Alexis and Salmon, Joseph},
   publisher = {French Statistical Society},
   title = {Peerannot: Classification for Crowdsourced Image Datasets with {Python}},
   journal = {Computo},
   date = {2024-04-04},
   url = {https://computo.sfds.asso.fr/published-202402-lefort-peerannot/},
}

peerannot¶