Generative model of Labels, Abilities, and Difficulties
Model
GLAD’s model models both the worker ability and task difficulty into the label aggregation scheme. In order to do so, we write $\alpha_j\in\mathbb{R}$ the worker ability and $\beta_i\in\mathbb{R}^+_\star$ the task difficulty.
The model is as follows:
\[\mathbb{P}\bigg(y_i^{(j)}=y_i^\star\bigg)=\frac{1}{1+e^{-\alpha_j\beta_i}}.\]We also assume that the error is uniform elsewhere, i.e. in a classification setting with $K$ classes that \(\mathbb{P}\bigg(y_i^{(j)}\neq y_i^\star\bigg)=\frac{1}{K-1}\bigg(1-\frac{1}{1+e^{-\alpha_j\beta_i}}\bigg).\)
CLI
With peerannot
in a terminal located in the directory of answers, the GLAD model can be used as follows.
peerannnot aggregate . --strategy GLAD --answers-file answers.json
Note that by default, if the answers are in a file names answers.json
the --answers-file
argument can be omitted.
API
Import the aggregation model in the current session
from peerannot.models import GLAD
Assuming the answers are in a dictionary names answers
then run:
glad = GLAD(answers, n_workers, n_classes, dataset=pathlib.Path.cwd() / "glad")
glad.run()
yhat = glad.get_probas()
In the implementation, the prior on $(\alpha_j)_j$ and $(\beta_i)_i$ is set to a vector of ones. This can be altered as follows for example with a prior on alphas of 2 and a prior on betas of 3:
glad = GLAD(answers, n_workers, n_classes, dataset=pathlib.Path.cwd() / "glad")
glad.priorAlpha(2*np.ones(glad.n_workers))
glad.priorBeta(3*np.ones(glad.n_task))
glad.run()
Estimated abilities
To access the estimated confusion matrices in a variable pi
, run:
beta = glad.beta
alpha = glad.alpha
print(alpha.shape, beta.shape)
# (n_worker,) (n_task,)
Aggregate into hard labels
After running the aggregation strategy, instead of soft labels one can recover hard labels by running:
yhat_hard = glad.get_answers()
Note that this is an argmax
on the first dimension with a random split in case of equalities.
API details: class models.GLAD
GLAD model class that herits from CrowdModel
__init__(answers, n_classes,**kwargs)
Parameters:
answers
:(dict) Dictionary of workers answers with format{ task0: {worker0: label, worker1: label}, task1: {worker1: label} }
n_classes
: (int) Number of possible classeskwargs
: (dict) Dictionary that should contain at leastn_workers
the number of workers. Other arguments arepath_remove
to remove tasks identified from theWAUM
or another method.
EM(epsilon, maxiter)
Parameters:
epsilon
: (float) relative error between two iterates of the expectation of the joint likelihoodmaxiter
: (int) maximum number of iterations in the EM algorithm
run(epsilon, maxiter)
Run the EM algorithm for a given set of parameters
Parameters:
epsilon
: (float) relative error between two iterates of the expectation of the joint likelihoodmaxiter
: (int) maximum number of iterations in the EM algorithm
save_difficulty(path)
Save coefficients $(\beta)_i$ at a given path as numpy
arrays.
Parameters:
- path: (str) file path in which coefficients are saved using the
np.save
function.