Generative model of Labels, Abilities, and Difficulties



Model

GLAD’s model models both the worker ability and task difficulty into the label aggregation scheme. In order to do so, we write $\alpha_j\in\mathbb{R}$ the worker ability and $\beta_i\in\mathbb{R}^+_\star$ the task difficulty.

The model is as follows:

\[\mathbb{P}\bigg(y_i^{(j)}=y_i^\star\bigg)=\frac{1}{1+e^{-\alpha_j\beta_i}}.\]

We also assume that the error is uniform elsewhere, i.e. in a classification setting with $K$ classes that \(\mathbb{P}\bigg(y_i^{(j)}\neq y_i^\star\bigg)=\frac{1}{K-1}\bigg(1-\frac{1}{1+e^{-\alpha_j\beta_i}}\bigg).\)

CLI

With peerannot in a terminal located in the directory of answers, the GLAD model can be used as follows.

peerannnot aggregate . --strategy GLAD --answers-file answers.json

Note that by default, if the answers are in a file names answers.json the --answers-file argument can be omitted.

API

Import the aggregation model in the current session

from peerannot.models import GLAD

Assuming the answers are in a dictionary names answers then run:

glad = GLAD(answers, n_workers, n_classes, dataset=pathlib.Path.cwd() / "glad")
glad.run()
yhat = glad.get_probas()

In the implementation, the prior on $(\alpha_j)_j$ and $(\beta_i)_i$ is set to a vector of ones. This can be altered as follows for example with a prior on alphas of 2 and a prior on betas of 3:

glad = GLAD(answers, n_workers, n_classes, dataset=pathlib.Path.cwd() / "glad")
glad.priorAlpha(2*np.ones(glad.n_workers))
glad.priorBeta(3*np.ones(glad.n_task))
glad.run()

Estimated abilities

To access the estimated confusion matrices in a variable pi, run:

beta = glad.beta
alpha = glad.alpha
print(alpha.shape, beta.shape)
# (n_worker,) (n_task,)

Aggregate into hard labels

After running the aggregation strategy, instead of soft labels one can recover hard labels by running:

yhat_hard = glad.get_answers()

Note that this is an argmax on the first dimension with a random split in case of equalities.

API details: class models.GLAD

GLAD model class that herits from CrowdModel


__init__(answers, n_classes,**kwargs)

Parameters:

  • answers:(dict) Dictionary of workers answers with format
    
              {
                  task0: {worker0: label, worker1: label},
                  task1: {worker1: label}
              }
    
  • n_classes: (int) Number of possible classes
  • kwargs: (dict) Dictionary that should contain at least n_workers the number of workers. Other arguments are path_remove to remove tasks identified from the WAUM or another method.

EM(epsilon, maxiter)

Parameters:

  • epsilon: (float) relative error between two iterates of the expectation of the joint likelihood
  • maxiter: (int) maximum number of iterations in the EM algorithm

run(epsilon, maxiter)

Run the EM algorithm for a given set of parameters

Parameters:

  • epsilon: (float) relative error between two iterates of the expectation of the joint likelihood
  • maxiter: (int) maximum number of iterations in the EM algorithm

save_difficulty(path)

Save coefficients $(\beta)_i$ at a given path as numpy arrays.

Parameters:

  • path: (str) file path in which coefficients are saved using the np.save function.