Generative model of Labels, Abilities, and Difficulties

Model
CLI
API
- Estimated abilities
- Aggregate into hard labels
API details: class models.GLAD

Model

GLAD’s model models both the worker ability and task difficulty into the label aggregation scheme. In order to do so, we write $\alpha_j\in\mathbb{R}$ the worker ability and $\beta_i\in\mathbb{R}^+_\star$ the task difficulty.

The model is as follows:

\[\mathbb{P}\bigg(y_i^{(j)}=y_i^\star\bigg)=\frac{1}{1+e^{-\alpha_j\beta_i}}.\]

We also assume that the error is uniform elsewhere, i.e. in a classification setting with $K$ classes that $\mathbb{P}\bigg(y_i^{(j)}\neq y_i^\star\bigg)=\frac{1}{K-1}\bigg(1-\frac{1}{1+e^{-\alpha_j\beta_i}}\bigg).$

CLI

With peerannot in a terminal located in the directory of answers, the GLAD model can be used as follows.

peerannnot aggregate . --strategy GLAD --answers-file answers.json

Note that by default, if the answers are in a file names answers.json the --answers-file argument can be omitted.

API

Import the aggregation model in the current session

from peerannot.models import GLAD

Assuming the answers are in a dictionary names answers then run:

glad = GLAD(answers, n_workers, n_classes, dataset=pathlib.Path.cwd() / "glad")
glad.run()
yhat = glad.get_probas()

In the implementation, the prior on $(\alpha_j)_j$ and $(\beta_i)_i$ is set to a vector of ones. This can be altered as follows for example with a prior on alphas of 2 and a prior on betas of 3:

glad = GLAD(answers, n_workers, n_classes, dataset=pathlib.Path.cwd() / "glad")
glad.priorAlpha(2*np.ones(glad.n_workers))
glad.priorBeta(3*np.ones(glad.n_task))
glad.run()

Estimated abilities

To access the estimated confusion matrices in a variable pi, run:

beta = glad.beta
alpha = glad.alpha
print(alpha.shape, beta.shape)
# (n_worker,) (n_task,)

Aggregate into hard labels

After running the aggregation strategy, instead of soft labels one can recover hard labels by running:

yhat_hard = glad.get_answers()

Note that this is an argmax on the first dimension with a random split in case of equalities.

API details: class models.GLAD

GLAD model class that herits from CrowdModel

__init__(answers, n_classes,**kwargs)

Parameters:

answers:(dict) Dictionary of workers answers with format

          {
              task0: {worker0: label, worker1: label},
              task1: {worker1: label}
          }

n_classes: (int) Number of possible classes
kwargs: (dict) Dictionary that should contain at least n_workers the number of workers. Other arguments are path_remove to remove tasks identified from the WAUM or another method.

EM(epsilon, maxiter)

Parameters:

epsilon: (float) relative error between two iterates of the expectation of the joint likelihood
maxiter: (int) maximum number of iterations in the EM algorithm

run(epsilon, maxiter)

Run the EM algorithm for a given set of parameters

Parameters:

epsilon: (float) relative error between two iterates of the expectation of the joint likelihood
maxiter: (int) maximum number of iterations in the EM algorithm

save_difficulty(path)

Save coefficients $(\beta)_i$ at a given path as numpy arrays.

Parameters:

path: (str) file path in which coefficients are saved using the np.save function.

Search for

Generative model of Labels, Abilities, and Difficulties

Model

CLI

API

Estimated abilities

Aggregate into hard labels

API details: class models.GLAD