GLAD

class GLAD(answers, n_classes, **kwargs)

GLAD (Whitehill et. al 2009)

Each worker ability is modeled using one scalar. Each task has a difficulty level represented as a positive scalar. Knowing these coefficients, the probability to have the right answer is a sigmoid of their product.

Assumption: - The errors are uniform over classes

Using: - One scalar per task and worker (task difficulty and worker ability)

__init__(answers, n_classes, **kwargs)

The probability of a worker to give the right answer is a sigmoid (denoted \(\mathrm{sig}\)) of the product of the worker ability \(\alpha_j\) and the task difficulty \(\beta_i\). Given a label \(k\in [K]\),

\[\mathbb{P}(y_i^{(j)}=k |y_i^\star=k, \alpha_j,\beta_i) = \mathrm{sig}(\alpha_j\beta_i) = \frac{1}{1+e^{-\alpha_j\beta_i}} \enspace.\]

And the following likelihood is maximized:

\[\prod_{i\in[n_\text{task}]} \prod_{k\in[K]}\mathbb{P}(y_i^\star=k)\prod_{j\in [n_\text{worker}]} \left(\frac{1}{K-1}\left(1-\mathrm{sig}(\alpha_j\beta_i)\right)\right)^{1-\mathbf{1}_{\{y_i^{(j)}=k\}}}\mathrm{sig}(\alpha_j\beta_i)^{\mathbf{1}_{\{y_i^{(j)}=k\}}} \enspace.\]
Parameters:
  • answers (dict) –

    Dictionary of workers answers with format

    {
        task0: {worker0: label, worker1: label},
        task1: {worker1: label}
    }
    

  • n_classes (int) – Number of possible classes

  • dataset (path) – path to where model estimated parameters are stores Defaults to the current directory

EM(epsilon, maxiter)

Infer true labels, tasks’ difficulty and workers’ ability

calcLogProbL(item, *args)

Compute the log probability of a label given the task and worker parameters

EStep()

Evaluate the posterior probability of true labels given observed labels and parameters

packX()
unpackX(x)
getBoundsX(alpha=(-100, 100), beta=(-100, 100))
f(x)

Return the value of the objective function

df(x)

Return gradient vector

MStep()

Maximization step

computeQ()

Calculate the expectation of the joint likelihood

dAlpha(item, *args)

Compute the derivative of the objective function with respect to the worker ability

dBeta(item, *args)

Compute the derivative of the objective function with respect to the task difficulty

gradientQ()
run(epsilon=1e-05, maxiter=50)

Run the label aggregation via EM algorithm

Parameters:
  • epsilon (float, optional) – tolerance hyperparameter, relative change in likelihood, defaults to 1e-5

  • maxiter (int, optional) – Maximum number of iterations, defaults to 100

get_probas()

Get soft labels distribution for each task

Returns:

Soft labels

Return type:

numpy.ndarray(n_task, n_classes)

get_answers()

Argmax of soft labels.

Returns:

Hard labels

Return type:

numpy.ndarray

save_difficulty(path)

Save task difficulty coefficients

Parameters:

path (str) – path folder

save_ability(path)

Save worker ability coefficients

Parameters:

path (str) – path folder