GLAD¶

class GLAD(answers, n_classes, **kwargs)¶

GLAD (Whitehill et. al 2009)¶

Each worker ability is modeled using one scalar. Each task has a difficulty level represented as a positive scalar. Knowing these coefficients, the probability to have the right answer is a sigmoid of their product.

Assumption: - The errors are uniform over classes

Using: - One scalar per task and worker (task difficulty and worker ability)

__init__(answers, n_classes, **kwargs)¶

The probability of a worker to give the right answer is a sigmoid (denoted \(\mathrm{sig}\)) of the product of the worker ability \(\alpha_j\) and the task difficulty \(\beta_i\). Given a label \(k\in [K]\),

\[\mathbb{P}(y_i^{(j)}=k |y_i^\star=k, \alpha_j,\beta_i) = \mathrm{sig}(\alpha_j\beta_i) = \frac{1}{1+e^{-\alpha_j\beta_i}} \enspace.\]

And the following likelihood is maximized:

\[\prod_{i\in[n_\text{task}]} \prod_{k\in[K]}\mathbb{P}(y_i^\star=k)\prod_{j\in [n_\text{worker}]} \left(\frac{1}{K-1}\left(1-\mathrm{sig}(\alpha_j\beta_i)\right)\right)^{1-\mathbf{1}_{\{y_i^{(j)}=k\}}}\mathrm{sig}(\alpha_j\beta_i)^{\mathbf{1}_{\{y_i^{(j)}=k\}}} \enspace.\]

Parameters:

answers (dict) –

Dictionary of workers answers with format

{
    task0: {worker0: label, worker1: label},
    task1: {worker1: label}
}

n_classes (int) – Number of possible classes
dataset (path) – path to where model estimated parameters are stores Defaults to the current directory

EM(epsilon, maxiter)¶: Infer true labels, tasks’ difficulty and workers’ ability

calcLogProbL(item, *args)¶: Compute the log probability of a label given the task and worker parameters

EStep()¶: Evaluate the posterior probability of true labels given observed labels and parameters

packX()¶

unpackX(x)¶

getBoundsX(alpha=(-100, 100), beta=(-100, 100))¶

f(x)¶: Return the value of the objective function

df(x)¶: Return gradient vector

MStep()¶: Maximization step

computeQ()¶: Calculate the expectation of the joint likelihood

dAlpha(item, *args)¶: Compute the derivative of the objective function with respect to the worker ability

dBeta(item, *args)¶: Compute the derivative of the objective function with respect to the task difficulty

gradientQ()¶

run(epsilon=1e-05, maxiter=50)¶

Run the label aggregation via EM algorithm

Parameters:

epsilon (float, optional) – tolerance hyperparameter, relative change in likelihood, defaults to 1e-5
maxiter (int, optional) – Maximum number of iterations, defaults to 100

get_probas()¶

Get soft labels distribution for each task

Returns:: Soft labels
Return type:: numpy.ndarray(n_task, n_classes)

get_answers()¶

Argmax of soft labels.

Returns:: Hard labels
Return type:: numpy.ndarray

save_difficulty(path)¶

Save task difficulty coefficients

Parameters:: path (str) – path folder

save_ability(path)¶

Save worker ability coefficients

Parameters:: path (str) – path folder