CLI simulate¶
The help documentation is available in the terminal from:
peerannot simulate --help
Simulate independent mistakes¶
The independent mistakes setting considers that each worker \(w_j\) answers following a multinomial distribution with weights given at the row \(y_i^{\star}\) of their confusion matrix \(\pi^{(j)}\). Each confusion row in the confusion matrix is generated uniformly in the simplex. Then, we make the matrix diagonally dominant (to represent non-adversarial workers) by switching the diagonal term with the maximum value by row. Answers are independent of one another as each matrix is generated independently and each worker answers independently of other workers. In this setting, the DS model is expected to perform better with enough data as we are simulating data from its assumed noise model.
peerannot simulate --n-worker=30 --n-task=200 --n-classes=5 \
--strategy independent-confusion \
--feedback=10 --seed 0 \
--folder ./simus/independent
This example generates 200 tasks and 30 workers with \(K=5\) classes. Each task receivex \(\mathcal{A}(x_i)=10\) votes. This leads to around \(200\times 10/30\simeq 67\) tasks per worker (variations are due to the randomness in the affectations).
Note
To create imbalanced number of votes per task, use the --imbalance-votes
option. The number of votes is then chosen at random uniformly between 1 and the number of workers available.
Simulate mistakes with discrete difficulty levels¶
Introduced in Whitehill et al. (2009), workers are either good or bad. Tasks are either easy or hard. The keyword ratio-diff
indicates the prevalence of each level of difficulty as the ratio of easy tasks over hard tasks:
peerannot simulate --n-worker=100 --n-task=200 --n-classes=5 \
--strategy discrete-difficulty \
--ratio 0.35 --ratio-diff 1 \
--feedback 10 --seed 0 \
--folder ./simus/discrete_difficulty
We simulate 200 tasks and 100 workers with \(K=5\) classes. Each task receives \(\mathcal{A}(x_i)=10\) votes. The ratio of good workers is 0.35. The ratio of easy tasks is 1. 35% of workers are good and there is 50% of easy tasks.
peerannot simulate¶
Crowdsourcing simulations of workers
peerannot simulate [OPTIONS]
Options
- --n-worker <n_worker>¶
Number of workers
- --n-task <n_task>¶
Number of tasks
- -K, --n-classes <n_classes>¶
Number of classes
- --folder <folder>¶
Folder in which produces simulations are stored.
- -s, --strategy <strategy>¶
Type of worker simulation
- --matrix-file <matrix_file>¶
Numpy file containing a tensor of confusion matrices of size (n_worker, n_classes, n_classes)
- -r, --ratio <ratio>¶
Number in (0,1) representing the ratio of spammers/students/good workers amongst total number of workers (depending on the strategy used)
- --ratio-diff <ratio_diff>¶
Ratio of easy tasks amongst hard. Only used in simulations based on task difficulty
- --random <random>¶
Probability for a given task to have a difficulty random ie to be unidentifiable
- -wl, --workerload <workerload>¶
Upper bound on the number of tasks answered per worker
- -fb, --feedback <feedback>¶
Upper bound on the number of labels per task
- --imbalance-votes¶
If set, the number of votes per task is randomly chosen between 1 and the possible number of votes considering the constraint on the workerload and feedback force.
- --seed <seed>¶
Randome state for reproducibility
- --verbose¶
Display more information