{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Simulate the dataset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We simulate 40 workers in a hammer-spammer setting. There are $100\\times 0.7=70$ spammers that will answer randomly. All other workers answer the true labels." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from pathlib import Path\n", "path = (Path() / \"..\" / \"_build\" / \"notebooks\")\n", "path.mkdir(exist_ok=True, parents=True)\n", "\n", "! peerannot simulate --n-worker=100 --n-task=300 --n-classes=5 \\\n", " --strategy hammer-spammer \\\n", " --ratio 0.7 \\\n", " --feedback=10 --seed 0 \\\n", " --folder {path}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that if the dataset comes with an install file (like the `LabelMe` dataset available in peerannot), simply run the install file to download the dataset:\n", "\n", "```\n", "$ peerannot install labelme.py\n", "```\n", "\n", "Below, we always precise where the labels are stored in the dataset. This is to hilight that multiple datasets can be used with the same code, as long as the labels are stored in the same way." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Value of the krippendorff alpha\n", "\n", "The closer to 0, the less reliable the data. The closer to 1, the more reliable the data.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "! peerannot identify -s krippendorffalpha {path} \\\n", " --labels {path}/answers.json \\\n", " --n-classes 5\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We obtain $\\alpha\\simeq 0.08$ which indicates that the data is not reliable." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Identify spammers" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If there are ambiguities, we can identify spammers by looking at the spammer score. The closer to 0, the more likely the annotator is a spammer." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "! peerannot identify -s spam_score {path} \\\n", " --labels {path}/answers.json \\\n", " --n-classes 5" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "\n", "spam_scores = np.load(path / \"identification\" / \"spam_score.npy\")\n", "plt.figure()\n", "plt.hist(spam_scores, bins=20)\n", "plt.xlabel(\"Spam score\")\n", "plt.ylabel(\"Count\")\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can get the ID of workers with a spam score below $0.5$:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(np.where(spam_scores < 0.5))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Aggregation with and without identification" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from peerannot.models import DawidSkene as DS\n", "from peerannot.models import MajorityVoting as MV\n", "import json\n", "\n", "with open(path / \"answers.json\") as f:\n", " answers = json.load(f)\n", "\n", "gt = np.load(path / \"ground_truth.npy\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "y_mv = MV(answers, n_classes=5).get_answers()\n", "ds = DS(answers, n_classes=5, n_workers=100)\n", "ds.run()\n", "y_ds = ds.get_answers()\n", "print(f\"\"\"\n", " - MV accuracy: {np.mean(y_mv == gt)}\n", " - DS accuracy: {np.mean(y_ds == gt)}\n", " \"\"\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Because the DS model models the confusions, it was able to generate better predicions than the majority vote. Let's see if we can identify the spammers and improve the predictions." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "id_spammers = list(np.where(spam_scores < 0.5)[0])\n", "\n", "ans_cleaned = {}\n", "worker_ids = {}\n", "for task in answers.keys():\n", " ans_cleaned[task] = {}\n", " for worker, label in answers[task].items():\n", " if int(worker) in id_spammers:\n", " pass\n", " else:\n", " if worker_ids.get(worker, None) is None:\n", " worker_ids[worker] = len(worker_ids)\n", " ans_cleaned[task][worker_ids[worker]] = label" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "y_mv = MV(ans_cleaned, n_classes=5).get_answers()\n", "ds = DS(ans_cleaned, n_classes=5, n_workers=len(worker_ids))\n", "ds.run()\n", "y_ds = ds.get_answers()\n", "print(\n", " f\"\"\"\n", " - MV accuracy: {np.mean(y_mv == gt)}\n", " - DS accuracy: {np.mean(y_ds == gt)}\n", " \"\"\"\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that we cleaned the data, we can aggregate the labels again and obtain a majority vote that performs as good as the DS strategy !\n", "\n", "Similar modifications can be done by identifying the ambiguous tasks and not the workers." ] } ], "metadata": { "kernelspec": { "display_name": "peerannot", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.19" } }, "nbformat": 4, "nbformat_minor": 2 }