Add a dataset to peerannot

This tutorial shows how to add a new dataset in peerannot.

Hint

If not yet done, please go to get started page to install the peerannot library.

What is a dataset?

Datasets are located in the datasets/ directory. You can create a new dataset by creating a new directory in datasets/ and adding the following files:

  • mydataset.py containing how to install the dataset

  • metadata.json containing all relevant information about the dataset

  • answers.json a .json file containing the answers to the questions asked in the crowdsourced experiment

If the task images are also available, they can be installed in the mydataset.py file using the setfolders method.

datasets/mydataset/mydataset.py
import json
from pathlib import Path

class MyDataset:
    name = 'mydataset'

    def __init__(self):
        self.DIR = Path(__file__).parent.resolve()
        ...  # download all necessary files

    def setfolders(self):
        ...  # split data into train/val/test folders
        selg.get_crowd_labels()

    def get_crowd_labels(self):
        ...  # save the crowd labels in a .json file
        with open(self.DIR / "answers.json", "w") as answ:
            json.dump(mylabels, answ, ensure_ascii=False, indent=3)

The dataset folder should look like this:

mydataset/
├── mydataset.py     # install file
├── train/
│   ├── ...          # existing images with crowd labels
├── val/
│   ├── ...          # existing images for validation
├── test/
│   ├── ...          # existing images with known labels
├── answers.json     # crowdsourced answers
└── metadata.json    # dataset information