Add a dataset to peerannot¶
This tutorial shows how to add a new dataset in peerannot
.
Hint
If not yet done, please go to get started page to install the peerannot
library.
What is a dataset?¶
Datasets are located in the datasets/ directory. You can create a new dataset by creating a new directory in datasets/ and adding the following files:
mydataset.py containing how to install the dataset
metadata.json containing all relevant information about the dataset
answers.json a .json file containing the answers to the questions asked in the crowdsourced experiment
If the task images are also available, they can be installed in the mydataset.py file using the setfolders method.
import json
from pathlib import Path
class MyDataset:
name = 'mydataset'
def __init__(self):
self.DIR = Path(__file__).parent.resolve()
... # download all necessary files
def setfolders(self):
... # split data into train/val/test folders
selg.get_crowd_labels()
def get_crowd_labels(self):
... # save the crowd labels in a .json file
with open(self.DIR / "answers.json", "w") as answ:
json.dump(mylabels, answ, ensure_ascii=False, indent=3)
The dataset folder should look like this:
mydataset/
├── mydataset.py # install file
├── train/
│ ├── ... # existing images with crowd labels
├── val/
│ ├── ... # existing images for validation
├── test/
│ ├── ... # existing images with known labels
├── answers.json # crowdsourced answers
└── metadata.json # dataset information