CLI datasets

To install a dataset (from the example datasets or your custom dataset) use the command:

peerannot install installationFile.py

For either case, a Python file describing the installation will be needed.

Example datasets

To install an example dataset, only the installation file is needed. For example if you want to install the cifar10H dataset, run:

cd datasets/cifar10H & peerannot install cifar10h.py

Custom datasets

To install your dataset, you will have to use the customDataset.py installation file located at datasets/ and add multiple arguments depending on the structure of your dataset.

Only the answers file and the answers-format arguments must always be included, for more information about the answers formats see the format page:

Taskless dataset

If your dataset has no task, then you can add the no-task flag with the answers and answers-format argument.

cd datasets/MyDataset & peerannot install ../customDataset.py --no-task --answers answersFile.json --answers-format 1

Dataset with tasks

In the case your dataset has tasks (if you want to train a model for image classification). A train-set must be included and you will have to specify its path. A files-path also has to be given. It should include the path to the file with the same order as the one in the answers file.

A validation set can be provided with the val-set option but is not mandatory. In case a validation set is not provided it will be created with 20% of the train set.

Finally, label names can be provided in a file with the option label-names which can help construct the structure of the dataset (especially if the test set has no ground truth file). In case it’s not given, it will be assumed that the structure of the dataset is similar to a Pytorch ImageFolder dataset (see https://pytorch.org/vision/main/generated/torchvision.datasets.ImageFolder.html) where tasks are arranged inside folders per labels.

Here are some examples of commands to create custom datasets:

Creation of a dataset with no task:

peerannot install datasets/customDataset.py --answers-format 2 --answers PATH_TO_ANSWERS_FILE/answers.json --no-task

Creation of a dataset with a train, val and test set:

peerannot install datasets/customDataset.py --train-path PATH_TO_TRAIN_DIR --test-path PATH_TO_TEST_DIR --val-path PATH_TO_VAL_DIR --answers PATH_TO_ANSWERS_FILE/answers.txt --files-path PATH_TO_FILENAMES_FILE/filenames.txt --label-names PATH_TO_LABELNAMES_FILE/labelNames.txt

Creation of a dataset with only a train set:

peerannot install datasets/customDataset.py --train-path PATH_TO_TRAIN_DIR --answers-format 1 --files-path PATH_TO_FILENAME_FILE/filenames.txt --answers PATH_TO_ANSWERS_FILE/answers.json --label-names PATH_TO_LABELNAMES_FILE/labelNames.txt

The help documentation is available in the terminal from:

peerannot install --help

peerannot install

Install dataset from .py file

peerannot install [OPTIONS] PATH

Options

--no-task

True if no task is associated with the dataset

--answers-format <answers_format>

annotation file format

--answers <answers>

annotation file

--metadata <metadata>

metadata information file

--label-names <label_names>

path to label names files

--files-path <files_path>

path to train filenames

--train-path <train_path>

path to train data

--test-ground-truth-format <test_ground_truth_format>

annotation file format

--test-ground-truth <test_ground_truth>

test ground truth file

--test-path <test_path>

path to test data

--val-path <val_path>

path to val data

Arguments

PATH

Required argument

Each dataset is a folder with:



  • name.py: python file containing how to download and format data

  • answers.json: json file containing each task voted labels

  • metadata.json: all metadata for dataset, at least the name, n_task and n_classes