Appendix II: Dataset Preparation for Common AI

XGen provides a set of popular base AI models for some common AI tasks. For a common AI task, users may provide their own datasets and use XGen to optimize one of those base models into a model suiting the specific need.

The simplest way to use users' own dataset is to convert it to the format compatible to the expected format of the common AI model of interest. This appendix explains the expected data format of each of the common AI tasks.

Image Classification

Dataset Preparation

For multi-class classification, we recommend to use ImageNet format.

The images should be arranged in the following structure:

train/
├── cat
│   ├── xxx.png
│   ├── xxy.png
│   └── ...
│   └── xxz.png
├── bird
│   ├── bird1.png
│   ├── bird2.png
│   └── ...
└── dog
    ├── 123.png
    ├── nsdf3.png
    ├── ...
    └── asd932_.png

In this way, the users don't need to provide separate annotations: All images in the a directory (e.g., cat) will be regarded as samples of that class.

Usually, the whole dataset shall be split into two sub datasets: train and val for training and validation respectively:

image_classification
└── my_dataset
    ├── train
    └── val

The val folder should be in the same structure as the train folder.

Use in XGen

With dataset organized as instructed, the user may use the custom AI in XGen on any of the included image classification models that has ImageNet as the default dataset. Below is a sample of the inputs in a custom AI:

# Training data absolute path should be absolute train path in the container
Training data absolute path: /data/prepare_data/image_classification/my_dataset/train
# Training labels absolute path should be null for image classification
Training labels absolute path:
# Eval data absolute path should be absolute eval path in the container
Eval data absolute path: /data/prepare_data/image_classification/my_dataset/val
# Eval labels absolute path should be null for image classification
Eval labels absolute path:
# Number of classes is total classes in your image classification dataset
Number of classes: 5

Object Detection (YOLOX)

Dataset Preparation

For object detection, we use COCO format.

Organize your train and val images and labels according to the example below.

object_detection_datasets/
  - YourDataSetName/
    - annotations/
      - instances_train2017.json
      - instances_val2017.json
    - train2017/
      - <image files for training>
    - val2017/
      - <image files for validation>

Note that the names of your train and val annotations must be instances_train2017.json and instances_val2017.json

Use in XGen

Then, when using XGen, the user inputs the paths info when XGen asks.

# Dataset absolute path should be absolute object_detection_datasets
The root directory of the dataset:  /data/prepare_data/object_detection_datasets/YourDataSetName/
# Number of classes is total classes in your object detection dataset, the number of classes must be equal to the number of names in .yaml files
Number of classes: 80

Object Detection (YOLOV6)

Dataset Preparation

For object detection, we use YOLOv5 format.

Organize your train and val images and labels according to the example below.

object_detection_datasets
└── YourDataSetName
    ├── images
    │    ├── train
    │    │   ├── train_01.png
    │    │   ├── train_02.png
    │    │   ├── train_03.png
    │    │   └── ...
    │    ├──  val
    │    │   ├── val_01.png
    │    │   ├── val_02.png
    │    │   └── ...
    ├── labels
    │    ├── train
    │    │   ├── train_01.txt
    │    │   ├── train_02.txt
    │    │   ├── train_03.txt
    │    │   └── ...
    │    ├── val
    │    │   ├── val_01.txt
    │    │   ├── val_02.txt
    │    │   └── ...

The labels folder contains one text label file for each of the images unless that image contains no object. The label file specifications are as follows:

One row per object
Each row shows the following of an object in the image:

class x_center y_center width height,

where, class is the class label of that object (zero-indexed), x_center y_center width height are the coordinates, width and height of the bounding box of that object. The coordinates, width and height must all be normalized to 0-1 by the width and height of the image.

bbox

The label file corresponding to the above image contains 2 persons (class 0) and a tie (class 27):

label

Label Images

user can use Roboflow to help prepare datasets. Create a free Roboflow account, upload your dataset, and then label the images as needed. After that, user can export the labels to the needed format (YOLO v5 PyTorch format). That website also allows a user to convert a dataset with labels of other formats (e.g., COCO) to convert the labels to the required format.

Note, if you have a labeled dataset already in the COCO format, you may use coco_json2yolo.py to convert the label files to the format expected by YOLO. Example

python coco_json2yolo.py ./object_detection_datasets/coco/annotations/instances_val2017.json

Use in XGen

To use XGen on the prepared dataset, the user need to create a yaml file to indicate the number and names of the classes, as illustrated as follows:

# Classes
train: /data/object_detection_datasets/images/train
val: /data/object_detection_datasets/images/val
test: /data/object_detection_datasets/images/test
# If you use coco dataset, set is_coco: True, and original json path must set there
anno_path: /data/object-detection-yolov6/coco/annotations/instances_val2017.json
# number of classes
nc: 80
# whether it is coco dataset, only coco dataset should be set to True.
is_coco: True
nc: 80  # number of classes
names: ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light',
        'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
        'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',
        'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
        'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',
        'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
        'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
        'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear',
        'hair drier', 'toothbrush']  # class names

Then, when using XGen, the user inputs the paths info when XGen asks.

Path to the dataset description yaml file (coco.yaml in yolov6):/data/prepare_data/object_detection_datasets/coco/coco.yaml
# Number of classes is total classes in your object detection dataset, the number of classes must be equal to the number of names in .yaml files
Number of classes: 80

Object Detection (YOLOV8)

Dataset Preparation

The data format for YOLOv8 is consistent with YOLOv6, and you can refer to the previous data description section for YOLOv6.

Use in XGen

To use XGen on the prepared dataset, the user need to create a yaml file to indicate the number and names of the classes, as illustrated as follows:

path: /data/object-detection/coco  # dataset root dir
train: train2017.txt  # train images (relative to 'path') 118287 images
val: val2017.txt  # val images (relative to 'path') 5000 images
test: test-dev2017.txt  # 20288 of 40670 images, submit to https://competitions.codalab.org/competitions/20794

# Classes
names:
  0: person
  1: bicycle
  2: car
  3: motorcycle
  4: airplane
  5: bus
  6: train
  7: truck
  8: boat
  9: traffic light
  10: fire hydrant
  11: stop sign
  12: parking meter
  13: bench
  14: bird
  15: cat
  16: dog
  17: horse
  18: sheep
  19: cow
  20: elephant
  21: bear
  22: zebra
  23: giraffe
  24: backpack
  25: umbrella
  26: handbag
  27: tie
  28: suitcase
  29: frisbee
  30: skis
  31: snowboard
  32: sports ball
  33: kite
  34: baseball bat
  35: baseball glove
  36: skateboard
  37: surfboard
  38: tennis racket
  39: bottle
  40: wine glass
  41: cup
  42: fork
  43: knife
  44: spoon
  45: bowl
  46: banana
  47: apple
  48: sandwich
  49: orange
  50: broccoli
  51: carrot
  52: hot dog
  53: pizza
  54: donut
  55: cake
  56: chair
  57: couch
  58: potted plant
  59: bed
  60: dining table
  61: toilet
  62: tv
  63: laptop
  64: mouse
  65: remote
  66: keyboard
  67: cell phone
  68: microwave
  69: oven
  70: toaster
  71: sink
  72: refrigerator
  73: book
  74: clock
  75: vase
  76: scissors
  77: teddy bear
  78: hair drier
  79: toothbrush

Then, when using XGen, the user inputs the paths info when XGen asks.

Path to the dataset description yaml file (coco.yaml in yolov8):/data/prepare_data/object_detection_datasets/coco/coco.yaml
# Number of classes is total classes in your object detection dataset, the number of classes must be equal to the number of names in .yaml files
Number of classes: 80

Object Segmentation

Dataset Preparation

For object segmentation, we use ISBI Challenge Dataset format as it is the format expected by UNet, the included segmentation model in XGen. The original dataset struct of ISBI dataset is as follows.

object_segmentation_datasets
└── ISBI
    ├── em
    │    ├── train-volume.tif
    │    ├── train-labels.tif
    │    └── test-volume.tif

If the users' data are in that format, they can use ~/Projects/segmentation-unet/run_make_database.py script to convert the dataset to the format required by UNet. Please customize the data path in the script before running.

python run_make_database.py

The users should make sure that the train and val images and labels are in a structure as shown as follows.

object_segmentation_datasets
└── ISBI
    ├── em
    │    ├── train
    │    │   ├── input_001.npy
    │    │   ├── input_002.npy
    │    │   ├── input_003.npy
    │    │   ├── ...
    │    │   ├── label_001.npy
    │    │   ├── label_002.npy
    │    │   ├── label_002.npy
    │    │   └── ...
    │    ├── val
    │    │   ├── input_001.npy
    │    │   ├── input_002.npy
    │    │   ├── ...
    │    │   ├── label_001.npy
    │    │   ├── label_002.npy
    │    │   └── ...
    │    ├── test
    │    │   ├── input_001.npy
    │    │   ├── input_002.npy
    │    │   ├── ...
    │    │   ├── label_001.npy
    │    │   ├── label_002.npy
    │    │   └── ...

Use in XGen

# Training data absolute path should be absolute train images path
Training data absolute path: /data/prepare_data/object_segmentation_datasets/ISBI/
# Training labels absolute path should be absolute train label path
Training labels absolute path: /data/prepare_data/object_segmentation_datasets/ISBI/
# Eval data absolute path should be absolute eval images path
Eval data absolute path: /data/prepare_data/object_segmentation_datasets/ISBI/
# Eval labels absolute path should be absolute eval images path
Eval labels absolute path: /data/prepare_data/object_segmentation_datasets/ISBI/

Video Classification

Dataset Preparation

For video classification, the format is one derived from UCF-101 dataset format.

The original structure of the UCF-101 dataset is as follows:

video_classification_datasets
└── ucf101
   ├── ApplyEyeMakeup
   │   ├── v_ApplyEyeMakeup_g08_c01.avi
   │   ├── v_ApplyEyeMakeup_g08_c02.avi
   │   └── ...
   ├── ApplyLipstick
   │   ├── v_ApplyLipstick_g08_c01.avi
   │   ├── v_ApplyLipstick_g08_c02.avi
   │   └── ...
   └── Archery
   │   ├── v_Archery_g08_c01.avi
   │   ├── v_Archery_g08_c02.avi
   │   └── ...
   └── ...

If the users' dataset is in such a format and structure, before applying XGen on it, the user needs to convert it with several scripts included in ~/Projects/video-classification-s2-1d/ as follows:

Extract frames from videos by running python extract_videos.py -d ucf101.

Convert frames into LMDB files with create_lmdb.py.

python create_lmdb.py -d ucf101_frame -s train -vr 0 10000
python create_lmdb.py -d ucf101_frame -s val -vr 0 4000

The users shall make sure that after the preparation, the datasets are in the following structure:

video_classification_datasets
├── ucf101_frame
│   ├── v_ApplyEyeMakeup_g08_c01
│   │    ├── 00001.jpg
│   │    ├── 00002.jpg
│   │    ├── 00003.jpg
│   │    └── ...
│   ├── v_ApplyEyeMakeup_g08_c02
│   │    ├── 00001.jpg
│   │    ├── 00002.jpg
│   │    ├── 00003.jpg
│   │    └── ...
│   ├── ...
│   ├── v_ApplyLipstick_g08_c01
│   │    ├── 00001.jpg
│   │    ├── 00002.jpg
│   │    ├── 00003.jpg
│   │    └── ...
│   ├── v_ApplyLipstick_g08_c02
│   │    ├── 00001.jpg
│   │    ├── 00002.jpg
│   │    ├── 00003.jpg
│   │    └── ...
│   ├── ...
│   ├── v_Archery_g08_c01
│   │    ├── 00001.jpg
│   │    ├── 00002.jpg
│   │    ├── 00003.jpg
│   │    └── ...
│   ├── ...
└── ucfTrainTestlist
    ├── classInd.txt
    ├── testlist01.txt
    └── trainlist01.txt

The classInd.txt records the indices of the classes, as illustrated as follows:

1 ApplyEyeMakeup
2 ApplyLipstick
3 Archery
4 BabyCrawling
5 BalanceBeam
...

Use in XGen

# Training data absolute path should be absolute train images path
Training data absolute path: /data/prepare_data/video_classification_datasets/ucf101_frame
# Training labels absolute path should be null for video classification
Training labels absolute path:
# Eval data absolute path should be absolute eval path in the container
Eval data absolute path:
# Eval labels absolute path should be null for video classification
Eval labels absolute path:
# Number of classes is total classes in your video classification dataset
Number of classes: 5