Appendix II: Dataset Preparation for Common AI
XGen provides a set of popular base AI models for some common AI tasks. For a common AI task, users may provide their own datasets and use XGen to optimize one of those base models into a model suiting the specific need.
The simplest way to use users' own dataset is to convert it to the format compatible to the expected format of the common AI model of interest. This appendix explains the expected data format of each of the common AI tasks.
Image Classification
Dataset Preparation
For multi-class classification, we recommend to use ImageNet format.
The images should be arranged in the following structure:
train/
├── cat
│ ├── xxx.png
│ ├── xxy.png
│ └── ...
│ └── xxz.png
├── bird
│ ├── bird1.png
│ ├── bird2.png
│ └── ...
└── dog
├── 123.png
├── nsdf3.png
├── ...
└── asd932_.png
In this way, the users don't need to provide separate annotations: All images in the a directory (e.g., cat
) will be regarded as samples of that class.
Usually, the whole dataset shall be split into two sub datasets: train
and val
for training and validation respectively:
image_classification
└── my_dataset
├── train
└── val
val
folder should be in the same structure as the train
folder.
Use in XGen
With dataset organized as instructed, the user may use the custom AI in XGen on any of the included image classification models that has ImageNet
as the default dataset. Below is a sample of the inputs in a custom AI:
# Training data absolute path should be absolute train path in the container
Training data absolute path: /data/prepare_data/image_classification/my_dataset/train
# Training labels absolute path should be null for image classification
Training labels absolute path:
# Eval data absolute path should be absolute eval path in the container
Eval data absolute path: /data/prepare_data/image_classification/my_dataset/val
# Eval labels absolute path should be null for image classification
Eval labels absolute path:
# Number of classes is total classes in your image classification dataset
Number of classes: 5
Object Detection (YOLOX)
Dataset Preparation
For object detection, we use COCO format.
Organize your train and val images and labels according to the example below.
object_detection_datasets/
- YourDataSetName/
- annotations/
- instances_train2017.json
- instances_val2017.json
- train2017/
- <image files for training>
- val2017/
- <image files for validation>
Note that the names of your train and val annotations must be instances_train2017.json and instances_val2017.json
Use in XGen
Then, when using XGen, the user inputs the paths info when XGen asks.
# Dataset absolute path should be absolute object_detection_datasets
The root directory of the dataset: /data/prepare_data/object_detection_datasets/YourDataSetName/
# Number of classes is total classes in your object detection dataset, the number of classes must be equal to the number of names in .yaml files
Number of classes: 80
Object Detection (YOLOV6)
Dataset Preparation
For object detection, we use YOLOv5 format.
Organize your train and val images and labels according to the example below.
object_detection_datasets
└── YourDataSetName
├── images
│ ├── train
│ │ ├── train_01.png
│ │ ├── train_02.png
│ │ ├── train_03.png
│ │ └── ...
│ ├── val
│ │ ├── val_01.png
│ │ ├── val_02.png
│ │ └── ...
├── labels
│ ├── train
│ │ ├── train_01.txt
│ │ ├── train_02.txt
│ │ ├── train_03.txt
│ │ └── ...
│ ├── val
│ │ ├── val_01.txt
│ │ ├── val_02.txt
│ │ └── ...
The labels
folder contains one text label file for each of the images unless that image contains no object. The label file specifications are as follows:
- One row per object
- Each row shows the following of an object in the image:
class x_center y_center width height
,
where, class
is the class label of that object (zero-indexed), x_center y_center width height
are the coordinates, width and height of the bounding box of that object. The coordinates, width
and height
must all be normalized to 0-1 by the width and height of the image.
The label file corresponding to the above image contains 2 persons (class 0) and a tie (class 27):
Label Images
user can use Roboflow to help prepare datasets. Create a free Roboflow account, upload your dataset, and then label the images as needed. After that, user can export the labels to the needed format (YOLO v5 PyTorch format). That website also allows a user to convert a dataset with labels of other formats (e.g., COCO) to convert the labels to the required format.
Note, if you have a labeled dataset already in the COCO format, you may use coco_json2yolo.py
to convert the label files to the format expected by YOLO.
Example
python coco_json2yolo.py ./object_detection_datasets/coco/annotations/instances_val2017.json
Use in XGen
To use XGen on the prepared dataset, the user need to create a yaml file to indicate the number and names of the classes, as illustrated as follows:
# Classes
train: /data/object_detection_datasets/images/train
val: /data/object_detection_datasets/images/val
test: /data/object_detection_datasets/images/test
# If you use coco dataset, set is_coco: True, and original json path must set there
anno_path: /data/object-detection-yolov6/coco/annotations/instances_val2017.json
# number of classes
nc: 80
# whether it is coco dataset, only coco dataset should be set to True.
is_coco: True
nc: 80 # number of classes
names: ['person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light',
'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',
'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',
'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear',
'hair drier', 'toothbrush'] # class names
Then, when using XGen, the user inputs the paths info when XGen asks.
Path to the dataset description yaml file (coco.yaml in yolov6):/data/prepare_data/object_detection_datasets/coco/coco.yaml
# Number of classes is total classes in your object detection dataset, the number of classes must be equal to the number of names in .yaml files
Number of classes: 80
Object Detection (YOLOV8)
Dataset Preparation
The data format for YOLOv8 is consistent with YOLOv6, and you can refer to the previous data description section for YOLOv6.
Use in XGen
To use XGen on the prepared dataset, the user need to create a yaml file to indicate the number and names of the classes, as illustrated as follows:
path: /data/object-detection/coco # dataset root dir
train: train2017.txt # train images (relative to 'path') 118287 images
val: val2017.txt # val images (relative to 'path') 5000 images
test: test-dev2017.txt # 20288 of 40670 images, submit to https://competitions.codalab.org/competitions/20794
# Classes
names:
0: person
1: bicycle
2: car
3: motorcycle
4: airplane
5: bus
6: train
7: truck
8: boat
9: traffic light
10: fire hydrant
11: stop sign
12: parking meter
13: bench
14: bird
15: cat
16: dog
17: horse
18: sheep
19: cow
20: elephant
21: bear
22: zebra
23: giraffe
24: backpack
25: umbrella
26: handbag
27: tie
28: suitcase
29: frisbee
30: skis
31: snowboard
32: sports ball
33: kite
34: baseball bat
35: baseball glove
36: skateboard
37: surfboard
38: tennis racket
39: bottle
40: wine glass
41: cup
42: fork
43: knife
44: spoon
45: bowl
46: banana
47: apple
48: sandwich
49: orange
50: broccoli
51: carrot
52: hot dog
53: pizza
54: donut
55: cake
56: chair
57: couch
58: potted plant
59: bed
60: dining table
61: toilet
62: tv
63: laptop
64: mouse
65: remote
66: keyboard
67: cell phone
68: microwave
69: oven
70: toaster
71: sink
72: refrigerator
73: book
74: clock
75: vase
76: scissors
77: teddy bear
78: hair drier
79: toothbrush
Then, when using XGen, the user inputs the paths info when XGen asks.
Path to the dataset description yaml file (coco.yaml in yolov8):/data/prepare_data/object_detection_datasets/coco/coco.yaml
# Number of classes is total classes in your object detection dataset, the number of classes must be equal to the number of names in .yaml files
Number of classes: 80
Object Segmentation
Dataset Preparation
For object segmentation, we use ISBI Challenge Dataset format as it is the format expected by UNet, the included segmentation model in XGen. The original dataset struct of ISBI dataset is as follows.
object_segmentation_datasets
└── ISBI
├── em
│ ├── train-volume.tif
│ ├── train-labels.tif
│ └── test-volume.tif
If the users' data are in that format, they can use ~/Projects/segmentation-unet/run_make_database.py
script to convert the dataset to the format required by UNet. Please customize the data path in the script before running.
python run_make_database.py
The users should make sure that the train and val images and labels are in a structure as shown as follows.
object_segmentation_datasets
└── ISBI
├── em
│ ├── train
│ │ ├── input_001.npy
│ │ ├── input_002.npy
│ │ ├── input_003.npy
│ │ ├── ...
│ │ ├── label_001.npy
│ │ ├── label_002.npy
│ │ ├── label_002.npy
│ │ └── ...
│ ├── val
│ │ ├── input_001.npy
│ │ ├── input_002.npy
│ │ ├── ...
│ │ ├── label_001.npy
│ │ ├── label_002.npy
│ │ └── ...
│ ├── test
│ │ ├── input_001.npy
│ │ ├── input_002.npy
│ │ ├── ...
│ │ ├── label_001.npy
│ │ ├── label_002.npy
│ │ └── ...
Use in XGen
# Training data absolute path should be absolute train images path
Training data absolute path: /data/prepare_data/object_segmentation_datasets/ISBI/
# Training labels absolute path should be absolute train label path
Training labels absolute path: /data/prepare_data/object_segmentation_datasets/ISBI/
# Eval data absolute path should be absolute eval images path
Eval data absolute path: /data/prepare_data/object_segmentation_datasets/ISBI/
# Eval labels absolute path should be absolute eval images path
Eval labels absolute path: /data/prepare_data/object_segmentation_datasets/ISBI/
Video Classification
Dataset Preparation
For video classification, the format is one derived from UCF-101 dataset format.
The original structure of the UCF-101 dataset is as follows:
video_classification_datasets
└── ucf101
├── ApplyEyeMakeup
│ ├── v_ApplyEyeMakeup_g08_c01.avi
│ ├── v_ApplyEyeMakeup_g08_c02.avi
│ └── ...
├── ApplyLipstick
│ ├── v_ApplyLipstick_g08_c01.avi
│ ├── v_ApplyLipstick_g08_c02.avi
│ └── ...
└── Archery
│ ├── v_Archery_g08_c01.avi
│ ├── v_Archery_g08_c02.avi
│ └── ...
└── ...
If the users' dataset is in such a format and structure, before applying XGen on it, the user needs to convert it with several scripts included in ~/Projects/video-classification-s2-1d/
as follows:
- Extract frames from videos by running
python extract_videos.py -d ucf101
. - Convert frames into LMDB files with
create_lmdb.py
.python create_lmdb.py -d ucf101_frame -s train -vr 0 10000 python create_lmdb.py -d ucf101_frame -s val -vr 0 4000
The users shall make sure that after the preparation, the datasets are in the following structure:
video_classification_datasets
├── ucf101_frame
│ ├── v_ApplyEyeMakeup_g08_c01
│ │ ├── 00001.jpg
│ │ ├── 00002.jpg
│ │ ├── 00003.jpg
│ │ └── ...
│ ├── v_ApplyEyeMakeup_g08_c02
│ │ ├── 00001.jpg
│ │ ├── 00002.jpg
│ │ ├── 00003.jpg
│ │ └── ...
│ ├── ...
│ ├── v_ApplyLipstick_g08_c01
│ │ ├── 00001.jpg
│ │ ├── 00002.jpg
│ │ ├── 00003.jpg
│ │ └── ...
│ ├── v_ApplyLipstick_g08_c02
│ │ ├── 00001.jpg
│ │ ├── 00002.jpg
│ │ ├── 00003.jpg
│ │ └── ...
│ ├── ...
│ ├── v_Archery_g08_c01
│ │ ├── 00001.jpg
│ │ ├── 00002.jpg
│ │ ├── 00003.jpg
│ │ └── ...
│ ├── ...
└── ucfTrainTestlist
├── classInd.txt
├── testlist01.txt
└── trainlist01.txt
classInd.txt
records the indices of the classes, as illustrated as follows:
1 ApplyEyeMakeup
2 ApplyLipstick
3 Archery
4 BabyCrawling
5 BalanceBeam
...
Use in XGen
# Training data absolute path should be absolute train images path
Training data absolute path: /data/prepare_data/video_classification_datasets/ucf101_frame
# Training labels absolute path should be null for video classification
Training labels absolute path:
# Eval data absolute path should be absolute eval path in the container
Eval data absolute path:
# Eval labels absolute path should be null for video classification
Eval labels absolute path:
# Number of classes is total classes in your video classification dataset
Number of classes: 5