Appendix IV Customized Model Sample
Prepare YOLOX for XGen Optimizations
This document explains how to revise the original YOLOX training script so that it can be optimized by XGen. It in addition gives an example configuration file for the execution of XGen, and hints on the use of multiple GPUs and data preparation.
1.Prepare the training script
YOLOX has a default training script, which needs to be revised so that XGen can control the training process in its optimizations of the model.
The modifications are done basically by following the XGen Document. Because the training code of YOLOX is implemented in several Python files, the involved changes are in multiple files. There is a special limitation in the default YOLOX training script, which prevents the compression from flexibly scaling the Neural Network layers. So an extra minor modification is applied to remove that limitation. We detail the changes below. In the example code, we use "#XGEN: " comments to indicate the applied changes.
-
Step-1. Rename the default main training script
YOLOX/tools/train.py
totrain_script_main.py
. (In the YOLOX include XGen, we puttrain_script_main.py
intoYOLOX/
.) -
Steps-2 and 3. Import the XGen modules and connect the training control variables with XGen. The modified
train_script_main.py
is as follows. See#XGEN:
for the changed places.... # XGEN Introduce the xgen_tools utility. from xgen_tools import * def make_parser(): ... @logger.catch # XGEN: add an extra argument so that the XGen control variables args_ai can be passed to the rest of the training script def main(exp: Exp, args, args_ai): if exp.seed is not None: random.seed(exp.seed) torch.manual_seed(exp.seed) cudnn.deterministic = True warnings.warn( "You have chosen to seed training. This will turn on the CUDNN deterministic setting, " "which can slow down your training considerably! You may see unexpected behavior " "when restarting from checkpoints." ) # set environment variables for distributed training configure_nccl() configure_omp() cudnn.benchmark = True # XGEn For the "get_trainer" section, add an args_ai parameter. trainer = exp.get_trainer(args, args_ai) trainer.train() # XGEN: Since XGen needs to find the training_main function, create the function with the following content: def training_main(args_ai=None): configure_module() args = make_parser().parse_args() # XGEN: call xgen_init args, args_ai = xgen_init(args, args_ai) # XGEN: The 'origin' field contains the XGen control variables xgen_control_args = args_ai['training_script_args'] # XGEN: For multi-GPU training, determine the number of GPUs. args.devices = xgen_control_args['device_num'] # XGEN: You can use the exp_file parameter in xgen config to select the exp_file of yolox exp = get_exp(XXXX) exp.merge(args.opts) # XGEN: Connect the control variables exp.data_dir = xgen_control_args['train_data_path'] # XGEN: Connect the control variables exp.max_epoch = xgen_control_args['common_train_epochs'] # XGEN: Connect the control variables exp.num_classes = xgen_control_args['num_classes'] # XGEN: This part is for scaling (Step-11 below). # A simple change would be simply scaling the width of a network # exp.width = exp.width * xgen_control_args['scaling_factor'] # We suggest the following change instead as such scaling empirically gives better results if args.scaling_factor != 1: exp.width = exp.width * xgen_control_args['scaling_factor'] + 0.3375 else: exp.width = exp.width * xgen_control_args['scaling_factor'] exp.depth = exp.depth * xgen_control_args['scaling_factor'] # XGEN: Disable EMA as it can affect the accuracy of pruning sparsity statistics exp.ema = False check_exp_value(exp) if not args.experiment_name: args.experiment_name = exp.exp_name num_gpu = get_num_devices() if args.devices is None else args.devices assert num_gpu <= get_num_devices() if args.cache is not None: exp.dataset = exp.get_dataset(cache=True, cache_type=args.cache) dist_url = "auto" if args.dist_url is None else args.dist_url launch( main, num_gpu, args.num_machines, args.machine_rank, backend=args.dist_backend, dist_url=dist_url, args=(exp, args, args_ai), ) # XGEN: Create an entry point function that calls the training_main function: if __name__ == "__main__": args_ai = None training_main(args_ai)
Make corresponding modifications to
YOLOX/yolox/core/trainer.py
as it defines the trainer used in the main training script. Add the following to the beginning of the file.# XGEN: Import XGen's built-in optimization methods at the beginning of the file from co_lib import Co_Lib as CL from xgen_tools import xgen_record, xgen_init, xgen_load, XgenArgs, helper helper.ENABLE_HELPER = False
And modify the
__init__
to connect the control variables with the trainer:class Trainer: # XGEN: add argument args_ai to get access to XGEN control variables def __init__(self, exp: Exp, args, args_ai): # init function only defines some basic attr, other attrs like model, optimizer are built in # before_train methods. self.exp = exp self.args = args #XGEN: connect the XGEN control with the training process self.args_ai = args_ai self.epoch = 0 ...
Due to the addition of the
args_ai
parameter in thetrainer = exp.get_trainer(args, args_ai)
line in thetrain_script_main.py
file, the corresponding modifications need to be made in theYOLOX/yolox/exp/yolox_base.py
file.# XGEN: add argument args_ai def get_trainer(self, args, args_ai): from yolox.core import Trainer # XGEN: add args_ai in the call to Trainer trainer = Trainer(self, args, args_ai) ...
-
Steps-4 and 5. Insert the
xgen_load
for DNN model loading, and initialize XGen's optimization. The changes should be put where the model is loaded and before the training starts. For YOLOX, the location is inside fileYOLOX/yolox/core/trainer.py
. So the changes are to that file, specifically in functionbefore_train(self)
; after the modification, the following is inserted into that function:#XGEN: after the model, train_loader, and optimizer are defined,add the following code. #XGEN: use xgen_load function to load pretrained model weights xgen_load(model, self.args_ai) #XGEN: optimization method will initialize these parameters cl_model = CL.init(args=self.args_ai, model=model, optimizer=self.optimizer, data_loader=self.train_loader) if cl_model: model = cl_model
-
Steps-6,7,8,9,10. Add XGEN API calls inside the training loop to keep track the training status and apply controls, and finally output the final optimized model. As the training loop is inside file
YOLOX/yolox/core/trainer.py
, the changes are in that file. The changes are as follows:def train_in_epoch(self): #XGEN: XGen will update the parameters once before and after each epoch of training, so xgen-related update methods need to be added to "train_in_epoch". #XGEN: At the same time, in the Compatibility test function of XGEN, we will set max_epoch to 0, and evaluate the accuracy of the model at the same time. #XGEN:In order to support this function, we need to add some additional code, as follows: #XGEN: To support XGen Compatibility test function if self.max_epoch == 0: self.evaluate_and_save_model() #XGEN: Save the model weights and best ap, convert model to onnx in XGen. xgen_record(self.args_ai, self.model, self.best_ap, epoch=-1) for self.epoch in range(self.start_epoch, self.max_epoch): #XGEN: XGen-related optimization functions, before_epoch CL.before_each_train_epoch(epoch=self.epoch) self.before_epoch() self.train_in_iter() self.after_epoch() #XGEN: XGen-related optimization functions, after_epoch CL.after_scheduler_step(epoch=self.epoch)
def train_one_iter(self): #XGEN: During the overall model optimization process, XGen's method includes generating a loss function that needs to be updated alongside the original training loss in the train_one_iter function. This ensures that both the XGen loss and the training loss are updated together to optimize the model effectively. iter_start_time = time.time() inps, targets = self.prefetcher.next() inps = inps.to(self.data_type) targets = targets.to(self.data_type) targets.requires_grad = False inps, targets = self.exp.preprocess(inps, targets, self.input_size) data_end_time = time.time() with torch.cuda.amp.autocast(enabled=self.amp_training): outputs = self.model(inps, targets) loss = outputs["total_loss"] #XGEN: Update the loss function with XGen. loss = CL.update_loss(loss)
#XGEN: Add code in the after_epoch function to save the optimized models: def after_epoch(self): self.save_ckpt(ckpt_name="latest") #XGEN: saves the optimized models xgen_record(self.args_ai, self.model, self.best_ap, epoch=self.epoch) if (self.epoch + 1) % self.exp.eval_interval == 0: all_reduce_norm(self.model) self.evaluate_and_save_model()
-
Step-11. To enable network scaling, modify
train_script_main.py
based on Configure Scaling for XGen to allow XGen to control the width and depth of the network.# XGEN: This part is for scaling (Step-11 below). # A simple change would be simply scaling the width of a network # exp.width = exp.width * xgen_control_args['scaling_factor'] # We suggest the following change instead as such scaling empirically gives better results if args.scaling_factor != 1: exp.width = exp.width * xgen_control_args['scaling_factor'] + 0.3375 else: exp.width = exp.width * xgen_control_args['scaling_factor'] exp.depth = exp.depth * xgen_control_args['scaling_factor']
-
Extra step. The default YOLOX has a special limitation. When
64 * scaling_factor
is not an integer, the shapes of the output tensors and input tensors expected by the next layer may mismatch. It is due to some rounding bugs in YOLOX code. Removing the bug will unlock the flexibility for the model compression.
The corresponding modified files are YOLOX/yolox/models/yolo_pafpn.py
and YOLOX/yolox/models/yolo_head.py
. The specific changes are as follows:
In the _init_
function in yolo_pafpan.py
:
#!/usr/bin/env python
# -*- encoding: utf-8 -*-
# Copyright (c) Megvii Inc. All rights reserved.
import torch
import torch.nn as nn
from .darknet import CSPDarknet
from .network_blocks import BaseConv, CSPLayer, DWConv
import math
class YOLOPAFPN(nn.Module):
"""
YOLOv3 model. Darknet 53 is the default backbone of this model.
"""
def __init__(
self,
depth=1.0,
width=1.0,
in_features=("dark3", "dark4", "dark5"),
in_channels=[256, 512, 1024],
depthwise=False,
act="silu",
):
super().__init__()
self.backbone = CSPDarknet(depth, width, depthwise=depthwise, act=act)
self.in_features = in_features
self.in_channels = in_channels
Conv = DWConv if depthwise else BaseConv
self.upsample = nn.Upsample(scale_factor=2, mode="nearest")
# XGEN: Modify the calculation method of inchanenl and out_channel to support any width of d size
self.lateral_conv0 = BaseConv(
int(math.ceil(64 * width) * 16), int(math.ceil(64 * width)* 8), 1, 1, act=act
)
# XGEN: Modify the calculation method of inchanenl and out_channel to support any width of d size
self.C3_p4 = CSPLayer(
int(2 * math.ceil(64 * width) * 8),
int(math.ceil(64 * width) * 8),
round(3 * depth),
False,
depthwise=depthwise,
act=act,
) # cat
# XGEN: Modify the calculation method of inchanenl and out_channel to support any width of d size
self.reduce_conv1 = BaseConv(
int(math.ceil(64 * width) * 8), int(math.ceil(64 * width) * 4), 1, 1, act=act
)
# XGEN: Modify the calculation method of inchanenl and out_channel to support any width of d size
self.C3_p3 = CSPLayer(
int(2 * math.ceil(64 * width) * 4),
int(math.ceil(64 * width) * 4),
round(3 * depth),
False,
depthwise=depthwise,
act=act,
)
# bottom-up conv
# XGEN: Modify the calculation method of inchanenl and out_channel to support any width of d size
self.bu_conv2 = Conv(
int(math.ceil(64 * width) * 4), int(math.ceil(64 * width) * 4), 3, 2, act=act
)
# XGEN: Modify the calculation method of inchanenl and out_channel to support any width of d size
self.C3_n3 = CSPLayer(
int(2 * math.ceil(64 * width) * 4),
int(math.ceil(64 * width) * 8),
round(3 * depth),
False,
depthwise=depthwise,
act=act,
)
# bottom-up conv
# XGEN: Modify the calculation method of inchanenl and out_channel to support any width of d size
self.bu_conv1 = Conv(
int(math.ceil(64 * width) * 8), int(math.ceil(64 * width) * 8), 3, 2, act=act
)
# XGEN: Modify the calculation method of inchanenl and out_channel to support any width of d size
self.C3_n4 = CSPLayer(
int(2 * math.ceil(64 * width) * 8),
int(math.ceil(64 * width) * 16),
round(3 * depth),
False,
depthwise=depthwise,
act=act,
)
_init_
function in yolo_head.py
:
#!/usr/bin/env python3
# -*- coding:utf-8 -*-
# Copyright (c) Megvii Inc. All rights reserved.
import math
from loguru import logger
import torch
import torch.nn as nn
import torch.nn.functional as F
from yolox.utils import bboxes_iou, cxcywh2xyxy, meshgrid, visualize_assign
from .losses import IOUloss
from .network_blocks import BaseConv, DWConv
class YOLOXHead(nn.Module):
def __init__(
self,
num_classes,
width=1.0,
strides=[8, 16, 32],
in_channels=[256, 512, 1024],
act="silu",
depthwise=False,
):
"""
Args:
act (str): activation type of conv. Defalut value: "silu".
depthwise (bool): whether apply depthwise conv in conv branch. Defalut value: False.
"""
super().__init__()
self.num_classes = num_classes
self.decode_in_inference = True # for deploy, set to False
self.cls_convs = nn.ModuleList()
self.reg_convs = nn.ModuleList()
self.cls_preds = nn.ModuleList()
self.reg_preds = nn.ModuleList()
self.obj_preds = nn.ModuleList()
self.stems = nn.ModuleList()
Conv = DWConv if depthwise else BaseConv
# XGEN: Modify the calculation method of inchanenl and out_channel to support any width of d size
for i in range(len(in_channels)):
self.stems.append(
BaseConv(
in_channels=int(math.ceil(64 * width) * (2 ** (i + 1)) * 2),
out_channels=int(math.ceil(64 * width) * 4),
ksize=1,
stride=1,
act=act,
)
)
# XGEN: Modify the calculation method of inchanenl and out_channel to support any width of d size
self.cls_convs.append(
nn.Sequential(
*[
Conv(
in_channels=int(math.ceil(64 * width) * 4),
out_channels=int(math.ceil(64 * width) * 4),
ksize=3,
stride=1,
act=act,
),
Conv(
in_channels=int(math.ceil(64 * width) * 4),
out_channels=int(math.ceil(64 * width) * 4),
ksize=3,
stride=1,
act=act,
),
]
)
)
# XGEN: Modify the calculation method of inchanenl and out_channel to support any width of d size
self.reg_convs.append(
nn.Sequential(
*[
Conv(
in_channels=int(math.ceil(64 * width) * 4),
out_channels=int(math.ceil(64 * width) * 4),
ksize=3,
stride=1,
act=act,
),
Conv(
in_channels=int(math.ceil(64 * width) * 4),
out_channels=int(math.ceil(64 * width) * 4),
ksize=3,
stride=1,
act=act,
),
]
)
)
# XGEN: Modify the calculation method of inchanenl and out_channel to support any width of d size
self.cls_preds.append(
nn.Conv2d(
in_channels=int(math.ceil(64 * width) * 4),
out_channels=self.num_classes,
kernel_size=1,
stride=1,
padding=0,
)
)
# XGEN: Modify the calculation method of inchanenl and out_channel to support any width of d size
self.reg_preds.append(
nn.Conv2d(
in_channels=int(math.ceil(64 * width) * 4),
out_channels=4,
kernel_size=1,
stride=1,
padding=0,
)
)
# XGEN: Modify the calculation method of inchanenl and out_channel to support any width of d size
self.obj_preds.append(
nn.Conv2d(
in_channels=int(math.ceil(64 * width) * 4),
out_channels=1,
kernel_size=1,
stride=1,
padding=0,
)
)
self.use_l1 = False
self.l1_loss = nn.L1Loss(reduction="none")
self.bcewithlog_loss = nn.BCEWithLogitsLoss(reduction="none")
self.iou_loss = IOUloss(reduction="none")
self.strides = strides
self.grids = [torch.zeros(1)] * len(in_channels)
All the above file modifications can be found in the YOLOX code in XGen, /root/Projects/object-detection-yolox/yolox_xgen
.
2.Prepare Config file
As XGen's optimization methods are launched through configuration files, when integrating your own model into XGen optimization, you will need to prepare the corresponding configuration file. An example is as follows:
{
"general": {
"user_id": "test",
"work_place": null,
"tran_scripts_path": null,
"pretrained_model_path": "/root/Projects/object-detection-yolovx/yolovx_xgen/xgen.pt",
"enable_ddp": false,
"CUDA_VISIBLE_DEVICES": "0",
"random_seed": 3407
},
"user_requirements": {
"power": null,
"accuracy": null,
"accuracy_reverse_yn": 0,
"model_size": null,
"memory_size": null,
"latency": 90,
"margin": 0.1,
"target_type": "latency",
"max_sampling": 10,
"searching_variable": "scaling_factor",
"searching_range": [
0.2,
1
],
"searching_granularity": null
},
"compiler": {
"input_shape": "(1,3,640,640)",
"opset_version": 11
},
"training_script_args": {
"train_data_path": "/data/object-detection-yolox/coco",
"train_label_path": null,
"eval_data_path": "/data/object-detection-yolox/coco",
"eval_label_path": null,
"dataset_path": "/data/object-detection-yolox/coco",
"learning_rate": 0.01,
"common_train_epochs": 300,
"device_num": 1,
"batch_size": 16,
"root_path": "./Xgen/",
"name": "yolox-m",
"exp_file": null,
"weights": null,
"device": null,
"imgsz": 640,
"width": 0.75,
"depth": 0.67,
"num_classes": 80,
"scaling_factor": 1,
"workers": 16,
"noplots": true
},
"pruning": {
"sp_retrain": false,
"sp_admm": false,
"sp_config_file": null,
"sp_admm_update_epoch": 5,
"sp_admm_rho": 0.001,
"sparsity_type": "block_punched",
"sp_admm_lr": 0.01,
"sp_global_weight_sparsity": false,
"sp_admm_block": "(8,4)",
"sp_prune_ratios": 0,
"sp_prune_threshold": -1.0
},
"pas": {
"pas_ratio": 0,
"pas": false,
"limit_loss_weights": 5.0,
"use_limit_loss":false,
"pas_debug": false,
"pas_rebuild": false,
"pas_finetune_epoch": 200,
"pas_pretrained_weight_path": null,
"pas_ignore": null,
"pas_searching_ratio": [0.1],
"part_name": "backbone"
},
"train": {
"log_save_folder": "logs",
"eval_dataset_id": null,
"training_dataset_id": null,
"uuid": "yolox",
"accuracy_reverse_yn": 0,
"common_save_best_yn": 1
},
"scaling": {
"searching_variable": "scaling_factor",
"searching_range": [
0.2,
1
],
"searching_granularity": 0.05
}
}
Please see the configuration included in XGEN for YOLOX here: /root/Projects/object-detection-yolox/yolox_xgen/yolox_config/xgen.json
.
3. Other considerations
In multi-GPU training, using COCO evaluation with multi-process C++ calls can lead to process conflicts.
It is recommended to remove the following code at line 293 in YOLOX/yolox/evaluators/coco_evaluator.py
:
try:
from yolox.layers import COCOeval_opt as COCOeval
except ImportError:
from pycocotools.cocoeval import COCOeval
logger.warning("Use standard COCOeval.")
from pycocotools.cocoeval import COCOeval
logger.warning("Use standard COCOeval.")
4.Dataset Preparation
Currently, YOLOX only supports the original COCO dataset format. To use your own dataset, you will need to convert your dataset to COCO dataset format. Please see Dataset Preparation for Object Detection (YOLOX) for details. The directory structure of the dataset is as follows: datasets/ - coco/ - annotations/ - instances_train2017.json - instances_val2017.json - train2017/ - (image files for training) - val2017/ - (image files for validation)