Appendix IV Customized Model Sample

Prepare YOLOX for XGen Optimizations

This document explains how to revise the original YOLOX training script so that it can be optimized by XGen. It in addition gives an example configuration file for the execution of XGen, and hints on the use of multiple GPUs and data preparation.

1.Prepare the training script

YOLOX has a default training script, which needs to be revised so that XGen can control the training process in its optimizations of the model.

The modifications are done basically by following the XGen Document. Because the training code of YOLOX is implemented in several Python files, the involved changes are in multiple files. There is a special limitation in the default YOLOX training script, which prevents the compression from flexibly scaling the Neural Network layers. So an extra minor modification is applied to remove that limitation. We detail the changes below. In the example code, we use "#XGEN: " comments to indicate the applied changes.

Step-1. Rename the default main training script YOLOX/tools/train.py to train_script_main.py. (In the YOLOX include XGen, we put train_script_main.py into YOLOX/.)

Steps-2 and 3. Import the XGen modules and connect the training control variables with XGen. The modified train_script_main.py is as follows. See #XGEN: for the changed places.

... 
# XGEN Introduce the xgen_tools utility.
from xgen_tools import *  

def make_parser():
  ...

@logger.catch
# XGEN: add an extra argument so that the XGen control variables args_ai can be passed to the rest of the training script
def main(exp: Exp, args, args_ai):
    if exp.seed is not None:
        random.seed(exp.seed)
        torch.manual_seed(exp.seed)
        cudnn.deterministic = True
        warnings.warn(
            "You have chosen to seed training. This will turn on the CUDNN deterministic setting, "
            "which can slow down your training considerably! You may see unexpected behavior "
            "when restarting from checkpoints."
        )

    # set environment variables for distributed training
    configure_nccl()
    configure_omp()
    cudnn.benchmark = True

    # XGEn For the "get_trainer" section, add an args_ai parameter.
    trainer = exp.get_trainer(args, args_ai) 
    trainer.train()

# XGEN: Since XGen needs to find the training_main function, create the function with the following content:
def training_main(args_ai=None):
    configure_module()
    args = make_parser().parse_args()

    # XGEN: call xgen_init
    args, args_ai = xgen_init(args, args_ai)
    # XGEN: The 'origin' field contains the XGen control variables
    xgen_control_args = args_ai['training_script_args']
    # XGEN: For multi-GPU training, determine the number of GPUs.
    args.devices = xgen_control_args['device_num']
    # XGEN: You can use the exp_file parameter in xgen config to select the exp_file of yolox
    exp = get_exp(XXXX)

    exp.merge(args.opts)

    # XGEN: Connect the control variables
    exp.data_dir = xgen_control_args['train_data_path']
    # XGEN: Connect the control variables
    exp.max_epoch = xgen_control_args['common_train_epochs']
    # XGEN: Connect the control variables
    exp.num_classes = xgen_control_args['num_classes']


    # XGEN: This part is for scaling (Step-11 below). 
    # A simple change would be simply scaling the width of a network
    # exp.width = exp.width * xgen_control_args['scaling_factor']
    # We suggest the following change instead as such scaling empirically gives better results
    if args.scaling_factor != 1:     
        exp.width = exp.width * xgen_control_args['scaling_factor'] + 0.3375
    else:
        exp.width = exp.width * xgen_control_args['scaling_factor']
    exp.depth = exp.depth * xgen_control_args['scaling_factor']

    # XGEN: Disable EMA as it can affect the accuracy of pruning sparsity statistics
    exp.ema = False
    check_exp_value(exp)

    if not args.experiment_name:
        args.experiment_name = exp.exp_name

    num_gpu = get_num_devices() if args.devices is None else args.devices
    assert num_gpu <= get_num_devices()

    if args.cache is not None:
        exp.dataset = exp.get_dataset(cache=True, cache_type=args.cache)

    dist_url = "auto" if args.dist_url is None else args.dist_url
    launch(
        main,
        num_gpu,
        args.num_machines,
        args.machine_rank,
        backend=args.dist_backend,
        dist_url=dist_url,
        args=(exp, args, args_ai),
    )

# XGEN: Create an entry point function that calls the training_main function:
if __name__ == "__main__":
    args_ai = None
    training_main(args_ai)

Make corresponding modifications to YOLOX/yolox/core/trainer.py as it defines the trainer used in the main training script. Add the following to the beginning of the file.

# XGEN: Import XGen's built-in optimization methods at the beginning of the file
from co_lib import Co_Lib as CL
from xgen_tools import xgen_record, xgen_init, xgen_load, XgenArgs, helper
helper.ENABLE_HELPER = False

And modify the __init__ to connect the control variables with the trainer：

class Trainer:
    # XGEN: add argument args_ai to get access to XGEN control variables 
    def __init__(self, exp: Exp, args, args_ai):
        # init function only defines some basic attr, other attrs like model, optimizer are built in
        # before_train methods.
        self.exp = exp
        self.args = args

        #XGEN: connect the XGEN control with the training process
        self.args_ai = args_ai

        self.epoch = 0
        ...

Due to the addition of the args_ai parameter in the trainer = exp.get_trainer(args, args_ai) line in the train_script_main.py file, the corresponding modifications need to be made in the YOLOX/yolox/exp/yolox_base.py file.

# XGEN: add argument args_ai
def get_trainer(self, args, args_ai):
    from yolox.core import Trainer
    # XGEN: add args_ai in the call to Trainer
    trainer = Trainer(self, args, args_ai)
    ...

Steps-4 and 5. Insert the xgen_load for DNN model loading, and initialize XGen's optimization. The changes should be put where the model is loaded and before the training starts. For YOLOX, the location is inside file YOLOX/yolox/core/trainer.py. So the changes are to that file, specifically in function before_train(self); after the modification, the following is inserted into that function:

#XGEN: after the model, train_loader, and optimizer are defined，add the following code.
#XGEN: use xgen_load function to load pretrained model weights
xgen_load(model, self.args_ai)
#XGEN: optimization method will initialize these parameters
cl_model = CL.init(args=self.args_ai, model=model, optimizer=self.optimizer, data_loader=self.train_loader)
if cl_model:
    model = cl_model

Steps-6,7,8,9,10. Add XGEN API calls inside the training loop to keep track the training status and apply controls, and finally output the final optimized model. As the training loop is inside file YOLOX/yolox/core/trainer.py, the changes are in that file. The changes are as follows:

def train_in_epoch(self):

    #XGEN: XGen will update the parameters once before and after each epoch of training, so xgen-related update methods need to be added to "train_in_epoch".
    #XGEN: At the same time, in the Compatibility test function of XGEN, we will set max_epoch to 0, and evaluate the accuracy of the model at the same time. 
    #XGEN：In order to support this function, we need to add some additional code, as follows:

    #XGEN: To support XGen Compatibility test function
    if self.max_epoch == 0:
        self.evaluate_and_save_model()
        #XGEN: Save the model weights and best ap, convert model to onnx in XGen.
        xgen_record(self.args_ai, self.model, self.best_ap, epoch=-1)
    for self.epoch in range(self.start_epoch, self.max_epoch):
        #XGEN: XGen-related optimization functions, before_epoch
        CL.before_each_train_epoch(epoch=self.epoch)
        self.before_epoch()
        self.train_in_iter()
        self.after_epoch()
        #XGEN: XGen-related optimization functions, after_epoch
        CL.after_scheduler_step(epoch=self.epoch)

def train_one_iter(self):
    #XGEN： During the overall model optimization process, XGen's method includes generating a loss function that needs to be updated alongside the original training loss in the train_one_iter function. This ensures that both the XGen loss and the training loss are updated together to optimize the model effectively.

    iter_start_time = time.time()
    inps, targets = self.prefetcher.next()
    inps = inps.to(self.data_type)
    targets = targets.to(self.data_type)
    targets.requires_grad = False
    inps, targets = self.exp.preprocess(inps, targets, self.input_size)
    data_end_time = time.time()
    with torch.cuda.amp.autocast(enabled=self.amp_training):
        outputs = self.model(inps, targets)

    loss = outputs["total_loss"]
    #XGEN: Update the loss function with XGen.
    loss = CL.update_loss(loss)

#XGEN: Add code in the after_epoch function to save the optimized models: 
def after_epoch(self):
    self.save_ckpt(ckpt_name="latest")
    #XGEN: saves the optimized models
    xgen_record(self.args_ai, self.model, self.best_ap, epoch=self.epoch)
    if (self.epoch + 1) % self.exp.eval_interval == 0:
        all_reduce_norm(self.model)
        self.evaluate_and_save_model()

Step-11. To enable network scaling, modify train_script_main.py based on Configure Scaling for XGen to allow XGen to control the width and depth of the network.

# XGEN: This part is for scaling (Step-11 below). 
# A simple change would be simply scaling the width of a network
# exp.width = exp.width * xgen_control_args['scaling_factor']
# We suggest the following change instead as such scaling empirically gives better results
if args.scaling_factor != 1:     
    exp.width = exp.width * xgen_control_args['scaling_factor'] + 0.3375
else:
    exp.width = exp.width * xgen_control_args['scaling_factor']
exp.depth = exp.depth * xgen_control_args['scaling_factor']

Extra step. The default YOLOX has a special limitation. When 64 * scaling_factor is not an integer, the shapes of the output tensors and input tensors expected by the next layer may mismatch. It is due to some rounding bugs in YOLOX code. Removing the bug will unlock the flexibility for the model compression.

The corresponding modified files are YOLOX/yolox/models/yolo_pafpn.py and YOLOX/yolox/models/yolo_head.py. The specific changes are as follows:

In the _init_ function in yolo_pafpan.py:

  #!/usr/bin/env python
  # -*- encoding: utf-8 -*-
  # Copyright (c) Megvii Inc. All rights reserved.

  import torch
  import torch.nn as nn

  from .darknet import CSPDarknet
  from .network_blocks import BaseConv, CSPLayer, DWConv
  import math

  class YOLOPAFPN(nn.Module):
      """
      YOLOv3 model. Darknet 53 is the default backbone of this model.
      """

      def __init__(
          self,
          depth=1.0,
          width=1.0,
          in_features=("dark3", "dark4", "dark5"),
          in_channels=[256, 512, 1024],
          depthwise=False,
          act="silu",
      ):
          super().__init__()
          self.backbone = CSPDarknet(depth, width, depthwise=depthwise, act=act)
          self.in_features = in_features
          self.in_channels = in_channels
          Conv = DWConv if depthwise else BaseConv

          self.upsample = nn.Upsample(scale_factor=2, mode="nearest")
          # XGEN: Modify the calculation method of inchanenl and out_channel to support any width of d size
          self.lateral_conv0 = BaseConv(
              int(math.ceil(64 * width) * 16), int(math.ceil(64 * width)* 8), 1, 1, act=act
          )
          # XGEN: Modify the calculation method of inchanenl and out_channel to support any width of d size
          self.C3_p4 = CSPLayer(
              int(2 * math.ceil(64 * width) * 8),
              int(math.ceil(64 * width) * 8),
              round(3 * depth),
              False,
              depthwise=depthwise,
              act=act,
          )  # cat
          # XGEN: Modify the calculation method of inchanenl and out_channel to support any width of d size
          self.reduce_conv1 = BaseConv(
              int(math.ceil(64 * width) * 8), int(math.ceil(64 * width) * 4), 1, 1, act=act
          )
          # XGEN: Modify the calculation method of inchanenl and out_channel to support any width of d size
          self.C3_p3 = CSPLayer(
              int(2 * math.ceil(64 * width) * 4),
              int(math.ceil(64 * width) * 4),
              round(3 * depth),
              False,
              depthwise=depthwise,
              act=act,
          )

          # bottom-up conv
          # XGEN: Modify the calculation method of inchanenl and out_channel to support any width of d size
          self.bu_conv2 = Conv(
              int(math.ceil(64 * width) * 4), int(math.ceil(64 * width) * 4), 3, 2, act=act
          )
          # XGEN: Modify the calculation method of inchanenl and out_channel to support any width of d size
          self.C3_n3 = CSPLayer(
              int(2 * math.ceil(64 * width) * 4),
              int(math.ceil(64 * width) * 8),
              round(3 * depth),
              False,
              depthwise=depthwise,
              act=act,
          )

          # bottom-up conv
          # XGEN: Modify the calculation method of inchanenl and out_channel to support any width of d size
          self.bu_conv1 = Conv(
              int(math.ceil(64 * width) * 8), int(math.ceil(64 * width) * 8), 3, 2, act=act
          )
          # XGEN: Modify the calculation method of inchanenl and out_channel to support any width of d size
          self.C3_n4 = CSPLayer(
              int(2 * math.ceil(64 * width) * 8),
              int(math.ceil(64 * width) * 16),
              round(3 * depth),
              False,
              depthwise=depthwise,
              act=act,
          )

In the _init_ function in yolo_head.py:

  #!/usr/bin/env python3
  # -*- coding:utf-8 -*-
  # Copyright (c) Megvii Inc. All rights reserved.

  import math
  from loguru import logger

  import torch
  import torch.nn as nn
  import torch.nn.functional as F

  from yolox.utils import bboxes_iou, cxcywh2xyxy, meshgrid, visualize_assign

  from .losses import IOUloss
  from .network_blocks import BaseConv, DWConv


  class YOLOXHead(nn.Module):
      def __init__(
              self,
              num_classes,
              width=1.0,
              strides=[8, 16, 32],
              in_channels=[256, 512, 1024],
              act="silu",
              depthwise=False,
      ):
          """
          Args:
              act (str): activation type of conv. Defalut value: "silu".
              depthwise (bool): whether apply depthwise conv in conv branch. Defalut value: False.
          """
          super().__init__()

          self.num_classes = num_classes
          self.decode_in_inference = True  # for deploy, set to False

          self.cls_convs = nn.ModuleList()
          self.reg_convs = nn.ModuleList()
          self.cls_preds = nn.ModuleList()
          self.reg_preds = nn.ModuleList()
          self.obj_preds = nn.ModuleList()
          self.stems = nn.ModuleList()
          Conv = DWConv if depthwise else BaseConv
          # XGEN: Modify the calculation method of inchanenl and out_channel to support any width of d size
          for i in range(len(in_channels)):
              self.stems.append(
                  BaseConv(
                      in_channels=int(math.ceil(64 * width) * (2 ** (i + 1)) * 2),
                      out_channels=int(math.ceil(64 * width) * 4),
                      ksize=1,
                      stride=1,
                      act=act,
                  )
              )
              # XGEN: Modify the calculation method of inchanenl and out_channel to support any width of d size
              self.cls_convs.append(
                  nn.Sequential(
                      *[
                          Conv(
                              in_channels=int(math.ceil(64 * width) * 4),
                              out_channels=int(math.ceil(64 * width) * 4),
                              ksize=3,
                              stride=1,
                              act=act,
                          ),
                          Conv(
                              in_channels=int(math.ceil(64 * width) * 4),
                              out_channels=int(math.ceil(64 * width) * 4),
                              ksize=3,
                              stride=1,
                              act=act,
                          ),
                      ]
                  )
              )
              # XGEN: Modify the calculation method of inchanenl and out_channel to support any width of d size
              self.reg_convs.append(
                  nn.Sequential(
                      *[
                          Conv(
                              in_channels=int(math.ceil(64 * width) * 4),
                              out_channels=int(math.ceil(64 * width) * 4),
                              ksize=3,
                              stride=1,
                              act=act,
                          ),
                          Conv(
                              in_channels=int(math.ceil(64 * width) * 4),
                              out_channels=int(math.ceil(64 * width) * 4),
                              ksize=3,
                              stride=1,
                              act=act,
                          ),
                      ]
                  )
              )
              # XGEN: Modify the calculation method of inchanenl and out_channel to support any width of d size
              self.cls_preds.append(
                  nn.Conv2d(
                      in_channels=int(math.ceil(64 * width) * 4),
                      out_channels=self.num_classes,
                      kernel_size=1,
                      stride=1,
                      padding=0,
                  )
              )
              # XGEN: Modify the calculation method of inchanenl and out_channel to support any width of d size
              self.reg_preds.append(
                  nn.Conv2d(
                      in_channels=int(math.ceil(64 * width) * 4),
                      out_channels=4,
                      kernel_size=1,
                      stride=1,
                      padding=0,
                  )
              )
              # XGEN: Modify the calculation method of inchanenl and out_channel to support any width of d size
              self.obj_preds.append(
                  nn.Conv2d(
                      in_channels=int(math.ceil(64 * width) * 4),
                      out_channels=1,
                      kernel_size=1,
                      stride=1,
                      padding=0,
                  )
              )

          self.use_l1 = False
          self.l1_loss = nn.L1Loss(reduction="none")
          self.bcewithlog_loss = nn.BCEWithLogitsLoss(reduction="none")
          self.iou_loss = IOUloss(reduction="none")
          self.strides = strides
          self.grids = [torch.zeros(1)] * len(in_channels)

All the above file modifications can be found in the YOLOX code in XGen, /root/Projects/object-detection-yolox/yolox_xgen.

2.Prepare Config file

As XGen's optimization methods are launched through configuration files, when integrating your own model into XGen optimization, you will need to prepare the corresponding configuration file. An example is as follows:

{
  "general": {
    "user_id": "test",
    "work_place": null,
    "tran_scripts_path": null,
    "pretrained_model_path": "/root/Projects/object-detection-yolovx/yolovx_xgen/xgen.pt",
    "enable_ddp": false,
    "CUDA_VISIBLE_DEVICES": "0",
    "random_seed": 3407
  },
  "user_requirements": {
    "power": null,
    "accuracy": null,
    "accuracy_reverse_yn": 0,
    "model_size": null,
    "memory_size": null,
    "latency": 90,
    "margin": 0.1,
    "target_type": "latency",
    "max_sampling": 10,
    "searching_variable": "scaling_factor",
    "searching_range": [
      0.2,
      1
    ],
    "searching_granularity": null
  },
  "compiler": {
    "input_shape": "(1,3,640,640)",
    "opset_version": 11
  },
  "training_script_args": {
    "train_data_path": "/data/object-detection-yolox/coco",
    "train_label_path": null,
    "eval_data_path": "/data/object-detection-yolox/coco",
    "eval_label_path": null,
    "dataset_path": "/data/object-detection-yolox/coco",
    "learning_rate": 0.01,
    "common_train_epochs": 300,
    "device_num": 1,
    "batch_size": 16,
    "root_path": "./Xgen/",
    "name": "yolox-m",
    "exp_file": null,
    "weights": null,
    "device": null,
    "imgsz": 640,
    "width": 0.75,
    "depth": 0.67,
    "num_classes": 80,
    "scaling_factor": 1,
    "workers": 16,
    "noplots": true
  },
  "pruning": {
    "sp_retrain": false,
    "sp_admm": false,
    "sp_config_file": null,
    "sp_admm_update_epoch": 5,
    "sp_admm_rho": 0.001,
    "sparsity_type": "block_punched",
    "sp_admm_lr": 0.01,
    "sp_global_weight_sparsity": false,
    "sp_admm_block": "(8,4)",
    "sp_prune_ratios": 0,
    "sp_prune_threshold": -1.0
  },
  "pas": {
      "pas_ratio": 0,
      "pas": false,
      "limit_loss_weights": 5.0,
      "use_limit_loss":false,
      "pas_debug": false,
      "pas_rebuild": false,
      "pas_finetune_epoch": 200,
      "pas_pretrained_weight_path": null,
      "pas_ignore": null,
      "pas_searching_ratio": [0.1],
      "part_name": "backbone"
  },
  "train": {
    "log_save_folder": "logs",
    "eval_dataset_id": null,
    "training_dataset_id": null,
    "uuid": "yolox",
    "accuracy_reverse_yn": 0,
    "common_save_best_yn": 1
  },
  "scaling": {
    "searching_variable": "scaling_factor",
    "searching_range": [
      0.2,
      1
    ],
    "searching_granularity": 0.05
  }
}

Please see the configuration included in XGEN for YOLOX here: /root/Projects/object-detection-yolox/yolox_xgen/yolox_config/xgen.json.

3. Other considerations

In multi-GPU training, using COCO evaluation with multi-process C++ calls can lead to process conflicts.

It is recommended to remove the following code at line 293 in YOLOX/yolox/evaluators/coco_evaluator.py:

try:
    from yolox.layers import COCOeval_opt as COCOeval
except ImportError:
    from pycocotools.cocoeval import COCOeval

    logger.warning("Use standard COCOeval.")

Replace it with:

from pycocotools.cocoeval import COCOeval

logger.warning("Use standard COCOeval.")

4.Dataset Preparation

Currently, YOLOX only supports the original COCO dataset format. To use your own dataset, you will need to convert your dataset to COCO dataset format. Please see Dataset Preparation for Object Detection (YOLOX) for details. The directory structure of the dataset is as follows: datasets/ - coco/ - annotations/ - instances_train2017.json - instances_val2017.json - train2017/ - (image files for training) - val2017/ - (image files for validation)