Chapter 4. Usage of XGen

XGen offers two ways to help users obtain desirable AI models and code:

Common AI: [First option to consider] This approach uses the already included AI models in XGen as the base for its optimizations. XGen includes a set of AI models for common AI tasks. Users may choose to let XGen optimize the models on the default or their own datasets. This approach should be considered first as it is simple and can meet most needs.
Custom AI: [Only if Common AI is not applicable] This approach allows users to invoke XGen to optimize their own AI models. This approach is more flexible but also more complicated, requiring more work from the users. To optimize their own models, the users must have a workable training script of their AI model, written in PyTorch (other front-end support will be added in the future).

In addition to optimizing AI, XGen has several other functionalities that may sometimes come in handy, such as AI model training, AI model/code quality assessment, and so on. This chapter explains those functionalities after describing the two main ways of using XGen for AI optimizations.

Overview of XGen Modes and Executions

XGen has a commandline interface. When users start XGen in a Shell terminal, XGen gets users' requirements and other inputs through a series of prompts. After that, XGen starts its execution. XGen provides four modes:

Compatibility testing: to test whether the to-be-optimized AI model and its training script are compatible with XGen. The execution of this mode usually takes a much shorter time than other modes do.
Pruning: to figure out what weights to prune from the to-be-optimized AI model and prune them such that the model becomes smaller and faster. To apply it to a custom AI model, the user needs to specify the pruning range and other parameters (detailed later). For now, this mode is opened only for Android devices.
Scaling: to figure out the width of each layer and scale the network accordingly such that the model becomes smaller and faster. It is an alternative to pruning for model compression and optimization. To apply it to a custom AI model, the user needs to embed a search variable in their model and specify some parameters (detailed later).
PaS: to figure out the width of each layer through a model’s computational graph and scale the network accordingly such that the model becomes smaller and faster. It is an alternative to pruning for model compression and optimization. To apply it to a custom AI model, the user needs to specify the pas ratio_list and other parameters (detailed later).
Customization: to allow customized usage of XGen, which can be pruning an AI model in a specific way or training a model with a specific configuration.

The pruning and scaling modes conduct automatic search of the optimization space and generate the optimization results that best meet users' requirements. The customization mode does not conduct search, but executes training or pruning in a specific way defined by the user. It is worth noting that XGen also offers distillation, a way to enhance model accuracy; it can be used in both pruning and scaling modes. All the four modes can be used in both common AI and custom AI, explained next.

Common AI

XGen includes a set of base DNN models for some common types of AI tasks, listed in the following table, along with the default datasets. DigitNet is included mainly for demo purpose as it completes in several minutes. Please note that these models are only some samples of the many models XGen supports. Most AI models are supported by XGen thanks to its broad coverage of DNN operators as detailed in Appendix I.

Category	Model	Dataset	Note
Image classification	DigitNet	MNIST	10 classes
Image classification	EfficientNet	ImageNet (auto-download)	1000 classes
Image classification	ResNet	ImageNet (auto-download)	1000 classes
Image classification	MobileNet	ImageNet (auto-download)	1000 classes
Image classification	ViT	ImageNet (auto-download)	1000 classes
Segmentation	UNet	ISBI-2012 (included)	--
Video classification	R2+1d	UCF101 (auto-download)	101 classes
Natural Language Processing (Question Answer)	MobileBERT	Squad1.1 (auto-download)	--
Object detection	YOLOv6	CoCo (auto-download)	80 classes
Object detection	YOLOX	CoCo (auto-download)	80 classes
Object detection	YOLOv8	CoCo (auto-download)	80 classes

If the AI task needed by a user is in one of the covered categories, common AI is the way to go. Through XGen's interface, users may pick one of the models that suite their task, and input their desired latency or accuracy and the dataset of interest. XGen will then transform the base model and generate code that meets the users' needs. The specific procedure is as follows.

Step 1: Start XGen

 XGen  # assuming that the XGen container was started already (Chapter 2)

Step 2: Input the Requirements

After XGen is started, it will collect the user's needs through a series of questions.

Target Platform: Choose Android or iOS as the target platform. XGen will generate the optimized AI model and code for the selected platform. If no devices are available, the user can choose Simulation as the platform, in which case, XGen will still optimize the AI model to find the smallest model that meets the desired accuracy, even though the reported latencies are imprecise; what it produces will be optimized model files rather than generated model code for a particular platform.
Target device(s): The wizard lists the devices of the chosen platform currently connected to this machine. The user may select zero or more devices of interest. XGen will try to generate the AI model and code that best meet the requirements on all of the selected devices. If no device is selected (or there are no device connected/listed), XGen will behave as if Simulation was chosen as the platform: It uses pseudo-latency in its optimizations; the exact latency value would be imprecise, but the accuracy in the optimization process is still accurate; what it produces will be optimized model files rather than generated model code for a particular platform.
Pick the base model: XGen then lists the already included base AI models (along with their default datasets). Users pick the model that best fits their needs. If none of the models fit the nature of the task, the user may consider Custom AI of XGen.
Specify datasets: If users want to use datasets other than the default datasets associated with the picked AI model, the users need to specify the datasets of interest for training and evaluation. In that case, the users have to make sure that the datasets are organized and formatted in the same way as the default dataset of the model (detailed in Appendix II). Otherwise, the users would need to revise the data loader in the default training script of the model. If a default dataset is chosen but the dataset is not included in XGen, XGen will automatically download the dataset.
Specify other model parameters: If non-default datasets are used, beside the dataset paths, some model parameters may need to be specified as well, such as the number of classes for classification models.
Pick the task for XGen to perform. XGen lists four options as follows:
- Compatibility test: a quick check of whether the model and setup are compatible with XGen. Such a test is much quicker than optimizations (pruning or scaling). So it is a good practice to run this test first before starting an optimization to find out possible problems sooner.
- Pruning: one of the ways that XGen uses to compress a model and generate efficient code. It uses the world-leading proprietary pruning technology of CoCoPIE to prune DNN weights.
- Scaling: one of the ways XGen uses to compress a model, which scales down (or up) the width of DNN layers.
- PaS: one of the ways XGen uses to compress a model, which scales down the width of DNN layers.
- Customization: customized operation. Users can change the XGen configuration of a model to enable special operations, detailed later in Advanced Common AI.

The three compression methods, pruning, scaling and pas, suit different AI models differently. Their differences are detailed later in Custom AI. For a model in Common AI, the XGen prompt already specifies the recommended choice, as shown in the following example:

Pruning Recommended for this model

When some of the common AI models are selected, XGen may show only one of pruning and scaling in the option list because the other mode does not apply to the model at all.

If one of Image classification I: EfficientNet (high resolution, default: ImageNet), Image classification II: ResNet (high resolution, default: ImageNet) or Image classification III: MobileNet (high resolution, default: ImageNet) is chosen as the base model, user also can optionally optimize the model with distillation, detailed later in config for distillation; XGen will prompt several relevant questions:

```bash
# user can do distillation with your own model
Do you want to use distillation?
> Yes
No

Will you use the default distillation model?
> Yes
No
...
# of course, user can also use their own distillation model

Do you want to use distillation?
> Yes
No

Will you use the default distillation model?
Yes
> No

Teacher model name: regnety_160
Teacher model path: /root/Projects/classif/root/Projects/classification-multi-model-efficientnet/timm/teacher_models/regnety_160-a5fe301d.pth
Number of classes: 1000
```

XGen then asks for the optimization objectives (except in the Compatibility test mode). Users may specify either "Quality score" or "Latency" as the primary interest, and the other as the secondary interest. XGen will try to meet the primary interest first and then try to meet the secondary interest. The acceptable specifications are as follows:
- Latency: format is NumberUnit, where Number can be either an integer or a decimal number, Unit can be "ms" (milliseconds), "s" (seconds), , "m" (minutes) or "hr" (hours). If latency is the secondary interest, users may also input min to indicate the intent to minimize the latency while meeting the primary goal.
- Quality score: format is op number, where op can be >, <, >=, <=, and number can be either an integer or a decimal number. Quality score corresponds to the score used to measure the quality of the AI model. It can be accuracy for classification models, PSNR for image enhancement, loss for some other tasks, and so on. No matter what the actual measure is, XGen represents it uniformly in Quality score and handles it in a unified manner. If quality score is the secondary interest, users may also input min or max to indicate the intent to minimize or maximize the quality score while meeting the latency goal.
XGen then asks which GPU(s) the user would like XGen to use. The user may select one or multiple GPUs; XGen will try to conduct training on all the selected GPUs if the training script of the model supports the use of multiple GPU devices.
XGen then asks the users to input their desired training parameters of the AI model (e.g., batch size, learning rate, fine-tuning epoches).
XGen then asks for the path of a pretrained model if there is one (except in the Compatibility test mode). If none, XGen will create a pretrained model before doing the model optimizations.
XGen will automatically determine the level of optimization based on the user's selections in earlier questions. Depending on the chosen models and other selections, XGen will default to using Extensive Optimization.
- Extensive Optimization (Taking several rounds of training time): Users don't need to make any choices, as this is the default optimization method provided by XGen. This option will be executed by default when users select Latency as the target. The model will first search within XGen's built-in models to see if there's one that satisfies user requirements. If a suitable model is found, it will use that built-in model to optimize the corresponding data. If no suitable model is found within the built-in models, it will resort to the optimization method previously chosen by the user to obtain an optimized model.
Finally, XGen asks for xgen-workplace. It is a directory where the user wants to store models and data. If the user skips this question, XGen will create a unique folder and inform the user.

If the directory already contains the results of a previous run of XGen, XGen will check whether the XGen configuration used in the previous run is identical to the configuration in this current run. If so, XGen would ask the user whether to continue its work from where the previous run stopped to avoid unnecessary repeated work.

Step 3: Wait for the Results

After inputting the requirements, XGen will start compressing and optimizing the base AI model. The process may include a series of search and one or several times of model trainings. The time it takes depends on how long the model training takes. For the seven models listed above, the time ranges from several hours to several days. For reference, below is a list showing the training time of each of the models on their default datasets:

* DigitNet (5-10 minutes on a 1-card titanX)
* EfficientNet [2 days on 4-card v100]
* ResNet-ImageNet [3 days on 4-card v100]
* MobileNet [7 days a 4-card 3090]
* ViT [7 days a 4-card v100]
* UNet [1 hour on 1-card 3090ti]
* R2+1d [2-3 days 8-card titanX]
* YOLOv6 [7 days on 2-car 3090]
* YOLOX [7 days on 2-car 3090]
* YOLOv8 [5 days on 2-car 3090]

During the execution of XGen, the status will be printed on the screen in real time, including the actual latency, quality score and model size of the generated models by XGen. At the end, the screen will show the path where the final results are stored. Please see XGen Results for the detailed explanation of the final results and how to use the results.

Use Non-Default Datasets

In Common AI, each AI model comes with a default dataset. XGen supports the use of other datasets. To take that option, the users need to ensure that the datasets are in a format and structure compatible with the expectation of XGen. Please refer to Appendix II for details.

Advanced Common AI

The standard common AI can already produce much more competitive results than other existing tools do. Advanced common AI may allow XGen to achieve even better results. To use it, the user needs to adjust the content of the XGen configuration file associated with the AI model, and then follow the same procedure as mentioned earlier.

The XGen configuration file determines many aspects of the optimizations by XGen. Each of the included AI models has one. Users can create the configuration for their own AI models as well. The following list provides the locations of the configurations of the included AI models and some other info.

Image Classification 0: DigitNet (MNIST)
- XGen config path: /root/Projects/classification-digitnet/xgen.json
- Training script path: /root/Projects/classification-digitnet/
- Recommend mode: pruning
Image classification I: EfficientNet (ImageNet)
- XGen config path: /root/Projects/classification-multi-model-efficientnet/timm/efficientnet_config/xgen.json
- training script path: /root/Projects/classification-multi-model-efficientnet/timm
- Recommend mode: scaling
Image classification II: ResNet (ImageNet)
- XGen config path: /root/Projects/classification-multi-model-resnet/timm/resnet_config/xgen.json
- Training script path: /root/Projects/classification-multi-model-resnet/timm
- Recommend mode: pruning
Image classification III: MobileNet (ImageNet)
- XGen config path: /root/Projects/classification-mobilenet/timm/mobilenet_config/xgen.json
- training script path: /root/Projects/classification-mobilenet/timm/
- Recommend mode: scaling
Image classification IV: ViT (ImageNet)
- XGen config path: /root/Projects/classification-vit/timm/vit_config/xgen.json
- training script path: /root/Projects/classification-vit/timm/
- Recommend mode: scaling
Segmentation: UNet
- XGen config path: /root/Projects/segmentation-unet/unet_config/xgen.json
- training script path: /root/Projects/segmentation-unet/
- Recommend mode: scaling
Video classification: R2+1D
- XGen config path: /root/Projects/video-classification-s2-1d/s2+1d_config/xgen.json
- training script path: /root/Projects/video-classification-s2-1d/
- Recommend mode: pruning
Natural language processing: MobileBERT (Squad1.1)
- XGen config path: /root/Projects/nlp-mobilebert/mobilebert_config/xgen.json
- training script path: /root/Projects/nlp-mobilebert/
- Recommend mode: scaling
Object detection: YOLOv6 (CoCo2017)
- XGen config path: /root/Projects/object-detection-yolov6/yolov6_xgen/yolov6_config/xgen.json
- training script path: /root/Projects/object-detection-yolov6/yolov6_xgen
- Recommend mode: scaling
Object detection: YOLOX (CoCo2017)
- XGen config path: /root/Projects/object-detection-yolox/yolox_xgen/yolox_config/xgen.json
- training script path: /root/Projects/object-detection-yolox/yolox_xgen
- Recommend mode: scaling
Object detection: YOLOv8 (CoCo2017)
- XGen config path: /root/Projects/object-detection-yolov8/yolov8_xgen/yolov8_config/xgen.json
- training script path: /root/Projects/object-detection-yolov8/yolov8_xgen
- Recommend mode: scaling

Users can change the values of the parameters specified in the configuration files to change how XGen works with the model. Next is the explanation of the content of a XGen model configuration. Hint: Before changing the default configuration of a common AI model, users are suggested to create a backup copy of the configuration file first for easy resetting when necessary.

Model Configuration for XGen

This part explains the essential entries in an XGen configuration by drawing on the following example.

{

  # general parameters of XGen.
  "general": {
    "work_place": null, # the folder for XGen to store results
    "pretrained_model_path": null, # pretrained model location
    "enable_ddp": true, # enable DDP training
    "CUDA_VISIBLE_DEVICES":  "0,1,2,3", # which GPUs are used
  },

  # parameters of the original training script of the AI model can be put into this section
  # XGen will pass them into the training script
  "training_script_args": {
    "common_train_epochs": 30, # the number of epochs in one training
    # Optional: Any argument of the training_main script can appear here
    "video-scale": 2,
    "learning-rate": 0.01,
    "batch_size": 16, # the batch size per GPU
  },

  "user_requirements": {
    # Auto generated options from XGen interface. Do not change.
    "accuracy": null,
    "latency": 30,
    "use_distillation": true,
    "use_default_distillation_model": false,
    # Other options that are manually changeable.
    "margin": 2, # Acceptable error margin for the termination of XGen optimization
    "max_sampling": 3     # The max sampling number in the pruning or scaling mode
                          # If latency is primary: suggested value is 20;
                          # If accuracy is primary: suggested value is at least 3, no more than 5
    ### distillation default model options
    "teacher_model": "efficientnet_b4",   # default model name
    "teacher_path": "/model_path_to/efficientnet_b4.pth", # default model weights path
  },

  # parameters defining the pruning of XGen
  "pruning" : {
        "prune_method": "magnitude"  # "magnitude" (for magnitude-based pruning; default) or "admm" (for ADMM pruning) or "null" (no pruning)
        "prune_weight_ranking": global # "global" or "intra-layer"
        "prune_rate": 0.5, # uniform pruning rate (take effect only in the customization mode)
        "prune_layers": null, # the list of layers to prune and their respective pruning rates
                              # If "null", XGen automatically decides the layers to prune

        "admm_lr": 0.01, # admm special learning rate
        "admm_update_epoch": 5, # how frequently we update z and u in admm pruning
        "admm_rho": 0.001, # ( a hyperparameter in admm loss function)
  },

  # parameters defining the scaling of XGen
  "scaling" : {
    "searching_variable": null, # Variable to search; must be a variable in the "pruning" or "training_script_args" section
    "searching_range": null, # A range (e.g., [0,1])
    "searching_granularity": null, # a number determines the smallest unit in the search space; null means continuous space
  },
  "pas": {
    "pas_ratio": 0, # uniform pas scaling rate (take effect only in the customization mode)
    "pas": false , # using pas slection
    "limit_loss_weights": 5.0, # pas limit loss weight which can limit model size
    "use_limit_loss":true, # using limit loss or not 
    "pas_debug": false , # If ture while show loss item
    "pas_rebuild": false , # If true while rebuild model for pas_model
    "pas_finetune_epoch": 200, # number of epochs for fine-tune model for pas_model
    "pas_pretrained_weight_path": null # pretrained pas_model weight path
},
  # distillation key args
  "distillation": {
    "distillation_method" : "classic_distillation", # "null" (no distillation; default) or "classic_distillation" (for classic distillation; )

    # classic_distillation config start
    # general setting
    "original_loss_weights":0.1, # loss weights for original loss which define by user
    # Tag loss setting:
    "tag_loss_weights": 0.9, # tag loss weights
    "tag_loss": 'kl',  # None will ignore tag loss, only support mse loss and kl loss for now
    "tag_temperature": 4,   # tag temperature
    "tag_loss_combination_method": "avg", # if there are multiple teacher, how we combined them, we support "avg" and "random" method for now
    # feature loss setting
    "feature_loss_weights":0.9, # feature loss weights
    "feature_default_temperature":1, # feature default temperature
    "advance_feature_mapping": {}, # teacher_model_name: [student_layer_name, teacher_layer_name,loss_type,weights,{adv_config}]
    # regularization loss
    "regularization_loss_weights": 1, # regulization loss weighst
    "discriminator_lr":1e-4  # tag_discriminator learning rate
    # classic_distillation config end
  },

  "compiler": {
    "input_shape": "(1,3,360,640)", # The shape of the input tensor of the AI model
    "opset_version": 11,  # The version of ONNX (11 or 9); default 11
  }
}

The configuration is a JSON file, consisting of seven sections. The inputs from the XGen interface automatically updates the corresponding entries in this JSON file. Users may also directly modify these entries. Most entries are self-explained in the example. Several notes are worth mentioning.

Configure Pruning

XGen has two pruning methods, magnitude-based pruning and ADMM-based pruning. The former prunes the weights with the least magnitudes, the latter uses a more advanced method to select the weights to prune. Parameter prune_weight_ranking determines whether the ranking of weights for pruning is within each individual layer or across all layers.
In the pruning mode, parameters prune_method and prune_rate are ignored, and XGen uses magnitude-based pruning and automatically determines the appropriate pruning rates for the layers listed in prune_layers. If prune_layers is null, XGen automatically determines the layers to prune. The preset pruning rates in prune_layers are ignored in this mode.
In the customization mode, XGen conducts either pruning (if prune_method is not null) or training of the AI model (otherwise); the parameter prune_method determines the pruning method, and the parameter prune_layers determines the layers to prune and their pruning rates. If the prune_rate parameter is not null, it takes a higher priority over the pruning rates in prune_layers. In the customization mode, if prune_method is not null, prune_rate and prune_layers cannot be both null.
The prune_layers entry is a JSON string consisting a list of layers in the AI model and their pruning rates (i.e., the percentage of weights to be removed). An example is as follows.

'{"prune_rates": {"layer1.0.conv1.weight": 0.2, "layer1.0.conv2.weight": 0.4,
"layer1.1.conv1.weight": 0.5, "layer1.1.conv2.weight": 0.5, "layer2.0.conv1.weight": 0.5,
"layer2.0.conv2.weight": 0.5, "layer2.0.shortcut.0.weight": 0.5, "layer2.1.conv1.weight": 0.5,
"layer2.1.conv2.weight": 0.5, "layer3.0.conv1.weight": 0.5, "layer3.0.conv2.weight": 0.5,
"layer3.0.shortcut.0.weight": 0.5, "layer3.1.conv1.weight": 0.4, "layer3.1.conv2.weight": 0.2}}'

Users can also create a separate yaml file to specify the layers and the pruning rate of each layer, and then provide the file path as the value of "prune_layers"; XGen will get the value from the file content. To help users set prune_layers, XGen provides a utility function for users to get the list of names of the layers in an AI model amenable for pruning. Users just need to add from co_lib.utils import export_pruning_layer to the beginning of the training script of the AI model and use export_pruning_layer(the-AI-model,output-yaml-file-name) in the training script to export the model's prunable layers into a file. The pruning rates in the exported list are all set to 0.5 by default. Below is an example of such a yaml file:

prune_ratios:
   module.layer1.0.conv1.0.0.weight:
     0.6
   module.layer2.0.conv2.0.3.weight:
     0.8

To use ADMM-based pruning, the customization mode has to be used. Two runs of XGen are needed. The first run must set prune_method to admm and the second run magnitude. The two runs must use the same prune_rate and prune_layers.

Configure Scaling

IMPORTANT: To use scaling for a custom AI model, it is necessary for the user to add a search variable into the AI model and configure the scaling in the configuration file. Without that, scaling would not function and would give only the original model.

In the scaling mode, XGen has a generic design. It changes the value of the searching_variable while assessing the quality of the AI model. So in principle, the user may express any aspect of the AI model with the searching_variable, and then use scaling of XGen to search for the best value. A typical use of this feature is to express, in the training script, the widths or depths of the layers of the AI model as expressions of the searching_variable. Then, XGen will automatically scale down (or up) those layers towards the objectives. For instance, in the training script, the user may set the first three layer's widths to 100*w, 200*w, 300*w. If in the configuration file w is set as the searching_variable and the search range is [0.5, 1] with a step size 0.1, XGen will investigate six variants of the AI model with different widths. Users can also use this feature to change other aspects of the AI model. Note that the searching_variable must be a variable in the training_script_args section in the configuration file.

Below is an example showing how scaling is configured in UNet:

# Inside the configuration file of UNet (config.json),
# nch_ker is set as the searching variable.
"searching_variable": "nch_ker",

# Inside the model construction file of UNet, `nch_ker` is used as a variable to control
# the width of each layer. Notice that different factors are used with `nch_ker` at different
# layers.

self.enc1_1 = CNR2d(1 * self.nch_in, 1 * self.nch_ker, kernel_size=3, stride=1, norm=self.norm, relu=0.0)
self.enc1_2 = CNR2d(1 * self.nch_ker, 1 * self.nch_ker, kernel_size=3, stride=1, norm=self.norm, relu=0.0)

self.pool1 = Pooling2d(pool=2, type='max')

self.enc2_1 = CNR2d(1 * self.nch_ker, 2 * self.nch_ker, kernel_size=3, stride=1, norm=self.norm, relu=0.0)
self.enc2_2 = CNR2d(2 * self.nch_ker, 2 * self.nch_ker, kernel_size=3, stride=1, norm=self.norm, relu=0.0)

self.pool2 = Pooling2d(pool=2, type='max')

self.enc3_1 = CNR2d(2 * self.nch_ker, 4 * self.nch_ker, kernel_size=3, stride=1, norm=self.norm, relu=0.0)
self.enc3_2 = CNR2d(4 * self.nch_ker, 4 * self.nch_ker, kernel_size=3, stride=1, norm=self.norm, relu=0.0)

self.pool3 = Pooling2d(pool=2, type='max')

self.enc4_1 = CNR2d(4 * self.nch_ker, 8 * self.nch_ker, kernel_size=3, stride=1, norm=self.norm, relu=0.0)
self.enc4_2 = CNR2d(8 * self.nch_ker, 8 * self.nch_ker, kernel_size=3, stride=1, norm=self.norm, relu=0.0)

self.pool4 = Pooling2d(pool=2, type='max')

self.enc5_1 = CNR2d(8 * self.nch_ker, 2 * 8 * self.nch_ker, kernel_size=3, stride=1, norm=self.norm, relu=0.0)

Model accuracy is sensitive to the shapes of the stem (i.e. input) layer and the output head layer. So when setting up for scaling, try to avoid scaling down those layers much. There are various online resources on how to best do AI model scaling.

In the customization mode, XGen does not do scaling; the scaling section in the configuration file is ignored.

Configure Distillation

Distillation is a way to improve the accuracy of a pruned or scaled model.

To use distillation, users just need to answer Yes when being asked in the XGen interactive interface. In the XGen configuration file (xgen.json), there are several options in the user_requirements section about distillation and a separate distillation section that control the behavior of distillation, as shown in the example configuration file in the Model Configuration for XGen part. Please see the comments in that example configuration file for the explanations of those options.

Configure PaS

PaS (Pruning as Search) is a novel method of structured pruning that searches through a model’s computational graph to find a smaller and more optimal model. To use the PaS algorithm, the user must select the PaS option when being asked in the XGen interactive interface, or select the customization mode while having parameter pas set in the configuration file.

The main PaS configurations include pas_ratio, which indicates the degree of compression you want for your model; pas_finetune_epoch, which specifies the number of training epochs required for fine-tuning your final model. See the example configuration above for other parameters.

Initially, a larger PaS-generated model is trained to identify more suitable weights, and then, using the fine-tune epochs specified by the user, the previously-generated model is pruned to the desired size. Notably, this method cannot adequately handle dynamic computational graphs and requires the user’s model to have a static graph, with no conditional statements in the forward function, among other requirements. In experimental comparisons, the use of torch.batchnorm may significantly impact the accuracy. If using the PaS algorithm, it is recommended to replace torch.batchnorm with nn.BatchNorm.

Hints

Here are some hints on using the various options provided in the XGen configuration.

The pretrained_model_path entry is for holding the location of the pretrained model if there is any. The format of the pretrained model can be one of the following:
- result saved by torch.save(), such as torch.save({'net': net.state_dict(),'acc': acc,'epoch': epoch,}, file), or torch.save(net.state_dict(), file)
- a pickle file created by, for instance, pickle.dump(quant_sim_model, file)
An optimization done by XGen (pruning or scaling) involves the trainings of multiple AI models, and may hence take hours or days to complete. Usually fewer trainings are needed if "latency" rather than "quality score" is chosen as the primary interest.
If the original model weights were obtained by distillation (or transfer learning) from some other networks, the pruning mode is often a better option than scaling.
Pruning is applicable to a layer only if the sizes of the kernels in that layer are multiples of 8, such as 4x4.
When users are not sure of what configurations work best, they can try to run XGen multiple times with a different configuration used each time.
The pruning and scaling modes both involve an iterative search process, finding the appropriate pruning rates or scaling variable values. The customization mode does not involve a search process. XGen simply trains the model based on the provided configuration; if pruning_method is not null, XGen conducts a one-time pruning of the AI model before training it.
XGen supports data parallel training on multiple GPUs in a machine. To use that support, the user just needs to enable parallel training in the training script of the AI model. This version of XGen does not support training one model on multiple machines.

Example

As an example, the following table shows the differences between the default configuration of R2+1d model and modified configurations that achieve better speed-quality tradeoffs. The modified configurations use the customization mode, including two runs of XGen. The first run uses the ADMM method to identify the weights that are good to prune and make them small in magnitude, and the second run uses magnitude-based pruning to set those weights to zero and retrain the model.

Parameter	Default configuration	Modified configurations
mode	pruning	customization
prune_method	magnitude	admm (run-1), magnitude (run-2)
common_train_epochs	35	50 (run-1), 130 (run-2)
learning_rate	0.005	0.0005
admm_update_epoch	5	10
admm_rho	0.001	0.0001
prune_layers	null	{module.layer1.0.conv1.0.0.weight: 0.8
		module.layer1.0.conv1.0.3.weight: 0.8
		module.layer1.0.conv2.0.0.weight: 0.8
		module.layer1.0.conv2.0.3.weight: 0.8
		module.layer1.1.conv1.0.0.weight: 0.8
		module.layer1.1.conv1.0.3.weight: 0.8
		module.layer1.1.conv2.0.0.weight: 0.8
		module.layer1.1.conv2.0.3.weight: 0.8
		module.layer2.0.conv1.0.0.weight: 0.7
		module.layer2.0.conv1.0.3.weight: 0.7
		module.layer2.0.conv2.0.0.weight: 0.7
		module.layer2.0.conv2.0.3.weight: 0.7
		module.layer2.1.conv1.0.0.weight: 0.7
		module.layer2.1.conv1.0.3.weight: 0.7
		module.layer2.1.conv2.0.0.weight: 0.7
		module.layer2.1.conv2.0.3.weight: 0.7}

The customization mode leads to both a higher accuracy and a lower latency on three devices, as shown as follows.

r3d results

Custom AI

If users have their own model to optimize, custom AI of XGen can be the way to go. But before using this option, the users are recommended to first consider common AI as custom AI involves more efforts on the user's end.

To use custom AI, the user must already have a workable training script of their own AI model. The script must be written in PyTorch. (Support of other front ends---such as TensorFlow, Keras---will be added in the future.)

The usage of custom AI is similar to common AI except that the user would need to pick the "Your own model" option when selecting the base model in the XGen interface. The rest of the procedure is self-explained. There are however three steps to go through before XGen can start working on a custom AI:

Checking the model applicability;
Preparing the running environment;
Preparing the training script. An example is included inside XGen (/root/examples inside XGen container).

Step I: Checking Model Applicability

Although XGen has a broad coverage of DNN operators, some AI models may still contain operators beyond XGen's coverage and cause XGen to fail. This first step is to run such a check.

Export the AI model to an ONNX file

The user may use the PyTorch function torch.onnx.export() to export the model to an ONNX file. There are three requirements for making the call:
- The second argument, model input, must have one as the batch size of the input (i.e., the first dimension of the tensor must be 1);
- the do_constant_folding argument must be True;
- the opset_version argument must be 11 or 9. Here is an example:
- do not set dynamic_axes argument.
```
torch.onnx.export(
                model,  # model being run
                x,  # model input
                onnx_save_path,  # where to save the model
                do_constant_folding=True,
                input_names=['input'],  # the model's input names (an arbitrary string)
                output_names=['output'],  # the model's output names (an arbitrary string)
                opset_version=11  # XGen supports 11 or 9
            )
```
Sometimes users' training script may already has a customized function to export ONNX format of the model. In that case, the user should use that function instead of the default PyTorch function.
Run the check

Use XGen onnx_latency_benchmark to test the support status of generated onnx file, if the benchmark succeeded, it means that current onnx file is supported. Otherwise, the onnx file is not supported.

The supported operators (listed in the Appendix I).

Step II: Prepare Running Environment

Because XGen involves trainings of the AI model inside its docker container, the users need to install into the docker container the dependencies required by the training of the AI model. It is better to ensure that the training script can run properly in the docker container before moving to the next step.

Step III: Prepare Training Scripts

To allow XGen to optimize a custom AI model, the training script of the model has to be revised so that XGen can interact with the training script during the optimization process. This part explains the needed changes.

Descriptions of the APIs used below can be found in Appedix III

Rename the main training script to train_script_main.py, and ensure that (i) at the end of the script, the script invokes the main training function; (ii) the main training function to be named training_main; (iii) this renamed script train_script_main.py reside in the top-level folder of the custom AI model (e.g., at the end of the script, the script invokes the main training function. (XGen currently requires the main training function to be namedtraining_main`; this requirement will become obsolete in the next version.)
```
if __name__ == '__main__':
    training_main()
```
Import the XGen modules into the training script by adding the following lines to the beginning of the training script:
```
from co_lib import Co_Lib as CL
from xgen_tools import *
```
Initialize XGen, and connect the control variables in the training script with the internal variables inside XGen so that XGen can control the training process. Some of the internal variables of XGen to be connected with are listed as follows. For the full list, please see the training_script_args field in xgen config json file (e.g., /root/Projects/object-detection-yolox/yolox_xgen/yolox_config/xgen.json).
```
common_train_epochs: control the number of training epochs
learning_rate: control the learning rate
batch_size: control the batch size
device_num: the number of GPUs to use
num_classes: the number of classes if this model is a classification model

train_data_path: path of the training data 
eval_data_path: path of the eval data
train_label_path: path of the train label
eval_label_path: path of the eval label
pretrain_model_weights_path: the path of the pretrained model weights
```
XGen sets these internal variable values based on the values it attains during the interactions with the user. So for the training script to follow these controls, the user would need to ensure that the corresponding variables inside the training script take their values from these XGen internal variables.

XGen offers an API xgen_init(user_args, args_ai=None, map=COCOPIE_MAP) to facilitate the process. The call, as shown as follows, returns two values, with the first (orginalArgs in the example) holding the original arguments in the training script, and the second (args_ai) holding the internal variables of XGen.
```
# user_args represents a user variable holding arguments in the training_main function
# the returned args_ai is the same as the argument `args_ai`
# the return type of originalArgs is identical to the type of user_args (e.g., type argparse.Namespace)
originalArgs, args_ai = xgen_init(user_args, args_ai=None, map=COCOPIE_MAP)
```
Specifically, the variables that xgen actually controls for the training process are in args_ai['training_script_args'] field. So for reference conveniences, the user may add this line:
```
xgen_control_args = args_ai['training_script_args']
```
Note, that call to xgen_init should be put after the parsing of arguments in the original training script, where user_args should be replaced with the variable in the original training script that holds the arguments parsing results (of type argparse.Namespace or dict).

User can then connect the variables in the originalArgs with the corresponding variables in args_ai. For instance, if the original training script uses batchSize variable for the batch size of training, the following code will connect it with the internal variable of XGen.
```
originalArgs.batchSize = xgen_control_args['batch_size']
```
If the original training script uses other ways to organize the control variables of training, the modifications would just need to ensure that the connection is done properly. For example, the training script of Yolox passes the training parameters into the training script through a Python file (config.py) and then uses a variable exp of type ABCMeta to hold them. The corresponding code changes would need to connect the fields in exp with the corresponding internal variables of XGen:
```
# XGen: invoke xgen_init
# If args is not provided, it can be set as an empty dictionary{}.
# If args is {}, then args will be consistent with the args_ai generated by initialization in xgen_init
user_args, args_ai = xgen_init(args, args_ai=None)
# XGEN: The variable that xgen actually controls the training process is in args_ai['training_script_args'], 
# which is a dictionary, so you can use it in the following ways.
# Note: The args_ai['training_script_args'] field is actually mapped by the 'training_script_args' field in xgen config json
xgen_control_args = args_ai['training_script_args']
# XGen: connect variables with XGen internal variables
exp.max_epoch = xgen_control_args['common_train_epochs']
exp.data_dir = xgen_control_args['train_data_path']
```
It is worth mentioning that it is not necessary to connect XGen's every internal variable. If the training script has no use of some of the controls, those variables do not need to be connected with the user's variables. Normally, at the minimum, the common_train_epochs should be connected with the user's variable.

For conveniences, XGen provides a buildin mechanism for forming the connection. But note that this mechanism applies only if the original variables are directly passed into the training function through arguments. Rather than writing those assignment statements, the user may use a dictionary COCOPIE_MAP, as illustrated as follows. The variables in the original training script are put as the keys and the XGen internal varialbes as the values. Put this map at the beginning of the training script. Then, after the call to user_args, args_ai = xgen_init(args, args_ai=None), the user_args will contain all the original variables in the COCOPIE_MAP, and they will be all connected with the internal variables of XGen.
```
from xgen_tools import XgenArgs
COCOPIE_MAP = {
      'training epochs variable' : 'common_train_epochs', 
      'your_lr': 'learning_rate', # assign `learning_rate` in xgen configuration to `your_lr`
      'your_batch_size': 'batch_size', # assign `batch_size` in xgen configuration to `your_batch_size`
      'your_train_data_path': 'train_data_path', # assign `train_data_path` in xgen configuration to `your_train_data_path`
      'your_val_data_path': 'eval_data_path', # assign `val_data_path` in xgen configuration to `your_val_data_path`
      'your_train_label_path': 'train_label_path', # assign `train_label_path` in xgen configuration to `your_train_label_path`
      'your_eval_label_path': 'eval_label_path', # assign `eval_label_path` in xgen configuration to `your_eval_label_path`
      'your_pretrain_model_weights_path': 'pretrain_model_weights_path', # assign `pretrain_model_weights_path` in xgen configuration to `your_pretrain_model_weights_path`,
    }
```
After the connection is done, the user would need to ensure that the rest of the training script uses those control variables to control the training process.
Insert the following line after the line where the AI model architecture is assigned to a variable (AImodel in the example). This line should replace the origin model weight loading function in the script. It loads the model weights from the file specified by the pretrained_model_path entry in the XGen configuration file. If that entry is null, the model will be initialized by the default model initialization in the training script.
```
# AImodel represents a variable holding the model architecture
xgen_load(AImodel,args_ai=args_ai)
```
Insert the following line after the DNN model is loaded, but before the training starts. AImodel, dataLoader, and AIoptimizer represent variables in the training script: AImodel is the holder of the DNN model, dataLoader is the training data loader, and AIoptimizer is the optimizer of the training process.
```
cl_model = CL.init(args=args_ai, model=AImodel:torch.nn.Module, optimizer=AIoptimizer:torch.optim,
        data_loader=dataLoader:torch.utils.data.dataloader.DataLoader)
if cl_model:
    AImodel = cl_model
```
Insert the following line at the beginning of each training epoch (inside the training epoch loop), where epochCount is the counter of epochs:
```
CL.before_each_train_epoch(epoch=epochCount)
```
Insert the following line after the changes to the learning rate in the training loop, where epochCount is the counter of epochs. Skip this step if the training loop has no changes to the learning rate.
```
CL.after_scheduler_step(epoch=epochCount)
```
Insert the following line after the loss function, where loss is the loss tensor calculated by the loss function in the training script. This line allows XGen to adjust the loss for its optimization purpose.
```
loss = CL.update_loss(loss)
```
Inside the training loop, insert the following line right after an evaluation of the current model, where AImodel and epochCount have been explained earlier, and evaluationResult represents a variable for the evaluation result, such as the accuracy, average PSNR (for super resolution), and so on. The evaluation can be invoked for every one or more epochs. If there are no evaluations in the training loop, the user would need to add them.
```
xgen_record(args_ai,AImodel,evaluationResult,epoch=epochCount)
```
- Note:
  - If the training script uses EMA, DataParallel or DistributedDataParallel, the actual model is in a field module of the model (i.e., AImodel.module), in which case, the line should be xgen_record(args_ai, AImodel.module, evaluationResult, epoch=epochCount)
  - Sometimes the training script already has a customized function to export the model in ONNX format. In that case, the user should use that customized function to export the model to an ONNX file. The user may then pass the onnx_file_path to xgen_record as follows: xgen_record(args_ai, AImodel, evaluationResult, epoch=epochCount, onnx_file_path= the onnx file path );
At the end of the training_main function, insert the following. If the training script has no evaluation function, use -float('inf') for evaluationResult)
```
xgen_record(args_ai,AImodel,evaluationResult,epoch=-1)
```
If the user wants to use the "Scaling" mode of XGen to compress the model, the user needs to modify the training script of the AI model to give XGen the control of the scale (e.g., width, depth) of the AI model. Please see Configure Scaling for XGen part for details.
XGen has a buildin order check that inspects the training script to identify obvious api call errors. The api order check is enabled by default. If the user wants to disable it, she can add the following line into the training script:
```
from xgen_tools import helper
helper.ENABLE_HELPER = False
```

Step IV: Prepare Config file

The next step is to prepare a configuration file which determines how XGen applies optimizations to the model. Please see Model Configuration for details.

Example Documents

You can refer to our documentation on YOLOX model modification to understand this process. YOLOX example

Example Code

Following is an example of a training script after being revised for XGen. The changed parts are surrounded by xgen_tools START and xgen_tools END comments.

# xgen_tools START @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
from co_lib import Co_Lib as CL
from xgen_tools import *
COCOPIE_MAP = {'train_epochs' : 'common_train_epochs'}
# xgen_tools END @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

def training_main():

    orginalArgs = ... # parsing users' parameters (including "train_epochs" in this example)

    # xgen_tools START @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
    user_args,args_ai = xgen_init(orginalArgs,map = COCOPIE_MAP)
    # xgen_tools END @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

    # define dataloader
    trainDataLoader = torch.utils.data.DataLoader(...)
    testDataLoader = torch.utils.data.DataLoader(...)

    # define DNN model
    model = ...

    # define loss function
    criterion = ...


    # define optimizer
    optimizer = torch.optim.Adam(
            model.parameters(),
            lr=args.learning_rate,
            betas=(0.9, 0.999),
            eps=1e-08,
            weight_decay=args.decay_rate
        )
    scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=30, gamma=0.7)

    # xgen_tools START @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
    #load model
    xgen_load(model,args_ai=args_ai)
    cl_model = CL.init(args=args_ai, model=model, optimizer=optimizer, data_loader=trainDataLoader)
    if cl_model:
        model = cl_model
    # xgen_tools END @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

    for epochCount in range(0,user_args.train_epochs):

      # xgen_tools START @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
        CL.before_each_train_epoch(epoch=epochCount)
      # xgen_tools END @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

        scheduler.step()

      # xgen_tools START @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
        CL.after_scheduler_step( epoch = epochCount)
      # xgen_tools END @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

        for batch_id, data in enumerate(trainDataLoader):
            model.train()

            output = model(input)

            loss = criterion(...) # regular loss, i.e., cross-entropy, mse, ...

        # xgen_tools START @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
            loss = CL.update_loss(loss)
        # xgen_tools END @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

            loss.backward()
            optimizer.step()

        if epoch % user_args.eval_epochs == 0:
            accuracy = eval(model)
            # save the model
            save_path = 'path_name.pth'
                state = {
                    'epoch': epochCount,
                    'model_state_dict': model.state_dict(),
                    'optimizer_state_dict': optimizer.state_dict(),
                }
            # xgen_tools START @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
            #if users want to save the intermediate models and log info (optional)
            xgen_record(args_ai,model,accuracy,epoch=epochCount)
            # xgen_tools END @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

    accuracy = eval(model) # needed only if the total of the training epochs is less than one
    # xgen_tools START @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
    xgen_record(args_ai,model,accuracy,epoch=-1)

if __name__ == '__main__':
# don't for get add this line to make sure it will start the main function
    training_main()

Customized Operation

It is common when user wants to include a new model to XGen, user may need to add some customized operations, want to control the pruning rate / pruning method or just want to run the model without optimized by XGen. User'll need customized operation mode to do that. Before start, please make sure the model has been already correctly adapted to XGen due to Advanced Common AI.

Here, let's take a look at how to use customized operation mode.

Choose DigitNet (MNIST) for example.

Please choose the base model (and the default dataset) to start with:
> Image Classification 0: DigitNet (MNIST)
  Image classification I: EfficientNet (high resolution, default: ImageNet)
  Image classification II: ResNet (high resolution, default: ImageNet)
  Image classification III: MobileNet (high resolution, default: ImageNet)
  Image classification IV: ViT (default: ImageNet (high resolution))
  Segmentation: UNet (ISBI-2012)
  Object detection: YOLOv6 (CoCo2017)
  Object detection: YOLOX (CoCo2017)
  Object detection: YOLOv8 (CoCo2017)
  Video classification: R2+1d (UCF101)
  Natural language processing: MobileBERT (Squad1.1)
  Your own model

In the mode choice, choose customized operation mode.

What do you want to do? (Pruning recommended):
  Compatibility test
  Pruning
  Scaling
  PaS
> Customized operation

Choose or input any further question.

  Which GPU(s) do you want to use:
  > [*] GPU 0: NVIDIA GeForce RTX 3090 (UUID: GPU-e75cbd31-7281-3c03-xxxx-xxxxxxx)
    [ ] GPU 1: NVIDIA GeForce RTX 3090 (UUID: GPU-647eedd7-0986-f7aa-xxxx-xxxxxxx)
  Press <space> or <tab> for multi-selection, <up> and <down> or letter key <j> and <k> to move, <enter> to accept.

  What is the batch size per GPU: 128
  What is the learning rate: 0.01
  How many epochs: 3

  XGen pretrained model absolute path (optional):
  No value detected, confirm to continue？
  > Yes
    No

  Your workplace for this session (if you don't input a path, XGen will make one for you):
  This is your default workplace for this session: /root/output/DigitNet_MNIST/20221219121928

In the output, you'll see there is only one stage in XGen, because XGen won't do any other optimizing operations like pruning or scaling in customized operation mode. And the latency and accuracy of the original model is measured by XGen. That is to say, if you want to know the model latency or accuracy before being optimized by XGen, user can use customized operation mode.

Another use case is that you want to control the pruning rate or pruning method. user can also use customized operation mode to do that. There is a little bit more work to do. Just simply edit the pruning field of xgen.json config file in your model project. For DigitNet, the file is located at ~/Projects/classification-digitnet/xgen.json. user can change the pruning field to the following:

{
  ...
  "pruning": {
    "prune_method": "magnitude",
    "prune_layers": null,
    "prune_weight_ranking": "intra-layer",
    "prune_rate": 0.1,
    "admm_update_epoch": 5,
    "admm_rho": 0.001,
    "admm_sparsity_type": "block_punched",
    "admm_lr": 0.01,
    "admm_block": "(8,4)",
    "prune_threshold": -1.0
  },
  ...
}

There are many choices for value of pruning field. Please refer to the pruning field in Model Config for more details. Now when you run XGen in customized operation mode, user can see the pruning rate is 0.1 and the pruning method is magnitude. That is to say, XGen won't do any pruning operation in this case.

Other Usage

At the center of XGen are three components, the AI model optimizer for model compression and optimization, the compiler for code optimization and generation, and the model testing component for measuring the speed of the models. The usage of XGen mentioned so far automatically invokes all of those components. But those components can also be used separately. One of the benefits this flexibility brings is that even if there are no devices connected, XGen can still be used to optimize an AI model via the AI model optimizer (with some limitations); the other example flexibility is to use XGen to skip the model optimization step, and quickly generate executable code for a model and test it out or profile its execution.

Model Optimization Alone Without Running on Devices

This option is for the case where the user wants to find a model with a certain accuracy and minimum latency but does not have or want to use any connected device. The usage of XGen is the same as in the default case except that in the step of platform selection, select Other Device, or in the step of device selection, select no device.

In this case, XGen does not run the AI models on the physical devices to measure the speed. It instead uses a simple oracle to output a pseudo-latency. The oracle only ensures that the pseudo-latencies it gives out follow the rule that the smaller the AI model is, the faster it runs. The pseudo-latency values can be far away from the real latencies. But because the rule it follows is roughly consistent with actual observations, despite the imprecision of latency, XGen can still find satisfying AI models when the users' requirement is to meet an accuracy while minimizing the latency. (The accuracy it gets is still correct.) If the users' requirement is to find a model with a certain latency, this option is not applicable.

Using the Compiler and Testing Components Only

This usage allows the compiler and the testing components to be invoked without the AI optimizer. It can come handy if the users have collected one or more AI models and would like to measure their speeds (in a batch).

Users can do that through the following command:

XGen onnx-latency-benchmark --model-path [onnx-model-path] --output-path [output-path] --devices device_id1 device_id2 ... --verbose

Note: The standalone onnx-latency-benchmark command (without XGen) is deprecated. Please use XGen onnx-latency-benchmark instead.

The script will measure the speed of every ONNX model in the onnx-model-path and output one single csv file in the output path. The csv file reports the inference latency of each of the models along with some other detailed information, such as the place where the code generated by the compiler resides (output_dir), the number of parameters in the model (params), the amount of computation in the model (MACs).

Parameter devices is optional, if you don't specify it, you will enter an interactive interface to select the devices. Parameter verbose is also optional, if you specify it, the script will print out the detailed information.

XGen onnx-latency-benchmark --benchmark --model-path [onnx-model-path] --output-path [output-path] --devices device_id1 device_id2 ... --verbose

Parameter benchmark is optional. If specified, this tool will record the running time of the model and the detailed running information of each operator.

I deepvan/run/statistics.cc:359] ---------------------------------------------------------------------------------------------------------------------------------------------------------------
I deepvan/run/statistics.cc:359]                                                                       Sort by Run Order
I deepvan/run/statistics.cc:359] ---------------------------------------------------------------------------------------------------------------------------------------------------------------
I deepvan/run/statistics.cc:359] |         Op Type | Avg(ms) |      % |  GMACPS | MACs(Dense) | Stride |   Pad |    Filter Shape |   Output Shape | Dilation |                            name |
I deepvan/run/statistics.cc:359] ---------------------------------------------------------------------------------------------------------------------------------------------------------------
I deepvan/run/statistics.cc:359] | BufferTransform |   0.046 |  0.873 |   0.000 |     0.00000 |        |       |                 |  [1,224,224,3] |          | deepvan_node_input.1_mem_type_2 |
I deepvan/run/statistics.cc:359] |          Conv2D |   0.076 |  1.427 |  53.477 |     4.06426 |  [2,2] | [2,2] |      [12,3,3,3] | [1,112,112,12] |    [1,1] |                          Clip_1 |
I deepvan/run/statistics.cc:359] | DepthwiseConv2d |   0.071 |  1.329 |  19.135 |     1.35475 |  [1,1] | [2,2] |      [1,12,3,3] | [1,112,112,12] |    [1,1] |                          Clip_3 |
I deepvan/run/statistics.cc:359] |          Conv2D |   0.037 |  0.690 |  49.152 |     1.80634 |  [1,1] | [0,0] |     [12,12,1,1] | [1,112,112,12] |    [1,1] |                          Conv_4 |
I deepvan/run/statistics.cc:359] |         Eltwise |   0.029 |  0.540 |   0.000 |     0.00000 |        |       |                 | [1,112,112,12] |          |                           Add_5 |
I deepvan/run/statistics.cc:359] |          Conv2D |   0.071 |  1.331 |  76.432 |     5.41901 |  [1,1] | [0,0] |     [36,12,1,1] | [1,112,112,36] |    [1,1] |                          Clip_7 |
I deepvan/run/statistics.cc:359] | DepthwiseConv2d |   0.106 |  1.999 |   9.541 |     1.01606 |  [2,2] | [2,2] |      [1,36,3,3] |   [1,56,56,36] |    [1,1] |                          Clip_9 |
I deepvan/run/statistics.cc:359] |          Conv2D |   0.037 |  0.697 |  48.623 |     1.80634 |  [1,1] | [0,0] |     [16,36,1,1] |   [1,56,56,16] |    [1,1] |                         Conv_10 |
I deepvan/run/statistics.cc:359] |          Conv2D |   0.035 |  0.652 |  57.757 |     2.00704 |  [1,1] | [0,0] |     [40,16,1,1] |   [1,56,56,40] |    [1,1] |                         Clip_12 |
I deepvan/run/statistics.cc:359] | DepthwiseConv2d |   0.072 |  1.361 |  15.572 |     1.12896 |  [1,1] | [2,2] |      [1,40,3,3] |   [1,56,56,40] |    [1,1] |                         Clip_14 |
I deepvan/run/statistics.cc:359] |          Conv2D |   0.037 |  0.701 |  53.736 |     2.00704 |  [1,1] | [0,0] |     [16,40,1,1] |   [1,56,56,16] |    [1,1] |                         Conv_15 |
I deepvan/run/statistics.cc:359] |         Eltwise |   0.017 |  0.316 |   0.000 |     0.00000 |        |       |                 |   [1,56,56,16] |          |                          Add_16 |
I deepvan/run/statistics.cc:359] |          Conv2D |   0.046 |  0.871 |  90.836 |     4.21478 |  [1,1] | [0,0] |     [84,16,1,1] |   [1,56,56,84] |    [1,1] |                         Clip_18 |
I deepvan/run/statistics.cc:359] | DepthwiseConv2d |   0.077 |  1.454 |   7.653 |     0.59270 |  [2,2] | [2,2] |      [1,84,3,3] |   [1,28,28,84] |    [1,1] |                         Clip_20 |
I deepvan/run/statistics.cc:359] |          Conv2D |   0.057 |  1.075 |  32.209 |     1.84397 |  [1,1] | [0,0] |     [28,84,1,1] |   [1,28,28,28] |    [1,1] |                         Conv_21 |
I deepvan/run/statistics.cc:359] |          Conv2D |   0.028 |  0.532 |  55.751 |     1.58054 |  [1,1] | [0,0] |     [72,28,1,1] |   [1,28,28,72] |    [1,1] |                         Clip_23 |
I deepvan/run/statistics.cc:359] | DepthwiseConv2d |   0.041 |  0.775 |  12.301 |     0.50803 |  [1,1] | [2,2] |      [1,72,3,3] |   [1,28,28,72] |    [1,1] |                         Clip_25 |
I deepvan/run/statistics.cc:359] |          Conv2D |   0.053 |  0.992 |  29.906 |     1.58054 |  [1,1] | [0,0] |     [28,72,1,1] |   [1,28,28,28] |    [1,1] |                         Conv_26 |
I deepvan/run/statistics.cc:359] |         Eltwise |   0.011 |  0.207 |   0.000 |     0.00000 |        |       |                 |   [1,28,28,28] |          |                          Add_27 |
I deepvan/run/statistics.cc:359] |          Conv2D |   0.028 |  0.533 |  58.745 |     1.66835 |  [1,1] | [0,0] |     [76,28,1,1] |   [1,28,28,76] |    [1,1] |                         Clip_29 |
I deepvan/run/statistics.cc:359] | DepthwiseConv2d |   0.041 |  0.775 |  12.984 |     0.53626 |  [1,1] | [2,2] |      [1,76,3,3] |   [1,28,28,76] |    [1,1] |                         Clip_31 |
I deepvan/run/statistics.cc:359] |          Conv2D |   0.056 |  1.052 |  29.765 |     1.66835 |  [1,1] | [0,0] |     [28,76,1,1] |   [1,28,28,28] |    [1,1] |                         Conv_32 |
I deepvan/run/statistics.cc:359] |         Eltwise |   0.011 |  0.206 |   0.000 |     0.00000 |        |       |                 |   [1,28,28,28] |          |                          Add_33 |
I deepvan/run/statistics.cc:359] |          Conv2D |   0.047 |  0.890 |  81.510 |     3.86355 |  [1,1] | [0,0] |    [176,28,1,1] |  [1,28,28,176] |    [1,1] |                         Clip_35 |
I deepvan/run/statistics.cc:359] | DepthwiseConv2d |   0.044 |  0.819 |   7.121 |     0.31046 |  [2,2] | [2,2] |     [1,176,3,3] |  [1,14,14,176] |    [1,1] |                         Clip_37 |
I deepvan/run/statistics.cc:359] |          Conv2D |   0.059 |  1.105 |  32.825 |     1.93178 |  [1,1] | [0,0] |    [56,176,1,1] |   [1,14,14,56] |    [1,1] |                         Conv_38 |
I deepvan/run/statistics.cc:359] |          Conv2D |   0.036 |  0.681 |  48.379 |     1.75616 |  [1,1] | [0,0] |    [160,56,1,1] |  [1,14,14,160] |    [1,1] |                         Clip_40 |
I deepvan/run/statistics.cc:359] | DepthwiseConv2d |   0.027 |  0.510 |  10.396 |     0.28224 |  [1,1] | [2,2] |     [1,160,3,3] |  [1,14,14,160] |    [1,1] |                         Clip_42 |
I deepvan/run/statistics.cc:359] |          Conv2D |   0.053 |  0.998 |  33.042 |     1.75616 |  [1,1] | [0,0] |    [56,160,1,1] |   [1,14,14,56] |    [1,1] |                         Conv_43 |
I deepvan/run/statistics.cc:359] |         Eltwise |   0.009 |  0.169 |   0.000 |     0.00000 |        |       |                 |   [1,14,14,56] |          |                          Add_44 |
I deepvan/run/statistics.cc:359] |          Conv2D |   0.034 |  0.635 |  58.452 |     1.97568 |  [1,1] | [0,0] |    [180,56,1,1] |  [1,14,14,180] |    [1,1] |                         Clip_46 |
I deepvan/run/statistics.cc:359] | DepthwiseConv2d |   0.030 |  0.556 |  10.727 |     0.31752 |  [1,1] | [2,2] |     [1,180,3,3] |  [1,14,14,180] |    [1,1] |                         Clip_48 |
I deepvan/run/statistics.cc:359] |          Conv2D |   0.061 |  1.140 |  32.548 |     1.97568 |  [1,1] | [0,0] |    [56,180,1,1] |   [1,14,14,56] |    [1,1] |                         Conv_49 |
I deepvan/run/statistics.cc:359] |         Eltwise |   0.009 |  0.173 |   0.000 |     0.00000 |        |       |                 |   [1,14,14,56] |          |                          Add_50 |
I deepvan/run/statistics.cc:359] |          Conv2D |   0.036 |  0.681 |  58.135 |     2.10739 |  [1,1] | [0,0] |    [192,56,1,1] |  [1,14,14,192] |    [1,1] |                         Clip_52 |
I deepvan/run/statistics.cc:359] | DepthwiseConv2d |   0.032 |  0.596 |  10.667 |     0.33869 |  [1,1] | [2,2] |     [1,192,3,3] |  [1,14,14,192] |    [1,1] |                         Clip_54 |
I deepvan/run/statistics.cc:359] |          Conv2D |   0.065 |  1.211 |  32.673 |     2.10739 |  [1,1] | [0,0] |    [56,192,1,1] |   [1,14,14,56] |    [1,1] |                         Conv_55 |
I deepvan/run/statistics.cc:359] |         Eltwise |   0.009 |  0.166 |   0.000 |     0.00000 |        |       |                 |   [1,14,14,56] |          |                          Add_56 |
I deepvan/run/statistics.cc:359] |          Conv2D |   0.044 |  0.835 |  77.942 |     3.46842 |  [1,1] | [0,0] |    [316,56,1,1] |  [1,14,14,316] |    [1,1] |                         Clip_58 |
I deepvan/run/statistics.cc:359] | DepthwiseConv2d |   0.041 |  0.776 |  13.481 |     0.55742 |  [1,1] | [2,2] |     [1,316,3,3] |  [1,14,14,316] |    [1,1] |                         Clip_60 |
I deepvan/run/statistics.cc:359] |          Conv2D |   0.097 |  1.817 |  46.068 |     4.45939 |  [1,1] | [0,0] |    [72,316,1,1] |   [1,14,14,72] |    [1,1] |                         Conv_61 |
I deepvan/run/statistics.cc:359] |          Conv2D |   0.048 |  0.908 |  75.887 |     3.66912 |  [1,1] | [0,0] |    [260,72,1,1] |  [1,14,14,260] |    [1,1] |                         Clip_63 |
I deepvan/run/statistics.cc:359] | DepthwiseConv2d |   0.037 |  0.692 |  12.446 |     0.45864 |  [1,1] | [2,2] |     [1,260,3,3] |  [1,14,14,260] |    [1,1] |                         Clip_65 |
I deepvan/run/statistics.cc:359] |          Conv2D |   0.082 |  1.533 |  44.937 |     3.66912 |  [1,1] | [0,0] |    [72,260,1,1] |   [1,14,14,72] |    [1,1] |                         Conv_66 |
I deepvan/run/statistics.cc:359] |         Eltwise |   0.010 |  0.184 |   0.000 |     0.00000 |        |       |                 |   [1,14,14,72] |          |                          Add_67 |
I deepvan/run/statistics.cc:359] |          Conv2D |   0.049 |  0.915 |  82.211 |     4.00781 |  [1,1] | [0,0] |    [284,72,1,1] |  [1,14,14,284] |    [1,1] |                         Clip_69 |
I deepvan/run/statistics.cc:359] | DepthwiseConv2d |   0.036 |  0.679 |  13.858 |     0.50098 |  [1,1] | [2,2] |     [1,284,3,3] |  [1,14,14,284] |    [1,1] |                         Clip_71 |
I deepvan/run/statistics.cc:359] |          Conv2D |   0.088 |  1.649 |  45.621 |     4.00781 |  [1,1] | [0,0] |    [72,284,1,1] |   [1,14,14,72] |    [1,1] |                         Conv_72 |
I deepvan/run/statistics.cc:359] |         Eltwise |   0.010 |  0.185 |   0.000 |     0.00000 |        |       |                 |   [1,14,14,72] |          |                          Add_73 |
I deepvan/run/statistics.cc:359] |          Conv2D |   0.086 |  1.606 | 107.551 |     9.20102 |  [1,1] | [0,0] |    [652,72,1,1] |  [1,14,14,652] |    [1,1] |                         Clip_75 |
I deepvan/run/statistics.cc:359] | DepthwiseConv2d |   0.049 |  0.920 |   5.868 |     0.28753 |  [2,2] | [2,2] |     [1,652,3,3] |    [1,7,7,652] |    [1,1] |                         Clip_77 |
I deepvan/run/statistics.cc:359] |          Conv2D |   0.161 |  3.025 |  38.064 |     6.13402 |  [1,1] | [0,0] |   [192,652,1,1] |    [1,7,7,192] |    [1,1] |                         Conv_78 |
I deepvan/run/statistics.cc:359] |          Conv2D |   0.100 |  1.882 |  76.578 |     7.67693 |  [1,1] | [0,0] |   [816,192,1,1] |    [1,7,7,816] |    [1,1] |                         Clip_80 |
I deepvan/run/statistics.cc:359] | DepthwiseConv2d |   0.042 |  0.783 |   8.630 |     0.35986 |  [1,1] | [2,2] |     [1,816,3,3] |    [1,7,7,816] |    [1,1] |                         Clip_82 |
I deepvan/run/statistics.cc:359] |          Conv2D |   0.202 |  3.798 |  37.948 |     7.67693 |  [1,1] | [0,0] |   [192,816,1,1] |    [1,7,7,192] |    [1,1] |                         Conv_83 |
I deepvan/run/statistics.cc:359] |         Eltwise |   0.008 |  0.149 |   0.000 |     0.00000 |        |       |                 |    [1,7,7,192] |          |                          Add_84 |
I deepvan/run/statistics.cc:359] |          Conv2D |   0.102 |  1.905 |  81.196 |     8.24141 |  [1,1] | [0,0] |   [876,192,1,1] |    [1,7,7,876] |    [1,1] |                         Clip_86 |
I deepvan/run/statistics.cc:359] | DepthwiseConv2d |   0.044 |  0.824 |   8.800 |     0.38632 |  [1,1] | [2,2] |     [1,876,3,3] |    [1,7,7,876] |    [1,1] |                         Clip_88 |
I deepvan/run/statistics.cc:359] |          Conv2D |   0.211 |  3.959 |  39.077 |     8.24141 |  [1,1] | [0,0] |   [192,876,1,1] |    [1,7,7,192] |    [1,1] |                         Conv_89 |
I deepvan/run/statistics.cc:359] |         Eltwise |   0.009 |  0.162 |   0.000 |     0.00000 |        |       |                 |    [1,7,7,192] |          |                          Add_90 |
I deepvan/run/statistics.cc:359] |          Conv2D |   0.108 |  2.031 |  98.428 |    10.64986 |  [1,1] | [0,0] |  [1132,192,1,1] |   [1,7,7,1132] |    [1,1] |                         Clip_92 |
I deepvan/run/statistics.cc:359] | DepthwiseConv2d |   0.061 |  1.142 |   8.204 |     0.49921 |  [1,1] | [2,2] |    [1,1132,3,3] |   [1,7,7,1132] |    [1,1] |                         Clip_94 |
I deepvan/run/statistics.cc:359] |          Conv2D |   0.297 |  5.579 |  61.216 |    18.19350 |  [1,1] | [0,0] |  [328,1132,1,1] |    [1,7,7,328] |    [1,1] |                         Conv_95 |
I deepvan/run/statistics.cc:359] |          Conv2D |   0.207 |  3.895 | 110.606 |    22.95082 |  [1,1] | [0,0] |  [1428,328,1,1] |   [1,7,7,1428] |    [1,1] |                         Clip_97 |
I deepvan/run/statistics.cc:359] |         Pooling |   0.030 |  0.573 |   0.000 |     0.00000 |  [7,7] | [0,0] |           [7,7] |   [1,1,1,1428] |          |                  AveragePool_99 |
I deepvan/run/statistics.cc:359] |  FullyConnected |   0.692 | 12.998 |   2.063 |     1.42800 |        |       | [1000,1428,1,1] |   [1,1,1,1000] |          |                        Conv_100 |
I deepvan/run/statistics.cc:359] | BufferTransform |   0.011 |  0.216 |   0.000 |     0.00000 |        |       |                 |   [1,1000,1,1] |          |     deepvan_node_469_mem_type_0 |
I deepvan/run/statistics.cc:359] |         Reshape |   0.005 |  0.096 |   0.000 |     0.00000 |        |       |                 |       [1,1000] |          |                     Reshape_101 |
I deepvan/run/statistics.cc:359] |  FullyConnected |   0.694 | 13.027 |   2.058 |     1.42800 |        |       | [1000,1428,1,1] |   [1,1,1,1000] |          |                        Conv_102 |
I deepvan/run/statistics.cc:359] | BufferTransform |   0.012 |  0.223 |   0.000 |     0.00000 |        |       |                 |   [1,1000,1,1] |          |     deepvan_node_476_mem_type_0 |
I deepvan/run/statistics.cc:359] |         Reshape |   0.005 |  0.095 |   0.000 |     0.00000 |        |       |                 |       [1,1000] |          |                     Reshape_103 |
I deepvan/run/statistics.cc:359] |         Eltwise |   0.006 |  0.108 |   0.000 |     0.00000 |        |       |                 |       [1,1000] |          |                         Add_104 |
I deepvan/run/statistics.cc:359] |         Eltwise |   0.006 |  0.112 |   0.000 |     0.00000 |        |       |                 |       [1,1000] |          |                         Div_105 |

Hint: Users would need to first put the ONNX model files into the XGen container. There are two approaches:

They may put the ONNX model files under the Projects/ folder of XGen on the host machine. Because that folder is mapped to /root/Projects folder inside the XGen container, the files will be directly accessible to XGen.
They may use docker cp. Example: docker cp ./onnxModles xgen_<version>_<username>:/root/speedTests/. The onnxModels will be copied into /root/speedTests/ inside the XGen container.

Using XGen in Non-Interactive Mode

XGen can be used in a non-interactive mode. The user needs to provide a task-description file. The config file is a json file that contains all the information needed to run XGen. The command to use in this mode is as follows:

XGen non-interactive --config-file /path/to/config.json --platform [android|ios|simulation] --devices device_id_a device_id_b ...

The argument --platform is optional; if you don't specify it, the script will use the device_platform config of the config file. The argument devices is one or more UUIDs of devices. It is optional, but is required if the --platform argument is android or ios.

The config file also has device_platform and devices fields corresponding to the platform and devices fields in the command line. The command line values override those in the config file.

The non-interactive mode of XGen allows the use of the models included in XGen (Common AI) or customized AI models, with default or customized datasets.

XGen includes some sample config files (in /root/noninteractive_config_sample) for users' reference.

Following is an example config file of efficientnet on imageNet dataset:

{
  "description": "this is a sample of common ai config file in non-interactive mode",
  "device_platform_options": [
    "android",
    "ios",
    "simulation"
  ],
  "device_platform": "simulation",
  "devices": [],
  "xgen_model_options": {
    "description": "different platforms support different models, 'simulation' platform supports all models'",
    "android_support": [
      "DigitNet_MNIST",
      "EfficientNet_ImageNet",
      "ResNet_ImageNet",
      "MobileNet_ImageNet",
      "ViT_ImageNet",
      "YOLOv6_CoCo2017",
      "UNet_ISBI-2012",
      "R2+1d_UCF101",
      "MobileBERT_Squad1.1"
    ],
    "ios_support": [
      "DigitNet_MNIST",
      "EfficientNet_ImageNet",
      "MobileNet_ImageNet",
      "YOLOv6_CoCo2017",
      "UNet_ISBI-2012",
      "MobileBERT_Squad1.1"
    ]
  },
  "select_model": "EfficientNet_ImageNet",
  "use_default_dataset": false,
  "custom_dataset_config": {
    "description": "if use_default_dataset == false, config the section, section title is the same as your select_model",
    "dataset": {
      "train_data_path": {
        "required": true,
        "value": "/data/jenkins_common_ai/data/imagenet_subsets/train"
      },
      "train_label_path": {
        "required": false,
        "value": ""
      },
      "eval_data_path": {
        "required": true,
        "value": "/data/jenkins_common_ai/data/imagenet_subsets/val"
      },
      "eval_label_path": {
        "required": false,
        "value": ""
      },
      "customized": {
        "description": "if using customized distillation, please make sure num_classes as the same as this section",
        "num_classes": {
          "type": "int",
          "required": true,
          "path": 50
        }
      }
    }
  },
  "use_distillation": false,
  "use_default_distillation_config": true,
  "custom_distillation_config": {
    "teacher_model": {
      "description": "if using customized dataset, please make sure num_classes as the same as this section",
      "required": true,
      "name": "efficientnet_b4",
      "file_path": "/data/jenkins_common_ai/teacher_model/efficientnet_b4_best.pth.tar",
      "num_classes": 50
    }
  },
  "finetune_teacher_model_batch_size": 64,
  "xgen_mode_options": {
    "description": "different platforms support different xgen mode, 'simulation' platform supports all models'",
    "android_support": [
      "compatible_testing",
      "scaling",
      "pas",
      "customization"
    ],
    "ios_support": [
      "compatible_testing",
      "scaling",
      "pas",
      "customization"
    ]
  },
  "xgen_mode_recommended": "scaling",
  "xgen_mode": "scaling",
  "xgen_gpus": "0",
  "xgen_learning_rate": 0.048,
  "xgen_batch_size": 192,
  "xgen_epochs": 1,
  "pas_finetune_epoch": 200,
  "primary_interest_options": [
    "latency",
    "quality_score"
  ],
  "primary_interest": "latency",
  "latency": "21ms",
  "quality_score": ">=73",
  "resume_workplace_path": "",
  "custom_workplace_path": "",
  "pretrained_model_path": "",
  "express_path_config": {
    "express_mode_description": "In the case of using default datasets and custom datasets, different operation modes support different express path optimization methods. The specific options are shown in the \"express_mode_options\" field.",
    "express_mode_options": {
      "compatible_testing": {
        "default_dataset": {
        },
        "customized_dataset": {
        }
      },
      "pruning": {
        "default_dataset": {
          "Express Optimization (Quickest, likely no training needed)": 0,
          "Extensive Optimization (Taking several rounds of training time)": 2
        },
        "customized_dataset": {
          "Fine Optimization (Taking about one round of training time)": 1,
          "Extensive Optimization (Taking several rounds of training time)": 2
        }
      },
      "scaling": {
        "default_dataset": {
          "Express Optimization (Quickest, likely no training needed)": 0,
          "Extensive Optimization (Taking several rounds of training time)": 2
        },
        "customized_dataset": {
          "Fine Optimization (Taking about one round of training time)": 1,
          "Extensive Optimization (Taking several rounds of training time)": 2
        }
      },
      "pas": {
        "default_dataset": {
          "Express Optimization (Quickest, likely no training needed)": 0,
          "Extensive Optimization (Taking several rounds of training time)": 2
        },
        "customized_dataset": {
          "Fine Optimization (Taking about one round of training time)": 1,
          "Extensive Optimization (Taking several rounds of training time)": 2
        }
      },
      "customization": {
        "default_dataset": {
          "Express Optimization (Quickest, likely no training needed)": 0,
          "Extensive Optimization (Taking several rounds of training time)": 2
        },
        "customized_dataset": {
          "Fine Optimization (Taking about one round of training time)": 1,
          "Extensive Optimization (Taking several rounds of training time)": 2
        }
      }
    },
    "express_mode": 0
  }
}

Following is an example config file for running XGen on custom models (Custom AI).

{
  "description": "this is a sample of customized ai config file in non-interactive mode",
  "device_platform_options": [
    "android",
    "ios",
    "simulation"
  ],
  "device_platform": "simulation",
  "devices": [],
  "xgen_config_file": "/path/to/xgen.json",
  "training_script_folder": "/path/to/training/script/folder",
  "xgen_mode_options": {
    "description": "different platforms support different xgen mode, 'simulation' platform supports all models'",
    "android_support": [
      "compatible_testing",
      "pruning",
      "scaling",
      "pas",
      "customization"
    ],
    "ios_support": [
      "compatible_testing",
      "scaling",
      "pas",
      "customization"
    ]
  },
  "xgen_mode": "scaling",
  "xgen_gpus": "0,1",
  "xgen_learning_rate": 0.01,
  "xgen_batch_size": 128,
  "xgen_epochs": 3,
  "pas_finetune_epoch": 200,
  "primary_interest_options": [
    "latency",
    "quality_score"
  ],
  "primary_interest": "latency",
  "latency": "30ms",
  "quality_score": ">=85",
  "resume_workplace_path": "",
  "custom_workplace_path": "",
  "pretrained_model_path": "",
  "use_distillation": false,
  "use_default_distillation_config": false,
  "custom_distillation_config": {
    "teacher_model": {
      "description": "if using customized dataset, please make sure num_classes as the same as this section",
      "required": true,
      "name": "EfficientNet_ImageNet",
      "file_path": "/data/jenkins_common_ai/data/teacher_model/efficientnet-b0.pth",
      "num_classes": 50
    }
  },
  "finetune_teacher_model_batch_size": 16,
  "express_path_config": {
    "express_mode_description": "In the case of using default datasets and custom datasets, different operation modes support different express path optimization methods. The specific options are shown in the \"express_mode_options\" field.",
    "express_mode_options": {
      "compatible_testing": {
        "default_dataset": {
        },
        "customized_dataset": {
        }
      },
      "pruning": {
        "default_dataset": {
        },
        "customized_dataset": {
        }
      },
      "scaling": {
        "default_dataset": {
        },
        "customized_dataset": {
        }
      },
      "pas": {
        "default_dataset": {
        },
        "customized_dataset": {
        }
      },
      "customization": {
        "default_dataset": {
        },
        "customized_dataset": {
        }
      }
    },
    "express_mode": 0
  }
}

Parameter description for task-description file

description

Description of the task, which is used to describe the task in the non-interactive mode.

device_platform_options

It is a list of platforms that the user could choose from when filling the field device_patform. There are currently three options: android, ios, simulation.

device_platform

Users can pick one platform from the list in device_platform_options for this field.

devices

A list of IDs of the device that the user wants to use. The user can run command xgen_devices to see the ids of all the connected devices. If Simulation is chose and the platform, this field will be ignored.

xgen_config_file

The path of the xgen.json file. Its content is about the specifications on the model and dataset to use, as in the interactive mode.

training_script_folder

The path of the training script folder. The training script in the folder should be named as train_script_main.py.

xgen_model_options

A list of AI models that the user may choose from to fill the next field select_model. This list depends on the device_platform field as some models may work well on only some platforms. The user can refer to the sub-fields android_support and ios_support to find the supported models.

select_model

Pick one AI model in the list of xgen_model_options.

use_default_dataset

If the user wants to use the default dataset for the selected model, set it to true, otherwise, set it to false, and config the custom_dataset_config with the information of the custom dataset.

custom_dataset_config

If the user wants to use custom dataset, set the field use_default_dataset to false, and set the datasets detail information in this field of the config file (a sample is in /root/noninteractive_config_sample folder).

There are 5 fields in the sub-section, train_data_path, train_label_path, eval_data_path, eval_label_path, customized. The users can set their own dataset paths in these fields, the required field is a tip. customized is a field for customized parameters, the user can set their own parameters in this field. but need to notice, the type field must be set, and the path field must be set to the value of the parameter.

use_distillation

If the user wants to use distillation, set it to true, otherwise, set it to false. (So far, only the EfficientNet_ImageNet, ResNet_ImageNet, MobileNet_ImageNet models support distillation.)

use_default_distillation_config

If the user wants to use the default distillation config, set it to true, otherwise, set it to false and config custom_distillation_config with the information for the custom distillation.

custom_distillation_config

If the users want to use custom distillation, they need to set the field use_default_distillation_config to false, and set the distillation detail information (teacher model details) in this field (sample config files are in /root/noninteractive_config_sample folder).

xgen_mode_options

This is a list of XGen modes that the user can choose from to fill the field xgen_mode. It depends on the device_platform field. The user can refer to the sub-fields android_support and ios_support to find the modes that can work on the platform selected in the device_platform field.

xgen_mode_recommended

It is the recommended mode to use for AI model compression (pruning or scaling). Note, this field is meaningful only for the Android platform because at this point only scaling mode applies to iOS devices.

xgen_mode

Pick one xgen mode from xgen_mode_options to use in this run.

xgen_gpus

It is a list of GPUs to use. Running nvidia-smi -L can list all the GPUs on your machine. The above example is to use the first 8 GPUs.

xgen_batch_size

It is the per-GPU batch size to use. It is an integer.

xgen_learning_rate

It is the learning rate to use at the beginning of training. It is a floating-point number.

xgen_epochs

It is the number of epochs to use in training. It is an integer.

primary_interest_options

It is a list of interests a user can choose from to fill in primary_interest. For now, XGen supports latency and quality_score only.

primary_interest

Pick one primary interest from primary_interest_options.

latency

It is the target latency. It is a string (e.g., "30ms").

quality_score

It is the quality score that you want to use. It is a string (e.g., ">=85").

resume_workplace_path

If the user wants to resume the previous work, the user needs to set the path of the previous work in this field. Leave it empty otherwise. If resume_workplace_path is filled, XGen will check whether the select_model, xgen_mode and other parameters are the same as the previous work; if not, it will raise an error.

custom_workplace_path

It is the workplace path to use. It is a string (e.g., "/path/to/output", if the path does not exist, XGen will create it). As in the interactive mode, it is optional; XGen will create a unique folder and inform the user on the screen if it is not specified. (If resume_workplace_path is filled, custom_workplace_path will be ignored.). If the path contains train data, it will raise an error.

pretrained_model_path

It is the pretrained model path to use. It is a string (e.g., "/path/to/pretrained_model_path"). As in the interactive mode, it is optional.

finetune_teacher_model_batch_size

It is the batch size to use in finetuning the teacher model. It is an integer.

express_mode_options

It is a list of express modes that the user can choose from to fill the field express_mode. It depends on the device_platform field. The user can refer to the sub-fields android_support and ios_support to find the modes that can work on the platform selected in the device_platform field.

express_mode

Pick one express mode from express_mode_options to use in this run.