Chapter 4. Usage of XGen
XGen offers two ways to help users obtain desirable AI models and code:
-
Common AI: [First option to consider] This approach uses the already included AI models in XGen as the base for its optimizations. XGen includes a set of AI models for common AI tasks. Users may choose to let XGen optimize the models on the default or their own datasets. This approach should be considered first as it is simple and can meet most needs.
-
Custom AI: [Only if Common AI is not applicable] This approach allows users to invoke XGen to optimize their own AI models. This approach is more flexible but also more complicated, requiring more work from the users. To optimize their own models, the users must have a workable training script of their AI model, written in PyTorch (other front-end support will be added in the future).
In addition to optimizing AI, XGen has several other functionalities that may sometimes come in handy, such as AI model training, AI model/code quality assessment, and so on. This chapter explains those functionalities after describing the two main ways of using XGen for AI optimizations.
Overview of XGen Modes and Executions
XGen has a commandline interface. When users start XGen in a Shell terminal, XGen gets users' requirements and other inputs through a series of prompts. After that, XGen starts its execution. XGen provides four modes:
-
Compatibility testing: to test whether the to-be-optimized AI model and its training script are compatible with XGen. The execution of this mode usually takes a much shorter time than other modes do.
-
Pruning: to figure out what weights to prune from the to-be-optimized AI model and prune them such that the model becomes smaller and faster. To apply it to a custom AI model, the user needs to specify the pruning range and other parameters (detailed later). For now, this mode is opened only for Android devices.
-
Scaling: to figure out the width of each layer and scale the network accordingly such that the model becomes smaller and faster. It is an alternative to pruning for model compression and optimization. To apply it to a custom AI model, the user needs to embed a search variable in their model and specify some parameters (detailed later).
-
PaS: to figure out the width of each layer through a model’s computational graph and scale the network accordingly such that the model becomes smaller and faster. It is an alternative to pruning for model compression and optimization. To apply it to a custom AI model, the user needs to specify the pas ratio_list and other parameters (detailed later).
-
Customization: to allow customized usage of XGen, which can be pruning an AI model in a specific way or training a model with a specific configuration.
The pruning
and scaling
modes conduct automatic search of the optimization space and generate the optimization results that best meet users' requirements. The customization
mode does not conduct search, but executes training or pruning in a specific way defined by the user. It is worth noting that XGen also offers distillation
, a way to enhance model accuracy; it can be used in both pruning
and scaling
modes. All the four modes can be used in both common AI and custom AI, explained next.
Common AI
XGen includes a set of base DNN models for some common types of AI tasks, listed in the following table, along with the default datasets. DigitNet is included mainly for demo purpose as it completes in several minutes. Please note that these models are only some samples of the many models XGen supports. Most AI models are supported by XGen thanks to its broad coverage of DNN operators as detailed in Appendix I.
Category | Model | Dataset | Note |
---|---|---|---|
Image classification | DigitNet | MNIST | 10 classes |
Image classification | EfficientNet | ImageNet (auto-download) | 1000 classes |
Image classification | ResNet | ImageNet (auto-download) | 1000 classes |
Image classification | MobileNet | ImageNet (auto-download) | 1000 classes |
Image classification | ViT | ImageNet (auto-download) | 1000 classes |
Segmentation | UNet | ISBI-2012 (included) | -- |
Video classification | R2+1d | UCF101 (auto-download) | 101 classes |
Natural Language Processing (Question Answer) | MobileBERT | Squad1.1 (auto-download) | -- |
Object detection | YOLOv6 | CoCo (auto-download) | 80 classes |
Object detection | YOLOX | CoCo (auto-download) | 80 classes |
Object detection | YOLOv8 | CoCo (auto-download) | 80 classes |
If the AI task needed by a user is in one of the covered categories, common AI is the way to go. Through XGen's interface, users may pick one of the models that suite their task, and input their desired latency or accuracy and the dataset of interest. XGen will then transform the base model and generate code that meets the users' needs. The specific procedure is as follows.
Step 1: Start XGen
XGen # assuming that the XGen container was started already (Chapter 2)
Step 2: Input the Requirements
After XGen is started, it will collect the user's needs through a series of questions.
-
Target Platform: Choose
Android
oriOS
as the target platform. XGen will generate the optimized AI model and code for the selected platform. If no devices are available, the user can chooseSimulation
as the platform, in which case, XGen will still optimize the AI model to find the smallest model that meets the desired accuracy, even though the reported latencies are imprecise; what it produces will be optimized model files rather than generated model code for a particular platform. -
Target device(s): The wizard lists the devices of the chosen platform currently connected to this machine. The user may select zero or more devices of interest. XGen will try to generate the AI model and code that best meet the requirements on all of the selected devices. If no device is selected (or there are no device connected/listed), XGen will behave as if
Simulation
was chosen as the platform: It uses pseudo-latency in its optimizations; the exact latency value would be imprecise, but the accuracy in the optimization process is still accurate; what it produces will be optimized model files rather than generated model code for a particular platform. -
Pick the base model: XGen then lists the already included base AI models (along with their default datasets). Users pick the model that best fits their needs. If none of the models fit the nature of the task, the user may consider Custom AI of XGen.
-
Specify datasets: If users want to use datasets other than the default datasets associated with the picked AI model, the users need to specify the datasets of interest for training and evaluation. In that case, the users have to make sure that the datasets are organized and formatted in the same way as the default dataset of the model (detailed in Appendix II). Otherwise, the users would need to revise the data loader in the default training script of the model. If a default dataset is chosen but the dataset is not included in XGen, XGen will automatically download the dataset.
-
Specify other model parameters: If non-default datasets are used, beside the dataset paths, some model parameters may need to be specified as well, such as the number of classes for classification models.
-
Pick the task for XGen to perform. XGen lists four options as follows:
-
Compatibility test: a quick check of whether the model and setup are compatible with XGen. Such a test is much quicker than optimizations (pruning or scaling). So it is a good practice to run this test first before starting an optimization to find out possible problems sooner.
-
Pruning: one of the ways that XGen uses to compress a model and generate efficient code. It uses the world-leading proprietary pruning technology of CoCoPIE to prune DNN weights.
-
Scaling: one of the ways XGen uses to compress a model, which scales down (or up) the width of DNN layers.
-
PaS: one of the ways XGen uses to compress a model, which scales down the width of DNN layers.
-
Customization: customized operation. Users can change the XGen configuration of a model to enable special operations, detailed later in Advanced Common AI.
-
The three compression methods, pruning, scaling and pas, suit different AI models differently. Their differences are detailed later in Custom AI. For a model in Common AI, the XGen prompt already specifies the recommended choice, as shown in the following example:
When some of the common AI models are selected, XGen may show only one of pruning
and scaling
in the option list because the other mode does not apply to the model at all.
-
If one of
Image classification I: EfficientNet (high resolution, default: ImageNet)
,Image classification II: ResNet (high resolution, default: ImageNet)
orImage classification III: MobileNet (high resolution, default: ImageNet)
is chosen as the base model, user also can optionally optimize the model withdistillation
, detailed later in config for distillation; XGen will prompt several relevant questions:```bash # user can do distillation with your own model Do you want to use distillation? > Yes No Will you use the default distillation model? > Yes No ... # of course, user can also use their own distillation model Do you want to use distillation? > Yes No Will you use the default distillation model? Yes > No Teacher model name: regnety_160 Teacher model path: /root/Projects/classif/root/Projects/classification-multi-model-efficientnet/timm/teacher_models/regnety_160-a5fe301d.pth Number of classes: 1000 ```
-
XGen then asks for the optimization objectives (except in the
Compatibility test
mode). Users may specify either "Quality score" or "Latency" as the primary interest, and the other as the secondary interest. XGen will try to meet the primary interest first and then try to meet the secondary interest. The acceptable specifications are as follows:-
Latency: format is NumberUnit, where Number can be either an integer or a decimal number, Unit can be "ms" (milliseconds), "s" (seconds), , "m" (minutes) or "hr" (hours). If latency is the secondary interest, users may also input
min
to indicate the intent to minimize the latency while meeting the primary goal. -
Quality score: format is op number, where
op
can be>
,<
,>=
,<=
, andnumber
can be either an integer or a decimal number. Quality score corresponds to the score used to measure the quality of the AI model. It can be accuracy for classification models, PSNR for image enhancement, loss for some other tasks, and so on. No matter what the actual measure is, XGen represents it uniformly in Quality score and handles it in a unified manner. If quality score is the secondary interest, users may also inputmin
ormax
to indicate the intent to minimize or maximize the quality score while meeting the latency goal.
-
-
XGen then asks which GPU(s) the user would like XGen to use. The user may select one or multiple GPUs; XGen will try to conduct training on all the selected GPUs if the training script of the model supports the use of multiple GPU devices.
-
XGen then asks the users to input their desired training parameters of the AI model (e.g., batch size, learning rate, fine-tuning epoches).
-
XGen then asks for the path of a pretrained model if there is one (except in the
Compatibility test
mode). If none, XGen will create a pretrained model before doing the model optimizations. -
XGen will automatically determine the level of optimization based on the user's selections in earlier questions. Depending on the chosen models and other selections, XGen will default to using Extensive Optimization.
- Extensive Optimization (Taking several rounds of training time): Users don't need to make any choices, as this is the default optimization method provided by XGen. This option will be executed by default when users select Latency as the target. The model will first search within XGen's built-in models to see if there's one that satisfies user requirements. If a suitable model is found, it will use that built-in model to optimize the corresponding data. If no suitable model is found within the built-in models, it will resort to the optimization method previously chosen by the user to obtain an optimized model.
-
Finally, XGen asks for xgen-workplace. It is a directory where the user wants to store models and data. If the user skips this question, XGen will create a unique folder and inform the user.
If the directory already contains the results of a previous run of XGen, XGen will check whether the XGen configuration used in the previous run is identical to the configuration in this current run. If so, XGen would ask the user whether to continue its work from where the previous run stopped to avoid unnecessary repeated work.
Step 3: Wait for the Results
After inputting the requirements, XGen will start compressing and optimizing the base AI model. The process may include a series of search and one or several times of model trainings. The time it takes depends on how long the model training takes. For the seven models listed above, the time ranges from several hours to several days. For reference, below is a list showing the training time of each of the models on their default datasets:
* DigitNet (5-10 minutes on a 1-card titanX)
* EfficientNet [2 days on 4-card v100]
* ResNet-ImageNet [3 days on 4-card v100]
* MobileNet [7 days a 4-card 3090]
* ViT [7 days a 4-card v100]
* UNet [1 hour on 1-card 3090ti]
* R2+1d [2-3 days 8-card titanX]
* YOLOv6 [7 days on 2-car 3090]
* YOLOX [7 days on 2-car 3090]
* YOLOv8 [5 days on 2-car 3090]
During the execution of XGen, the status will be printed on the screen in real time, including the actual latency, quality score and model size of the generated models by XGen. At the end, the screen will show the path where the final results are stored. Please see XGen Results for the detailed explanation of the final results and how to use the results.
Use Non-Default Datasets
In Common AI, each AI model comes with a default dataset. XGen supports the use of other datasets. To take that option, the users need to ensure that the datasets are in a format and structure compatible with the expectation of XGen. Please refer to Appendix II for details.
Advanced Common AI
The standard common AI can already produce much more competitive results than other existing tools do. Advanced common AI may allow XGen to achieve even better results. To use it, the user needs to adjust the content of the XGen configuration file associated with the AI model, and then follow the same procedure as mentioned earlier.
The XGen configuration file determines many aspects of the optimizations by XGen. Each of the included AI models has one. Users can create the configuration for their own AI models as well. The following list provides the locations of the configurations of the included AI models and some other info.
- Image Classification 0: DigitNet (MNIST)
- XGen config path:
/root/Projects/classification-digitnet/xgen.json
- Training script path:
/root/Projects/classification-digitnet/
- Recommend mode:
pruning
- XGen config path:
- Image classification I: EfficientNet (ImageNet)
- XGen config path:
/root/Projects/classification-multi-model-efficientnet/timm/efficientnet_config/xgen.json
- training script path:
/root/Projects/classification-multi-model-efficientnet/timm
- Recommend mode:
scaling
- XGen config path:
- Image classification II: ResNet (ImageNet)
- XGen config path:
/root/Projects/classification-multi-model-resnet/timm/resnet_config/xgen.json
- Training script path:
/root/Projects/classification-multi-model-resnet/timm
- Recommend mode:
pruning
- XGen config path:
- Image classification III: MobileNet (ImageNet)
- XGen config path:
/root/Projects/classification-mobilenet/timm/mobilenet_config/xgen.json
- training script path:
/root/Projects/classification-mobilenet/timm/
- Recommend mode:
scaling
- XGen config path:
- Image classification IV: ViT (ImageNet)
- XGen config path:
/root/Projects/classification-vit/timm/vit_config/xgen.json
- training script path:
/root/Projects/classification-vit/timm/
- Recommend mode:
scaling
- XGen config path:
- Segmentation: UNet
- XGen config path:
/root/Projects/segmentation-unet/unet_config/xgen.json
- training script path:
/root/Projects/segmentation-unet/
- Recommend mode:
scaling
- XGen config path:
- Video classification: R2+1D
- XGen config path:
/root/Projects/video-classification-s2-1d/s2+1d_config/xgen.json
- training script path:
/root/Projects/video-classification-s2-1d/
- Recommend mode:
pruning
- XGen config path:
- Natural language processing: MobileBERT (Squad1.1)
- XGen config path:
/root/Projects/nlp-mobilebert/mobilebert_config/xgen.json
- training script path:
/root/Projects/nlp-mobilebert/
- Recommend mode:
scaling
- XGen config path:
- Object detection: YOLOv6 (CoCo2017)
- XGen config path:
/root/Projects/object-detection-yolov6/yolov6_xgen/yolov6_config/xgen.json
- training script path:
/root/Projects/object-detection-yolov6/yolov6_xgen
- Recommend mode:
scaling
- XGen config path:
- Object detection: YOLOX (CoCo2017)
- XGen config path:
/root/Projects/object-detection-yolox/yolox_xgen/yolox_config/xgen.json
- training script path:
/root/Projects/object-detection-yolox/yolox_xgen
- Recommend mode:
scaling
- XGen config path:
- Object detection: YOLOv8 (CoCo2017)
- XGen config path:
/root/Projects/object-detection-yolov8/yolov8_xgen/yolov8_config/xgen.json
- training script path:
/root/Projects/object-detection-yolov8/yolov8_xgen
- Recommend mode:
scaling
- XGen config path:
Users can change the values of the parameters specified in the configuration files to change how XGen works with the model. Next is the explanation of the content of a XGen model configuration. Hint: Before changing the default configuration of a common AI model, users are suggested to create a backup copy of the configuration file first for easy resetting when necessary.
Model Configuration for XGen
This part explains the essential entries in an XGen configuration by drawing on the following example.
{
# general parameters of XGen.
"general": {
"work_place": null, # the folder for XGen to store results
"pretrained_model_path": null, # pretrained model location
"enable_ddp": true, # enable DDP training
"CUDA_VISIBLE_DEVICES": "0,1,2,3", # which GPUs are used
},
# parameters of the original training script of the AI model can be put into this section
# XGen will pass them into the training script
"training_script_args": {
"common_train_epochs": 30, # the number of epochs in one training
# Optional: Any argument of the training_main script can appear here
"video-scale": 2,
"learning-rate": 0.01,
"batch_size": 16, # the batch size per GPU
},
"user_requirements": {
# Auto generated options from XGen interface. Do not change.
"accuracy": null,
"latency": 30,
"use_distillation": true,
"use_default_distillation_model": false,
# Other options that are manually changeable.
"margin": 2, # Acceptable error margin for the termination of XGen optimization
"max_sampling": 3 # The max sampling number in the pruning or scaling mode
# If latency is primary: suggested value is 20;
# If accuracy is primary: suggested value is at least 3, no more than 5
### distillation default model options
"teacher_model": "efficientnet_b4", # default model name
"teacher_path": "/model_path_to/efficientnet_b4.pth", # default model weights path
},
# parameters defining the pruning of XGen
"pruning" : {
"prune_method": "magnitude" # "magnitude" (for magnitude-based pruning; default) or "admm" (for ADMM pruning) or "null" (no pruning)
"prune_weight_ranking": global # "global" or "intra-layer"
"prune_rate": 0.5, # uniform pruning rate (take effect only in the customization mode)
"prune_layers": null, # the list of layers to prune and their respective pruning rates
# If "null", XGen automatically decides the layers to prune
"admm_lr": 0.01, # admm special learning rate
"admm_update_epoch": 5, # how frequently we update z and u in admm pruning
"admm_rho": 0.001, # ( a hyperparameter in admm loss function)
},
# parameters defining the scaling of XGen
"scaling" : {
"searching_variable": null, # Variable to search; must be a variable in the "pruning" or "training_script_args" section
"searching_range": null, # A range (e.g., [0,1])
"searching_granularity": null, # a number determines the smallest unit in the search space; null means continuous space
},
"pas": {
"pas_ratio": 0, # uniform pas scaling rate (take effect only in the customization mode)
"pas": false , # using pas slection
"limit_loss_weights": 5.0, # pas limit loss weight which can limit model size
"use_limit_loss":true, # using limit loss or not
"pas_debug": false , # If ture while show loss item
"pas_rebuild": false , # If true while rebuild model for pas_model
"pas_finetune_epoch": 200, # number of epochs for fine-tune model for pas_model
"pas_pretrained_weight_path": null # pretrained pas_model weight path
},
# distillation key args
"distillation": {
"distillation_method" : "classic_distillation", # "null" (no distillation; default) or "classic_distillation" (for classic distillation; )
# classic_distillation config start
# general setting
"original_loss_weights":0.1, # loss weights for original loss which define by user
# Tag loss setting:
"tag_loss_weights": 0.9, # tag loss weights
"tag_loss": 'kl', # None will ignore tag loss, only support mse loss and kl loss for now
"tag_temperature": 4, # tag temperature
"tag_loss_combination_method": "avg", # if there are multiple teacher, how we combined them, we support "avg" and "random" method for now
# feature loss setting
"feature_loss_weights":0.9, # feature loss weights
"feature_default_temperature":1, # feature default temperature
"advance_feature_mapping": {}, # teacher_model_name: [student_layer_name, teacher_layer_name,loss_type,weights,{adv_config}]
# regularization loss
"regularization_loss_weights": 1, # regulization loss weighst
"discriminator_lr":1e-4 # tag_discriminator learning rate
# classic_distillation config end
},
"compiler": {
"input_shape": "(1,3,360,640)", # The shape of the input tensor of the AI model
"opset_version": 11, # The version of ONNX (11 or 9); default 11
}
}
The configuration is a JSON file, consisting of seven sections. The inputs from the XGen interface automatically updates the corresponding entries in this JSON file. Users may also directly modify these entries. Most entries are self-explained in the example. Several notes are worth mentioning.
Configure Pruning
-
XGen has two pruning methods, magnitude-based pruning and ADMM-based pruning. The former prunes the weights with the least magnitudes, the latter uses a more advanced method to select the weights to prune. Parameter
prune_weight_ranking
determines whether the ranking of weights for pruning is within each individual layer or across all layers. -
In the
pruning
mode, parametersprune_method
andprune_rate
are ignored, and XGen uses magnitude-based pruning and automatically determines the appropriate pruning rates for the layers listed inprune_layers
. Ifprune_layers
is null, XGen automatically determines the layers to prune. The preset pruning rates inprune_layers
are ignored in this mode. -
In the
customization
mode, XGen conducts either pruning (ifprune_method
is not null) or training of the AI model (otherwise); the parameterprune_method
determines the pruning method, and the parameterprune_layers
determines the layers to prune and their pruning rates. If theprune_rate
parameter is not null, it takes a higher priority over the pruning rates inprune_layers
. In thecustomization
mode, ifprune_method
is not null,prune_rate
andprune_layers
cannot be both null. -
The
prune_layers
entry is a JSON string consisting a list of layers in the AI model and their pruning rates (i.e., the percentage of weights to be removed). An example is as follows.
'{"prune_rates": {"layer1.0.conv1.weight": 0.2, "layer1.0.conv2.weight": 0.4,
"layer1.1.conv1.weight": 0.5, "layer1.1.conv2.weight": 0.5, "layer2.0.conv1.weight": 0.5,
"layer2.0.conv2.weight": 0.5, "layer2.0.shortcut.0.weight": 0.5, "layer2.1.conv1.weight": 0.5,
"layer2.1.conv2.weight": 0.5, "layer3.0.conv1.weight": 0.5, "layer3.0.conv2.weight": 0.5,
"layer3.0.shortcut.0.weight": 0.5, "layer3.1.conv1.weight": 0.4, "layer3.1.conv2.weight": 0.2}}'
- Users can also create a separate yaml file to specify the layers and the pruning rate of each layer, and then provide the file path as the value of "prune_layers"; XGen will get the value from the file content. To help users set
prune_layers
, XGen provides a utility function for users to get the list of names of the layers in an AI model amenable for pruning. Users just need to addfrom co_lib.utils import export_pruning_layer
to the beginning of the training script of the AI model and useexport_pruning_layer(the-AI-model,output-yaml-file-name)
in the training script to export the model's prunable layers into a file. The pruning rates in the exported list are all set to 0.5 by default. Below is an example of such a yaml file:
prune_ratios:
module.layer1.0.conv1.0.0.weight:
0.6
module.layer2.0.conv2.0.3.weight:
0.8
- To use ADMM-based pruning, the
customization
mode has to be used. Two runs of XGen are needed. The first run must setprune_method
toadmm
and the second runmagnitude
. The two runs must use the sameprune_rate
andprune_layers
.
Configure Scaling
IMPORTANT: To use scaling for a custom AI model, it is necessary for the user to add a search variable into the AI model and configure the scaling in the configuration file. Without that, scaling
would not function and would give only the original model.
-
In the
scaling
mode, XGen has a generic design. It changes the value of thesearching_variable
while assessing the quality of the AI model. So in principle, the user may express any aspect of the AI model with thesearching_variable
, and then usescaling
of XGen to search for the best value. A typical use of this feature is to express, in the training script, the widths or depths of the layers of the AI model as expressions of thesearching_variable
. Then, XGen will automatically scale down (or up) those layers towards the objectives. For instance, in the training script, the user may set the first three layer's widths to100*w
,200*w
,300*w
. If in the configuration filew
is set as thesearching_variable
and the search range is[0.5, 1]
with a step size 0.1, XGen will investigate six variants of the AI model with different widths. Users can also use this feature to change other aspects of the AI model. Note that thesearching_variable
must be a variable in thetraining_script_args
section in the configuration file.Below is an example showing how scaling is configured in UNet:
# Inside the configuration file of UNet (config.json), # nch_ker is set as the searching variable. "searching_variable": "nch_ker",
# Inside the model construction file of UNet, `nch_ker` is used as a variable to control # the width of each layer. Notice that different factors are used with `nch_ker` at different # layers. self.enc1_1 = CNR2d(1 * self.nch_in, 1 * self.nch_ker, kernel_size=3, stride=1, norm=self.norm, relu=0.0) self.enc1_2 = CNR2d(1 * self.nch_ker, 1 * self.nch_ker, kernel_size=3, stride=1, norm=self.norm, relu=0.0) self.pool1 = Pooling2d(pool=2, type='max') self.enc2_1 = CNR2d(1 * self.nch_ker, 2 * self.nch_ker, kernel_size=3, stride=1, norm=self.norm, relu=0.0) self.enc2_2 = CNR2d(2 * self.nch_ker, 2 * self.nch_ker, kernel_size=3, stride=1, norm=self.norm, relu=0.0) self.pool2 = Pooling2d(pool=2, type='max') self.enc3_1 = CNR2d(2 * self.nch_ker, 4 * self.nch_ker, kernel_size=3, stride=1, norm=self.norm, relu=0.0) self.enc3_2 = CNR2d(4 * self.nch_ker, 4 * self.nch_ker, kernel_size=3, stride=1, norm=self.norm, relu=0.0) self.pool3 = Pooling2d(pool=2, type='max') self.enc4_1 = CNR2d(4 * self.nch_ker, 8 * self.nch_ker, kernel_size=3, stride=1, norm=self.norm, relu=0.0) self.enc4_2 = CNR2d(8 * self.nch_ker, 8 * self.nch_ker, kernel_size=3, stride=1, norm=self.norm, relu=0.0) self.pool4 = Pooling2d(pool=2, type='max') self.enc5_1 = CNR2d(8 * self.nch_ker, 2 * 8 * self.nch_ker, kernel_size=3, stride=1, norm=self.norm, relu=0.0)
-
Model accuracy is sensitive to the shapes of the stem (i.e. input) layer and the output head layer. So when setting up for scaling, try to avoid scaling down those layers much. There are various online resources on how to best do AI model scaling.
- In the
customization
mode, XGen does not do scaling; the scaling section in the configuration file is ignored.
Configure Distillation
Distillation is a way to improve the accuracy of a pruned or scaled model.
To use distillation, users just need to answer Yes
when being asked in the XGen interactive interface. In the XGen configuration file (xgen.json), there are several options in the user_requirements
section about distillation and a separate distillation
section that control the behavior of distillation, as shown in the example configuration file in the Model Configuration for XGen part. Please see the comments in that example configuration file for the explanations of those options.
Configure PaS
PaS (Pruning as Search) is a novel method of structured pruning that searches through a model’s computational graph to find a smaller and more optimal model.
To use the PaS algorithm, the user must select the PaS
option when being asked in the XGen interactive interface, or select the customization mode while having parameter pas
set in the configuration file.
The main PaS configurations include pas_ratio
, which indicates the degree of compression you want for your model; pas_finetune_epoch
, which specifies the number of training epochs required for fine-tuning your final model. See the example configuration above for other parameters.
Initially, a larger PaS-generated model is trained to identify more suitable weights, and then, using the fine-tune epochs specified by the user, the previously-generated model is pruned to the desired size.
Notably, this method cannot adequately handle dynamic computational graphs and requires the user’s model to have a static graph, with no conditional statements in the forward function, among other requirements.
In experimental comparisons, the use of torch.batchnorm
may significantly impact the accuracy. If using the PaS algorithm, it is recommended to replace torch.batchnorm
with nn.BatchNorm
.
Hints
Here are some hints on using the various options provided in the XGen configuration.
-
The
pretrained_model_path
entry is for holding the location of the pretrained model if there is any. The format of the pretrained model can be one of the following:- result saved by torch.save(), such as
torch.save({'net': net.state_dict(),'acc': acc,'epoch': epoch,}, file)
, ortorch.save(net.state_dict(), file)
- a pickle file created by, for instance,
pickle.dump(quant_sim_model, file)
- result saved by torch.save(), such as
-
An optimization done by XGen (
pruning
orscaling
) involves the trainings of multiple AI models, and may hence take hours or days to complete. Usually fewer trainings are needed if "latency" rather than "quality score" is chosen as the primary interest. -
If the original model weights were obtained by distillation (or transfer learning) from some other networks, the
pruning
mode is often a better option thanscaling
. -
Pruning is applicable to a layer only if the sizes of the kernels in that layer are multiples of 8, such as 4x4.
-
When users are not sure of what configurations work best, they can try to run XGen multiple times with a different configuration used each time.
-
The
pruning
andscaling
modes both involve an iterative search process, finding the appropriate pruning rates or scaling variable values. Thecustomization
mode does not involve a search process. XGen simply trains the model based on the provided configuration; ifpruning_method
is not null, XGen conducts a one-time pruning of the AI model before training it. -
XGen supports data parallel training on multiple GPUs in a machine. To use that support, the user just needs to enable parallel training in the training script of the AI model. This version of XGen does not support training one model on multiple machines.
Example
As an example, the following table shows the differences between the default configuration of R2+1d
model and modified configurations that achieve better speed-quality tradeoffs. The modified configurations use the customization
mode, including two runs of XGen. The first run uses the ADMM method to identify the weights that are good to prune and make them small in magnitude, and the second run uses magnitude-based pruning to set those weights to zero and retrain the model.
Parameter | Default configuration | Modified configurations |
---|---|---|
mode | pruning | customization |
prune_method | magnitude | admm (run-1), magnitude (run-2) |
common_train_epochs | 35 | 50 (run-1), 130 (run-2) |
learning_rate | 0.005 | 0.0005 |
admm_update_epoch | 5 | 10 |
admm_rho | 0.001 | 0.0001 |
prune_layers | null | {module.layer1.0.conv1.0.0.weight: 0.8 |
module.layer1.0.conv1.0.3.weight: 0.8 | ||
module.layer1.0.conv2.0.0.weight: 0.8 | ||
module.layer1.0.conv2.0.3.weight: 0.8 | ||
module.layer1.1.conv1.0.0.weight: 0.8 | ||
module.layer1.1.conv1.0.3.weight: 0.8 | ||
module.layer1.1.conv2.0.0.weight: 0.8 | ||
module.layer1.1.conv2.0.3.weight: 0.8 | ||
module.layer2.0.conv1.0.0.weight: 0.7 | ||
module.layer2.0.conv1.0.3.weight: 0.7 | ||
module.layer2.0.conv2.0.0.weight: 0.7 | ||
module.layer2.0.conv2.0.3.weight: 0.7 | ||
module.layer2.1.conv1.0.0.weight: 0.7 | ||
module.layer2.1.conv1.0.3.weight: 0.7 | ||
module.layer2.1.conv2.0.0.weight: 0.7 | ||
module.layer2.1.conv2.0.3.weight: 0.7} |
The customization
mode leads to both a higher accuracy and a lower latency on three devices, as shown as follows.
Custom AI
If users have their own model to optimize, custom AI of XGen can be the way to go. But before using this option, the users are recommended to first consider common AI as custom AI involves more efforts on the user's end.
To use custom AI, the user must already have a workable training script of their own AI model. The script must be written in PyTorch. (Support of other front ends---such as TensorFlow, Keras---will be added in the future.)
The usage of custom AI is similar to common AI except that the user would need to pick the "Your own model" option when selecting the base model in the XGen interface. The rest of the procedure is self-explained. There are however three steps to go through before XGen can start working on a custom AI:
- Checking the model applicability;
- Preparing the running environment;
- Preparing the training script. An example is included inside XGen (
/root/examples
inside XGen container).
Step I: Checking Model Applicability
Although XGen has a broad coverage of DNN operators, some AI models may still contain operators beyond XGen's coverage and cause XGen to fail. This first step is to run such a check.
-
Export the AI model to an ONNX file
The user may use the PyTorch function
torch.onnx.export()
to export the model to an ONNX file. There are three requirements for making the call:- The second argument, model input, must have one as the batch size of the input (i.e., the first dimension of the tensor must be 1);
- the
do_constant_folding
argument must beTrue
; - the
opset_version
argument must be 11 or 9. Here is an example: - do not set
dynamic_axes
argument.
torch.onnx.export( model, # model being run x, # model input onnx_save_path, # where to save the model do_constant_folding=True, input_names=['input'], # the model's input names (an arbitrary string) output_names=['output'], # the model's output names (an arbitrary string) opset_version=11 # XGen supports 11 or 9 )
Sometimes users' training script may already has a customized function to export ONNX format of the model. In that case, the user should use that function instead of the default PyTorch function.
-
Run the check
Use
XGen onnx_latency_benchmark
to test the support status of generated onnx file, if the benchmark succeeded, it means that current onnx file is supported. Otherwise, the onnx file is not supported.The supported operators (listed in the Appendix I).
Step II: Prepare Running Environment
Because XGen involves trainings of the AI model inside its docker container, the users need to install into the docker container the dependencies required by the training of the AI model. It is better to ensure that the training script can run properly in the docker container before moving to the next step.
Step III: Prepare Training Scripts
To allow XGen to optimize a custom AI model, the training script of the model has to be revised so that XGen can interact with the training script during the optimization process. This part explains the needed changes.
Descriptions of the APIs used below can be found in Appedix III
-
Rename the main training script to
train_script_main.py
, and ensure that (i) at the end of the script, the script invokes the main training function; (ii) the main training function to be namedtraining_main
; (iii) this renamed scripttrain_script_main.py
reside in the top-level folder of the custom AI model (e.g.,at the end of the script, the script invokes the main training function. (XGen currently requires the main training function to be named
training_main`; this requirement will become obsolete in the next version.)if __name__ == '__main__': training_main()
-
Import the XGen modules into the training script by adding the following lines to the beginning of the training script:
from co_lib import Co_Lib as CL from xgen_tools import *
-
Initialize XGen, and connect the control variables in the training script with the internal variables inside XGen so that XGen can control the training process. Some of the internal variables of XGen to be connected with are listed as follows. For the full list, please see the
training_script_args
field in xgen config json file (e.g.,/root/Projects/object-detection-yolox/yolox_xgen/yolox_config/xgen.json
).common_train_epochs: control the number of training epochs learning_rate: control the learning rate batch_size: control the batch size device_num: the number of GPUs to use num_classes: the number of classes if this model is a classification model train_data_path: path of the training data eval_data_path: path of the eval data train_label_path: path of the train label eval_label_path: path of the eval label pretrain_model_weights_path: the path of the pretrained model weights
XGen sets these internal variable values based on the values it attains during the interactions with the user. So for the training script to follow these controls, the user would need to ensure that the corresponding variables inside the training script take their values from these XGen internal variables.
XGen offers an API
xgen_init(user_args, args_ai=None, map=COCOPIE_MAP)
to facilitate the process. The call, as shown as follows, returns two values, with the first (orginalArgs
in the example) holding the original arguments in the training script, and the second (args_ai
) holding the internal variables of XGen.# user_args represents a user variable holding arguments in the training_main function # the returned args_ai is the same as the argument `args_ai` # the return type of originalArgs is identical to the type of user_args (e.g., type argparse.Namespace) originalArgs, args_ai = xgen_init(user_args, args_ai=None, map=COCOPIE_MAP)
Specifically, the variables that xgen actually controls for the training process are in
args_ai['training_script_args']
field. So for reference conveniences, the user may add this line:xgen_control_args = args_ai['training_script_args']
Note, that call to
xgen_init
should be put after the parsing of arguments in the original training script, whereuser_args
should be replaced with the variable in the original training script that holds the arguments parsing results (of typeargparse.Namespace
ordict
).User can then connect the variables in the
originalArgs
with the corresponding variables inargs_ai
. For instance, if the original training script usesbatchSize
variable for the batch size of training, the following code will connect it with the internal variable of XGen.originalArgs.batchSize = xgen_control_args['batch_size']
If the original training script uses other ways to organize the control variables of training, the modifications would just need to ensure that the connection is done properly. For example, the training script of Yolox passes the training parameters into the training script through a Python file (
config.py
) and then uses a variableexp
of type ABCMeta to hold them. The corresponding code changes would need to connect the fields inexp
with the corresponding internal variables of XGen:# XGen: invoke xgen_init # If args is not provided, it can be set as an empty dictionary{}. # If args is {}, then args will be consistent with the args_ai generated by initialization in xgen_init user_args, args_ai = xgen_init(args, args_ai=None) # XGEN: The variable that xgen actually controls the training process is in args_ai['training_script_args'], # which is a dictionary, so you can use it in the following ways. # Note: The args_ai['training_script_args'] field is actually mapped by the 'training_script_args' field in xgen config json xgen_control_args = args_ai['training_script_args'] # XGen: connect variables with XGen internal variables exp.max_epoch = xgen_control_args['common_train_epochs'] exp.data_dir = xgen_control_args['train_data_path']
It is worth mentioning that it is not necessary to connect XGen's every internal variable. If the training script has no use of some of the controls, those variables do not need to be connected with the user's variables. Normally, at the minimum, the
common_train_epochs
should be connected with the user's variable.For conveniences, XGen provides a buildin mechanism for forming the connection. But note that this mechanism applies only if the original variables are directly passed into the training function through arguments. Rather than writing those assignment statements, the user may use a dictionary COCOPIE_MAP, as illustrated as follows. The variables in the original training script are put as the keys and the XGen internal varialbes as the values. Put this map at the beginning of the training script. Then, after the call to
user_args, args_ai = xgen_init(args, args_ai=None)
, the user_args will contain all the original variables in the COCOPIE_MAP, and they will be all connected with the internal variables of XGen.from xgen_tools import XgenArgs COCOPIE_MAP = { 'training epochs variable' : 'common_train_epochs', 'your_lr': 'learning_rate', # assign `learning_rate` in xgen configuration to `your_lr` 'your_batch_size': 'batch_size', # assign `batch_size` in xgen configuration to `your_batch_size` 'your_train_data_path': 'train_data_path', # assign `train_data_path` in xgen configuration to `your_train_data_path` 'your_val_data_path': 'eval_data_path', # assign `val_data_path` in xgen configuration to `your_val_data_path` 'your_train_label_path': 'train_label_path', # assign `train_label_path` in xgen configuration to `your_train_label_path` 'your_eval_label_path': 'eval_label_path', # assign `eval_label_path` in xgen configuration to `your_eval_label_path` 'your_pretrain_model_weights_path': 'pretrain_model_weights_path', # assign `pretrain_model_weights_path` in xgen configuration to `your_pretrain_model_weights_path`, }
After the connection is done, the user would need to ensure that the rest of the training script uses those control variables to control the training process.
-
Insert the following line after the line where the AI model architecture is assigned to a variable (
AImodel
in the example). This line should replace the origin model weight loading function in the script. It loads the model weights from the file specified by thepretrained_model_path
entry in the XGen configuration file. If that entry isnull
, the model will be initialized by the default model initialization in the training script.# AImodel represents a variable holding the model architecture xgen_load(AImodel,args_ai=args_ai)
-
Insert the following line after the DNN model is loaded, but before the training starts.
AImodel
,dataLoader
, andAIoptimizer
represent variables in the training script:AImodel
is the holder of the DNN model,dataLoader
is the training data loader, andAIoptimizer
is the optimizer of the training process.cl_model = CL.init(args=args_ai, model=AImodel:torch.nn.Module, optimizer=AIoptimizer:torch.optim, data_loader=dataLoader:torch.utils.data.dataloader.DataLoader) if cl_model: AImodel = cl_model
-
Insert the following line at the beginning of each training epoch (inside the training epoch loop), where
epochCount
is the counter of epochs:CL.before_each_train_epoch(epoch=epochCount)
-
Insert the following line after the changes to the learning rate in the training loop, where
epochCount
is the counter of epochs. Skip this step if the training loop has no changes to the learning rate.CL.after_scheduler_step(epoch=epochCount)
-
Insert the following line after the loss function, where
loss
is the loss tensor calculated by the loss function in the training script. This line allows XGen to adjust the loss for its optimization purpose.loss = CL.update_loss(loss)
-
Inside the training loop, insert the following line right after an evaluation of the current model, where
AImodel
andepochCount
have been explained earlier, andevaluationResult
represents a variable for the evaluation result, such as the accuracy, average PSNR (for super resolution), and so on. The evaluation can be invoked for every one or more epochs. If there are no evaluations in the training loop, the user would need to add them.xgen_record(args_ai,AImodel,evaluationResult,epoch=epochCount)
-
Note:
-
If the training script uses EMA, DataParallel or DistributedDataParallel, the actual model is in a field
module
of the model (i.e., AImodel.module), in which case, the line should bexgen_record(args_ai, AImodel.module, evaluationResult, epoch=epochCount)
-
Sometimes the training script already has a customized function to export the model in ONNX format. In that case, the user should use that customized function to export the model to an ONNX file. The user may then pass the onnx_file_path to xgen_record as follows:
xgen_record(args_ai, AImodel, evaluationResult, epoch=epochCount, onnx_file_path= the onnx file path )
;
-
-
-
At the end of the
training_main
function, insert the following. If the training script has no evaluation function, use-float('inf')
forevaluationResult
)xgen_record(args_ai,AImodel,evaluationResult,epoch=-1)
-
If the user wants to use the "Scaling" mode of XGen to compress the model, the user needs to modify the training script of the AI model to give XGen the control of the scale (e.g., width, depth) of the AI model. Please see Configure Scaling for XGen part for details.
-
XGen has a buildin order check that inspects the training script to identify obvious api call errors. The api order check is enabled by default. If the user wants to disable it, she can add the following line into the training script:
from xgen_tools import helper helper.ENABLE_HELPER = False
Step IV: Prepare Config file
The next step is to prepare a configuration file which determines how XGen applies optimizations to the model. Please see Model Configuration for details.
Example Documents
You can refer to our documentation on YOLOX model modification to understand this process. YOLOX example
Example Code
Following is an example of a training script after being revised for XGen. The changed parts are surrounded by xgen_tools START
and xgen_tools END
comments.
# xgen_tools START @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
from co_lib import Co_Lib as CL
from xgen_tools import *
COCOPIE_MAP = {'train_epochs' : 'common_train_epochs'}
# xgen_tools END @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
def training_main():
orginalArgs = ... # parsing users' parameters (including "train_epochs" in this example)
# xgen_tools START @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
user_args,args_ai = xgen_init(orginalArgs,map = COCOPIE_MAP)
# xgen_tools END @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
# define dataloader
trainDataLoader = torch.utils.data.DataLoader(...)
testDataLoader = torch.utils.data.DataLoader(...)
# define DNN model
model = ...
# define loss function
criterion = ...
# define optimizer
optimizer = torch.optim.Adam(
model.parameters(),
lr=args.learning_rate,
betas=(0.9, 0.999),
eps=1e-08,
weight_decay=args.decay_rate
)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=30, gamma=0.7)
# xgen_tools START @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
#load model
xgen_load(model,args_ai=args_ai)
cl_model = CL.init(args=args_ai, model=model, optimizer=optimizer, data_loader=trainDataLoader)
if cl_model:
model = cl_model
# xgen_tools END @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
for epochCount in range(0,user_args.train_epochs):
# xgen_tools START @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
CL.before_each_train_epoch(epoch=epochCount)
# xgen_tools END @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
scheduler.step()
# xgen_tools START @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
CL.after_scheduler_step( epoch = epochCount)
# xgen_tools END @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
for batch_id, data in enumerate(trainDataLoader):
model.train()
output = model(input)
loss = criterion(...) # regular loss, i.e., cross-entropy, mse, ...
# xgen_tools START @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
loss = CL.update_loss(loss)
# xgen_tools END @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
loss.backward()
optimizer.step()
if epoch % user_args.eval_epochs == 0:
accuracy = eval(model)
# save the model
save_path = 'path_name.pth'
state = {
'epoch': epochCount,
'model_state_dict': model.state_dict(),
'optimizer_state_dict': optimizer.state_dict(),
}
# xgen_tools START @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
#if users want to save the intermediate models and log info (optional)
xgen_record(args_ai,model,accuracy,epoch=epochCount)
# xgen_tools END @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
accuracy = eval(model) # needed only if the total of the training epochs is less than one
# xgen_tools START @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
xgen_record(args_ai,model,accuracy,epoch=-1)
if __name__ == '__main__':
# don't for get add this line to make sure it will start the main function
training_main()
Customized Operation
It is common when user wants to include a new model to XGen, user may need to add some customized operations, want to control the pruning rate / pruning method or just want to run the model without optimized by XGen. User'll need customized operation
mode to do that. Before start, please make sure the model has been already correctly adapted to XGen due to Advanced Common AI.
Here, let's take a look at how to use customized operation
mode.
-
Choose
DigitNet (MNIST)
for example.Please choose the base model (and the default dataset) to start with: > Image Classification 0: DigitNet (MNIST) Image classification I: EfficientNet (high resolution, default: ImageNet) Image classification II: ResNet (high resolution, default: ImageNet) Image classification III: MobileNet (high resolution, default: ImageNet) Image classification IV: ViT (default: ImageNet (high resolution)) Segmentation: UNet (ISBI-2012) Object detection: YOLOv6 (CoCo2017) Object detection: YOLOX (CoCo2017) Object detection: YOLOv8 (CoCo2017) Video classification: R2+1d (UCF101) Natural language processing: MobileBERT (Squad1.1) Your own model
-
In the mode choice, choose
customized operation
mode.What do you want to do? (Pruning recommended): Compatibility test Pruning Scaling PaS > Customized operation
-
Choose or input any further question.
Which GPU(s) do you want to use: > [*] GPU 0: NVIDIA GeForce RTX 3090 (UUID: GPU-e75cbd31-7281-3c03-xxxx-xxxxxxx) [ ] GPU 1: NVIDIA GeForce RTX 3090 (UUID: GPU-647eedd7-0986-f7aa-xxxx-xxxxxxx) Press <space> or <tab> for multi-selection, <up> and <down> or letter key <j> and <k> to move, <enter> to accept. What is the batch size per GPU: 128 What is the learning rate: 0.01 How many epochs: 3 XGen pretrained model absolute path (optional): No value detected, confirm to continue? > Yes No Your workplace for this session (if you don't input a path, XGen will make one for you): This is your default workplace for this session: /root/output/DigitNet_MNIST/20221219121928
In the output, you'll see there is only one stage in XGen, because XGen won't do any other optimizing operations like pruning or scaling in customized operation
mode. And the latency
and accuracy
of the original model is measured by XGen. That is to say, if you want to know the model latency or accuracy before being optimized by XGen, user can use customized operation
mode.
Another use case is that you want to control the pruning rate or pruning method. user can also use customized operation
mode to do that. There is a little bit more work to do. Just simply edit the pruning
field of xgen.json
config file in your model project. For DigitNet
, the file is located at ~/Projects/classification-digitnet/xgen.json
. user can change the pruning
field to the following:
{
...
"pruning": {
"prune_method": "magnitude",
"prune_layers": null,
"prune_weight_ranking": "intra-layer",
"prune_rate": 0.1,
"admm_update_epoch": 5,
"admm_rho": 0.001,
"admm_sparsity_type": "block_punched",
"admm_lr": 0.01,
"admm_block": "(8,4)",
"prune_threshold": -1.0
},
...
}
There are many choices for value of pruning
field. Please refer to the pruning
field in Model Config for more details. Now when you run XGen in customized operation
mode, user can see the pruning rate is 0.1
and the pruning method is magnitude
. That is to say, XGen won't do any pruning operation in this case.
Other Usage
At the center of XGen are three components, the AI model optimizer for model compression and optimization, the compiler for code optimization and generation, and the model testing component for measuring the speed of the models. The usage of XGen mentioned so far automatically invokes all of those components. But those components can also be used separately. One of the benefits this flexibility brings is that even if there are no devices connected, XGen can still be used to optimize an AI model via the AI model optimizer (with some limitations); the other example flexibility is to use XGen to skip the model optimization step, and quickly generate executable code for a model and test it out or profile its execution.
Model Optimization Alone Without Running on Devices
This option is for the case where the user wants to find a model with a certain accuracy and minimum latency but does not have or want to use any connected device. The usage of XGen is the same as in the default case except that in the step of platform selection, select Other Device
, or in the step of device selection, select no device.
In this case, XGen does not run the AI models on the physical devices to measure the speed. It instead uses a simple oracle to output a pseudo-latency. The oracle only ensures that the pseudo-latencies it gives out follow the rule that the smaller the AI model is, the faster it runs. The pseudo-latency values can be far away from the real latencies. But because the rule it follows is roughly consistent with actual observations, despite the imprecision of latency, XGen can still find satisfying AI models when the users' requirement is to meet an accuracy while minimizing the latency. (The accuracy it gets is still correct.) If the users' requirement is to find a model with a certain latency, this option is not applicable.
Using the Compiler and Testing Components Only
This usage allows the compiler and the testing components to be invoked without the AI optimizer. It can come handy if the users have collected one or more AI models and would like to measure their speeds (in a batch).
Users can do that through the following command:
XGen onnx-latency-benchmark --model-path [onnx-model-path] --output-path [output-path] --devices device_id1 device_id2 ... --verbose
Note: The standalone
onnx-latency-benchmark
command (withoutXGen
) is deprecated. Please useXGen onnx-latency-benchmark
instead.
The script will measure the speed of every ONNX model in the onnx-model-path and output one single csv file in the output path. The csv file reports the inference latency of each of the models along with some other detailed information, such as the place where the code generated by the compiler resides (output_dir
), the number of parameters in the model (params
), the amount of computation in the model (MACs
).
Parameter devices
is optional, if you don't specify it, you will enter an interactive interface to select the devices.
Parameter verbose
is also optional, if you specify it, the script will print out the detailed information.
XGen onnx-latency-benchmark --benchmark --model-path [onnx-model-path] --output-path [output-path] --devices device_id1 device_id2 ... --verbose
Parameter benchmark
is optional. If specified, this tool will record the running time of the model and the detailed running information of each operator.
I deepvan/run/statistics.cc:359] ---------------------------------------------------------------------------------------------------------------------------------------------------------------
I deepvan/run/statistics.cc:359] Sort by Run Order
I deepvan/run/statistics.cc:359] ---------------------------------------------------------------------------------------------------------------------------------------------------------------
I deepvan/run/statistics.cc:359] | Op Type | Avg(ms) | % | GMACPS | MACs(Dense) | Stride | Pad | Filter Shape | Output Shape | Dilation | name |
I deepvan/run/statistics.cc:359] ---------------------------------------------------------------------------------------------------------------------------------------------------------------
I deepvan/run/statistics.cc:359] | BufferTransform | 0.046 | 0.873 | 0.000 | 0.00000 | | | | [1,224,224,3] | | deepvan_node_input.1_mem_type_2 |
I deepvan/run/statistics.cc:359] | Conv2D | 0.076 | 1.427 | 53.477 | 4.06426 | [2,2] | [2,2] | [12,3,3,3] | [1,112,112,12] | [1,1] | Clip_1 |
I deepvan/run/statistics.cc:359] | DepthwiseConv2d | 0.071 | 1.329 | 19.135 | 1.35475 | [1,1] | [2,2] | [1,12,3,3] | [1,112,112,12] | [1,1] | Clip_3 |
I deepvan/run/statistics.cc:359] | Conv2D | 0.037 | 0.690 | 49.152 | 1.80634 | [1,1] | [0,0] | [12,12,1,1] | [1,112,112,12] | [1,1] | Conv_4 |
I deepvan/run/statistics.cc:359] | Eltwise | 0.029 | 0.540 | 0.000 | 0.00000 | | | | [1,112,112,12] | | Add_5 |
I deepvan/run/statistics.cc:359] | Conv2D | 0.071 | 1.331 | 76.432 | 5.41901 | [1,1] | [0,0] | [36,12,1,1] | [1,112,112,36] | [1,1] | Clip_7 |
I deepvan/run/statistics.cc:359] | DepthwiseConv2d | 0.106 | 1.999 | 9.541 | 1.01606 | [2,2] | [2,2] | [1,36,3,3] | [1,56,56,36] | [1,1] | Clip_9 |
I deepvan/run/statistics.cc:359] | Conv2D | 0.037 | 0.697 | 48.623 | 1.80634 | [1,1] | [0,0] | [16,36,1,1] | [1,56,56,16] | [1,1] | Conv_10 |
I deepvan/run/statistics.cc:359] | Conv2D | 0.035 | 0.652 | 57.757 | 2.00704 | [1,1] | [0,0] | [40,16,1,1] | [1,56,56,40] | [1,1] | Clip_12 |
I deepvan/run/statistics.cc:359] | DepthwiseConv2d | 0.072 | 1.361 | 15.572 | 1.12896 | [1,1] | [2,2] | [1,40,3,3] | [1,56,56,40] | [1,1] | Clip_14 |
I deepvan/run/statistics.cc:359] | Conv2D | 0.037 | 0.701 | 53.736 | 2.00704 | [1,1] | [0,0] | [16,40,1,1] | [1,56,56,16] | [1,1] | Conv_15 |
I deepvan/run/statistics.cc:359] | Eltwise | 0.017 | 0.316 | 0.000 | 0.00000 | | | | [1,56,56,16] | | Add_16 |
I deepvan/run/statistics.cc:359] | Conv2D | 0.046 | 0.871 | 90.836 | 4.21478 | [1,1] | [0,0] | [84,16,1,1] | [1,56,56,84] | [1,1] | Clip_18 |
I deepvan/run/statistics.cc:359] | DepthwiseConv2d | 0.077 | 1.454 | 7.653 | 0.59270 | [2,2] | [2,2] | [1,84,3,3] | [1,28,28,84] | [1,1] | Clip_20 |
I deepvan/run/statistics.cc:359] | Conv2D | 0.057 | 1.075 | 32.209 | 1.84397 | [1,1] | [0,0] | [28,84,1,1] | [1,28,28,28] | [1,1] | Conv_21 |
I deepvan/run/statistics.cc:359] | Conv2D | 0.028 | 0.532 | 55.751 | 1.58054 | [1,1] | [0,0] | [72,28,1,1] | [1,28,28,72] | [1,1] | Clip_23 |
I deepvan/run/statistics.cc:359] | DepthwiseConv2d | 0.041 | 0.775 | 12.301 | 0.50803 | [1,1] | [2,2] | [1,72,3,3] | [1,28,28,72] | [1,1] | Clip_25 |
I deepvan/run/statistics.cc:359] | Conv2D | 0.053 | 0.992 | 29.906 | 1.58054 | [1,1] | [0,0] | [28,72,1,1] | [1,28,28,28] | [1,1] | Conv_26 |
I deepvan/run/statistics.cc:359] | Eltwise | 0.011 | 0.207 | 0.000 | 0.00000 | | | | [1,28,28,28] | | Add_27 |
I deepvan/run/statistics.cc:359] | Conv2D | 0.028 | 0.533 | 58.745 | 1.66835 | [1,1] | [0,0] | [76,28,1,1] | [1,28,28,76] | [1,1] | Clip_29 |
I deepvan/run/statistics.cc:359] | DepthwiseConv2d | 0.041 | 0.775 | 12.984 | 0.53626 | [1,1] | [2,2] | [1,76,3,3] | [1,28,28,76] | [1,1] | Clip_31 |
I deepvan/run/statistics.cc:359] | Conv2D | 0.056 | 1.052 | 29.765 | 1.66835 | [1,1] | [0,0] | [28,76,1,1] | [1,28,28,28] | [1,1] | Conv_32 |
I deepvan/run/statistics.cc:359] | Eltwise | 0.011 | 0.206 | 0.000 | 0.00000 | | | | [1,28,28,28] | | Add_33 |
I deepvan/run/statistics.cc:359] | Conv2D | 0.047 | 0.890 | 81.510 | 3.86355 | [1,1] | [0,0] | [176,28,1,1] | [1,28,28,176] | [1,1] | Clip_35 |
I deepvan/run/statistics.cc:359] | DepthwiseConv2d | 0.044 | 0.819 | 7.121 | 0.31046 | [2,2] | [2,2] | [1,176,3,3] | [1,14,14,176] | [1,1] | Clip_37 |
I deepvan/run/statistics.cc:359] | Conv2D | 0.059 | 1.105 | 32.825 | 1.93178 | [1,1] | [0,0] | [56,176,1,1] | [1,14,14,56] | [1,1] | Conv_38 |
I deepvan/run/statistics.cc:359] | Conv2D | 0.036 | 0.681 | 48.379 | 1.75616 | [1,1] | [0,0] | [160,56,1,1] | [1,14,14,160] | [1,1] | Clip_40 |
I deepvan/run/statistics.cc:359] | DepthwiseConv2d | 0.027 | 0.510 | 10.396 | 0.28224 | [1,1] | [2,2] | [1,160,3,3] | [1,14,14,160] | [1,1] | Clip_42 |
I deepvan/run/statistics.cc:359] | Conv2D | 0.053 | 0.998 | 33.042 | 1.75616 | [1,1] | [0,0] | [56,160,1,1] | [1,14,14,56] | [1,1] | Conv_43 |
I deepvan/run/statistics.cc:359] | Eltwise | 0.009 | 0.169 | 0.000 | 0.00000 | | | | [1,14,14,56] | | Add_44 |
I deepvan/run/statistics.cc:359] | Conv2D | 0.034 | 0.635 | 58.452 | 1.97568 | [1,1] | [0,0] | [180,56,1,1] | [1,14,14,180] | [1,1] | Clip_46 |
I deepvan/run/statistics.cc:359] | DepthwiseConv2d | 0.030 | 0.556 | 10.727 | 0.31752 | [1,1] | [2,2] | [1,180,3,3] | [1,14,14,180] | [1,1] | Clip_48 |
I deepvan/run/statistics.cc:359] | Conv2D | 0.061 | 1.140 | 32.548 | 1.97568 | [1,1] | [0,0] | [56,180,1,1] | [1,14,14,56] | [1,1] | Conv_49 |
I deepvan/run/statistics.cc:359] | Eltwise | 0.009 | 0.173 | 0.000 | 0.00000 | | | | [1,14,14,56] | | Add_50 |
I deepvan/run/statistics.cc:359] | Conv2D | 0.036 | 0.681 | 58.135 | 2.10739 | [1,1] | [0,0] | [192,56,1,1] | [1,14,14,192] | [1,1] | Clip_52 |
I deepvan/run/statistics.cc:359] | DepthwiseConv2d | 0.032 | 0.596 | 10.667 | 0.33869 | [1,1] | [2,2] | [1,192,3,3] | [1,14,14,192] | [1,1] | Clip_54 |
I deepvan/run/statistics.cc:359] | Conv2D | 0.065 | 1.211 | 32.673 | 2.10739 | [1,1] | [0,0] | [56,192,1,1] | [1,14,14,56] | [1,1] | Conv_55 |
I deepvan/run/statistics.cc:359] | Eltwise | 0.009 | 0.166 | 0.000 | 0.00000 | | | | [1,14,14,56] | | Add_56 |
I deepvan/run/statistics.cc:359] | Conv2D | 0.044 | 0.835 | 77.942 | 3.46842 | [1,1] | [0,0] | [316,56,1,1] | [1,14,14,316] | [1,1] | Clip_58 |
I deepvan/run/statistics.cc:359] | DepthwiseConv2d | 0.041 | 0.776 | 13.481 | 0.55742 | [1,1] | [2,2] | [1,316,3,3] | [1,14,14,316] | [1,1] | Clip_60 |
I deepvan/run/statistics.cc:359] | Conv2D | 0.097 | 1.817 | 46.068 | 4.45939 | [1,1] | [0,0] | [72,316,1,1] | [1,14,14,72] | [1,1] | Conv_61 |
I deepvan/run/statistics.cc:359] | Conv2D | 0.048 | 0.908 | 75.887 | 3.66912 | [1,1] | [0,0] | [260,72,1,1] | [1,14,14,260] | [1,1] | Clip_63 |
I deepvan/run/statistics.cc:359] | DepthwiseConv2d | 0.037 | 0.692 | 12.446 | 0.45864 | [1,1] | [2,2] | [1,260,3,3] | [1,14,14,260] | [1,1] | Clip_65 |
I deepvan/run/statistics.cc:359] | Conv2D | 0.082 | 1.533 | 44.937 | 3.66912 | [1,1] | [0,0] | [72,260,1,1] | [1,14,14,72] | [1,1] | Conv_66 |
I deepvan/run/statistics.cc:359] | Eltwise | 0.010 | 0.184 | 0.000 | 0.00000 | | | | [1,14,14,72] | | Add_67 |
I deepvan/run/statistics.cc:359] | Conv2D | 0.049 | 0.915 | 82.211 | 4.00781 | [1,1] | [0,0] | [284,72,1,1] | [1,14,14,284] | [1,1] | Clip_69 |
I deepvan/run/statistics.cc:359] | DepthwiseConv2d | 0.036 | 0.679 | 13.858 | 0.50098 | [1,1] | [2,2] | [1,284,3,3] | [1,14,14,284] | [1,1] | Clip_71 |
I deepvan/run/statistics.cc:359] | Conv2D | 0.088 | 1.649 | 45.621 | 4.00781 | [1,1] | [0,0] | [72,284,1,1] | [1,14,14,72] | [1,1] | Conv_72 |
I deepvan/run/statistics.cc:359] | Eltwise | 0.010 | 0.185 | 0.000 | 0.00000 | | | | [1,14,14,72] | | Add_73 |
I deepvan/run/statistics.cc:359] | Conv2D | 0.086 | 1.606 | 107.551 | 9.20102 | [1,1] | [0,0] | [652,72,1,1] | [1,14,14,652] | [1,1] | Clip_75 |
I deepvan/run/statistics.cc:359] | DepthwiseConv2d | 0.049 | 0.920 | 5.868 | 0.28753 | [2,2] | [2,2] | [1,652,3,3] | [1,7,7,652] | [1,1] | Clip_77 |
I deepvan/run/statistics.cc:359] | Conv2D | 0.161 | 3.025 | 38.064 | 6.13402 | [1,1] | [0,0] | [192,652,1,1] | [1,7,7,192] | [1,1] | Conv_78 |
I deepvan/run/statistics.cc:359] | Conv2D | 0.100 | 1.882 | 76.578 | 7.67693 | [1,1] | [0,0] | [816,192,1,1] | [1,7,7,816] | [1,1] | Clip_80 |
I deepvan/run/statistics.cc:359] | DepthwiseConv2d | 0.042 | 0.783 | 8.630 | 0.35986 | [1,1] | [2,2] | [1,816,3,3] | [1,7,7,816] | [1,1] | Clip_82 |
I deepvan/run/statistics.cc:359] | Conv2D | 0.202 | 3.798 | 37.948 | 7.67693 | [1,1] | [0,0] | [192,816,1,1] | [1,7,7,192] | [1,1] | Conv_83 |
I deepvan/run/statistics.cc:359] | Eltwise | 0.008 | 0.149 | 0.000 | 0.00000 | | | | [1,7,7,192] | | Add_84 |
I deepvan/run/statistics.cc:359] | Conv2D | 0.102 | 1.905 | 81.196 | 8.24141 | [1,1] | [0,0] | [876,192,1,1] | [1,7,7,876] | [1,1] | Clip_86 |
I deepvan/run/statistics.cc:359] | DepthwiseConv2d | 0.044 | 0.824 | 8.800 | 0.38632 | [1,1] | [2,2] | [1,876,3,3] | [1,7,7,876] | [1,1] | Clip_88 |
I deepvan/run/statistics.cc:359] | Conv2D | 0.211 | 3.959 | 39.077 | 8.24141 | [1,1] | [0,0] | [192,876,1,1] | [1,7,7,192] | [1,1] | Conv_89 |
I deepvan/run/statistics.cc:359] | Eltwise | 0.009 | 0.162 | 0.000 | 0.00000 | | | | [1,7,7,192] | | Add_90 |
I deepvan/run/statistics.cc:359] | Conv2D | 0.108 | 2.031 | 98.428 | 10.64986 | [1,1] | [0,0] | [1132,192,1,1] | [1,7,7,1132] | [1,1] | Clip_92 |
I deepvan/run/statistics.cc:359] | DepthwiseConv2d | 0.061 | 1.142 | 8.204 | 0.49921 | [1,1] | [2,2] | [1,1132,3,3] | [1,7,7,1132] | [1,1] | Clip_94 |
I deepvan/run/statistics.cc:359] | Conv2D | 0.297 | 5.579 | 61.216 | 18.19350 | [1,1] | [0,0] | [328,1132,1,1] | [1,7,7,328] | [1,1] | Conv_95 |
I deepvan/run/statistics.cc:359] | Conv2D | 0.207 | 3.895 | 110.606 | 22.95082 | [1,1] | [0,0] | [1428,328,1,1] | [1,7,7,1428] | [1,1] | Clip_97 |
I deepvan/run/statistics.cc:359] | Pooling | 0.030 | 0.573 | 0.000 | 0.00000 | [7,7] | [0,0] | [7,7] | [1,1,1,1428] | | AveragePool_99 |
I deepvan/run/statistics.cc:359] | FullyConnected | 0.692 | 12.998 | 2.063 | 1.42800 | | | [1000,1428,1,1] | [1,1,1,1000] | | Conv_100 |
I deepvan/run/statistics.cc:359] | BufferTransform | 0.011 | 0.216 | 0.000 | 0.00000 | | | | [1,1000,1,1] | | deepvan_node_469_mem_type_0 |
I deepvan/run/statistics.cc:359] | Reshape | 0.005 | 0.096 | 0.000 | 0.00000 | | | | [1,1000] | | Reshape_101 |
I deepvan/run/statistics.cc:359] | FullyConnected | 0.694 | 13.027 | 2.058 | 1.42800 | | | [1000,1428,1,1] | [1,1,1,1000] | | Conv_102 |
I deepvan/run/statistics.cc:359] | BufferTransform | 0.012 | 0.223 | 0.000 | 0.00000 | | | | [1,1000,1,1] | | deepvan_node_476_mem_type_0 |
I deepvan/run/statistics.cc:359] | Reshape | 0.005 | 0.095 | 0.000 | 0.00000 | | | | [1,1000] | | Reshape_103 |
I deepvan/run/statistics.cc:359] | Eltwise | 0.006 | 0.108 | 0.000 | 0.00000 | | | | [1,1000] | | Add_104 |
I deepvan/run/statistics.cc:359] | Eltwise | 0.006 | 0.112 | 0.000 | 0.00000 | | | | [1,1000] | | Div_105 |
Hint: Users would need to first put the ONNX model files into the XGen container. There are two approaches:
- They may put the ONNX model files under the
Projects/
folder of XGen on the host machine. Because that folder is mapped to/root/Projects
folder inside the XGen container, the files will be directly accessible to XGen. - They may use
docker cp
. Example:docker cp ./onnxModles xgen_<version>_<username>:/root/speedTests/
. TheonnxModels
will be copied into/root/speedTests/
inside the XGen container.
Using XGen in Non-Interactive Mode
XGen can be used in a non-interactive mode. The user needs to provide a task-description file. The config file is a json file that contains all the information needed to run XGen. The command to use in this mode is as follows:
XGen non-interactive --config-file /path/to/config.json --platform [android|ios|simulation] --devices device_id_a device_id_b ...
The argument --platform
is optional; if you don't specify it, the script will use the device_platform
config of the config file.
The argument devices
is one or more UUIDs of devices. It is optional, but is required if the --platform
argument is android
or ios
.
The config file also has device_platform
and devices
fields corresponding to the platform
and devices
fields in the command line. The command line values override those in the config file.
The non-interactive mode of XGen
allows the use of the models included in XGen (Common AI) or customized AI models, with default or customized datasets.
XGen includes some sample config files (in /root/noninteractive_config_sample
) for users' reference.
Following is an example config file of efficientnet on imageNet dataset:
{
"description": "this is a sample of common ai config file in non-interactive mode",
"device_platform_options": [
"android",
"ios",
"simulation"
],
"device_platform": "simulation",
"devices": [],
"xgen_model_options": {
"description": "different platforms support different models, 'simulation' platform supports all models'",
"android_support": [
"DigitNet_MNIST",
"EfficientNet_ImageNet",
"ResNet_ImageNet",
"MobileNet_ImageNet",
"ViT_ImageNet",
"YOLOv6_CoCo2017",
"UNet_ISBI-2012",
"R2+1d_UCF101",
"MobileBERT_Squad1.1"
],
"ios_support": [
"DigitNet_MNIST",
"EfficientNet_ImageNet",
"MobileNet_ImageNet",
"YOLOv6_CoCo2017",
"UNet_ISBI-2012",
"MobileBERT_Squad1.1"
]
},
"select_model": "EfficientNet_ImageNet",
"use_default_dataset": false,
"custom_dataset_config": {
"description": "if use_default_dataset == false, config the section, section title is the same as your select_model",
"dataset": {
"train_data_path": {
"required": true,
"value": "/data/jenkins_common_ai/data/imagenet_subsets/train"
},
"train_label_path": {
"required": false,
"value": ""
},
"eval_data_path": {
"required": true,
"value": "/data/jenkins_common_ai/data/imagenet_subsets/val"
},
"eval_label_path": {
"required": false,
"value": ""
},
"customized": {
"description": "if using customized distillation, please make sure num_classes as the same as this section",
"num_classes": {
"type": "int",
"required": true,
"path": 50
}
}
}
},
"use_distillation": false,
"use_default_distillation_config": true,
"custom_distillation_config": {
"teacher_model": {
"description": "if using customized dataset, please make sure num_classes as the same as this section",
"required": true,
"name": "efficientnet_b4",
"file_path": "/data/jenkins_common_ai/teacher_model/efficientnet_b4_best.pth.tar",
"num_classes": 50
}
},
"finetune_teacher_model_batch_size": 64,
"xgen_mode_options": {
"description": "different platforms support different xgen mode, 'simulation' platform supports all models'",
"android_support": [
"compatible_testing",
"scaling",
"pas",
"customization"
],
"ios_support": [
"compatible_testing",
"scaling",
"pas",
"customization"
]
},
"xgen_mode_recommended": "scaling",
"xgen_mode": "scaling",
"xgen_gpus": "0",
"xgen_learning_rate": 0.048,
"xgen_batch_size": 192,
"xgen_epochs": 1,
"pas_finetune_epoch": 200,
"primary_interest_options": [
"latency",
"quality_score"
],
"primary_interest": "latency",
"latency": "21ms",
"quality_score": ">=73",
"resume_workplace_path": "",
"custom_workplace_path": "",
"pretrained_model_path": "",
"express_path_config": {
"express_mode_description": "In the case of using default datasets and custom datasets, different operation modes support different express path optimization methods. The specific options are shown in the \"express_mode_options\" field.",
"express_mode_options": {
"compatible_testing": {
"default_dataset": {
},
"customized_dataset": {
}
},
"pruning": {
"default_dataset": {
"Express Optimization (Quickest, likely no training needed)": 0,
"Extensive Optimization (Taking several rounds of training time)": 2
},
"customized_dataset": {
"Fine Optimization (Taking about one round of training time)": 1,
"Extensive Optimization (Taking several rounds of training time)": 2
}
},
"scaling": {
"default_dataset": {
"Express Optimization (Quickest, likely no training needed)": 0,
"Extensive Optimization (Taking several rounds of training time)": 2
},
"customized_dataset": {
"Fine Optimization (Taking about one round of training time)": 1,
"Extensive Optimization (Taking several rounds of training time)": 2
}
},
"pas": {
"default_dataset": {
"Express Optimization (Quickest, likely no training needed)": 0,
"Extensive Optimization (Taking several rounds of training time)": 2
},
"customized_dataset": {
"Fine Optimization (Taking about one round of training time)": 1,
"Extensive Optimization (Taking several rounds of training time)": 2
}
},
"customization": {
"default_dataset": {
"Express Optimization (Quickest, likely no training needed)": 0,
"Extensive Optimization (Taking several rounds of training time)": 2
},
"customized_dataset": {
"Fine Optimization (Taking about one round of training time)": 1,
"Extensive Optimization (Taking several rounds of training time)": 2
}
}
},
"express_mode": 0
}
}
Following is an example config file for running XGen
on custom models (Custom AI).
{
"description": "this is a sample of customized ai config file in non-interactive mode",
"device_platform_options": [
"android",
"ios",
"simulation"
],
"device_platform": "simulation",
"devices": [],
"xgen_config_file": "/path/to/xgen.json",
"training_script_folder": "/path/to/training/script/folder",
"xgen_mode_options": {
"description": "different platforms support different xgen mode, 'simulation' platform supports all models'",
"android_support": [
"compatible_testing",
"pruning",
"scaling",
"pas",
"customization"
],
"ios_support": [
"compatible_testing",
"scaling",
"pas",
"customization"
]
},
"xgen_mode": "scaling",
"xgen_gpus": "0,1",
"xgen_learning_rate": 0.01,
"xgen_batch_size": 128,
"xgen_epochs": 3,
"pas_finetune_epoch": 200,
"primary_interest_options": [
"latency",
"quality_score"
],
"primary_interest": "latency",
"latency": "30ms",
"quality_score": ">=85",
"resume_workplace_path": "",
"custom_workplace_path": "",
"pretrained_model_path": "",
"use_distillation": false,
"use_default_distillation_config": false,
"custom_distillation_config": {
"teacher_model": {
"description": "if using customized dataset, please make sure num_classes as the same as this section",
"required": true,
"name": "EfficientNet_ImageNet",
"file_path": "/data/jenkins_common_ai/data/teacher_model/efficientnet-b0.pth",
"num_classes": 50
}
},
"finetune_teacher_model_batch_size": 16,
"express_path_config": {
"express_mode_description": "In the case of using default datasets and custom datasets, different operation modes support different express path optimization methods. The specific options are shown in the \"express_mode_options\" field.",
"express_mode_options": {
"compatible_testing": {
"default_dataset": {
},
"customized_dataset": {
}
},
"pruning": {
"default_dataset": {
},
"customized_dataset": {
}
},
"scaling": {
"default_dataset": {
},
"customized_dataset": {
}
},
"pas": {
"default_dataset": {
},
"customized_dataset": {
}
},
"customization": {
"default_dataset": {
},
"customized_dataset": {
}
}
},
"express_mode": 0
}
}
Parameter description for task-description file
description
Description of the task, which is used to describe the task in the non-interactive mode.
device_platform_options
It is a list of platforms that the user could choose from when filling the field device_patform
. There are currently three options: android
, ios
, simulation
.
device_platform
Users can pick one platform from the list in device_platform_options
for this field.
devices
A list of IDs of the device that the user wants to use. The user can run command xgen_devices
to see the ids of all the connected devices. If Simulation
is chose and the platform, this field will be ignored.
xgen_config_file
The path of the xgen.json file. Its content is about the specifications on the model and dataset to use, as in the interactive mode.
training_script_folder
The path of the training script folder. The training script in the folder should be named as train_script_main.py
.
xgen_model_options
A list of AI models that the user may choose from to fill the next field select_model
. This list depends on the device_platform
field as some models may work well on only some platforms. The user can refer to the sub-fields android_support
and ios_support
to find the supported models.
select_model
Pick one AI model in the list of xgen_model_options
.
use_default_dataset
If the user wants to use the default dataset for the selected model, set it to true
, otherwise, set it to false
, and config the custom_dataset_config
with the information of the custom dataset.
custom_dataset_config
If the user wants to use custom dataset, set the field use_default_dataset
to false
, and set the datasets detail information in this field of the config file (a sample is in /root/noninteractive_config_sample
folder).
There are 5 fields in the sub-section, train_data_path
, train_label_path
, eval_data_path
, eval_label_path
, customized
. The users can set their own dataset paths in these fields, the required
field is a tip. customized
is a field for customized parameters, the user can set their own parameters in this field. but need to notice, the type
field must be set, and the path
field must be set to the value of the parameter.
use_distillation
If the user wants to use distillation, set it to true
, otherwise, set it to false
. (So far, only the EfficientNet_ImageNet
, ResNet_ImageNet
, MobileNet_ImageNet
models support distillation.)
use_default_distillation_config
If the user wants to use the default distillation config, set it to true
, otherwise, set it to false
and config custom_distillation_config
with the information for the custom distillation.
custom_distillation_config
If the users want to use custom distillation, they need to set the field use_default_distillation_config
to false
, and set the distillation detail information (teacher model details) in this field (sample config files are in /root/noninteractive_config_sample
folder).
xgen_mode_options
This is a list of XGen modes that the user can choose from to fill the field xgen_mode
.
It depends on the device_platform
field. The user can refer to the sub-fields android_support
and ios_support
to find the modes that can work on the platform selected in the device_platform
field.
xgen_mode_recommended
It is the recommended mode to use for AI model compression (pruning
or scaling
). Note, this field is meaningful only for the Android platform because at this point only scaling
mode applies to iOS devices.
xgen_mode
Pick one xgen mode from xgen_mode_options
to use in this run.
xgen_gpus
It is a list of GPUs to use. Running nvidia-smi -L
can list all the GPUs on your machine. The above example is to use the first 8 GPUs.
xgen_batch_size
It is the per-GPU batch size to use. It is an integer.
xgen_learning_rate
It is the learning rate to use at the beginning of training. It is a floating-point number.
xgen_epochs
It is the number of epochs to use in training. It is an integer.
primary_interest_options
It is a list of interests a user can choose from to fill in primary_interest
. For now, XGen supports latency
and quality_score
only.
primary_interest
Pick one primary interest from primary_interest_options
.
latency
It is the target latency. It is a string (e.g., "30ms").
quality_score
It is the quality score that you want to use. It is a string (e.g., ">=85").
resume_workplace_path
If the user wants to resume the previous work, the user needs to set the path of the previous work in this field. Leave it empty otherwise. If resume_workplace_path
is filled, XGen
will check whether the select_model
, xgen_mode
and other parameters are the same as the previous work; if not, it will raise an error.
custom_workplace_path
It is the workplace path to use. It is a string (e.g., "/path/to/output", if the path does not exist, XGen
will create it). As in the interactive mode, it is optional; XGen will create a unique folder and inform the user on the screen if it is not specified. (If resume_workplace_path
is filled, custom_workplace_path
will be ignored.). If the path contains train data, it will raise an error.
pretrained_model_path
It is the pretrained model path to use. It is a string (e.g., "/path/to/pretrained_model_path"). As in the interactive mode, it is optional.
finetune_teacher_model_batch_size
It is the batch size to use in finetuning the teacher model. It is an integer.
express_mode_options
It is a list of express modes that the user can choose from to fill the field express_mode
. It depends on the device_platform
field. The user can refer to the sub-fields android_support
and ios_support
to find the modes that can work on the platform selected in the device_platform
field.
express_mode
Pick one express mode from express_mode_options
to use in this run.