Chapter 2. Requirements, Installation, Upgrade, Uninstallation
Overview
In previous versions, physical devices such as phones needed to be connected to the host machine where XGen was running. The workflow involved training with XGen and then executing tasks on devices connected to the local machine. After task completion, the process returned to XGen for further training, and this cycle repeated.
Subsequent practical experience revealed several drawbacks. Firstly, a limited number of test physical devices could be connected to a single machine, and these physical devices couldn't be used for other tests simultaneously. Secondly, this approach was inconvenient, particularly for users whose training machines were located in server rooms or data centers, making it challenging or impossible to connect physical devices directly to these machines.
To address these issues, the current version introduces the concept of a "Device Lab," which abstracts and decouples the process of conducting test tasks on physical devices within the XGen workflow. Essentially, the process involves XGen sending a task to a physical device, the device executing the task, and then sending the results back to XGen, enabling the continuous workflow.
The Device Lab consists of two parts: a controller and one or more agents. The controller acts as a dispatcher or router, while the agent serves as a docking station for physical devices, which are connected to the agent.
Agents are connected to the controller, and XGen is also connected to the controller. Communication between the three occurs through RPC.
The benefits of this approach are substantial decoupling between XGen and the target devices running tasks. Physical devices visible to XGen could come from multiple agents located in different places. For instance, one agent in Beijing might have 10 physical devices such as phones, while another agent in Boston has 20 physical devices. XGen doesn't need to be concerned about how the devices are connected and where they are located. The user of XGen just needs to choose the target devices she wants XGen to run tasks on; XGen will then send the tasks to the controller, which then distributes them to the corresponding agents. The agents then forward the tasks to the physical devices. After task completion, the physical devices send the results back to the agents, which in turn send the results to the controller. The controller forwards the results to XGen, and the process continues.
It's clear that this is a highly loosely-coupled distributed system. Apart from the controller (only one controller is needed), everything else can be horizontally scaled. Multiple agents, XGen instances, and physical devices can be added as needed. They can all be independently deployed as well. For example, agents could be deployed in phone data centers, XGen could be in a computing center, and the controller could be in the cloud. This flexibility allows deployment and scaling according to specific requirements. Of course, as you might have guessed, everything could also be deployed on a single machine, similar to the previous version of XGen.
In brief, Device Lab offers benchmark services on physical devices for XGen. XGen submits benchmark jobs to Device Lab Controller then Controller will dispatch the job to corresponding Device Lab Agent, which is responsible for running the job on the device. The benchmark results will be sent back to Controller; after gathering all benchmark results from all agents, the Controller sends the results to XGen. The included Device Lab Dashboard offers a web-based tool for monitoring the benchmark jobs, which is basically another user-friendly interface for Device Lab Controller.
Thus, the installation process of XGen is a little bit different from the previous version. For more information about Device Lab, please refer to Device Lab Introduction. For properly making things work, please follow the instructions below to install XGen and Device Lab.
Device Lab should be installed and configured before installing XGen. The 3 parts should be installed and configured in order: Controller (with Dashboard), Device Agent, and XGen. That is, XGen should be installed after everything is settled for Controller and Agent.
If there is only one host machine to install XGen and Device Lab, All-in-One
installation script is recommended. This scenario is described below as Scenario 3.
Installation scenarios
If deployment on multiple machines is required, it is preferable that the machines hosting xgen, controller, and agent are all on the same network. If they are not, then please install the controller on a machine with a fixed public IP address. This way, other machines with xgen and agent installed can establish communication by connecting to the machine with the installed controller. Of course, the ports 80, 19000, 28000, and 55672 need to be opened, as communication between xgen, controller, and agent relies on these ports.
If the installation is being done on a single machine, the above steps are not necessary.
Scenario 1: Multiple Machines
If there are multiple machines to deploy, please note that there should be only ONE Device Lab Controller as the center node.
Scenario 2: Two Machines
If there are two machines to deploy, XGen should be deployed on the machine with GPU, and Device Lab Controller should be deployed on the machine that has an external IP visible to other machines (in the scenarios below, the scenario A
is recommended).
Scenario 3: One Machine
If there is only ONE machine to deploy, the user can install XGen, Device Lab Controller and Device Lab Agent on the same machine. This machine must be equipped with GPU.
Services and Container Introduction
Of course, whether the XGen, Controller, and Agent services are installed separately on multiple machines or on a single machine, they all ultimately manifest as 4 Docker containers (other containers provide basic services such as Redis, etc.).
The Docker container corresponding to XGen is named xgen_<version>_<username>
, which is the core component for user interaction.
The Controller service is supported by two Docker containers: xgen_controller
and xgen_dashboard
. As their names suggest, the xgen_dashboard
mainly provides web functionality, while the main logic is centralized within xgen_controller
.
The Docker container for the Agent service is named xgen_agent
.
For these core containers, namely xgen_<version>_<username>
, xgen_controller
, and xgen_agent
, convenient commands have been developed for starting or entering the containers.
To enter a container, use the command run_<name>
. For instance, to enter the controller container, execute the command run_controller
. If the xgen_controller
container does not exist on the machine at that point, the command will first create and start the container before entering it. If the xgen_controller
container already exists on the machine, running run_controller
will simply enter the container, equivalent to the Docker command:
docker exec -it xgen_controller bash
run_agent
, the logic is the same as run_controller
.
However, run_xgen
is slightly different due to its support for multiple concurrent users, each running an independent container. Therefore, the --name option needs to be added, for example:
run_xgen --name bar
Exiting the containers is the same for all of them – running the exit command within the container will leave it.
exit
In the aforementioned containers, specific commands have been developed to check the status of running services.
xgenctl status
can be used to view the current status of services within the container. If issues arise, xgenctl restart
can be employed to restart the services.
Both XGen and Agent containers feature the xgen_devices
command, but the parameters slightly differ. Detailed differences can be viewed using the --help
parameter.
Installation
Based on the information provided above, you should now have an understanding of your deployment scenario.
If you need to install XGen and Device Lab on a single machine, please refer to the deployment documentation for a single machine.
If XGen and Device Lab's Controller and Agent are being installed on separate machines, please consult the deployment documentation for multiple machines.