CN-121996434-A - Distributed cluster construction method, distributed reasoning method and resource scheduler

CN121996434ACN 121996434 ACN121996434 ACN 121996434ACN-121996434-A

Abstract

The application provides a distributed cluster construction method, a distributed reasoning method and a resource scheduler, which can be applied to the technical field of computers. The distributed cluster construction method comprises the steps of responding to submitting operation for a system programming file, binding virtual nodes created based on the system programming file to physical nodes provided with hardware accelerators, invoking a plurality of physical nodes, pulling target reasoning mirror images from a mirror image warehouse to start container examples of the physical nodes respectively by the target reasoning mirror images, wherein the target reasoning mirror images are obtained by packaging development tool packages and distributed reasoning frameworks of the hardware accelerators of the physical nodes, calling accelerator plug-ins of the physical nodes, distributing resources of the hardware accelerators on the physical nodes to the container examples, and constructing the distributed cluster according to the container examples.

Inventors

SUN JIANQIANG
WANG ZHONGQIN
PAN HUIWEN

Assignees

北京可橙科技发展有限公司

Dates

Publication Date: 20260508
Application Date: 20260409

Claims (10)

1. A distributed cluster construction method, applied to a resource scheduler, comprising: Binding virtual nodes created based on a system orchestration file to physical nodes equipped with hardware accelerators in response to a commit operation for the system orchestration file, the system orchestration file indicating a desired build state for a distributed cluster; Invoking a plurality of physical nodes, and pulling a target reasoning mirror image from a mirror image warehouse to start respective container examples of the physical nodes by using the target reasoning mirror image, wherein the target reasoning mirror image is obtained by packaging a development kit and a distributed reasoning framework of a hardware accelerator of the physical nodes; invoking an accelerator plug-in of the physical node, and distributing resources of a hardware accelerator on the physical node to the container instance; The distributed cluster is constructed from a plurality of the container instances.
2. The method of claim 1, wherein said binding virtual nodes created based on system orchestration files to physical nodes equipped with hardware accelerators in response to commit operations for the system orchestration files comprises: Analyzing the system layout file to obtain main node configuration information and working node configuration information; creating a virtual master node and a virtual working node based on the master node configuration information and the working node configuration information; Invoking the virtual master node, and loading a plurality of model fragments of an inference model onto the virtual working node respectively based on the system programming file and the loading rule so as to support parallel inference of the plurality of model fragments based on the plurality of virtual working nodes respectively in task inference; Binding each of the virtual master node and the plurality of virtual work nodes to the physical node equipped with a hardware accelerator.
3. The method of claim 2, wherein the system orchestration file comprises hardware accelerator resources of the virtual work node, a number of model fragments, and a number of initial model service instances, the number of model fragments characterizing a degree of parallel splitting within a single initial model service instance; the method comprises the steps that a plurality of model fragments are respectively loaded onto a virtual working node based on the system programming file and loading rules, and the method comprises the steps of; determining the total resource demand according to the initial model service instance number and the model fragment number; determining a target virtual working node according to the total resource requirement and the hardware accelerator resource of the virtual working node; And loading the model fragments to target virtual working nodes meeting loading rules respectively, wherein the loading rules comprise loading the model fragments of the same initial model service instance to different hardware accelerators of the same target virtual working nodes, and the hardware accelerator resources are larger than the resources required by the model fragments.
4. The method of claim 3, wherein the system orchestration file further comprises a preset number range of working nodes; Wherein the determining a target virtual working node according to the total resource requirement and the resource of the hardware accelerator of the virtual working node includes: Determining the initial number of the virtual working nodes according to the total resource requirement and the resources of the hardware accelerator of the virtual working nodes; determining a target virtual node from a plurality of virtual work nodes based on the initial number if the initial number is within the preset number range; If the initial number is greater than the upper limit value of the preset number range, increasing the configuration number of the hardware accelerators of the virtual working nodes or reducing the initial model service instance number, and determining a target virtual working node from a plurality of virtual working nodes based on the initial number; and determining a target virtual working node from a plurality of virtual working nodes based on the lower limit value when the initial number is smaller than the lower limit value of the preset number range.
5. The method according to claim 1, wherein the method further comprises: Replacing the equipment identifier in the configuration environment with a hardware accelerator identifier to obtain a replaced configuration environment; And driving the distributed cluster to identify the hardware accelerator based on the replaced configuration environment.
6. A distributed reasoning method, applied to a distributed cluster, the method comprising: Responding to at least one reasoning request, and distributing at least one target task indicated by each reasoning request to obtain each target model fragment of at least one target task; virtual node distribution is carried out on the target model fragments of at least one target task respectively, so that at least one target virtual node of each target task is obtained; Invoking a container instance corresponding to the target virtual node in the distributed cluster, utilizing the resources of the hardware accelerator of the container instance to infer the target task to obtain an inference result, Wherein the distributed clusters are constructed according to the method of any one of claims 1 to 5.
7. The method according to claim 6, wherein the performing virtual node allocation on the target model segments of the at least one target task to obtain the target virtual nodes of the at least one target task, includes: Determining respective target virtual nodes of at least one target model fragment from preset mapping relations, wherein the preset mapping relations represent loading relations between the model fragments and the target virtual working nodes; And obtaining at least one target virtual node of each target task according to the target model fragments of each target task and the target virtual nodes of each target model fragment.
8. The method of claim 6, wherein the method further comprises: Embedding an acquisition interface based on a transmission protocol in the reasoning process of the target task; Collecting fluctuation degrees of reasoning performance and task quantity based on the collection interface, wherein the reasoning performance comprises at least one of task reasoning performance and resource utilization performance of a hardware accelerator; According to the fluctuation degree of the task number and a preset fluctuation threshold, the number of the inference model fragments of the target task is adjusted; And adjusting the number of the target virtual working nodes according to the reasoning performance and the preset performance threshold.
9. The method of claim 6, wherein the method further comprises: In the reasoning process of the target task, detecting the state change and the reasoning log of the target virtual node to obtain a detection result; And restarting the reasoning process of the target task under the condition that the detection result is an abnormal result.
10. A resource scheduler, comprising: One or more processors; a memory for storing one or more computer programs, Characterized in that the one or more processors execute the one or more computer programs to implement the steps of the method according to any one of claims 1-9.

Description

Distributed cluster construction method, distributed reasoning method and resource scheduler Technical Field The application relates to the technical field of computers, in particular to a distributed cluster construction method, a distributed reasoning method and a resource scheduler. Background The container arrangement platform can provide functions of automatic deployment, automatic expansion and contraction, and the like, and in the related large model reasoning method, after resource arrangement is carried out by the container arrangement platform, distributed task scheduling and service reasoning are realized by the distributed clusters. However, when the distributed cluster is used for large model reasoning deployment on the container programming platform, the technical problem that the distributed cluster is difficult to schedule the resources of the hardware accelerator on the container programming platform exists. Disclosure of Invention In view of the above problems, the present application provides a distributed cluster construction method, a distributed reasoning method, and a resource scheduler. According to a first aspect of the application, a distributed cluster construction method is provided, which comprises the steps of binding virtual nodes created based on a system layout file to physical nodes provided with hardware accelerators in response to a submitting operation for the system layout file, calling a plurality of physical nodes, pulling target inference images from an image warehouse to start respective container instances of the physical nodes by using the target inference images, wherein the target inference images are obtained by packaging development kits and distributed inference frames of the hardware accelerators of the physical nodes, calling accelerator plug-ins of the physical nodes, distributing resources of the hardware accelerators on the physical nodes to the container instances, and constructing the distributed cluster according to the container instances. According to the embodiment of the application, the development tool package of the hardware accelerator and the distributed reasoning framework are packaged to obtain the target reasoning mirror image of the adaptive hardware accelerator, a plurality of physical nodes in the container arrangement platform are called to respectively pull the target reasoning mirror image to start the corresponding container instance, the accelerator plug-in is deployed in the resource scheduler in the container arrangement platform, the automatic discovery and registration of the hardware accelerator on the container arrangement platform level are realized, the accelerator plug-in is called to allocate the resources of the hardware accelerator on the physical nodes to the container instance, a calculation resource pool of the distributed cluster is formed, the container instance is used as a carrier for executing the resources of the hardware accelerator, and the discovery and the resource scheduling of the hardware accelerator by the distributed cluster are realized. According to the embodiment of the application, the virtual node created based on the system programming file is bound to the physical node provided with the hardware accelerator in response to the submitting operation of the system programming file, and the method comprises the steps of analyzing the system programming file to obtain the configuration information of the master node and the configuration information of the working node, creating the virtual master node and the virtual working node based on the configuration information of the master node and the configuration information of the working node, calling the virtual master node, respectively loading a plurality of model fragments of an inference model onto the virtual working node based on the system programming file and the loading rule, respectively supporting parallel reasoning of the plurality of model fragments based on the plurality of virtual working nodes in task reasoning, and respectively binding the virtual master node and the plurality of virtual target working nodes to the physical node provided with the hardware accelerator. According to the embodiment of the application, the virtual master node can be dispatched to any idle physical node because the accelerator resource is not occupied, and each virtual work node is bound to a specific physical node provided with a hardware accelerator through the equipment plug-in according to the resource demand, so that a container instance in the virtual work node is created, the complete link from the logic virtual node to the physical hardware resource is opened, and the computational power supply and the performance isolation of the distributed reasoning task are ensured. According to the embodiment of the application, a system programming file comprises hardware accelerator resources of a virtual working node, a model fragment number and an initial m