CN-118869816-B - Scheduling method, device, system, medium, product and equipment for model reasoning

CN118869816BCN 118869816 BCN118869816 BCN 118869816BCN-118869816-B

Abstract

The application discloses a scheduling method, a device, a system, a medium, a product and equipment for model reasoning, the method comprises the steps of receiving a first scheduling request message from a reasoning request end, wherein the first scheduling request message is used for indicating at least one region and the respective reasoning request quantity, determining a first target service end from the service ends to be scheduled based on the first scheduling request message, returning a first scheduling response message to the reasoning request end, wherein the first scheduling response message carries first information of the first target service end, so that the scheduling end can select a proper first target service end for the reasoning request end to finish the reasoning task according to the requirements of the reasoning request end on different regions and the reasoning request quantity, and the situation that the reasoning request end only needs to send the first scheduling request message once can finish batch scheduling of the reasoning task for one or more regions is realized, so that scheduling delay in a regional batch scheduling scene is reduced.

Inventors

ZHU MINGWEI
BU ZHONGGUI
FENG ZHENG
LIU LEI
ZHOU WEI
WU QIAN
ZHAI ZHENHUI

Assignees

中国移动通信集团设计院有限公司
中国移动通信集团有限公司

Dates

Publication Date: 20260508
Application Date: 20240628

Claims (17)

1. A scheduling method of model reasoning, which is applied to a scheduling end, the method comprising: Receiving a first scheduling request message from an reasoning request terminal, wherein the first scheduling request message is used for indicating at least one region and the respective reasoning request quantity thereof; Determining a first target server from the servers to be scheduled based on the first scheduling request message; returning a first scheduling response message to the reasoning request terminal, wherein the first scheduling response message carries first information of the first target server terminal; Receiving a second scheduling request message from the reasoning request terminal, wherein the second scheduling request message carries second parameter information of the reasoning request terminal; Determining a second target server from the servers to be scheduled based on the second parameter information; A second scheduling response message is returned to the reasoning request terminal, wherein the second scheduling response message carries second information of the second target service terminal, the second scheduling response message is used for indicating the reasoning request terminal to judge whether the second information meets the requirement indicated by the second scheduling request message, and a second confirmation scheduling message is fed back when the second information meets the requirement; The second parameter information comprises geographic positions of the reasoning request terminals, the second target service terminals are determined from the service terminals to be scheduled based on the second parameter information, the second parameter information comprises the steps of searching corresponding first affinity scheduling service terminals from the service terminals to be scheduled based on the geographic positions, screening the first affinity scheduling service terminals by adopting a load balancing algorithm under the condition that the number of the first affinity scheduling service terminals is larger than 1 to obtain second affinity scheduling service terminals serving as the second target service terminals, and taking the first affinity scheduling service terminals as the second target service terminals under the condition that the number of the first affinity scheduling service terminals is equal to 1.
2. The scheduling method of model reasoning of claim 1 wherein determining a first target server from the servers to be scheduled based on the first scheduling request message comprises: Determining a first batch scheduling server corresponding to each region from the servers to be scheduled; Screening out a second batch scheduling server corresponding to each region from the first batch scheduling server corresponding to each region based on the reasoning request quantity; And determining all the second batch scheduling servers as the first target server.
3. The method for scheduling model reasoning of claim 2, wherein the first scheduling request message carries at least one regional parameter combination associated with the at least one region one by one, each regional parameter combination includes first parameter information of the associated region and a reasoning request number, information of the server to be scheduled is stored in a preconfigured KV database, and determining a first batch of scheduling servers corresponding to each of the at least one region from the server to be scheduled includes: constructing an array corresponding to each regional parameter combination; And aiming at each regional parameter combination, inquiring in the KV database based on an array corresponding to the regional parameter combination to determine all the servers to be scheduled meeting the requirements indicated by the regional parameter combination as a first batch of scheduling servers corresponding to the regions associated with the regional parameter combination.
4. The scheduling method of model inference as set forth in claim 1, wherein the first scheduling response message is used to instruct the inference requesting terminal to determine whether the first information meets the requirement indicated by the first scheduling request message, and to feed back a first acknowledgement scheduling message when it is determined that the first information meets the requirement.
5. The model-inferred scheduling method of claim 4, wherein the method further comprises: After the first target server is determined, setting the state of the first target server as a temporary non-selectable state; And responding to the first confirmation scheduling message, changing the state of the first target server into a scheduled state, and sending a first scheduling success response message to the reasoning request terminal, wherein the first scheduling success response message is used for indicating the reasoning request terminal to initiate a first reasoning request corresponding to the first scheduling request message to the first target server, and the first reasoning request carries a reasoning task to be borne by the first target server.
6. The model reasoning scheduling method of claim 1, wherein the second parameter information further includes equipment type and/or room information.
7. The model-inferred scheduling method of claim 1, wherein the method further comprises: after the second target server is determined, setting the state of the second target server as temporarily non-selectable; and responding to the second confirmation scheduling message, changing the state of the second target server into a scheduled state, and sending a second scheduling success response message to the reasoning request terminal, wherein the second scheduling success response message is used for indicating the reasoning request terminal to initiate a second reasoning request corresponding to the second scheduling request message to the second target server, and the second reasoning request carries a reasoning task to be born by the second target server.
8. The scheduling method of model reasoning of claim 1 wherein the first scheduling response message is further used to instruct the reasoning request terminal to cache the address information of the first target server terminal locally for a subsequent first reasoning task corresponding to the first scheduling request message after determining that the first information meets the requirement indicated by the first scheduling request message; The second scheduling response message is further configured to instruct the reasoning request terminal to cache, after determining that the second information meets a requirement indicated by the second scheduling request message, address information of the second target server terminal to a local location, so as to be used for a subsequent second reasoning task corresponding to the second scheduling request message.
9. The model-inferred scheduling method of claim 8, wherein the method further comprises: receiving a subscription message sent by the reasoning request terminal, wherein the subscription message is used for subscribing the state of at least one server terminal, and the at least one server terminal is at least one server terminal in the first target server terminal and/or the second target server terminal; Responding to the subscription message, and respectively configuring a corresponding load threshold for each of the at least one server side; And under the condition that the load reported by any one of the at least one service terminal exceeds a corresponding load threshold, feeding back a load state change message of the any one service terminal to the reasoning request terminal, wherein the load state change message is used for indicating the reasoning request terminal to delete the address information of the any one service terminal cached in the local.
10. The model-inferred scheduling method of claim 9 wherein, When the address information of any service end is deleted, stopping the first reasoning task needing to be born by the any service end, and converting the first reasoning task needing to be born by the any service end into corresponding scheduling request information to be added into a first scheduling request message sent next time; and/or the number of the groups of groups, And under the condition that any service end is required to bear the subsequent second reasoning task, the load state change message is also used for indicating the reasoning request end to stop the second reasoning task required to be borne by any service end when the address information of any service end is deleted, and converting the second reasoning task required to be borne by any service end into corresponding scheduling request information to be added into a second scheduling request message transmitted next time.
11. A scheduling method of model reasoning, applied to a reasoning requesting end, the method comprising: A first scheduling request message is sent to a scheduling end, wherein the first scheduling request message is used for indicating at least one region and the respective reasoning request quantity thereof; Receiving a first scheduling response message returned by the scheduling end, wherein the first scheduling response message carries first information of a first target server, and the first target server is determined from a server to be scheduled by the scheduling end based on the first scheduling request message; Sending a second scheduling request message to the scheduling end, wherein the second scheduling request message carries second parameter information of the reasoning request end; receiving a second scheduling response message returned by the scheduling end, wherein the second scheduling response message carries second information of a second target server; judging whether the second information meets the requirement indicated by the second scheduling request message, and feeding back a second acknowledgement scheduling message when judging that the second information meets the requirement; The second parameter information comprises geographic positions of the reasoning request terminals, the second target service terminals are determined from the service terminals to be scheduled based on the second parameter information, the corresponding first affinity scheduling service terminals are found out from the service terminals to be scheduled based on the geographic positions, the first affinity scheduling service terminals are screened by adopting a load balancing algorithm under the condition that the number of the first affinity scheduling service terminals is greater than 1, the second affinity scheduling service terminals are obtained to serve as the second target service terminals, and the first affinity scheduling service terminals are taken as the second target service terminals under the condition that the number of the first affinity scheduling service terminals is equal to 1.
12. A scheduling device for model reasoning, applied to a scheduling end, the device comprising: The first scheduling request message receiving module is used for receiving a first scheduling request message from the reasoning request terminal, wherein the first scheduling request message is used for indicating at least one region and the respective reasoning request quantity thereof; The first target server determining module is used for determining a first target server from the servers to be scheduled based on the first scheduling request message; The first scheduling response message sending module is used for returning a first scheduling response message to the reasoning request terminal, wherein the first scheduling response message carries first information of the first target server terminal; the device is also for: Receiving a second scheduling request message from the reasoning request terminal, wherein the second scheduling request message carries second parameter information of the reasoning request terminal; Determining a second target server from the servers to be scheduled based on the second parameter information; A second scheduling response message is returned to the reasoning request terminal, wherein the second scheduling response message carries second information of the second target service terminal, the second scheduling response message is used for indicating the reasoning request terminal to judge whether the second information meets the requirement indicated by the second scheduling request message, and a second confirmation scheduling message is fed back when the second information meets the requirement; The second parameter information comprises geographic positions of the reasoning request terminals, the second target service terminals are determined from the service terminals to be scheduled based on the second parameter information, the second parameter information comprises the steps of searching corresponding first affinity scheduling service terminals from the service terminals to be scheduled based on the geographic positions, screening the first affinity scheduling service terminals by adopting a load balancing algorithm under the condition that the number of the first affinity scheduling service terminals is larger than 1 to obtain second affinity scheduling service terminals serving as the second target service terminals, and taking the first affinity scheduling service terminals as the second target service terminals under the condition that the number of the first affinity scheduling service terminals is equal to 1.
13. A scheduling apparatus for model reasoning applied to a reasoning requesting terminal, the apparatus comprising: The first scheduling request message sending module is used for sending a first scheduling request message to a scheduling end, wherein the first scheduling request message is used for indicating at least one region and the respective reasoning request quantity thereof; The first scheduling response message receiving module is used for receiving a first scheduling response message returned by the scheduling terminal, wherein the first scheduling response message carries first information of a first target server, and the first target server is determined from the server to be scheduled by the scheduling terminal based on the first scheduling request message; the device is also for: Sending a second scheduling request message to the scheduling end, wherein the second scheduling request message carries second parameter information of the reasoning request end; receiving a second scheduling response message returned by the scheduling end, wherein the second scheduling response message carries second information of a second target server; judging whether the second information meets the requirement indicated by the second scheduling request message, and feeding back a second acknowledgement scheduling message when judging that the second information meets the requirement; The second parameter information comprises geographic positions of the reasoning request terminals, the second target service terminals are determined from the service terminals to be scheduled based on the second parameter information, the corresponding first affinity scheduling service terminals are found out from the service terminals to be scheduled based on the geographic positions, the first affinity scheduling service terminals are screened by adopting a load balancing algorithm under the condition that the number of the first affinity scheduling service terminals is greater than 1, the second affinity scheduling service terminals are obtained to serve as the second target service terminals, and the first affinity scheduling service terminals are taken as the second target service terminals under the condition that the number of the first affinity scheduling service terminals is equal to 1.
14. A model-based inferred scheduling system, comprising: An inference requesting terminal for executing the method of claim 11 or having the apparatus of claim 13 provided therein, and A scheduling end, where the scheduling end is configured to perform the method of any one of claims 1 to 10, or where the scheduling end is provided with the apparatus of claim 12.
15. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the method of any of claims 1-11.
16. A computer program product comprising computer instructions which, when executed by a processor, implement the method of any of claims 1-11.
17. A computer device comprising a processor, a memory and a computer program stored in the memory and configured to be executed by the processor, the processor implementing the method of any of claims 1-11 when the computer program is executed.

Description

Scheduling method, device, system, medium, product and equipment for model reasoning Technical Field The present application relates to the field of communications technologies, and in particular, to a scheduling method, apparatus, system, medium, product, and device for model reasoning. Background In the prior art, the deployment scheme of current model reasoning generally concentrates AI (ARTIFICIAL INTELLIGENCE ) requests on one reasoning server cluster, so that the reasoning server cluster is used as a reasoning executive party for scheduling and calculating, and the deployment scheme is matched with the characteristics of centralized and cloudy computing resources of an AI reasoning scene in the Internet industry. However, as industries such as communication networks start to intelligently transform and upgrade, a new scenario of AI reasoning deployment appears, namely, reasoning requesters and reasoning executives are distributed in each region in a distributed manner, but a part of AI reasoning requesters have certain requirements on the region where the reasoning executives are located (for example, the reasoning executives are required to be deployed closely relative to the requesters), and the existing scheduling scheme is difficult to meet the deployment requirements of the AI reasoning new scenario. Disclosure of Invention In order to solve the technical problems, the embodiment of the application provides a scheduling method, a scheduling device, a scheduling system, a scheduling medium, a scheduling product and scheduling equipment for model reasoning. The embodiment of the application provides a scheduling method of model reasoning, which is applied to a scheduling end and comprises the following steps: Receiving a first scheduling request message from an reasoning request terminal, wherein the first scheduling request message is used for indicating at least one region and the respective reasoning request quantity thereof; Determining a first target server from the servers to be scheduled based on the first scheduling request message; And returning a first scheduling response message to the reasoning request terminal, wherein the first scheduling response message carries first information of the first target service terminal. Further, the determining, based on the first scheduling request message, a first target server from the servers to be scheduled includes: Determining a first batch scheduling server corresponding to each region from the servers to be scheduled; Screening out a second batch scheduling server corresponding to each region from the first batch scheduling server corresponding to each region based on the reasoning request quantity; And determining all the second batch scheduling servers as the first target server. Further, the first scheduling request message carries at least one region parameter combination associated with the at least one region one by one, each region parameter combination includes first parameter information of an associated region and an inference request number, the information of the server to be scheduled is stored in a preconfigured KV database, and the determining, from the server to be scheduled, a first batch of scheduling servers corresponding to the at least one region respectively includes: constructing an array corresponding to each regional parameter combination; And aiming at each regional parameter combination, inquiring in the KV database based on an array corresponding to the regional parameter combination to determine all the servers to be scheduled meeting the requirements indicated by the regional parameter combination as a first batch of scheduling servers corresponding to the regions associated with the regional parameter combination. Further, the first scheduling response message is configured to instruct the reasoning request terminal to determine whether the first information meets a requirement indicated by the first scheduling request message, and to feed back a first acknowledgement scheduling message when it is determined that the first information meets the requirement. Further, the method further comprises: After the first target server is determined, setting the state of the first target server as a temporary non-selectable state; And responding to the first confirmation scheduling message, changing the state of the first target server into a scheduled state, and sending a first scheduling success response message to the reasoning request terminal, wherein the first scheduling success response message is used for indicating the reasoning request terminal to initiate a first reasoning request corresponding to the first scheduling request message to the first target server, and the first reasoning request carries a reasoning task to be borne by the first target server. Further, the method further comprises: Receiving a second scheduling request message from the reasoning request terminal, wherein the second scheduling request