Search

CN-122019066-A - Scheduling method, scheduling device, electronic equipment, storage medium and program product

CN122019066ACN 122019066 ACN122019066 ACN 122019066ACN-122019066-A

Abstract

The application discloses a scheduling method, a scheduling device, electronic equipment, storage media and program products, which are applied to a scheduling system, wherein the method comprises the steps of obtaining a target request, wherein the target request comprises input information to be input to a preset model; according to the input information, predicting index parameters corresponding to each of a plurality of preset example scheduling schemes, wherein the index parameters corresponding to the example scheduling schemes are used for representing performance of scheduling examples according to the example scheduling schemes to execute target tasks, determining a target scheduling scheme from the plurality of preset example scheduling schemes according to the index parameters corresponding to each example scheduling scheme, and completing the target tasks through scheduling examples according to the target scheduling scheme. The implementation system can adaptively adopt an instance scheduling scheme adapted to the target request in the reasoning process, and the limitation that the scheduling system only supports one instance scheduling scheme is overcome, so that the efficient reasoning of the target request is realized, and the performance of the whole system is improved.

Inventors

  • ZHU BO
  • Zeng Fengzheng
  • CHEN JIE

Assignees

  • 成都华为技术有限公司

Dates

Publication Date
20260512
Application Date
20241031

Claims (17)

  1. 1. A scheduling method, applied to a scheduling system, the method comprising: acquiring a target request, wherein the target request is used for requesting to execute a target task by using a preset model, and the target request comprises input information to be input to the preset model; Predicting index parameters corresponding to each of a plurality of preset instance scheduling schemes according to the input information, wherein the instance scheduling schemes represent the number of tasks and task stages which can be executed by an instance of the preset model in a target time period; determining a target scheduling scheme from the plurality of preset example scheduling schemes according to index parameters corresponding to each example scheduling scheme; and according to the target scheduling scheme, completing the target task by scheduling the instance.
  2. 2. The scheduling method according to claim 1, wherein the index parameter includes a comprehensive delay value and/or a comprehensive throughput, the comprehensive delay value representing a total delay value of task phases of the target task, and the comprehensive throughput representing a total throughput of task phases of the target task.
  3. 3. The scheduling method according to claim 2, wherein the determining, according to the index parameter corresponding to each of the example scheduling schemes, the target scheduling scheme from the plurality of preset example scheduling schemes includes: and determining an example scheduling scheme with the minimum comprehensive delay value or the maximum comprehensive throughput from the plurality of preset example scheduling schemes as the target scheduling scheme.
  4. 4. A scheduling method according to any one of claims 1 to 3, wherein predicting, according to the input information, an index parameter corresponding to each of a plurality of preset instance scheduling schemes includes: acquiring related information of a historical task; predicting a target information length according to the input information and the related information of the historical task, wherein the target information length is the length of output information corresponding to the target task; And predicting index parameters corresponding to each example scheduling scheme according to the length of the input information, the target information length and target performance parameters, wherein the target performance parameters are used for representing the performance of the example for executing the target task.
  5. 5. The scheduling method according to claim 4, wherein before predicting the index parameter corresponding to each of the example scheduling schemes according to the length of the input information, the target information length, and the target performance parameter, the method further comprises: and estimating the target performance parameter according to the related information of the historical task.
  6. 6. A scheduling method according to claim 4 or 5, wherein the index parameter comprises a composite delay value representing a total delay value of task phases of the target task; the target performance parameters include performance parameters corresponding to each task stage of the target task; predicting the index parameter corresponding to each example scheduling scheme according to the length of the input information, the target information length and the target performance parameter, including: For each example scheduling scheme, predicting a phase delay value of each task phase of the target task according to the length of the input information, the target information length and the performance parameter corresponding to each task phase of the target task; And obtaining the comprehensive time delay value corresponding to the example scheduling scheme according to the stage time delay value of each task stage of the target task.
  7. 7. The scheduling method of claim 6, wherein the task phases include a pre-fill prefill phase and a decode phase, wherein the target performance parameters include a first performance parameter and a second performance parameter, wherein the first performance parameter is used to characterize an average time delay of the instance when executing prefill phase of the target task, and wherein the second performance parameter is used to characterize an average time delay of the instance when executing the decode phase of the target task; The stage delay value of prefill stages of the target task accords with the proportional relation with the first performance parameter, and the stage delay value of the decode stage of the target task is in the proportional relation with the second performance parameter.
  8. 8. The scheduling method of claim 7, wherein the instance scheduling scheme comprises a first instance scheduling scheme, wherein the first instance scheduling scheme indicates whether an instance performs prefill or decode phase of the same task in the target time period; under the first example scheduling scheme, the stage delay value of prefill stages of the target task is a first P delay value, and the stage delay value of the decode stage is a first D delay value; The first P delay value and the length of the input information and the first performance parameter conform to the following relation: In the formula, For the first P delay value, α is a first adjustment coefficient, N p is the length of the input information, and T p is the first performance parameter; The first D time delay value and the target information length and the second performance parameter accord with the following relation: In the formula, For the first D delay value, T wait is the waiting scheduling delay, For the target information length, T d is the second performance parameter.
  9. 9. The scheduling method according to claim 7 or 8, wherein the instance scheduling scheme includes a second instance scheduling scheme, the second instance scheduling scheme representing a prefill phase and a decode phase in which one instance performs different tasks in the target time period; Under the second example scheduling scheme, the stage delay value of prefill stages of the target task is a second P delay value, and the stage delay value of the decode stage is a second D delay value; the second P delay value and the length of the input information and the first performance parameter conform to the following relation: In the formula, For the second P delay value, beta is a second adjustment coefficient, T p is the first performance parameter, For the length of the block prefill, Determining according to the length of the input information and a preset block size; the second D delay value and the target information length and the second performance parameter conform to the following relationship: In the formula, For the second D delay value, For the target information length, T d is the second performance parameter.
  10. 10. The scheduling method according to any one of claims 7-9, wherein the instance scheduling scheme comprises a third instance scheduling scheme, the third instance scheduling scheme representing a prefill phase and a decode phase, respectively, of a plurality of instances performing different tasks in the target time period; Under the third example scheduling scheme, the stage delay value of prefill stages of the target task is a third P delay value, and the stage delay value of the decode stage is a third D delay value; The third P delay value and the length of the input information and the first performance parameter conform to the following relationship: In the formula, For the third P delay value, gamma is a third adjustment coefficient, N p is the length of the input information, T p is the first performance parameter for KV multiplexing length of the input information; the third D delay value and the target information length and the second performance parameter conform to the following relationship: In the formula, For the third D delay value, T wait is the waiting scheduling delay, Is the KV transmission time delay, Is determined based on information related to the historical task, For the length of the target information to be described, For the length of the block prefill, And determining T d as the second performance parameter according to the length of the input information and the preset block size.
  11. 11. The scheduling method according to any one of claims 1-10, wherein the completing the target task by scheduling the instance according to the target scheduling scheme comprises: predicting a target super parameter corresponding to the target scheduling scheme; according to the target scheduling scheme and the target super parameters, configuring an instance of the preset model to obtain a target instance; the target instance is used for executing the target task according to the target scheduling scheme based on the target super parameter; The target task is performed by scheduling the target instance.
  12. 12. The scheduling method according to claim 11, wherein predicting the target super parameter corresponding to the target scheduling scheme includes: Acquiring the current system load and the related information of the historical task of the dispatching system; predicting a target information length according to the input information and the related information of the historical task, wherein the target information length is the length of output information corresponding to the target task; And predicting a target super parameter corresponding to the target scheduling scheme according to the length of the input information, the target information length and the current system load.
  13. 13. Scheduling method according to any one of claims 4, 5, 10 and 12, characterized in that, The relevant information of the historical task comprises the lengths of the historical input information and the historical output information corresponding to the historical task, the time distribution of the lengths of the historical input information and the historical output information, the historical performance parameters and the historical key value cache distribution information, wherein the historical performance parameters are used for representing the performance of the instance execution historical task.
  14. 14. A scheduling apparatus for use in a scheduling system, the apparatus comprising: The system comprises an acquisition module, a target processing module and a processing module, wherein the acquisition module is used for acquiring a target request, the target request is used for requesting to execute a target task by using a preset model, and the target request comprises input information to be input to the preset model; The system comprises a prediction module, a prediction module and a control module, wherein the prediction module is used for predicting index parameters corresponding to each instance scheduling scheme in a plurality of preset instance scheduling schemes according to the input information, wherein the instance scheduling schemes represent the number of tasks and task phases which can be executed by an instance of a preset model in a target time period; the determining module is used for determining a target scheduling scheme from the plurality of preset example scheduling schemes according to index parameters corresponding to each example scheduling scheme; and the scheduling module is used for completing the target task by scheduling the instance according to the target scheduling scheme.
  15. 15. An electronic device comprising a processor and a memory for storing instructions executable by the processor; The processor being configured to, when executing the instructions, cause the electronic device to implement the scheduling method of any one of claims 1-13.
  16. 16. A storage medium having stored thereon computer program instructions which, when executed by an electronic device, cause the electronic device to implement the scheduling method of any one of claims 1-13.
  17. 17. A program product comprising a computer readable storage medium storing a computer program which, when executed by at least one processor, causes the at least one processor to perform the scheduling method of any one of claims 1-13.

Description

Scheduling method, scheduling device, electronic equipment, storage medium and program product Technical Field The present application relates to the field of computer technologies, and in particular, to a scheduling method, apparatus, electronic device, storage medium, and program product. Background With the continuous development and popularization of deep learning technology, the efficient large model reasoning is a good large model for supporting thousands of industries. In the related art, a model inference scheduling system generally only supports one instance scheduling scheme in the inference process, which results in that a large model can only adopt one instance scheduling scheme in the inference process, that is, according to one instance scheduling scheme, tasks are executed by scheduling instances, so that it is difficult to adaptively adopt a scheduling scheme adapted to a target request in the process of inferring tasks. Disclosure of Invention The embodiment of the application aims to provide a scheduling method, a scheduling device, electronic equipment, a storage medium and a program product. The method and the device can adaptively adopt an example scheduling scheme adapted to the target request in the process of executing the target task, and improve the processing efficiency. In order to achieve the above object, the embodiments of the present application provide the following solutions: The method comprises the steps of obtaining a target request, wherein the target request is used for requesting to execute target tasks by using a preset model, the target request comprises input information to be input to the preset model, predicting index parameters corresponding to each of a plurality of preset instance scheduling schemes according to the input information, the instance scheduling schemes represent the number of tasks and task stages which can be executed by an instance of the preset model in a target time period, the index parameters corresponding to the instance scheduling schemes are used for representing performance of scheduling the instance to execute the target tasks according to the instance scheduling schemes, determining the target scheduling schemes from the plurality of preset instance scheduling schemes according to the index parameters corresponding to each instance scheduling scheme, and completing the target tasks through the instance according to the target scheduling schemes. The scheduling system can be applied to the scene of large model reasoning. The scheduling system can support a plurality of preset example scheduling schemes, such as a first example scheduling scheme, a second example scheduling scheme, a third example scheduling scheme and the like, that is to say, the scheduling system integrates a plurality of example scheduling schemes, can be suitable for various different target requests (such as input information with different lengths), does not need to utilize a plurality of independent systems or tools to respectively process different target requests, and has good universality, suitability and flexibility. Based on the above, the performance of each instance scheduling scheme when executing the target request can be determined according to the performance parameters corresponding to each instance scheduling scheme in a plurality of preset instance scheduling schemes, then the target inference scheduling is determined according to the performance parameters corresponding to each instance scheduling scheme, and the target task is completed through the scheduling instance according to the target inference scheduling. Therefore, the system can adaptively adopt the example scheduling scheme adapted to the target request in the reasoning process, and the limitation that the scheduling system only supports one example scheduling scheme in the related technology is overcome, so that the efficient reasoning on the target request is realized, and the performance of the whole system is improved. In one possible implementation, the index parameter includes a composite latency value representing a total latency value of the task phase of the target task and/or a composite throughput representing a total throughput of the task phase of the target task. In this way, the performance of scheduling instance execution target tasks according to the instance scheduling scheme can be more fully reflected because the index parameters include integrated latency values and/or integrated throughput. In one possible implementation, determining the target scheduling scheme from a plurality of preset example scheduling schemes according to the index parameter corresponding to each example scheduling scheme includes determining an example scheduling scheme with the minimum integrated delay value or the maximum integrated throughput among the plurality of preset example scheduling schemes as the target scheduling scheme. And determining an example scheduling scheme with the min