CN-122021857-A - Parallel strategy searching method and device

CN122021857ACN 122021857 ACN122021857 ACN 122021857ACN-122021857-A

Abstract

The embodiment of the application discloses a parallel strategy searching method and device, which are used for improving the flexibility of parallel strategy searching. The method comprises the steps that a search system of parallel strategies obtains a plurality of parallel strategies, and each parallel strategy in the plurality of parallel strategies is used for indicating a parallel mode for training an AI model. An input interface is provided for obtaining input content for indicating at least one policy screening rule for screening a plurality of parallel policies. And screening the plurality of parallel strategies based on at least one strategy screening rule to determine a plurality of candidate parallel strategies in the plurality of parallel strategies. And evaluating the candidate parallel strategies to determine a target parallel strategy, wherein the target parallel strategy is a parallel strategy adopted by the training AI model.

Inventors

Qin Shanfu
GU YINGJIE
ZHENG YI
WANG ZHEFENG

Assignees

华为云计算技术有限公司

Dates

Publication Date: 20260512
Application Date: 20241111

Claims (20)

1. A search method of parallel policies, applied to a cloud platform for managing an infrastructure running cloud computing services, the infrastructure including at least one cloud data center, the infrastructure for training an artificial intelligence AI model, the method comprising: acquiring a plurality of parallel strategies, wherein each parallel strategy in the plurality of parallel strategies is used for indicating a parallel mode for training the AI model; providing an input interface for obtaining input content, the input content being for indicating at least one policy screening rule; screening the plurality of parallel strategies based on the at least one strategy screening rule, and determining a plurality of candidate parallel strategies in the plurality of parallel strategies; And evaluating the candidate parallel strategies to determine a target parallel strategy, wherein the target parallel strategy is a parallel strategy adopted for training the AI model.
2. The method of claim 1, wherein the at least one policy filtering rule includes one or more of a parallelization setting rule for indicating a setting rule of a single parallel manner in the parallel policy, a parallelization policy relation setting rule for indicating a setting rule of a relation between a plurality of parallel manners in the parallel policy, and a model segmentation rule for indicating a setting rule of a model segmentation point.
3. The method of claim 1 or 2, wherein the screening the plurality of parallel policies based on the at least one policy screening rule comprises: When a first parallel policy of the plurality of parallel policies meets the constraint condition in the at least one policy filtering rule, determining the first parallel policy as a candidate parallel policy, or And skipping a second parallel policy of the plurality of parallel policies when the second parallel policy does not conform to the constraint condition in the at least one policy filtering rule.
4. The method of any one of claims 1 to3, wherein the input content comprises one or more of a profile, a rule code, and graphical front end configuration information.
5. The method of claim 4, wherein the input content includes the graphical front end configuration information, the method further comprising: providing a Graphical User Interface (GUI) for displaying the graphical front end configuration information including one or more of rule Identification (ID), rule description and rule content.
6. The method of claim 5, wherein the method further comprises: Receiving a rule processing request, wherein the rule processing request comprises a processing request triggered by a user modifying the input content in the GUI; And one or more operations of adding policy screening rules, modifying policy screening rules and deleting policy screening rules are performed on the at least one policy screening rule based on the rule modification request.
7. The method of any one of claims 1 to 6, wherein evaluating the plurality of candidate parallel policies to determine a target parallel policy comprises: and evaluating the candidate parallel strategies based on a cost model, and determining a target parallel strategy, wherein the cost model is used for predicting the performance of the candidate parallel strategies, and the performance of the target parallel strategy is higher than that of other candidate parallel strategies except the target parallel strategy in the candidate parallel strategies.
8. The method of claim 7, wherein model parameters of the cost model are determined based on one or more of system information, task information, and history information.
9. A parallel policy search apparatus, comprising: An acquisition unit configured to acquire a plurality of parallel policies, each of the plurality of parallel policies being used to indicate a parallel manner of training the AI model; The acquisition unit is further used for providing an input interface, wherein the input interface is used for acquiring input content, and the input content is used for indicating at least one policy screening rule; the processing unit is used for screening the plurality of parallel strategies based on the at least one strategy screening rule and determining a plurality of candidate parallel strategies in the plurality of parallel strategies; the processing unit is further configured to evaluate the plurality of candidate parallel policies, and determine a target parallel policy, where the target parallel policy is a parallel policy used for training the AI model.
10. The apparatus of claim 9, wherein the at least one policy filtering rule comprises one or more of a parallelization setting rule, a parallelization policy relation setting rule, and a model segmentation rule, wherein the parallelization setting rule is used for indicating a setting rule of a single parallel mode in the parallel policy, the parallelization policy relation setting rule is used for indicating a setting rule of a relation among multiple parallel modes in the parallel policy, and the model segmentation rule is used for indicating a setting rule of a model segmentation point.
11. The apparatus according to claim 9 or 10, wherein the processing unit is further configured to: When a first parallel policy of the plurality of parallel policies meets the constraint condition in the at least one policy filtering rule, determining the first parallel policy as a candidate parallel policy, or And skipping a second parallel policy of the plurality of parallel policies when the second parallel policy does not conform to the constraint condition in the at least one policy filtering rule.
12. The apparatus of any one of claims 9 to 11, wherein the input content comprises one or more of a profile, a rule code, and graphical front end configuration information.
13. The apparatus of claim 12, wherein the input content includes the graphical front end configuration information, the obtaining unit further to: providing a Graphical User Interface (GUI) for displaying the graphical front end configuration information including one or more of rule Identification (ID), rule description and rule content.
14. The apparatus of claim 13, wherein the acquisition unit is further configured to: Receiving a rule processing request, wherein the rule processing request comprises a processing request triggered by a user modifying the input content in the GUI; the processing unit is further configured to perform one or more of adding policy screening rules, modifying policy screening rules, and deleting policy screening rules on the at least one policy screening rule based on the rule modification request.
15. The apparatus according to any one of claims 9 to 14, wherein the processing unit is specifically configured to: and evaluating the candidate parallel strategies based on a cost model, and determining a target parallel strategy, wherein the cost model is used for predicting the performance of the candidate parallel strategies, and the performance of the target parallel strategy is higher than that of other candidate parallel strategies except the target parallel strategy in the candidate parallel strategies.
16. The apparatus of claim 15, wherein model parameters of the cost model are determined based on one or more of system information, task information, and history information.
17. A computing device comprising a processor coupled with a memory, the processor to store instructions that, when executed by the processor, cause the electronic device to perform the method of any of claims 1-8.
18. A cluster of computing devices comprising at least one computing device, the computing device comprising a processor coupled with a memory, the processor to store instructions that, when executed by the processor, cause the cluster of computing devices to perform the method of any of claims 1 to 8.
19. A computer readable storage medium having instructions stored thereon, which when executed, cause a computer to perform the method of any of claims 1 to 8.
20. A computer program product comprising instructions which, when executed, cause a computer to carry out the method of any one of claims 1 to 8.

Description

Parallel strategy searching method and device Technical Field The embodiment of the application relates to the field of cloud computing, in particular to a search method and device for parallel strategies. Background In recent years, along with the development of artificial intelligence technology, a large model easily breaks through trillion scale parameters, and the traditional single-machine single-card mode cannot meet the requirement of training an oversized model, so that single-machine multi-card, even multi-machine multi-card, model training in a distributed mode is required, and the training performance of the large model is improved. At present, in order to better utilize the computing power of the computing device and the video memory space to perform distributed training on a large model, a parallel strategy, such as data parallel, tensor parallel, pipeline parallel, and recomputation, needs to be reasonably adopted. In the current large model training scheme based on cloud service, a training platform usually adopts an automatic parallel mode to train a large model, and the automatic parallel means that after a model and training cluster equipment are given, the system can automatically help a user to search an optimal or relatively optimal parallel strategy to train the large model. In the current automatic parallel scheme, the system needs to search a large number of parallel strategy options so as to select a globally optimal parallel strategy, however, when the automatic parallel scheme is provided for a user as a cloud service, a search system can only rely on some hard coding rules built in the cloud service to search the parallel strategy, so once the structural parameters of a model or the number of types of equipment clusters change, the parallel strategy search is performed based on the fixed hard coding rules, and therefore, the search flexibility of the parallel strategy is poor, and the globally optimal parallel strategy cannot be obtained. Disclosure of Invention The embodiment of the application provides a parallel strategy searching method, wherein an input interface can be provided for a user to input strategy screening rules by a parallel strategy searching system, so that the strategy screening rules can be flexibly adjusted, and the parallel strategy can be searched by the strategy screening rules, and the searching flexibility of the parallel strategy is improved. The embodiment of the application also provides a parallel strategy searching device, computing equipment, a computing equipment cluster, a computer readable storage medium and a computer program product corresponding to the parallel strategy searching method. In a first aspect, an embodiment of the present application provides a method for searching parallel policies, where the method may be performed by a computing device, or may be performed by a component of the computing device, for example, a processor, a chip, or a chip system of the computing device, or may be implemented by a logic module or software that can implement all or part of the functions of the computing device. The method provided in the first aspect includes the search system of parallel strategies obtaining a plurality of parallel strategies, each of the plurality of parallel strategies being used to indicate a parallel manner of training the AI model. The search system of parallel policies provides an input interface for obtaining input content for indicating at least one policy filtering rule for filtering a plurality of parallel policies. The parallel policy search system filters the plurality of parallel policies based on at least one policy filtering rule to determine a plurality of candidate parallel policies of the plurality of parallel policies. And the search system of the parallel strategy evaluates a plurality of candidate parallel strategies to determine a target parallel strategy, wherein the target parallel strategy is the parallel strategy adopted by the training AI model. According to the embodiment of the application, a user can flexibly set the strategy screening rule based on the input interface provided by the searching system of the parallel strategy, and the parallel strategy trained by the large model is screened based on the strategy screening rule, and then the target parallel strategy is evaluated and determined. In one possible implementation, the at least one policy filtering rule includes one or more of a parallelization setting rule, a parallelization policy relation setting rule, and a model slicing rule. The parallel strategy relation setting rules are used for indicating the setting rules of the relation among multiple parallel modes in the parallel strategy, and the model segmentation rules are used for indicating the setting rules of the model segmentation points. The strategy screening rules in the embodiment of the application comprise various types of rules, the various types of strategy screening rules c