CN-114444709-B - Super-parameter optimization method and device and computing equipment

CN114444709BCN 114444709 BCN114444709 BCN 114444709BCN-114444709-B

Abstract

The embodiment of the application provides a super-parameter optimization method, a super-parameter optimization device and a computing device, wherein the method comprises the steps of determining a first super-parameter instance corresponding to the super-parameter of a data processing system; the method comprises the steps of determining first effect evaluation information corresponding to a first super-ginseng example based on a first training result of the first super-ginseng example in a data processing system, obtaining a plurality of historical super-ginseng examples and historical effect evaluation information corresponding to the plurality of historical super-ginseng examples respectively, and selecting a target super-ginseng example with highest effect evaluation from the first super-ginseng example and the plurality of historical super-ginseng examples according to the first effect evaluation information corresponding to the first super-ginseng example and the historical effect evaluation information corresponding to the plurality of historical super-ginseng examples respectively. The embodiment of the application improves the efficiency of super-parameter optimization.

Inventors

XIE MIAO
LIU CHUNCHEN

Assignees

阿里巴巴集团控股有限公司

Dates

Publication Date: 20260508
Application Date: 20201103

Claims (20)

1. A method of super-parametric optimization, comprising: Determining a first super parameter instance corresponding to the super parameter of the data processing system currently; determining first effect evaluation information corresponding to the first super-parameter instance based on a first training result of the first super-parameter instance in the data processing system, wherein the first effect evaluation information is used for evaluating the training effect of the first super-parameter instance in a data processing model, and the data processing model is constructed based on the first super-parameter instance; Acquiring historical effect evaluation information corresponding to a plurality of historical super-parameter examples; selecting a target super-ginseng example with highest effect evaluation from the first super-ginseng example and the plurality of history super-ginseng examples according to first effect evaluation information corresponding to the first super-ginseng example and history effect evaluation information corresponding to the plurality of history super-ginseng examples respectively; The determining, based on the first training result of the first super-parameter instance in the data processing system, first effect evaluation information corresponding to the first super-parameter instance includes: determining a training target of the data processing system and a first training result generated by the first super-parameter instance when the data processing system is trained; determining a first causal relationship between the first super-parameter instance and the training target based on the first training result; And determining first effect evaluation information corresponding to the first super-parameter instance according to the first causal relationship.
2. The method of claim 1, wherein determining first effect evaluation information corresponding to the first super-parameter instance according to the first causal relationship comprises: determining first effect information according to the matching degree between the first causal relation and the first super-parameter instance; Estimating first evaluation information corresponding to the generation of the first training result by the first super-parameter instance according to the first causal relationship; and determining first effect evaluation information corresponding to the first super-parameter instance based on the first effect information and/or the first evaluation information.
3. The method of claim 2, wherein the first causal relationship comprises a first causal network, and wherein determining first effect information based on a degree of matching between the first causal relationship and the first super-reference instance comprises: Determining a first causal model corresponding to the first causal network based on a causal network discovery algorithm; calculating a first data match value between the first super-parameter instance and the first causal model; And acquiring the first effect information according to the first data matching value.
4. The method of claim 2, wherein the first causal relationship comprises a first causal network, and wherein the estimating first evaluation information corresponding to the generation of the first training result by the first super-parameter instance based on the first causal relationship comprises: Determining a first effect model corresponding to the first causal network based on a causal effect estimation algorithm; inputting the first super-parameter instance into the first effect model, and calculating to obtain a first effect score; And acquiring the first evaluation information according to the first effect score.
5. The method as recited in claim 1, further comprising: Generating path interpretation information corresponding to the selection process of the target super-parameter instance according to the first effect evaluation information corresponding to the first super-parameter instance and the historical effect evaluation information corresponding to the plurality of historical super-parameter instances.
6. The method as recited in claim 5, further comprising: determining a training target of the data processing system and a first training result generated by the first super-parameter instance when the data processing system is trained; determining a first causal relationship between the first super-parameter instance and the training target based on the first training result; and generating implicit effect interpretation information between the super-parameter instance and the training target of the data processing system according to the first causal relation.
7. The method of claim 6, wherein the first causal relationship comprises a first causal network, further comprising: Determining a first causal model corresponding to the first causal network based on a causal network discovery algorithm; Determining a first effect model corresponding to the first causal network based on a causal effect estimation algorithm; inputting the first super-parameter instance into the first effect model, and calculating to obtain a first effect score; and generating decision interpretation information for selecting the target super-parameter instance as the super-parameter instance with the highest effect evaluation according to the first causal model and the first effect score.
8. The method of claim 1, wherein determining first effect evaluation information corresponding to the first super-parameter instance based on training results of the first super-parameter instance in the data processing system comprises: And if the first super-parameter instance meets the training condition, determining first effect evaluation information corresponding to the first super-parameter instance based on a training result of the first super-parameter instance in the data processing system.
9. The method as recited in claim 8, further comprising: and if the first super-parameter instance does not meet the training condition, generating first effect evaluation information with the evaluation effect lower than an effect threshold value for the first super-parameter instance randomly.
10. The method of claim 1, wherein the determining the training target of the data processing system and the first training result generated by the first super-parameter instance when the data processing system is trained comprises: constructing a data processing model corresponding to the data processing system by using the first super-parameter instance; determining a training target of the data processing system; acquiring a plurality of training groups, wherein any training group comprises a plurality of training data; Sequentially training according to the training targets to obtain sub-training results generated by the plurality of training groups in the data processing model respectively; And determining a first training result generated by the first super-parameter instance in the data processing system based on the sub-training results respectively corresponding to the plurality of training groups.
11. The method of claim 1, wherein determining the first super parameter instance to which the super parameter of the data processing system currently corresponds comprises: Determining third effect evaluation information corresponding to a third super parameter instance obtained by the last one of the plurality of history super parameter instances; and obtaining the first super-parameter instance corresponding to the super-parameter of the data processing system currently based on a third super-parameter instance and third effect evaluation information corresponding to the third super-parameter instance and combining a preset parameter updating algorithm.
12. The method of claim 1, wherein determining the first super parameter instance to which the super parameter of the data processing system currently corresponds comprises: detecting a parameter optimization request triggered by a first user aiming at the super parameter of the data processing system; and responding to the parameter optimization request, and determining a first super parameter instance corresponding to the super parameter of the data processing system.
13. The method as recited in claim 12, further comprising: and sending the target super-parameter instance to first user equipment of the first user so that the first user equipment outputs the target super-parameter instance for the first user.
14. The method according to claim 1, wherein selecting the target super-parameter instance with the highest effect evaluation from the first super-parameter instance and the plurality of history super-parameter instances according to the first effect evaluation information corresponding to the first super-parameter instance and the history effect evaluation information corresponding to the plurality of history super-parameter instances, respectively, comprises: selecting a plurality of candidate super-ginseng instances meeting effect evaluation conditions from the first super-ginseng instance and the plurality of history super-ginseng instances according to first effect evaluation information corresponding to the first super-ginseng instance and history effect evaluation information corresponding to the plurality of history super-ginseng instances respectively; Displaying a plurality of candidate super-ginseng instances to a second user for the second user to select a target super-ginseng instance from the plurality of candidate super-ginseng instances; And acquiring the target super-parameter instance selected by the second user.
15. The method of claim 1, wherein prior to determining the first super parameter instance to which the super parameter of the data processing system currently corresponds, further comprising: and extracting the super parameters and the training targets corresponding to the data processing system based on preset scene information.
16. The method as recited in claim 15, further comprising: Based on preset scene information, selecting a data processing system matched with the scene information from a plurality of candidate learning algorithms.
17. The method as recited in claim 15, further comprising: Scene information input by a third user is detected.
18. The method of claim 16, wherein selecting a data processing system from the plurality of candidate learning algorithms that matches the scene information based on preset scene information comprises: displaying the plurality of candidate learning algorithms to a third user; detecting the data processing system which is selected by the third user from the candidate learning algorithm and is matched with the scene information.
19. A hyper-parametric optimization device, comprising: The instance determining module is used for determining a first super parameter instance corresponding to the super parameter of the data processing system currently; The system comprises a data processing system, an effect evaluation module, a first causal relation, first effect evaluation information, a data processing model, a first effect evaluation information and a data processing model, wherein the data processing system is used for processing data, the effect evaluation module is used for determining a training target of the data processing system and a first training result generated by the first super-ginseng instance when the data processing system is trained; the history acquisition module is used for acquiring a plurality of history super-parameter examples and history effect evaluation information respectively corresponding to the plurality of history super-parameter examples; and the instance selection module is used for selecting a target super-ginseng instance with highest effect evaluation from the first super-ginseng instance and the plurality of history super-ginseng instances according to the first effect evaluation information corresponding to the first super-ginseng instance and the history effect evaluation information respectively corresponding to the plurality of history super-ginseng instances.
20. A computing device comprising a storage component and a processing component, the storage component storing one or more computer instructions, the one or more computer instructions being invoked by the processing component to perform the hyper-parameter optimization method of any one of claims 1-18.

Description

Super-parameter optimization method and device and computing equipment Technical Field The present application relates to the field of electronic devices, and in particular, to a method and apparatus for optimizing super parameters, and a computing device. Background The super parameter is a parameter set for the construction of the data processing model before the data processing model starts the learning process, and parameter data of model parameters obtained by training, for example, the number of hidden layers of the deep network in the machine learning model, the learning rate of the model, and the like, may all belong to the super parameter. The super-parameters of a data processing model can generally comprise a plurality of sub-super-parameters, when the respective parameter values of the sub-super-parameters are set, a parameter instance of the super-parameters can be formed, and the selection of the parameter instance of the super-parameters has a wide influence on the learning effect of the data processing model. In general, selecting a set of instances of the most efficient training hyper-parameters for the data processing model may improve the performance and effectiveness of the data processing model. In the prior art, the examples of the super parameters are generally manually determined by a trainer according to the use experience of the model before the data processing model is trained, and after the training result is obtained, the examples of the super parameters are adjusted by utilizing the training result so as to optimize the training effect of the data processing model. However, the manual adjustment of the parameter instance of the super parameter is not efficient and has low precision. Disclosure of Invention In view of the above, the embodiments of the present application provide a method, an apparatus, and a computing device for optimizing a super parameter, which are used to solve the technical problem in the prior art that the efficiency of manually adjusting the super parameter of a data processing model is low. In a first aspect, an embodiment of the present application provides a method for optimizing a super parameter, including: Determining a first super parameter instance corresponding to the super parameter of the data processing system currently; determining first effect evaluation information corresponding to the first super-parameter instance based on a first training result of the first super-parameter instance in the data processing system; Acquiring historical effect evaluation information corresponding to a plurality of historical super-parameter examples; and selecting a target super-parameter instance with highest effect evaluation from the first super-parameter instance and the plurality of history super-parameter instances according to the first effect evaluation information corresponding to the first super-parameter instance and the history effect evaluation information corresponding to the plurality of history super-parameter instances respectively. In a second aspect, an embodiment of the present application provides a super parameter optimization apparatus, including: And the instance determining module is used for determining a first super parameter instance corresponding to the super parameter of the data processing system currently. And the effect evaluation module is used for determining first effect evaluation information corresponding to the first super-parameter instance based on a first training result of the first super-parameter instance in the data processing system. The history acquisition module is used for acquiring a plurality of history super-parameter examples and history effect evaluation information respectively corresponding to the plurality of history super-parameter examples; and the instance selection module is used for selecting a target super-ginseng instance with highest effect evaluation from the first super-ginseng instance and the plurality of history super-ginseng instances according to the first effect evaluation information corresponding to the first super-ginseng instance and the history effect evaluation information respectively corresponding to the plurality of history super-ginseng instances. In a third aspect, an embodiment of the present application provides a computing device, including a storage component and a processing component, where the storage component stores one or more computer instructions, the one or more computer instructions are invoked by the processing component, and the processing component is configured to: The method comprises the steps of determining a first super-parameter instance corresponding to a super-parameter of a data processing system currently, determining first effect evaluation information corresponding to the first super-parameter instance based on a first training result of the first super-parameter instance in the data processing system, obtaining a plurality of historical super-p