CN-121998024-A - Model generalization capability promotion method, device, equipment and readable storage medium

CN121998024ACN 121998024 ACN121998024 ACN 121998024ACN-121998024-A

Abstract

A method, apparatus, device and readable storage medium for improving generalization capability of model. The method comprises the steps of enabling model optimization to directly utilize local knowledge and intention of users (users) in the field by collecting user preference feedback, reducing data threshold and privacy risk without acquiring sensitive original scene data or a large number of standard labels, constructing a reward model according to all preference feedback, summarizing and refining scattered and subjective user preference into a stable and computable scoring function, providing quantification standard for automatic optimization, continuously receiving feedback from the reward model (representing the user preference) in the reinforcement learning process, and gradually adjusting internal parameters of the reward model to generate output more in line with the user expectation. According to the application, the obtained optimized artificial intelligent model shows remarkably improved task compliance and scene adaptation capability in a specific application environment, namely, the model generalization capability is improved.

Inventors

DENG CHEN
SHEN LI
YU WENHAI
Xu Ruiwan
XU ANRAN
Chen haotian

Assignees

烽火通信科技股份有限公司

Dates

Publication Date: 20260508
Application Date: 20260126

Claims (10)

1. The model generalization capability improving method is characterized by comprising the following steps of: acquiring user input-based artificial intelligence model Output prediction result Preference feedback of (a) The preference feedback is used for indicating the user to aim at Desired model output Is superior to , The value of (2) is 1 to n, and n is a preset value; Constructing a reward model according to all preference feedback, wherein the reward model is used for scoring the output result of the strategy model; and taking the basic artificial intelligent model as an initialization model of the strategy model, and training the strategy model by adopting a reinforcement learning algorithm in combination with the reward model to obtain the optimized artificial intelligent model.
2. The model generalization ability enhancement method of claim 1, wherein said constructing a reward model from all preference feedback comprises: Preference-based feedback Obtaining training data Wherein, the method comprises the steps of, Comprising the following steps: 、 And ; Training a reward model using all training data, wherein the training goal of the reward model is for Make it opposite to Is higher than the pair Is a score of (2).
3. The model generalization ability improvement method of claim 2, wherein the loss function employed to train the reward model is a cross entropy loss function.
4. The model generalization ability improvement method of claim 1, wherein the reinforcement learning algorithm is a near-end policy optimization algorithm.
5. The model generalization capability promotion method of claim 1, wherein the basic artificial intelligence model is deployed in a target application scenario, the target application scenario being a network planning, network optimization or network operation and maintenance scenario in an optical communication network.
6. The model generalization ability enhancement method of claim 1, wherein the acquiring user-to-base artificial intelligence model is based on input at the time of the acquiring user-to-base artificial intelligence model Output prediction result Preference feedback of (a) Before, still include: Pre-training the initial model to obtain a pre-training model; and performing supervised fine tuning on the pre-training model by using the labeling data to obtain a basic artificial intelligent model.
7. The model generalization ability enhancement method of claim 1, further comprising, after said deriving an optimized artificial intelligence model: Iteratively updating the reward model based on newly acquired user preference feedback; combining the updated reward model, and adopting a reinforcement learning algorithm to carry out iterative training on the optimized artificial intelligent model; evaluating the model after iterative training by using an independent test set, and verifying the generalization capability of the model; And deploying the estimated model to an actual application scene, continuously monitoring the performance of the model, and acquiring new user feedback for subsequent optimization.
8. A model generalization capability promotion device, characterized in that the model generalization capability promotion device comprises: The acquisition module is used for acquiring the input-based input of the user to the basic artificial intelligence model Output prediction result Preference feedback of (a) The preference feedback is used for indicating the user to aim at Desired model output Is superior to , The value of (2) is 1 to n, and n is a preset value; The construction module is used for constructing a reward model according to all preference feedback, wherein the reward model is used for scoring the output result of the strategy model; And the optimization module is used for taking the basic artificial intelligent model as an initialization model of the strategy model, combining the reward model, and training the strategy model by adopting a reinforcement learning algorithm to obtain the optimized artificial intelligent model.
9. A model generalization capability promotion device comprising a processor, a memory, and a model generalization capability promotion program stored on the memory and executable by the processor, wherein the model generalization capability promotion program, when executed by the processor, implements the steps of the model generalization capability promotion method of any one of claims 1 to 7.
10. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a model generalization capability promotion program, wherein the model generalization capability promotion program, when executed by a processor, implements the steps of the model generalization capability promotion method according to any one of claims 1 to 7.

Description

Model generalization capability promotion method, device, equipment and readable storage medium Technical Field The application relates to the technical field of artificial intelligence, in particular to a method, a device and equipment for improving generalization capability of a model and a computer readable storage medium. Background With the deep application of artificial intelligence technology in complex fields such as optical communication network planning, optimization and operation and maintenance, a pre-trained general model faces serious generalization challenges. These generic models, while typically trained on widely-covering benchmark datasets, perform well on generic tasks, tend to degrade significantly when deployed directly to specific, environmentally disparate, existing networks or specific customer scenarios. This is because the data distribution of the target application scenario, the business targets and constraints differ from the training data, resulting in reduced model performance. In the prior art, in order to solve this problem, a method of retraining or fine-tuning a model based on target scene data is generally adopted. However, this approach suffers from significant drawbacks, firstly, it requires the collection of large amounts of annotation data for the target scene, is costly and involves data privacy and security issues, secondly, the process often requires manual intervention by algorithmic specialists, cannot be done independently and quickly by field users (e.g. network engineers), is difficult to support large-scale, personalized model deployment requirements, and finally, traditional fine tuning methods focus on fitting the model to limited scene data, possibly compromising the inherent generalization ability of the model to the unseen situation within the scene, resulting in overfitting. Disclosure of Invention The application provides a method, a device, equipment and a computer readable storage medium for improving generalization capability of a model, which can solve the technical problem that the generalization capability of a generalized model in the prior art is insufficient in an application scene of an optical communication network. In a first aspect, an embodiment of the present application provides a method for improving model generalization capability, where the method for improving model generalization capability includes: acquiring user input-based artificial intelligence model Output prediction resultPreference feedback of (a)The preference feedback is used for indicating the user to aim atDesired model outputIs superior to,The value of (2) is 1 to n, and n is a preset value; Constructing a reward model according to all preference feedback, wherein the reward model is used for scoring the output result of the strategy model; and taking the basic artificial intelligent model as an initialization model of the strategy model, and training the strategy model by adopting a reinforcement learning algorithm in combination with the reward model to obtain the optimized artificial intelligent model. With reference to the first aspect, in an implementation manner, the building a reward model according to all preference feedback includes: Preference-based feedback Obtaining training dataWherein, the method comprises the steps of,Comprising the following steps:、 And ; Training a reward model using all training data, wherein the training goal of the reward model is forMake it opposite toIs higher than the pairIs a score of (2). With reference to the first aspect, in one embodiment, the reward model is trained and the loss function employed is a cross entropy loss function. With reference to the first aspect, in one implementation manner, the reinforcement learning algorithm is a near-end policy optimization algorithm. With reference to the first aspect, in an implementation manner, the basic artificial intelligence model is deployed in a target application scenario, where the target application scenario is a network planning, network optimization or network operation and maintenance scenario in an optical communication network. With reference to the first aspect, in an implementation manner, the acquiring the user-to-base artificial intelligence model is based on inputOutput prediction resultPreference feedback of (a)Before, still include: Pre-training the initial model to obtain a pre-training model; and performing supervised fine tuning on the pre-training model by using the labeling data to obtain a basic artificial intelligent model. With reference to the first aspect, in an implementation manner, after the obtaining the optimized artificial intelligence model, the method further includes: Iteratively updating the reward model based on newly acquired user preference feedback; combining the updated reward model, and adopting a reinforcement learning algorithm to carry out iterative training on the optimized artificial intelligent model; evaluating the model after