CN-121981783-A - Instant advertisement strategy generation method based on transient gradient injection and entropy guided search
Abstract
The application relates to the technical field of advertisement delivery, in particular to an instant advertisement strategy generation method based on transient gradient injection and entropy guided search, which comprises the steps of carrying out manifold space mapping on user side features, context features and candidate advertisement features in an advertisement request to obtain an initial state vector representing the current advertisement request state; loading a generated world model, introducing a strategy network, initializing low-rank adapter parameters in the strategy network, constructing a shadow simulation field facing a current advertisement request, executing extremum-oriented Monte Carlo tree search in the shadow simulation field to generate a track set corresponding to the current advertisement request, constructing an entropy oriented objective function, executing transient gradient update on the low-rank adapter parameters to obtain an updated strategy network, carrying out final forward reasoning on an initial state vector based on the updated strategy network to generate an advertisement strategy, and resetting the low-rank adapter parameters after executing. The method and the device can improve the pertinence of advertisement strategy generation.
Inventors
- SONG JIAJUN
- WU RUIMING
- Gong Shizhu
Assignees
- 苏州品物智能科技有限责任公司
Dates
- Publication Date
- 20260505
- Application Date
- 20260402
Claims (10)
- 1. An immediate advertisement strategy generation method based on transient gradient injection and entropy guided search, which is characterized by comprising the following steps: acquiring an advertisement request, and performing manifold space mapping on user side features, context features and candidate advertisement features in the advertisement request to obtain an initial state vector representing the current advertisement request state; Loading a generated world model, introducing a strategy network, initializing low-rank adapter parameters in the strategy network, and constructing a shadow simulation field facing a current advertisement request; Performing extremum-oriented Monte Carlo tree search in the shadow simulation field based on the initial state vector to generate a track set corresponding to the current advertisement request; constructing an entropy guiding objective function based on the track set, and executing transient gradient update on the low-rank adapter parameters to obtain a strategy network facing the current advertisement request; and finally performing forward reasoning on the initial state vector based on the updated strategy network to generate an advertisement strategy, and resetting low-rank adapter parameters after the advertisement strategy is executed.
- 2. The method for generating an immediate advertisement strategy based on transient gradient injection and entropy guided search according to claim 1, wherein said manifold space mapping the user-side features, the context features and the candidate advertisement features in the advertisement request to obtain an initial state vector representing the current advertisement request state comprises: Extracting user-side features from advertisement requests Contextual features Candidate advertisement features Extracting features and then carrying out joint characterization treatment; setting a pre-trained feature compression network, and performing manifold space mapping on the features after the joint characterization, wherein the feature compression network adopts a multi-layer perceptron structure and is used for mapping an input high-dimensional sparse feature vector into a low-dimensional dense vector representation; The mapping function of the feature compression network is expressed as: Wherein, the method comprises the steps of, Representing a feature compression network; Network parameters representing a feature compression network; representing an input feature set formed by user side features, contextual features and candidate advertisement features; an initial state vector representing a current advertisement request in a low-dimensional manifold space; , wherein, Represents a set of real numbers, The representation is composed of Consisting of real numbers The space of the vector of the dimensional real numbers, Representing the vector dimension of the low-dimensional manifold space.
- 3. The method for generating the immediate advertisement strategy based on transient gradient injection and entropy guided search according to claim 1, wherein the loading the generated world model and introducing a strategy network, initializing low-rank adapter parameters in the strategy network, and constructing a shadow simulation field facing the current advertisement request comprises: loading generation type world model The input of the generated world model is the current state vector And to be operated Output is the predicted state at the next time Predicting rewards The mathematical expression is as follows: ; introducing policy networks Initializing transient adaptation parameters in a policy network, wherein, Bypass mounting low-rank adapter on key linear layer of policy network, and recording parameters of low-rank adapter as , As transient adaptation parameters which can be locally updated in the current advertisement request processing process and reset after the request is finished; Low rank adapter Comprising two matrices And , wherein, , ; Represents a set of real numbers, The representation is composed of The matrix space consisting of the individual real numbers, The representation is composed of Matrix space composed of real numbers; The dimensions of the state vector are represented, Representing the rank of a low rank adapter, an The policy output after introducing the low rank adapter can be expressed as: ; Wherein, the Representing only the main parameters Outputting the generated basic strategy; Representing an input state vector; representing a local modifier applied by the low rank adapter to the base policy output; Generating a world model by loading And initializing transient adaptation parameters Together with the policy network, a shadow simulation field is formed.
- 4. The method for generating an immediate advertisement strategy based on transient gradient injection and entropy guided search according to claim 3, wherein said performing extremum-guided monte carlo tree search in said shadow simulation field based on said initial state vector, generating a set of trajectories corresponding to a current advertisement request comprises: in the form of an initial state vector As the root node of the search tree, execute A second simulation cycle, each simulation cycle being performed from the current state vector Starting from the policy network in state vector Determining actions from the generated action distribution And vector the state And actions Input-generating world model Obtaining a corresponding prediction state vector Predicting rewards Predicting state vector As the current state of next deduction, continuing action selection and state deduction until reaching the preset maximum deduction depth ; For any node Its child node action The selection basis of (a) is expressed as: ; Wherein, the Representing selecting the action that maximizes the node selection function among all candidate actions, Representing nodes in a history simulation process Selection action The maximum observed prize value is then used, Representing the search coefficients; Is represented at the node Action output by policy network in corresponding state Is a priori probability of (2); Representing nodes Downward movement Access times of (2); Representing nodes The total number of accesses for all candidate actions, wherein, Representing nodes The candidate action index below; Upon completion of After the simulation cycle, the state, action and predicted reward sequence formed in each simulation cycle are collected to form a track set.
- 5. The method for generating an immediate advertisement strategy based on transient gradient injection and entropy guided search of claim 4, wherein said constructing an entropy guided objective function based on said set of trajectories comprises: track-based collection Construction of entropy-oriented objective function Entropy-oriented objective function Expressed as: ; Wherein, the Representing parameters to be updated, corresponding low-rank adapter parameters ; Representing a trajectory From a collection of trajectories Sampling; Representing pairs of trace sets Obtaining expectations by the samples; Representing a trajectory Corresponding weights; Is shown in the state The parameters are as follows Policy network selection actions of (1) Is used to determine the logarithmic probability of (1), Represents the entropy adjustment coefficient, an ; Information entropy for representing policy network output action distribution; The track weight The method is determined by adopting an exponential weighting function, and the expression is as follows: ; Wherein, the Representing an exponential function; Representing a trajectory Is a cumulative prize of (2); Representing a track set The average value of all track accumulated rewards; indicating a prize scale factor, accumulating prizes The result is accumulated by each step of predicted rewards in the track, namely: Wherein, the method comprises the steps of, Representing the track at the first Predicting rewards corresponding to the steps; representing the deduced step number in the track, Representing the maximum deduction depth of a single track.
- 6. The method for generating an immediate advertisement strategy based on transient gradient injection and entropy guided search according to claim 5, wherein performing transient gradient update on the low-rank adapter parameters to obtain a strategy network facing a current advertisement request comprises: In constructing entropy-oriented objective functions Thereafter, the objective function is calculated with respect to the low rank adapter parameters Gradient of (2) The expression is: ; Wherein, the Representation of parameters Obtaining a deflection guide; Obtaining gradient Then, a random gradient descent optimizer or an Adam optimizer is adopted for low-rank adapter parameters Execution of Step updating, and recording the updated low-rank adapter parameters as The update process is expressed as: Wherein, the method comprises the steps of, Representing low rank adapter parameters before update, Representing the learning rate.
- 7. The method of claim 1, wherein the generating the advertisement policy based on final forward reasoning of the initial state vector by the updated policy network, and resetting the low-rank adapter parameters after the advertisement policy is executed comprises: After receiving the initial state vector, the updated strategy network outputs probability distribution corresponding to each candidate action in the current state, selects the action with the highest probability from all candidate actions as a target action based on the probability distribution, analyzes action content corresponding to the target action into an advertisement strategy, and generates a final advertisement strategy corresponding to the current advertisement request; after the final advertisement strategy is obtained, the advertisement strategy is sent to an advertisement transaction system for execution, and strategy generation and delivery decision of the advertisement request are completed; after the advertisement strategy is executed, the low-rank adapter parameters are reset, so that the low-rank adapter parameters are restored to the initial state.
- 8. An immediate advertisement strategy generation system based on transient gradient injection and entropy-oriented search, comprising: The request acquisition module is used for acquiring an advertisement request, and performing manifold space mapping on user side features, context features and candidate advertisement features in the advertisement request to obtain an initial state vector representing the current advertisement request state; The simulation construction module is used for loading the generated world model, introducing a strategy network, initializing low-rank adapter parameters in the strategy network and constructing a shadow simulation field facing the current advertisement request; the track generation module is used for executing extremum-oriented Monte Carlo tree search in the shadow simulation field based on the initial state vector to generate a track set corresponding to the current advertisement request; The network updating module is used for constructing an entropy guiding objective function based on the track set, and executing transient gradient updating on the low-rank adapter parameters to obtain a strategy network facing the current advertisement request; And the strategy generation module is used for generating an advertisement strategy by carrying out final forward reasoning on the initial state vector based on the updated strategy network, and resetting the low-rank adapter parameters after the advertisement strategy is executed.
- 9. An electronic device, characterized in that the device comprises a processor and a memory, wherein the memory stores a program, and the program is loaded and executed by the processor to implement an instant advertisement strategy generation method based on transient gradient injection and entropy oriented search according to any one of claims 1 to 7.
- 10. A computer readable storage medium, wherein a program is stored in the storage medium, and when the program is executed by a processor, the program is used to implement an instant advertisement policy generation method based on transient gradient injection and entropy-oriented search as claimed in any one of claims 1 to 7.
Description
Instant advertisement strategy generation method based on transient gradient injection and entropy guided search Technical Field The application relates to the technical field of advertisement delivery, in particular to an instant advertisement strategy generation method based on transient gradient injection and entropy guided search. Background In the internet advertising process, the platform typically needs to generate an advertisement policy corresponding to the current request based on user characteristics, access scenarios, candidate advertisement materials, and business constraint information to determine specific decisions such as advertisement presentation, bid control, or material selection. With continuous refinement of advertisement scenes and increasingly complex user behaviors, the generation of advertisement strategies not only needs to meet the timeliness requirement of online response, but also needs to be attached to the user state and the context environment corresponding to the current request as much as possible, so that the actual effect of advertisement delivery is improved. The existing advertisement strategy generation mode generally adopts a processing mode of combining offline training and online reasoning. Specifically, the system firstly builds training data based on a historical advertisement interaction log, carries out offline training on a click rate estimation model, a conversion rate estimation model or a joint decision model, then keeps model parameters fixed in an online stage, extracts user side characteristics, context characteristics and candidate advertisement characteristics after receiving a new advertisement request, inputs the characteristics into the trained model for forward calculation, and generates a corresponding advertisement strategy result. Because the historical training data is more concentrated on head active users with frequent interaction and relatively sufficient behavior characteristics in actual sources, the model obtained by offline training mainly fits the existing historical distribution, and the online stage mainly completes the generation of a real-time strategy according to the model capability formed by the offline training. However, based on the above processing manner, further analysis can be performed, and the existing advertisement strategy generation dependent training data mainly comes from head active users with frequent interaction, and the feature distribution has obvious concentration. However, in the actual online request, there is a long-tail user with a low frequency of occurrence and rare feature combination compared with the high-frequency user at the head. For such users, such as certain small language users, users with atypical browsing paths, or potentially high-equity users, their requests tend to be outside of the historical training data distribution. In this case, the model obtained by offline training is fitted on the existing distribution, and the model parameters in the online stage are kept unchanged, so that the model parameters tend to output relatively conservative average prediction results, so that the strategy generation results are difficult to reflect the possibly-existing differential features in the current request, thereby underestimating the potential high-value conversion and further difficult to capture the decision opportunity with higher benefit. Therefore, how to improve the pertinence of advertisement strategy generation for long-tail user scenes which deviate from the main distribution of the historical training data in the processing process of single advertisement request becomes a difficult problem to be solved in the field. Disclosure of Invention The application provides an instant advertisement strategy generation method based on transient gradient injection and entropy guided search, which can improve the pertinence of advertisement strategy generation aiming at long-tail user scenes which deviate from the main distribution of historical training data in the processing process of single advertisement requests. The application provides the following technical scheme: in a first aspect, the present application provides a method for generating an immediate advertisement strategy based on transient gradient injection and entropy-oriented search, the method comprising: acquiring an advertisement request, and performing manifold space mapping on user side features, context features and candidate advertisement features in the advertisement request to obtain an initial state vector representing the current advertisement request state; Loading a generated world model, introducing a strategy network, initializing low-rank adapter parameters in the strategy network, and constructing a shadow simulation field facing a current advertisement request; Performing extremum-oriented Monte Carlo tree search in the shadow simulation field based on the initial state vector to generate a track set correspondin