CN-121981776-A - CTR (control parameter) estimation method and device based on large model, electronic equipment and storage medium
Abstract
The invention discloses a CTR estimation method, a CTR estimation device, electronic equipment and a CTR storage medium based on a large model, which relate to the technical field of computer technology and large model, and comprise the steps of obtaining original data of an object to be estimated; the method comprises the steps of processing original data as a table sequence, generating a corresponding feature vector for each column of data in the table sequence, generating a mask mark for a missing data position, carrying out feature encoding on the generated feature vector to obtain encoded feature embedding, combining the encoded feature embedding and the mask mark into an input sequence, inputting the input sequence into a preset large model structure, sequentially processing the input sequence through the four transform processing layers to obtain a target feature vector, and carrying out CTR estimation based on the target feature vector. According to the scheme, the accuracy of CTR estimation is improved.
Inventors
- YUAN CHANG
Assignees
- 深圳新度博望科技有限公司
Dates
- Publication Date
- 20260505
- Application Date
- 20251202
Claims (10)
- 1. The CTR estimation method based on the large model is characterized by comprising the following steps of: Obtaining original data of an object to be estimated, wherein the original data comprises user data, context data, advertisement data and title data; Processing the original data as a table sequence, generating a corresponding feature vector for each column of data in the table sequence, and generating a mask mark for the missing data position; performing feature coding on the generated feature vector to obtain coded feature embedding; Combining the coded feature embedding and the mask marking into an input sequence, and inputting the input sequence into a preset large model structure, wherein the large model structure comprises four transducer processing layers which are connected in sequence; Sequentially processing the input sequence through the four convertors processing layers to execute intra-domain deletion completion, cross-domain feature intersection, global refinement correction and global residual error refinement to obtain a target feature vector; and carrying out CTR estimation based on the target feature vector.
- 2. The method of claim 1, wherein generating a corresponding feature vector for each column of data in the table sequence and generating a mask tag for a missing data location comprises: performing barrel separation processing on the column data of the numerical value type and generating a numerical value characteristic vector through linear layer mapping; generating a category embedding vector by a table look-up mode aiming at column data of category types; For missing column data, a corresponding zero or one mask flag is generated to indicate the missing state of the data.
- 3. The method of claim 1, wherein the feature encoding the generated feature vector comprises: Processing the feature vector corresponding to each data field in the user data, the context data, the advertisement data and the title data through the full connection layer and the activation function respectively and independently to adjust the feature weight, so as to obtain a processed feature vector; Constraining the processed feature vectors by using a center loss function to gather the feature vectors belonging to the same user or the same commodity to respective centers; And updating the parameters of the full connection layer and the weight parameters in the characteristic coding process based on the gradient back propagation path of the CTR estimation task.
- 4. The method of claim 1, wherein sequentially processing the input sequence through the four fransformer processing layers to perform intra-domain miss-filling, cross-domain feature interleaving, global refinement correction, and global residual refinement, results in a target feature vector, comprising: Receiving the input sequence at a first processing layer, and performing first completion on the missing field in the domain according to the existing observation field by utilizing a domain self-aggregation mechanism to obtain a first output sequence; at a second processing layer, receiving the first output sequence, establishing an interaction relationship among the user data, the context data, the advertisement data and the characteristic representations corresponding to the title data contained in the first output sequence by using an attention mechanism, and extracting a cross-domain crossing signal to obtain a second output sequence; Receiving the second output sequence in a third processing layer, and carrying out secondary correction on the rest missing content by utilizing the complemented field in the second output sequence to obtain a third output sequence; at a fourth processing layer, the third output sequence is received, a final high-order feature representation is extracted from the third output sequence through a global residual connection, and the final high-order feature representation is taken as the target feature vector.
- 5. The method of claim 4, wherein the second processing layer extracts high-level interaction information of users, contexts and advertisements through a causal attention mechanism when extracting cross-domain crossing signals.
- 6. The method of claim 4, further comprising multitasking information generation, comprising: Inputting the target feature vector output by the fourth processing layer to a preset price prediction module to generate estimated bid information, wherein the estimated bid information is used for predicting the price of the advertisement in real-time bidding; and inputting the target feature vector output by the fourth processing layer to a preset tag classification module to generate classified tag information, wherein the classified tag information is used for determining media attributes, user tags or advertisement categories corresponding to the original data.
- 7. The method of claim 4, wherein the third processing layer is further configured to generate a match score and a correlation score based on the second output sequence, the generated match score being used to characterize a degree of match between the user data and the advertisement data, the correlation score being used to characterize a degree of semantic correlation between different data domain features.
- 8. A large model-based CTR estimation apparatus, comprising: the system comprises an acquisition module, a prediction module and a prediction module, wherein the acquisition module is used for acquiring original data of an object to be predicted, and the original data comprises user data, context data, advertisement data and title data; the first processing module is used for processing the original data as a table sequence, generating a corresponding feature vector for each column of data in the table sequence, and generating a mask mark for the missing data position; The coding module is used for carrying out feature coding on the generated feature vector to obtain coded feature embedding; The second processing module is used for embedding the coded features and combining the mask marks into an input sequence, and inputting the input sequence into a preset large model structure, wherein the large model structure comprises four transducer processing layers which are connected in sequence; The third processing module is used for sequentially processing the input sequence through the four Transformer processing layers so as to execute intra-domain deletion completion, cross-domain feature intersection, global refinement correction and global residual extraction to obtain a target feature vector; and the estimation module is used for carrying out CTR estimation based on the target feature vector.
- 9. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements the large model-based CTR estimation method according to any one of claims 1 to 7 when executing the computer program.
- 10. A computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements the large model-based CTR estimation method according to any one of claims 1 to 7.
Description
CTR (control parameter) estimation method and device based on large model, electronic equipment and storage medium Technical Field The present invention relates to the field of computer technologies and large model technologies, and in particular, to a CTR estimation method, device, electronic apparatus, and storage medium based on a large model. Background CTR (Click-Through Rate) estimation is a core technology affecting the advertisement Return On Investment (ROI) in an online advertisement delivery system, and the accuracy of the CTR directly determines the effectiveness of an advertisement bidding strategy and the platform profit. In the prior art, a Deep learning-based CTR estimation model (such as Wide & Deep, deep FM and the like) is generally adopted, and the model relies on massive positive sample data and complete structural feature input to realize stable convergence. However, the new DSP (Demand-Side Platform), which is a core technology Platform in the field of advertisement delivery, faces the dual challenges of extremely sparse positive samples and high loss rate of user data (such as age, gender). Under the condition, the CTR estimation model is often more dependent on sparse ID characteristics, and the prior art generally adopts simple strategies such as zero value, mean value or mode filling for the missing value, so that not only can the true zero value and the information missing cannot be effectively distinguished, but also significant deviation is introduced due to the fact that filling distribution is inconsistent with the true distribution, and the CTR estimation is inaccurate. Disclosure of Invention The embodiment of the invention provides a CTR estimation method and device based on a large model, electronic equipment and a storage medium, which are used for solving the problem of inaccurate CTR estimation caused by data sparsity and feature deletion. In a first aspect, a large model-based CTR estimation method is provided, including: Obtaining original data of an object to be estimated, wherein the original data comprises user data, context data, advertisement data and title data; Processing the original data as a table sequence, generating a corresponding feature vector for each column of data in the table sequence, and generating a mask mark for the missing data position; performing feature coding on the generated feature vector to obtain coded feature embedding; Combining the coded feature embedding and the mask marking into an input sequence, and inputting the input sequence into a preset large model structure, wherein the large model structure comprises four transducer processing layers which are connected in sequence; Sequentially processing the input sequence through the four convertors processing layers to execute intra-domain deletion completion, cross-domain feature intersection, global refinement correction and global residual error refinement to obtain a target feature vector; and carrying out CTR estimation based on the target feature vector. In an embodiment, the generating a corresponding feature vector for each column of data in the table sequence and generating a mask tag for a missing data location includes: performing barrel separation processing on the column data of the numerical value type and generating a numerical value characteristic vector through linear layer mapping; generating a category embedding vector by a table look-up mode aiming at column data of category types; For missing column data, a corresponding zero or one mask flag is generated to indicate the missing state of the data. In an embodiment, the feature encoding the generated feature vector includes: Processing the feature vector corresponding to each data field in the user data, the context data, the advertisement data and the title data through the full connection layer and the activation function respectively and independently to adjust the feature weight, so as to obtain a processed feature vector; Constraining the processed feature vectors by using a center loss function to gather the feature vectors belonging to the same user or the same commodity to respective centers; And updating the parameters of the full connection layer and the weight parameters in the characteristic coding process based on the gradient back propagation path of the CTR estimation task. In an embodiment, the processing, by the four converterler processing layers, the input sequence sequentially to perform intra-domain missing completion, cross-domain feature intersection, global refinement correction, and global residual refinement, to obtain a target feature vector includes: Receiving the input sequence at a first processing layer, and performing first completion on the missing field in the domain according to the existing observation field by utilizing a domain self-aggregation mechanism to obtain a first output sequence; at a second processing layer, receiving the first output sequence, establishing an interac