CN-122022919-A - Advertisement click rate estimation method and system based on depth and ensemble learning
Abstract
The invention provides an advertisement click rate estimation method and system based on depth and ensemble learning, which relate to the technical field of data analysis, wherein the method comprises the following steps: the method comprises the steps of carrying out multi-source collection and unified processing on advertisement delivery logs, constructing an event chain formed by exposure, clicking and conversion, distinguishing marked samples from unmarked samples based on conversion feedback and observation time, modeling conversion delay distribution by using the marked samples, restraining dynamic weights of the unmarked samples, carrying out collaborative operation on delay modeling and click rate prediction under an integrated learning framework, embedding the delay modeling and click rate prediction into a real-time prediction engine, and outputting a more stable and accurate advertisement click rate prediction result in the same calculation link and directly driving delivery control. The method and the device can improve the accuracy of the estimation of the click rate of the advertisement.
Inventors
- CHEN CHENG
- BAO ZHIHUI
Assignees
- 武汉卓尔数字传媒科技有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20251226
Claims (10)
- 1. An advertisement click rate estimating method based on deep and integrated learning is characterized by comprising the following steps: Performing multi-source data acquisition and preset processing on a release log of a platform to enable an advertisement exposure event, a click event and a conversion event to form an event chain on the same time axis; distinguishing marked samples for which conversion feedback has been completed from unmarked samples for which conversion feedback has not been generated in the event chain based on a relationship between conversion feedback timestamps and observation cutoff timestamps; constructing a delay distribution estimation model based on the marked samples, modeling the time interval from the click event to the conversion event, and outputting a delay distribution representation of conversion feedback for the unmarked samples by combining the running state characteristics of the advertisement delivery system; Introducing the delay distribution characterization as a constraint condition, and applying dynamic weight to the unlabeled sample according to the delay distribution characterization when the unlabeled sample is processed, so as to construct an advertisement click rate prediction model; An integrated learning framework formed by the cooperation of the delay distribution estimation model and the advertisement click rate prediction model; Embedding the integrated learning framework into a real-time prediction engine, enabling the delay distribution characterization, the dynamic weight generation and the advertisement click rate prediction to be completed in the same calculation link, and outputting an advertisement click rate prediction result; And driving advertisement putting control operation based on the advertisement click rate estimation result.
- 2. The method for estimating click rate of advertisement based on deep and ensemble learning according to claim 1, wherein said constructing a delay distribution estimation model based on said marked samples, modeling a time interval between said click event and said conversion event, and outputting a delay distribution characterization of conversion feedback for said unmarked samples in combination with an advertisement delivery system running state feature, specifically comprises: extracting a first click time stamp and a first conversion feedback time stamp corresponding to each marked sample; Determining a delay duration based on a time difference between the first click timestamp and the first transition feedback timestamp; associating the running state characteristics in the occurrence time or the belonging time window of the click event for each marked sample to form a delayed training sample set formed by the delay time length and the running state characteristics; Performing interval mapping processing on the delay time length in the delay training sample set to generate a preset delay interval set; mapping the delay duration in each delay training sample to a corresponding delay interval to restrict the output space of the delay distribution estimation model; Training the delay distribution estimation model based on the delay training sample set, so that the delay distribution estimation model learns the conditional distribution relation between the running state characteristics and the delay interval set, and the occurrence probability corresponding to each delay interval is output when the running state characteristics are given; Extracting the running state characteristics corresponding to the second click time stamp of each unlabeled sample in a real-time pre-estimating process, and inputting the running state characteristics into the delay distribution estimation model to obtain an initial delay distribution representation corresponding to the unlabeled sample; And carrying out time consistency constraint processing on the initial delay distribution representation and the observed time length between the second click time stamp and the current observation cut-off time stamp, inhibiting occurrence probability corresponding to a delay section earlier than the observed time length, and carrying out normalization processing on occurrence probability corresponding to the rest delay section so as to generate the delay distribution representation meeting the current observation condition.
- 3. The method for estimating advertisement click rate based on deep and ensemble learning according to claim 1, wherein the introducing the delay profile characterization as a constraint condition applies a dynamic weight to the unlabeled exemplar according to the delay profile characterization when the unlabeled exemplar is processed, thereby constructing an advertisement click rate estimation model, and the method specifically comprises: obtaining a delay distribution representation corresponding to the unlabeled sample, and associating the delay distribution representation with a second click time stamp and a current observation cut-off time stamp of the unlabeled sample to determine a time interval range of conversion feedback of the unlabeled sample under the current observation condition; Generating sample confidence for the unlabeled exemplar based on probability distribution conditions of the delay distribution characterization within the time interval range, the sample confidence being used to reflect a confidence level of treating the unlabeled exemplar as a converted exemplar or a non-converted exemplar under the current observation condition; Mapping the sample confidence to sample weights corresponding to the unlabeled samples one by one; In the training process of the advertisement click rate prediction model, the sample weight and the characteristic representation of the unlabeled sample are jointly involved in the parameter updating of the advertisement click rate prediction model, so that the influence degree of the unlabeled sample on the training process of the advertisement click rate prediction model is dynamically changed along with the sample confidence; And inputting the unlabeled samples subjected to dynamic weight processing and the labeled samples into the advertisement click rate prediction model.
- 4. The method for estimating click rate of advertisement based on deep and ensemble learning according to claim 1, wherein said ensemble learning framework comprising said delay distribution estimation model and said click rate prediction model in cooperation comprises: according to a preset calculation sequence, firstly, receiving running state characteristics of the marked samples and the unmarked samples by the delay distribution estimation model, and outputting delay distribution characterization aiming at the unmarked samples based on a historical training result; Injecting the delay distribution representation into the advertisement click rate prediction model as constraint information shared across models, enabling the advertisement click rate prediction model to synchronously receive the delay distribution representation corresponding to a sample when receiving sample characteristic representations, and carrying out joint coding on the sample characteristic representations and the delay distribution representation in the model so as to constrain the loss contribution mode expression form of the advertisement click rate prediction model on the unlabeled sample; In the continuous operation process of the integrated learning framework, periodically updating the marked samples and the unmarked samples based on the arrival of real-time input data, and synchronously inputting the updated sample states into the delay distribution estimation model, so that the delay distribution estimation model continuously perceives the change of the arrival rhythm of conversion feedback and generates updated delay distribution characterization; And immediately transmitting the updated delay distribution characterization to the advertisement click rate prediction model for readjusting the dynamic weight corresponding to the unlabeled sample.
- 5. The method for estimating advertisement click rate based on deep and ensemble learning according to claim 1, wherein said embedding the ensemble learning framework into a real-time estimation engine, completing the delay profile characterization, the dynamic weight generation and the advertisement click rate estimation in the same calculation link, and outputting an advertisement click rate estimation result, specifically comprises: Generating a predicted request instance corresponding to each release request, and binding a sample identifier, a click time stamp, a current observation expiration time stamp and running state characteristics for the predicted request instance; Scheduling the estimated request instance to the delay distribution estimation model in the real-time estimation engine according to a preset calculation sequence, enabling the delay distribution estimation model to output delay distribution characterization corresponding to the estimated request instance based on the running state characteristics, and writing the delay distribution characterization into a calculation context bound with the estimated request instance as an intermediate constraint result; in the same computing context, the intermediate constraint result, the click time stamp and the observation cut-off time stamp are subjected to joint analysis to determine a conversion uncertainty state corresponding to the estimated request instance; generating a dynamic weight corresponding to the estimated request instance based on the conversion uncertainty state; Under the condition of not switching calculation links, integrally inputting a predicted request instance comprising sample characteristic representation, delay distribution representation and dynamic weight into the advertisement click rate prediction model, so that the advertisement click rate prediction model outputs an advertisement click rate predicted value by utilizing sample characteristic information and delay feedback constraint information in the same reasoning process; Binding the advertisement click rate predicted value with the predicted request instance and outputting the advertisement click rate predicted value from the real-time prediction engine as an advertisement click rate predicted result.
- 6. The method for estimating click rate of advertisement based on deep and ensemble learning according to claim 1, wherein said constructing a delay distribution estimation model based on said marked samples, modeling a time interval between said click event and said conversion event, and outputting a delay distribution characterization of conversion feedback for said unmarked samples in combination with an operational status feature of an advertisement delivery system, specifically further comprises: Performing an availability evaluation process on a delay profile estimate based on the size of the number of marked samples and a time profile within a current time window, wherein the time profile is a profile of a time interval between the click event and the conversion event; When the number of marked samples is detected to be lower than a preset stability threshold value or the situation that long tail is missing, multiple peaks are incomplete or concentrated exists in delay duration distribution, stable delay distribution formed across a historical time window is extracted to serve as a priori delay distribution reference; Measuring the similarity between a current time window and the historical time window based on the release stage feature, the strategy change identification feature and the flow structure feature in the running state feature; Performing adaptive fusion processing on the delay statistical result corresponding to the current time window and the prior delay distribution reference according to the similarity to generate candidate delay distribution characterization; performing stability detection on the variation amplitude of the candidate delay profile under a plurality of continuous observation deadlines, and performing smoothing correction processing on the candidate delay profile when the variation of the candidate delay profile exceeds a preset smoothing threshold in an adjacent time window so as to form a delay profile maintaining continuous evolution characteristics in a time dimension; Generating a delay profile confidence factor based on the quantity scale, the delay profile integrity, and the stability results of the delay profile characterization; and when the dynamic weight is applied to the unlabeled sample, the delay distribution characterization and the delay distribution confidence factor are taken together as constraint conditions, and the adjustment processing is performed on the action intensity of the dynamic weight.
- 7. The method for estimating click rate of advertisement based on deep and ensemble learning according to claim 1, wherein said constructing a delay distribution estimation model based on said marked samples, modeling a time interval between said click event and said conversion event, and outputting a delay distribution characterization of conversion feedback for said unmarked samples in combination with an operational status feature of an advertisement delivery system, specifically further comprises: Performing a decision process on the availability status of the delay profile estimate based on the number scale of marked samples, the coverage integrity of the click event to the conversion event time interval, the delay interval coverage ratio, and the multimodal extent of recognition of the delay profile for marked samples within a time window corresponding to a current observation deadline timestamp; when the availability state indicates that the delay distribution estimation does not meet a stable condition, extracting a priori delay distribution reference matched with the running state characteristic from a stable delay distribution library formed by precipitation across a historical time window, wherein the stable delay distribution library is subjected to layered maintenance according to advertisement types, delivery positions, user activity periods and content life cycles; Measuring the similarity of the running states between the current time window and the historical time window based on the release stage characteristics, the strategy change identification characteristics and the flow structure characteristics in the running state characteristics; Performing adaptive fusion processing on delay statistical information formed by marked samples in a current time window and a priori delay distribution reference according to the similarity to generate candidate delay distribution characterization for the current time window, wherein stable delay distribution formed across a historical time window is extracted as the priori delay distribution reference; Performing stability calibration processing on the candidate delay profile characterizations output under a plurality of continuous observation cut-off time stamps, and performing time smoothing correction on the candidate delay profile characterizations by detecting the change amplitude of the delay profile between adjacent time windows and when the change amplitude of the delay profile exceeds a preset smoothing threshold; Generating a delay profile confidence factor based on the quantity scale, the coverage integrity, and a stability calibration result; And taking the delay distribution confidence factor and the delay distribution characterization as constraint conditions together for adjusting the action intensity of the dynamic weight.
- 8. An advertisement click rate estimating system based on deep and integrated learning, characterized in that the system is used for executing the advertisement click rate estimating method based on deep and integrated learning as set forth in any one of claims 1-7, the system comprises an acquiring module, a processing module and an output module, wherein: the acquisition module is used for performing multi-source data acquisition and preset processing on the release log of the platform so that an advertisement exposure event, a click event and a conversion event form an event chain on the same time axis; The processing module is used for distinguishing marked samples with conversion feedback completed from unmarked samples without conversion feedback generated in the event chain based on the relation between the conversion feedback time stamp and the observation cut-off time stamp; the processing module is used for constructing a delay distribution estimation model based on the marked samples, modeling the time interval from the click event to the conversion event, and outputting delay distribution characterization of conversion feedback for the unmarked samples by combining the running state characteristics of the advertisement delivery system; The processing module is used for introducing the delay distribution characterization as a constraint condition, and when the unlabeled sample is processed, applying dynamic weight to the unlabeled sample according to the delay distribution characterization so as to construct an advertisement click rate prediction model; The processing module is used for cooperatively forming an integrated learning framework by the delay distribution estimation model and the advertisement click rate prediction model; The processing module is used for embedding the integrated learning framework into a real-time prediction engine, enabling the delay distribution characterization, the dynamic weight generation and the advertisement click rate prediction to be completed in the same calculation link, and outputting an advertisement click rate prediction result; and the output module is used for driving advertisement putting control operation based on the estimated result of the advertisement click rate.
- 9. An electronic device comprising a processor, a communication bus, a user interface, a network interface, and a memory, the memory for storing instructions, the user interface and the network interface each for communicating with other devices, the communication bus for enabling connection communications between components within the electronic device, the processor for executing instructions stored in the memory to cause the electronic device to perform the method of any of claims 1-7.
- 10. A non-transitory computer readable storage medium storing instructions which, when executed, perform the method of any of claims 1-7.
Description
Advertisement click rate estimation method and system based on depth and ensemble learning Technical Field The invention relates to the technical field of data analysis, in particular to an advertisement click rate estimation method and system based on deep and integrated learning. Background In a content social platform and a programmed advertisement putting scene, the existing advertisement click rate estimation technology is generally based on historical putting logs, performs statistical analysis on behavior data such as advertisement exposure, clicking and conversion, predicts the click rate through a machine learning or deep learning model, generally relies on user characteristics, advertisement content characteristics, putting time period characteristics and partial context environment characteristics to construct a prediction model, and outputs advertisement click rate estimation results under offline or near real-time conditions. With the expansion of data scale and the improvement of algorithm capability, part of schemes further introduce an integrated learning structure to improve prediction accuracy, but the whole system still takes observed conversion feedback as a main supervision signal. However, in the actual advertisement putting process, the transformation behavior tends to have objective time lag relative to the click behavior, and at any observation time, a large number of click samples are brought into the model training or prediction process without generating transformation feedback, and the prior art generally regards the samples without generating transformation feedback as negative samples directly or simply ignores the time uncertainty thereof, and fails to distinguish the difference between the transformation feedback which is not observed yet and the actual non-transformation, so that systematic deviation is introduced at the sample labeling level, and particularly in the scene of early putting, high-delay transformation or frequent adjustment of the putting strategy, the deviation is amplified by the model continuously, so that the estimated result of the advertisement click rate is unstable, the predicted deviation is accumulated, and the real-time putting control effect is further influenced. Disclosure of Invention The invention provides an advertisement click rate estimating method and system based on depth and ensemble learning, which can improve the accuracy of advertisement click rate estimation. In a first aspect of the present invention, there is provided a method for estimating click rate of advertisement based on deep and ensemble learning, the method comprising: Performing multi-source data acquisition and preset processing on a release log of a platform to enable an advertisement exposure event, a click event and a conversion event to form an event chain on the same time axis; distinguishing marked samples for which conversion feedback has been completed from unmarked samples for which conversion feedback has not been generated in the event chain based on a relationship between conversion feedback timestamps and observation cutoff timestamps; constructing a delay distribution estimation model based on the marked samples, modeling the time interval from the click event to the conversion event, and outputting a delay distribution representation of conversion feedback for the unmarked samples by combining the running state characteristics of the advertisement delivery system; Introducing the delay distribution characterization as a constraint condition, and applying dynamic weight to the unlabeled sample according to the delay distribution characterization when the unlabeled sample is processed, so as to construct an advertisement click rate prediction model; An integrated learning framework formed by the cooperation of the delay distribution estimation model and the advertisement click rate prediction model; Embedding the integrated learning framework into a real-time prediction engine, enabling the delay distribution characterization, the dynamic weight generation and the advertisement click rate prediction to be completed in the same calculation link, and outputting an advertisement click rate prediction result; And driving advertisement putting control operation based on the advertisement click rate estimation result. In a second aspect of the present invention, an advertisement click rate estimating system based on deep and ensemble learning is provided, where the system is configured to perform an advertisement click rate estimating method based on deep and ensemble learning as described in any one of the foregoing, and the system includes an obtaining module, a processing module, and an output module, where: the acquisition module is used for performing multi-source data acquisition and preset processing on the release log of the platform so that an advertisement exposure event, a click event and a conversion event form an event chain on the same time axis; The