CN-115345311-B - Data processing method and device for model training, electronic equipment and storage medium

CN115345311BCN 115345311 BCN115345311 BCN 115345311BCN-115345311-B

Abstract

The disclosure relates to a data processing method, a data processing device, an electronic device and a storage medium for model training. The method comprises the steps of obtaining a plurality of behavior data samples of a user account, wherein the plurality of behavior data samples comprise historical behavior data samples and online behavior data samples, the historical behavior data samples comprise all historical behavior data samples of the user account, or part of historical behavior data samples are extracted from all the historical behavior data samples based on sample extraction logic, determining training data and training labels corresponding to each behavior data sample, carrying out online training on an online recommendation model through the training data and the training labels, wherein the online recommendation model is a model which is trained to meet online prediction requirements, and the trained online recommendation model is used for recommending objects to the user account online. By adding historical behavior data into the training sample, the input of the model structure and the reasoning stage can not be changed, and the pressure of an online recommendation system is greatly reduced.

Inventors

LIAO YIQIAO
Luo Mingnan

Assignees

北京达佳互联信息技术有限公司

Dates

Publication Date: 20260512
Application Date: 20210511

Claims (17)

1. A data processing method for model training, comprising: Obtaining a plurality of behavior data samples of a user account, wherein each behavior data sample is generated by operating each associated object by the user account, the plurality of behavior data samples comprise historical behavior data samples and online behavior data samples, the historical behavior data samples comprise all historical behavior data samples of the user account or part of historical behavior data samples extracted from all the historical behavior data samples based on sample extraction logic; determining training data and training labels corresponding to each behavior data sample; Performing online training on an online recommendation model through the training data and the training tag, wherein the online recommendation model is a model which is trained to meet the online prediction requirement, and the trained online recommendation model is used for recommending objects to the user account online; The method comprises the steps of determining training data and training labels corresponding to each behavior data sample, if the behavior data samples are online behavior data samples, generating training data corresponding to the online behavior data samples according to the online behavior data samples, obtaining original labels from the online behavior data samples to serve as the training labels corresponding to the online behavior data samples, if the behavior data samples are historical behavior data samples, generating training data corresponding to the historical behavior data samples according to the historical behavior data samples, obtaining time differences between time stamps in the historical behavior data samples and current time, attenuating the original labels according to the time differences through a preset attenuation function to obtain training labels corresponding to the historical behavior data samples, and the attenuation function is any one of a linear function, an exponential function and a Gaussian function.
2. The model trained data processing method of claim 1, wherein the partial historical behavioural data samples extracted from the full historical behavioural data samples based on sample extraction logic are derived by performing any one of the following processes: Extracting according to the importance of the object corresponding to the historical behavior data sample to obtain the part of historical behavior data sample; Or obtaining a target object type of an object corresponding to the online behavior data sample, and extracting a historical behavior data sample under the target object type from all the historical behavior data samples to serve as the part of historical behavior data samples; Or obtaining a first similarity between the historical behavior data sample and the online behavior data sample, and extracting based on the first similarity to obtain the part of historical behavior data sample; or obtaining the type diversity index of the object type, and extracting according to the type diversity index to obtain the part of historical behavior data sample.
3. The model trained data processing method according to claim 2, characterized in that the number of the extracted partial historical behavior data samples is determined according to the training speed of the online recommendation model.
4. A data processing method for model training according to any one of claims 1 to 3, wherein the training of the online recommendation model by the training data and the training tag includes: Acquiring the weight corresponding to each behavior data sample; inputting training data corresponding to each behavior data sample into the online recommendation model to obtain a prediction result corresponding to each behavior data sample; Determining a loss value according to the prediction result, the training label and the weight corresponding to each behavior data sample; and adjusting model parameters of the online recommendation model according to the loss value, and continuously inputting training data corresponding to the next behavior data sample until a training stopping condition is reached.
5. The method for processing model training data according to claim 4, wherein when the behavior data sample is a historical behavior data sample, the obtaining the weight corresponding to each behavior data sample includes: Acquiring a time difference between a time stamp in each historical behavior data sample and the current moment, and determining a weight corresponding to each historical behavior data sample according to the time difference, wherein the weight is inversely related to the time difference; Or determining a weight corresponding to each historical behavior data sample according to the importance of the object corresponding to each historical behavior data sample, wherein the weight is positively correlated with the importance; or obtaining a second similarity of each historical behavior data sample and the online behavior data sample, and determining a weight corresponding to each historical behavior data sample based on the second similarity, wherein the weight is positively correlated with the second similarity; or obtaining a type diversity index of the object type, and determining the weight corresponding to each historical behavior data sample according to the type diversity index; or predicting according to each historical behavior data sample through a first deep learning model to obtain corresponding weight.
6. The model training data processing method according to claim 1, wherein the whole historical behavior data sample is queried from a first mapping table, the first mapping table is obtained when the online recommendation model is offline trained, and updated in real time along with the online training of the online recommendation model.
7. The method for processing model training data according to claim 1, wherein attenuating the original tag in the historical behavior data sample according to the time difference to obtain a training tag of the historical behavior data sample comprises: inquiring a training label corresponding to the time difference of the historical behavior data sample from a second mapping table, wherein the second mapping table comprises a corresponding relation between the time difference and the training label; Or attenuating the original tag according to the time difference through a preset attenuation function to obtain a training tag of the historical behavior data sample; Or predicting according to the historical behavior data sample through a second deep learning model to obtain a training label of the historical behavior data sample.
8. A data processing apparatus for model training, comprising: An acquisition module configured to perform acquisition of a number of behavioral data samples of a user account, each behavioral data sample being generated by an operation of the user account on an associated object, the number of behavioral data samples including a historical behavioral data sample and an online behavioral data sample, the historical behavioral data sample including all historical behavioral data samples of the user account or a portion of the historical behavioral data samples extracted from the all historical behavioral data samples based on sample extraction logic; the training sample generation module is configured to execute the training data and the training labels corresponding to each behavior data sample; The model training module is configured to perform online training on an online recommendation model through the training data and the training label, wherein the online recommendation model is a model which is trained to meet online prediction requirements, and the trained online recommendation model is used for recommending objects to the user account online; The training sample generation module comprises a first training data generation unit, a first label determination unit, a second training data generation unit, an acquisition unit and a second label determination unit, wherein the first training data generation unit is configured to execute training data corresponding to an online behavior data sample according to the online behavior data sample if the behavior data sample is the online behavior data sample, the first label determination unit is configured to execute training data corresponding to the online behavior data sample, the first label determination unit is configured to acquire an original label from the online behavior data sample and serve as a training label corresponding to the online behavior data sample, the second training data generation unit is configured to execute training data corresponding to a historical behavior data sample according to the historical behavior data sample if the behavior data sample is the historical behavior data sample, the acquisition unit is configured to execute time difference between a time stamp in the historical behavior data sample and the current time, the second label determination unit is configured to execute attenuation of the original label according to the time difference through a preset attenuation function to obtain the training label corresponding to the historical behavior data sample, and the attenuation function is any one of a linear function, an exponential function and a Gaussian function.
9. The model trained data processing apparatus of claim 8, further comprising a sample extraction module configured to perform extraction according to importance of objects corresponding to the historical behavior data samples, resulting in the partial historical behavior data samples; Or obtaining a target object type of an object corresponding to the online behavior data sample, and extracting a historical behavior data sample under the target object type from all the historical behavior data samples to serve as the part of historical behavior data samples; Or obtaining a first similarity between the historical behavior data sample and the online behavior data sample, and extracting based on the first similarity to obtain the part of historical behavior data sample; or obtaining the type diversity index of the object type, and extracting according to the type diversity index to obtain the part of historical behavior data sample.
10. The model trained data processing apparatus of claim 9, wherein the number of extracted portions of historical behavioral data samples is determined based on a training speed of the online recommendation model.
11. The data processing apparatus for model training according to any one of claims 8 to 10, wherein the model training module comprises: A weight acquisition unit configured to perform acquisition of a weight corresponding to each of the behavior data samples; The prediction unit is configured to input training data corresponding to each behavior data sample into the online recommendation model to obtain a prediction result corresponding to each behavior data sample; a loss value determination unit configured to perform a loss value determination according to the prediction result, the training tag, and the weight corresponding to each of the behavior data samples; and the parameter adjustment unit is configured to execute the adjustment of the model parameters of the online recommendation model according to the loss value, and continuously input training data corresponding to the next behavior data sample until the training stop condition is reached.
12. The model-trained data processing apparatus according to claim 11, wherein when the behavior data sample is a historical behavior data sample, the weight acquisition unit is configured to perform acquisition of a time difference between a time stamp in each of the historical behavior data samples and a current time, and determine a weight corresponding to each of the historical behavior data samples from the time difference, the weight being inversely related to the time difference; Or determining a weight corresponding to each historical behavior data sample according to the importance of the object corresponding to each historical behavior data sample, wherein the weight is positively correlated with the importance; or obtaining a second similarity of each historical behavior data sample and the online behavior data sample, and determining a weight corresponding to each historical behavior data sample based on the second similarity, wherein the weight is positively correlated with the second similarity; or obtaining a type diversity index of the object type, and determining the weight corresponding to each historical behavior data sample according to the type diversity index; or predicting according to each historical behavior data sample through a first deep learning model to obtain corresponding weight.
13. The model trained data processing apparatus of claim 8, wherein the entire historical behavioral data samples are queried from a first mapping table obtained when the online recommendation model is trained offline and updated in real time with online training of the online recommendation model.
14. The model training data processing apparatus according to claim 8, wherein the second tag determination unit is configured to perform a query from a second mapping table to obtain a training tag corresponding to a time difference of the historical behavior data sample, the second mapping table including a correspondence between the time difference and the training tag; Or attenuating the original tag according to the time difference through a preset attenuation function to obtain a training tag of the historical behavior data sample; Or predicting according to the historical behavior data sample through a second deep learning model to obtain a training label of the historical behavior data sample.
15. An electronic device, comprising: A processor; a memory for storing the processor-executable instructions; Wherein the processor is configured to execute the instructions to implement the data processing method of model training of any one of claims 1 to 7.
16. A computer readable storage medium, characterized in that instructions in the computer readable storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the model training data processing method of any one of claims 1 to 7.
17. A computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements the data processing method of model training according to any of claims 1 to 7.

Description

Data processing method and device for model training, electronic equipment and storage medium Technical Field The present disclosure relates to the field of internet technologies, and in particular, to a data processing method, an apparatus, an electronic device, a computer readable storage medium, and a computer program product for model training. Background The recommendation system may recommend objects to clients based on model-estimated Click-Through-Rate (CTR), conversion Rate (CVR), etc. Because the user historical behavior information contains the information of the user interest points, the recommendation accuracy of the recommendation system is improved by continuously learning the user historical behavior information by the model during model training. In order to enable the model to learn more historical behavior information of the user, in the related art, an ultralong behavior sequence is formed based on all the historical behavior data of the user, and the ultralong behavior sequence is used as training data for training the model. Accordingly, it is also necessary to use as input an extremely long behavior sequence formed by all of the historical behavior data at the time of reasoning. However, when training and reasoning, the extra-long behavior sequence is used as the input of the model, which means that the online recommendation system needs to bear extremely high pressure, and the problem of high memory consumption exists. Disclosure of Invention The present disclosure provides a data processing method, apparatus, electronic device, computer readable storage medium, and computer program product for model training, so as to at least solve the problem in the related art that the memory consumption of an online recommendation system is relatively large when an ultra-long behavior sequence is used as the input of a model during training and reasoning. The technical scheme of the present disclosure is as follows: according to a first aspect of embodiments of the present disclosure, there is provided a data processing method for model training, including: Obtaining a plurality of behavior data samples of a user account, wherein each behavior data sample is generated by operating an associated object by the user account, the plurality of behavior data samples comprise historical behavior data samples and online behavior data samples, the historical behavior data samples comprise all historical behavior data samples of the user account or part of historical behavior data samples extracted from all the historical behavior data samples based on sample extraction logic; determining training data and training labels corresponding to each behavior data sample; And carrying out online training on an online recommendation model through the training data and the training label, wherein the online recommendation model is a model which is trained to meet the online prediction requirement, and the trained online recommendation model is used for recommending objects to the user account online. In one embodiment, the portion of the historical behavioral data samples extracted from the entire historical behavioral data samples based on the sample extraction logic is derived by performing any one of the following processes: Extracting according to the importance of the object corresponding to the historical behavior data sample to obtain the part of historical behavior data sample; Or obtaining a target object type of an object corresponding to the online behavior data sample, and extracting a historical behavior data sample under the target object type from all the historical behavior data samples to serve as the part of historical behavior data samples; Or obtaining a first similarity between the historical behavior data sample and the online behavior data sample, and extracting based on the first similarity to obtain the part of historical behavior data sample; or obtaining the type diversity index of the object type, and extracting according to the type diversity index to obtain the part of historical behavior data sample. In one embodiment, the number of the extracted partial historical behavior data samples is determined according to a training speed of the online recommendation model. In one embodiment, the training the online recommendation model through the training data and the training tag includes: Acquiring the weight corresponding to each behavior data sample; inputting training data corresponding to each behavior data sample into the online recommendation model to obtain a prediction result corresponding to each behavior data sample; Determining a loss value according to the prediction result, the training label and the weight corresponding to each behavior data sample; and adjusting model parameters of the online recommendation model according to the loss value, and continuously inputting training data corresponding to the next behavior data sample until a training stopping condition is reached