CN-121996838-A - Multi-target content recommendation method and device, computer equipment and storage medium

CN121996838ACN 121996838 ACN121996838 ACN 121996838ACN-121996838-A

Abstract

The embodiment of the invention discloses a multi-target content recommendation method, a multi-target content recommendation device, computer equipment and a storage medium, and relates to the field of artificial intelligence. The method comprises the steps of collecting and preprocessing user, content multi-mode and scene data, extracting and utilizing a cross-mode attention mechanism to fuse the preprocessed user, content multi-mode and scene data into unified feature representation, dynamically determining multi-target weights such as click rate, retention rate and diversity and the like according to real-time feedback based on a reinforcement learning model, respectively generating three recommendation subsets based on user-content interaction relation, unified feature matching and diversity screening, and generating a final recommendation list according to dynamic weight fusion. The multi-objective adaptive collaborative optimization and multi-dimensional feature depth fusion method and device achieve multi-objective adaptive collaborative optimization and multi-dimensional feature depth fusion, and improve recommendation accuracy and user experience.

Inventors

SONG YONGXIANG
LU WEI
ZHANG QIFAN
Guo shangfeng

Assignees

深圳市酷开网络科技股份有限公司

Dates

Publication Date: 20260508
Application Date: 20251209

Claims (10)

1. A multi-objective content recommendation method, comprising: Collecting and preprocessing user data, content multi-modal data and scene data; extracting features of the preprocessed user data, content multi-mode data and scene data to obtain user feature vectors, content feature vectors and scene feature vectors, and fusing the user feature vectors, the content feature vectors and the scene feature vectors into unified feature representation through a cross-mode attention mechanism; Based on the current service scene and real-time feedback data, dynamically determining weights of a plurality of recommended targets through a reinforcement learning model, wherein the recommended targets at least comprise click rate, user retention rate and content diversity; Generating a first recommendation subset based on the interaction relation between the user and the content, generating a second recommendation subset based on the correlation between the unified feature representation and the user interest, and screening the content with the similarity with the selected content below a threshold value from the candidate content to generate a third recommendation subset; and fusing the first recommendation subset, the second recommendation subset and the third recommendation subset according to the dynamically determined weight to generate a final recommendation list.
2. The multi-objective content recommendation method according to claim 1, wherein performing feature extraction on the user data to obtain a user feature vector comprises: Arranging a historical behavior sequence of a user according to a time sequence, wherein the historical behavior sequence comprises clicking, staying and collecting operations of the user on the content; Inputting the historical behavior sequence into a transducer model, and calculating the dependency relationship between behaviors in the sequence through a multi-head self-attention mechanism in the transducer model; extracting the average value of the sequence characterization vector output by the transducer model or the vector of the last position as the user characteristic vector; Extracting the characteristics of the content multi-mode data to obtain content characteristic vectors, wherein the content characteristic vectors comprise: for text data, extracting local semantics through a convolutional neural network and/or acquiring upper and lower Wen Yuyi embedding through a pre-training language model to obtain text characteristics; for image or video frame data, extracting visual characteristics through a convolutional neural network; extracting time sequence change characteristics from audio or time sequence video data through a cyclic neural network; Splicing or weighting fusion is carried out on the extracted text features, visual features and time sequence change features, and corresponding entity information in a preset knowledge graph is associated to generate the content feature vector; extracting features of the scene data to obtain scene feature vectors, wherein the extracting the features comprises the following steps: Classifying the scene data into time dimension data, device dimension data and network environment dimension data; For the time dimension data, extracting the hour information, the week information and whether the current moment is holiday information or not, mapping the hour information into sine and cosine codes taking 24 hours as a period, performing single-heat coding on the week information, and converting the holiday information into Boolean value codes; the method comprises the steps of extracting equipment type, screen size and operating system version from equipment dimension data, performing one-time coding on the equipment type and the operating system version, dividing the screen size into predefined size intervals and performing interval coding; Extracting network type and bandwidth grade for network environment dimension data, and performing single-heat coding; Splicing the time dimension code, the equipment dimension code and the network environment dimension code to form an initial scene code vector; And inputting the initial scene coding vector into a fully connected neural network to perform nonlinear transformation and dimension reduction, and outputting the scene characteristic vector.
3. The multi-objective content recommendation method according to claim 1, wherein said dynamically determining weights of a plurality of recommendation objectives by a reinforcement learning model comprises: At least one of the user feature vector, the content feature vector and the scene feature vector and the corresponding historical weight value are used as the state input of the reinforcement learning model; defining a reward function which aims at integrating long-term user value, wherein the reward function is obtained by weighting and summing real-time indexes of the plurality of recommended targets; and updating the Q value function in the reinforcement learning model according to the current state, the action taken and the instant rewards obtained by the next state by utilizing a time sequence difference algorithm, and outputting the adjustment actions of the plurality of recommended target weights according to the updated Q value function, wherein the adjustment actions are executed in a preset weight range.
4. The multi-objective content recommendation method according to claim 3, further comprising: pre-configuring weight adjustment rules associated with specific business scenarios, including new user cold starts or promotional campaigns; When the matching of the current service scene and the specific service scene is detected, the weights of the plurality of recommended targets are firstly subjected to one-time basic adjustment according to the weight adjustment rule; And taking the weight after the basic adjustment as an initial state of the reinforcement learning model, and performing online fine adjustment by the reinforcement learning model according to the subsequent real-time feedback data.
5. The multi-objective content recommendation method according to claim 1, wherein: Constructing an iso-graph taking a user and content as nodes, aggregating information of neighbor nodes by using a graph neural network, updating node characterization, and selecting Top-K contents with highest similarity as the first recommendation subset according to inner product or cosine similarity between the updated user node and the content node characterization; Calculating the similarity of candidate content feature vectors and current user feature vectors in a feature space, optimizing the feature space by adopting a contrast learning loss function, so that the feature distance of a positive sample pair is reduced, the feature distance of a negative sample pair is increased, and finally, top-K contents with the highest similarity with the user feature vectors are selected as the second recommendation subset; The step of screening the contents with the similarity with the selected contents lower than a threshold value from the candidate contents to generate a third recommendation subset includes the steps of calculating cosine similarity between the candidate contents and the contents selected into a recommendation list, adding the candidate contents into a candidate pool if the similarity between the candidate contents and any one of the contents in the selected list is lower than a preset threshold value, and selecting the contents capable of maximizing the diversity score of the whole recommendation list from the candidate pool according to a maximum marginal utility algorithm to form the third recommendation subset.
6. The method of claim 5, wherein the fusing the first, second, and third recommendation subsets according to the dynamically determined weights generates a final recommendation list, comprising: assigning the dynamically determined click rate weights to predictive scores for each content in the first recommendation subset; Assigning the dynamically determined user retention weight to a predictive score for each content in the second recommendation subset; assigning the dynamically determined content diversity weights to predictive scores for each content in the third recommendation subset; And sequencing all the contents according to the weighted final scores, and selecting N contents with the highest ranking to generate the final recommendation list, wherein N is a preset positive integer.
7. The multi-objective content recommendation method according to claim 1, further comprising: Collecting interaction behaviors of a user aiming at each item of content in the final recommendation list in real time through a message queue, wherein the interaction behaviors comprise clicking, stay time and sliding skip; Counting the click rate, the average stay time and the information entropy based on the content category of the final recommendation list in real time in a sliding time window by utilizing a stream processing engine; taking the click rate, the average residence time and the information entropy which are obtained through statistics as the real-time indexes to form a training sample; Adopting an online gradient descent algorithm, aiming at minimizing recommended effect loss, and updating network parameters of the reinforcement learning model by using the training sample increment; and synchronously adopting the training samples, updating parameters of a graph neural network model based on which the first recommendation subset is generated, a feature matching model based on which the second recommendation subset is generated and a similarity calculation model based on which the third recommendation subset is generated in an incremental learning mode.
8. The multi-objective content recommendation method according to claim 2, further comprising: acquiring attribute information provided by the new user when the target is identified as the new user, and acquiring a metadata tag issued by the new content when the target is identified as the new content; Matching the attribute information or the metadata tag with a pre-constructed knowledge graph to obtain an embedded vector of a corresponding entity in the knowledge graph; For a new user, inputting a behavior sequence of the new user into a transducer model to extract an initial behavior feature vector, and splicing the initial behavior feature vector with the embedded vector obtained from a knowledge graph to form an enhanced user feature vector; And for new content, extracting the multi-mode original feature vector by using a convolutional neural network or a cyclic neural network, and splicing the original feature vector with the embedded vector obtained from the knowledge graph to form the enhanced content feature vector.
9. A computer device comprising a memory and a processor, the memory having stored thereon a computer program, the processor implementing the multi-objective content recommendation method according to any of claims 1-7 when executing the computer program.
10. A computer readable storage medium, characterized in that the storage medium stores a computer program, which when executed by a processor, implements the multi-objective content recommendation method according to any one of claims 1-7.

Description

Multi-target content recommendation method and device, computer equipment and storage medium Technical Field The present invention relates to the field of artificial intelligence technologies, and in particular, to a method, an apparatus, a computer device, and a storage medium for recommending multi-target content. Background With the explosive growth of internet content, recommendation systems have become a core tool for efficiently connecting users with massive amounts of content. However, there are still many limitations to existing content recommendation algorithms in practice. Most algorithms focus on optimizing single business targets such as click rate or conversion rate, and the single target optimizing mode easily leads to homogenization of recommended results, forms an 'information cocoon house', can limit user interest for a long time, and further leads to reduction of long-term retention rate of users. To alleviate this problem, some research is turned to multi-objective optimization, trying to compromise multiple metrics of click-through rate, user retention, content diversity, etc. at the same time. However, in these multi-objective recommendation systems, the weight of each objective is usually preset by manual experience or is configured statically, and is difficult to flexibly adjust according to the real-time interaction behavior of the user, the differentiated application scenario, such as a new user cold start stage or a specific promotion, which results in insufficient overall system adaptability. Furthermore, whether single-objective or multi-objective algorithms, their mining of content features and user features is often inadequate. The multi-modal information such as text, video, labels and the like at the content side, the behavior sequence and deep interest preference at the user side cannot be effectively fused and utilized, and the uniqueness of the feature dimension makes it difficult for the algorithm to truly support multi-objective collaborative optimization. On the other hand, the optimization iteration of the existing scheme is seriously dependent on offline historical data for evaluation, and a feedback closed loop linked with online user behaviors in real time is lacked, so that the algorithm updating speed is delayed from the dynamic change of the user demands, and the current situation cannot be responded quickly. How to construct a recommendation system which can effectively cooperate with a plurality of business targets, dynamically adapt to scene and user changes, fully utilize multidimensional information and timely respond to feedback so as to improve the accuracy and user experience of recommendation becomes a technical problem to be solved currently. Disclosure of Invention The embodiment of the invention provides a multi-target content recommendation method, a multi-target content recommendation device, a multi-target content recommendation computer device and a multi-target content recommendation storage medium, and aims to solve the technical problem of how to construct a recommendation system which can effectively cooperate with a plurality of business targets, dynamically adapt to scene and user changes, fully utilize multi-dimensional information and timely respond to feedback so as to improve recommendation accuracy and user experience. In a first aspect, an embodiment of the present invention provides a multi-objective content recommendation method, including: Collecting and preprocessing user data, content multi-modal data and scene data; extracting features of the preprocessed user data, content multi-mode data and scene data to obtain user feature vectors, content feature vectors and scene feature vectors, and fusing the user feature vectors, the content feature vectors and the scene feature vectors into unified feature representation through a cross-mode attention mechanism; Based on the current service scene and real-time feedback data, dynamically determining weights of a plurality of recommended targets through a reinforcement learning model, wherein the recommended targets at least comprise click rate, user retention rate and content diversity; Generating a first recommendation subset based on the interaction relation between the user and the content, generating a second recommendation subset based on the correlation between the unified feature representation and the user interest, and screening the content with the similarity with the selected content below a threshold value from the candidate content to generate a third recommendation subset; and fusing the first recommendation subset, the second recommendation subset and the third recommendation subset according to the dynamically determined weight to generate a final recommendation list. In a second aspect, an embodiment of the present invention further provides a multi-objective content recommendation apparatus, which includes a unit for executing the above method. In a third aspect, an emb