CN-121350346-B - Personalized recommendation method based on multi-modal feature denoising self-adaptive fusion

CN121350346BCN 121350346 BCN121350346 BCN 121350346BCN-121350346-B

Abstract

The application discloses a personalized recommendation method based on multi-mode feature denoising self-adaptive fusion, which can be applied to the technical field of graphic neural networks. The method comprises the steps of carrying out wavelet transform frequency domain denoising on the multi-modal original features to obtain denoised multi-modal features, carrying out fusion on the denoised multi-modal features to obtain frequency fusion features, simultaneously cutting an interaction graph of a user and an article based on node sensitivity to obtain a normalized adjacency matrix, constructing collaborative representation of the user and the article according to the normalized adjacency matrix, generating final fusion representation of a user behavior mode and modal features according to the frequency fusion features, the collaborative representation of the user and the article and the modal representation of the user and the article after constructing the modal representation of the user and the article according to the denoised multi-modal features, and carrying out article recommendation prediction according to the final fusion representation.

Inventors

HE CHAOBO
PENG FEIYU

Assignees

华南师范大学

Dates

Publication Date: 20260512
Application Date: 20250929

Claims (8)

1. The personalized recommendation method based on the multi-mode feature denoising self-adaptive fusion is characterized by comprising the following steps of: constructing multi-mode original features and an interaction diagram of a user and an article, wherein the multi-mode original features comprise original features of the article under different modes, and the different modes comprise a visual mode or a text mode; Carrying out wavelet transform frequency domain denoising on the multi-modal original features to obtain denoised multi-modal features, and fusing the denoised multi-modal features to obtain frequency fusion features; cutting the interaction graph of the user and the object based on node sensitivity to obtain a normalized adjacency matrix; Constructing a collaborative representation of the user and the object according to the normalized adjacency matrix; constructing a modal representation of the user and the object according to the denoised multi-modal characteristics; Generating a final fusion representation of a user behavior mode and a modal feature according to the frequency fusion feature, the collaborative representation of the user and the item, and the modal representation of the user and the item; carrying out item recommendation prediction according to the final fusion representation; The method for denoising the multi-modal original feature in the wavelet transform frequency domain to obtain a denoised multi-modal feature, and fusing the denoised multi-modal feature to obtain a frequency fusion feature comprises the following steps: carrying out space projection on the multi-mode original features to obtain multi-mode projection features; Performing discrete wavelet transformation on the multi-mode projection characteristics to obtain low-frequency components and high-frequency components of different modes; Calculating cross-modal frequency perception similarity weight according to the low-frequency component and the high-frequency component; carrying out frequency domain fusion on the low-frequency component and the high-frequency component according to the frequency perception similarity weight to obtain the frequency fusion characteristic; the calculation formula of the frequency fusion characteristic is as follows: ; In the formula (i), Representing a frequency fusion characteristic; representing a discrete wavelet transform; frequency-aware similarity weights representing low frequency components; Representing frequency-aware similarity weights for high frequency components; Representing low-frequency approximation coefficients in a visual mode; representing low frequency approximation coefficients in text mode; Representing the denoised high-frequency detail coefficient in the visual mode; representing the denoised high-frequency detail coefficient in the text mode; Representing the corresponding execution level of the discrete wavelet transform; representing the basis functions of the discrete wavelet transform.
2. The method of claim 1, wherein the frequency-aware similarity weight is calculated as follows: ; In the formula (i), Representing frequency-aware similarity weights; , , Representing low frequency components; Representing a high frequency component; representing a nonlinear activation function.
3. The method of claim 1, wherein clipping the user interaction graph with the item based on node sensitivity to obtain a normalized adjacency matrix comprises: acquiring the degree of each node in the interaction diagram of the user and the article; Calculating sampling weight according to the degree of the node; acquiring the number of edges of the interactive graph of the user and the object and acquiring a preset pruning proportion; determining the number of sampling edges according to the number of edges and the preset pruning proportion; Pruning operation is carried out on the interactive graph of the user and the object according to the normalized sampling weight, and a target edge set corresponding to the number of the sampling edges is obtained; constructing a pruning adjacent matrix according to the target edge set; and carrying out normalization operation on the pruned adjacent matrix to obtain the normalized adjacent matrix.
4. A method according to claim 3, wherein the expression of the normalized adjacency matrix is as follows: ; In the formula (i), Representing the normalized adjacency matrix; representing the post pruning adjacency matrix; representing post pruning adjacency matrix Is a degree matrix of (2).
5. The method of claim 1, wherein said constructing a collaborative representation of a user and an item from said normalized adjacency matrix comprises: Initializing an ID embedding matrix; Constructing a characteristic propagation rule according to the normalized adjacency matrix; and aggregating all layers of features in the interaction graph of the user and the object according to the initialized ID embedding matrix and the feature propagation rule to obtain the collaborative representation of the user and the object.
6. The method of claim 1, wherein constructing a modal representation of a user and an item from the denoised multimodal feature comprises: calculating cosine similarity among articles in each mode; Constructing a sparse post-similarity matrix according to cosine similarity among the articles; normalizing the sparse similarity matrix to obtain a normalized similarity matrix; constructing article modal representation according to the normalized similarity matrix and the denoised multi-modal characteristics; constructing a user mode representation according to the item mode representation; And splicing the article modal representation and the user modal representation to obtain the modal representation of the user and the article.
7. The method of claim 1, wherein the generating a final fused representation of user behavior patterns and modal features from the frequency fused features, the collaborative representation of the user and the item, and the modal representation of the user and the item comprises: injecting the frequency fusion feature into the collaborative representation of the user and the object to obtain an injected collaborative representation; computing a cross-modal feature from the injected collaborative representation and a modal representation of the user and the item; Calculating self-adaptive weights according to the injected collaborative representation, the modal representation of the user and the object and dynamic parameters, wherein the dynamic parameters are adjusted through the reliability of collaborative signals; And calculating according to the self-adaptive weight, the modal representation of the user and the article and the collaborative representation of the user and the article to obtain the final fusion representation of the user behavior mode and the modal characteristic.
8. The method of claim 1, wherein said making item recommendation predictions from said final fusion representation comprises: Calculating a user final representation and an article final representation according to the collaborative representation of the user and the article and the final fusion representation respectively; determining a recommendation prediction score based on an inner product of the user final representation and the item final representation; and recommending the articles according to the recommendation prediction scores.

Description

Personalized recommendation method based on multi-modal feature denoising self-adaptive fusion Technical Field The application relates to the technical field of graph neural networks, in particular to a personalized recommendation method based on multi-mode feature denoising self-adaptive fusion. Background In the related art, a recommendation system has been widely used in the fields of electronic commerce, social media, content distribution, and the like. The current recommendation method relies on the historical behavior of the user and the attribute of the object, and personalized recommendation is realized through collaborative filtering or a content-based mode, but recommendation precision and coverage are limited when the problem of data sparsity is faced. The multi-mode recommendation system (Multimodal Recommender Systems, MRSs) has advantages in aspects of mining the appearance, semantic description, use scenes and the like of objects by introducing characteristic information of multiple modes such as images, texts and the like, and can deeply characterize user preference characteristics, so that accuracy and diversity of recommendation results are improved. The multi-mode recommendation method based on the graph convolution network (Graph Convolutional Networks, GCNs) can effectively model high-order association relations between users and articles and capture complex interaction structure information. However, noise pollution is introduced in the training process of the current recommendation method based on the GCN, and the feature fusion process is mostly based on a static or linear fusion strategy, so that recommendation results obtained by the current recommendation method based on the GCN may be greatly different from actual demands of users. In summary, the technical problems in the related art are to be improved. Disclosure of Invention The embodiment of the application mainly aims to provide a personalized recommendation method based on multi-mode feature denoising self-adaptive fusion, which can effectively meet personalized requirements of recommended contents in related fields and reduce differences between recommendation results and actual requirements. In order to achieve the above purpose, an aspect of the embodiments of the present application provides a personalized recommendation method based on multi-modal feature denoising adaptive fusion, the method comprising the following steps: constructing multi-mode original features and an interaction diagram of a user and an article, wherein the multi-mode original features comprise original features of the article under different modes, and the different modes comprise a visual mode or a text mode; Carrying out wavelet transform frequency domain denoising on the multi-modal original features to obtain denoised multi-modal features, and fusing the denoised multi-modal features to obtain frequency fusion features; cutting the interaction graph of the user and the object based on node sensitivity to obtain a normalized adjacency matrix; Constructing a collaborative representation of the user and the object according to the normalized adjacency matrix; constructing a modal representation of the user and the object according to the denoised multi-modal characteristics; Generating a final fusion representation of a user behavior mode and a modal feature according to the frequency fusion feature, the collaborative representation of the user and the item, and the modal representation of the user and the item; and carrying out item recommendation prediction according to the final fusion representation. In some embodiments, the performing wavelet transform frequency domain denoising on the multi-modal original feature to obtain a denoised multi-modal feature, and fusing the denoised multi-modal feature to obtain a frequency fused feature, including: carrying out space projection on the multi-mode original features to obtain multi-mode projection features; Performing discrete wavelet transformation on the multi-mode projection characteristics to obtain low-frequency components and high-frequency components of different modes; Calculating cross-modal frequency perception similarity weight according to the low-frequency component and the high-frequency component; And carrying out frequency domain fusion on the low-frequency component and the high-frequency component according to the frequency perception similarity weight to obtain the frequency fusion characteristic. In some embodiments, the frequency fusion feature is calculated as follows: ; In the formula (i), Representing a frequency fusion characteristic; representing a discrete wavelet transform; frequency-aware similarity weights representing low frequency components; Representing frequency-aware similarity weights for high frequency components; Representing low-frequency approximation coefficients in a visual mode; representing low frequency approximation coefficients in text mode; Re