CN-116150480-B - User personalized demand prediction method integrating multi-mode comment information

CN116150480BCN 116150480 BCN116150480 BCN 116150480BCN-116150480-B

Abstract

The invention discloses a user personalized demand prediction method integrating multi-mode comment information, which comprises the steps of 1, constructing a user-commodity purchase relation network based on a user purchase history, 2, constructing the attribute of edges in a network diagram by utilizing multi-mode comment content published by a user, 3, excavating topic distribution of the user on two modes of a text and an image by utilizing a multi-mode variation self-encoder, 4, modeling interaction information between the user and the commodity and semantic information of the multi-mode comment content by utilizing a graph attention network, and 5, selecting a proper loss function to train and optimize a model. The invention combines the variation self-encoder and the graphic annotation meaning network, can fully and comprehensively mine preference information of the user on the commodity from data of two modes, namely a text mode and an image mode, and can accurately characterize the user and the commodity, thereby achieving a more accurate prediction effect of personalized demands of the user.

Inventors

JIANG YUANCHUN
ZHOU FAN
QIAN YANG
LIU YEZHENG
YUAN KUN
CHAI YIDONG

Assignees

合肥工业大学

Dates

Publication Date: 20260505
Application Date: 20230105

Claims (3)

1. A user personalized demand prediction method integrating multi-mode comment information is characterized by comprising the following steps: step1, constructing a data set; step 1.1, utilizing a directed graph Characterizing a purchase relationship network between a user and an article, wherein the user and the article are directed graphs Is provided with a plurality of nodes, each node being connected with a node, Representing a set of all nodes, if a user node Purchasing commodity node Representing the user node And commodity node There are edges between, and is noted as , Representing a set of all edges, and edges Corresponding to a attention coefficient And topic distribution And connect the user node And commodity node Mutually serving as neighbor nodes; step 1.2, acquiring user node For commodity node The comment text of the (2) is divided into words and the words are removed to obtain a word list of the comment text, which is recorded as , wherein, Representing word lists The first of (3) A word; A total number of words representing the word list; Computing user nodes For commodity node Word frequency characterization vectors for comment text over all words in a dictionary , wherein, Representing comment text in dictionary The number of times that a word appears, Representing the number of all non-repeated words in comment texts of all user nodes on commodity nodes; User node For commodity node Length of comment text of (a) After filling with zero vector to reach length N, inputting into BERT model for processing to obtain text initial characterization vector, and recording as , wherein, Representing text initial token vectors The first of (3) Characterization of individual words; Step 1.3, obtaining user node For commodity node After the comment image data of the number (1) is subjected to pixel unification processing, an initial characterization vector of the comment image is obtained and is recorded as , wherein, Representing comment images In the first place The characteristic value in the dimension is used for the feature, Representing the dimension of the initial feature vector; step 2, constructing a multi-mode variation self-encoder network, which comprises a mode encoding module, a mode fusion module and a mode decoding module; Step 2.1, constructing a mode coding module, which comprises a text coding network and an image coding network, wherein the text coding network comprises a Bi-LSTM network and a first full-connection layer, and the image coding network comprises a pretrained VGG-19 network and a second full-connection layer; step 2.1.1 For the current time step, the text initial characterization vector The first of (3) Characterization of individual words Inputting into the Bi-LSTM network to obtain forgetting gate states by using the formulas (1) - (5) Status of input door Output door status Currently the first Cell state for individual time steps And at present the first Unit outputs of individual time steps : (1) (2) (3) (4) (5) In the formulas (1) to (5), Representing the multiplication by element, Is a sigmoid activation function that is activated, Is a function of the hyperbolic tangent, And Two weight coefficients respectively representing the forgetting gate, A bias vector representing a forgetting gate, And Respectively representing two weight coefficients of the input gate, Representing the bias vector of the input gate, And Respectively representing two weight coefficients of the output gate, Representing the offset vector of the output gate, And Two weight coefficients representing the cell units respectively, The offset vector representing the cell unit is shown, Represent the first The state of the cell for a single time step, Represent the first The unit outputs of the time steps, when n=1, let 、 ; Step 2.1.2, outputting the text hidden characteristic by the first full connection layer by using the method (6) ; (6) In the formula (6), the amino acid sequence of the compound, A weight matrix representing the first fully connected layer; represent the first Outputting the unit of each time step; Step 2.1.3 comment image Using Pre-trained VGG-19 network Characterizing to obtain image feature vectors Thereby utilizing the formula (7) to carry out image characteristic vector Coding to obtain hidden image features ; (7) In the formula (7), the amino acid sequence of the compound, A weight matrix representing a second fully connected layer; step 2.2, the mode fusion module calculates a multi-mode sharing representation after text and image fusion by using the mode fusion module (8) ; (8) In the formula (8), the amino acid sequence of the compound, , Representing a normal distribution of the logic, Is the mean of a normal distribution of logic, wherein, Representing user nodes For commodity node Is in the first multi-modal comment content The average value over the individual subjects is calculated, Is the variance of the normal distribution of the logic, wherein, Representing user nodes For commodity node Is in the first multi-modal comment content The variance over the individual subjects is such that, Is a random variable subject to normal distribution with a mean value of 0 and a variance of 1; step 2.3, constructing a mode decoding module which comprises a text decoder network and an image decoder network; The text decoder network pair Performing decoding reconstruction to obtain a text reconstruction feature vector , wherein, Representing text reconstruction feature vectors Middle (f) Characterization after individual word reconstruction; The image decoder network pair Performing decoding reconstruction to obtain an image reconstruction feature vector , wherein, Representing comment images In the first place Maintaining the reconstructed characteristic value; step 3, the processing of the graph attention network: Step 3.1, defining the update times as d, and initializing d=1; Step 3.1.1, calculating the user node of the d-1 st update using (9) Features of (2) ; (9) In the formula (9), the amino acid sequence of the compound, Representing user nodes Is a set of all the neighboring commodity nodes of (c), Representing user nodes The number of all neighboring commodity nodes; Step 3.1.2, calculating the commodity node updated for the d-1 th time by using the method (10) Features of (2) ; (10) In the formula (10), the amino acid sequence of the compound, Representing commodity nodes Is a set of all the neighbor user nodes of (c), Representing commodity nodes Is the number of all neighbor user nodes; Step 3.2, calculating the user node of the d-1 st update using equation (11) Neighboring commodity node The kth head attention coefficient in between ; (11) In the formula (7), the amino acid sequence of the compound, The activation function is represented as a function of the activation, And Is a matrix of two parameters to be learned, User node representing the d-1 st update Neighbor commodity node is divided in neighbor commodity node set Some other neighbor commodity node Is characterized in that, Representing a concatenation operation on the vectors of the vector, Representing a matrix transpose operation; Step 3.3, calculating the user node of the d-th update using equation (12) and equation (13), respectively Feature vectors of (a) And commodity node Features of (2) ; (12) (13) In the formulae (12) - (13), Representing nodes Sum node Edge between In the first place Probability on individual topics; And is also provided with , Representing the number of topics; step 4, constructing a user personalized demand prediction model formed by a multi-mode variation self-encoder network and a drawing meaning network, and training and optimizing; Step 4.1, constructing a loss function of the graph attention network at the d-1 th update using (14) : (14) In the formula (14), the amino acid sequence of the compound, Representing user nodes Is a set of non-neighboring nodes of (a), Representing user nodes Non-neighbor node set of (a) The number of intermediate nodes; Representing user nodes Non-neighbor node set of (a) Is a function of the expectations of all the nodes in the network, User node with table updated at d time Non-neighbor node set of (a) Any node Is characterized by (2); constructing reconstruction loss of comment image data by using (15) : (15) In the formula (15), the amino acid sequence of the compound, Representing a collection of edges Is the expectation of all sides in (a); Reconstruction loss of text data using (16) construction : (16) In the formula (16), the amino acid sequence of the compound, Representing an initial token vector The first of (3) Characterization of individual words Whether or not to be the first in the dictionary Characterization of individual words, if so, then 1, Otherwise, let Is 0; constructing KL loss between true and variational profiles using (17) : (17) Constructing an overall loss function of the user-personalized demand prediction model at the d-th update using (18) : (18) Step 5.2, training the user personalized demand prediction model by utilizing an Adam algorithm, and calculating the integral loss function When the number of updates d reaches the maximum d max or the overall loss function And stopping training when the commodity personalized demand is converged, so that an optimal demand prediction model is obtained and is used for predicting the commodity personalized demand of any user.
2. An electronic device comprising a memory and a processor, wherein the memory is configured to store a program that supports the processor to perform the user personalization demand prediction method of claim 1, the processor being configured to execute the program stored in the memory.
3. A computer readable storage medium having a computer program stored thereon, characterized in that the computer program when run by a processor performs the steps of the user personalization demand prediction method of claim 1.

Description

User personalized demand prediction method integrating multi-mode comment information Technical Field The invention relates to the fields of user preference mining and user personalized demand prediction, in particular to a user personalized demand prediction method integrating multi-mode comment information. Background With the development of the internet and electronic commerce platforms, online shopping has become a mainstream shopping method, and users can post multi-modal comment contents (e.g., text, images, etc.). The multi-modal comment content expresses cross-modal semantic information, has information independent of each mode, and contains rich user preference information. The online shopping causes lack of face-to-face communication between merchants and users, and the user preference and demand information is difficult to capture accurately, so it is necessary to organize massive user purchase history data and multi-mode comment data to comprehensively mine the user preference, so as to accurately complete the prediction of the personalized demands of the users. In the existing user preference identification and demand prediction field, the traditional method omits joint modeling of user comment text and comment image data so that multi-modal and fine-grained preference characterization of users cannot be obtained, and in addition, the traditional method cannot jointly model purchase histories of users and multi-modal comment content so that fine-grained preference of users cannot be accurately identified and explanation of purchasing motivations of users is lacking. Disclosure of Invention The invention provides a user personalized demand prediction method integrating multi-mode comment information to overcome the defects of the prior art, so that joint modeling of user comment text and comment image data can be realized through a variation self-encoder model, user multi-mode preference distribution is learned, interaction records of user commodities and interaction reasons of the user commodities (namely, user multi-mode preference distribution) are jointly modeled by combining a graph attention network, and further the representation capacity of the multi-mode data on user preferences can be improved, and granularity of user preference identification and accuracy of personalized demand prediction can be refined. In order to achieve the aim of the invention, the invention adopts the following technical scheme: The invention discloses a user personalized demand prediction method integrating multi-mode comment information, which is characterized by comprising the following steps: step1, constructing a data set; step 1.1, a directed graph G= (V, M) is utilized to represent a purchase relation network between a user and goods, wherein the user and the goods are all nodes in the directed graph G, V represents a set of all nodes, if a user node i purchases a goods node j, the user node i and the goods node j are provided with edges, the edges are marked as e ij epsilon M, M represents a set of all edges, the edge e ij corresponds to an attention coefficient alpha ij and a topic distribution lambda ij, and the user node i and the goods node j are mutually used as neighbor nodes; step 1.2, obtaining comment text of the commodity node j by the user node i, and obtaining a word list of the comment text after word segmentation and word deactivation removal processing, wherein the word list is recorded as Wherein, the N t represents the total number of words of the word list; calculating word frequency representation vectors of comment texts of user node i on commodity node j on all words in dictionary Wherein, the The number of occurrence of the W-th word of the comment text in the dictionary is represented, and W represents the number of all unrepeated words in the comment text of all user nodes to commodity nodes; filling the length N t of comment text of commodity node j by user node i with zero vector to reach length N, inputting into BERT model, processing to obtain text initial characterization vector, and recording as Wherein, the Representing a representation of an nth word in the text initial representation vector T ij; Step 1.3, obtaining comment image data of commodity node j by user node i, performing pixel unification processing, obtaining an initial characterization vector of the comment image, and marking the initial characterization vector as Wherein, the Representing the feature value of the comment image P ij in the S-th dimension, S representing the dimension of the initial feature vector; step 2, constructing a multi-mode variation self-encoder network, which comprises a mode encoding module, a mode fusion module and a mode decoding module; Step 2.1, constructing a mode coding module, which comprises a text coding network and an image coding network, wherein the text coding network comprises a Bi-LSTM network and a first full-connection layer, and the image coding network comprises a