Search

CN-116644226-B - Transformer-based article-behavior cross-sequence recommendation method

CN116644226BCN 116644226 BCN116644226 BCN 116644226BCN-116644226-B

Abstract

The invention discloses a trans-former-based article-behavior trans-sequence recommendation system, which relates to the field of artificial intelligence and recommendation systems and comprises the following steps of 1) modeling an article-behavior sequence to obtain a user interaction article sequence and a user interaction behavior sequence, 2) performing sequence recommendation system modeling based on the trans-former, converting a single-hot sparse article into a dense embedded representation vector through embedded representation query, calculating the recommendation probability of each article according to the embedded representation of a sequence level and recommending the article with the highest recommendation probability to a user, 3) performing sequence representation modeling based on an encoder, modeling the two sequences into a single sequence and learning the embedded representation of the sequence, and 4) performing article-behavior trans-sequence fusion modeling based on a self-attention mechanism. The invention has important guiding significance in the aspects of multi-sequence recommendation system construction, sequence information modeling, multi-sequence fusion and the like.

Inventors

  • XIANG XIAOHONG
  • DENG XIN
  • Zhao Wangshanyin
  • ZHANG FUYUAN
  • CHEN ZEYU
  • LI RONGPENG

Assignees

  • 重庆邮电大学

Dates

Publication Date
20260508
Application Date
20230419

Claims (6)

  1. 1. The trans-former-based article-behavior cross-sequence recommendation method is characterized by comprising the following steps of: 1) The object-behavior sequence modeling step is to acquire a user interaction object sequence and a user interaction behavior sequence and model the two sequences; 2) Converting the uniheated sparse articles into dense embedded representation vectors through embedded representation query, learning embedded representation of sequences by utilizing encoder structures in the Transformer, thus obtaining sequence-level embedded representation capable of representing user preference, calculating recommending probability of each article according to the embedded representation of the sequence level by utilizing a normalized exponential function, and recommending articles with highest recommending probability to a user; 3) Modeling sequence representation based on an encoder, namely modeling two sequences into a single sequence and learning embedded representation of the sequence, fusing the embedded representation of sequence elements generated during a plurality of user interactions into the single sequence embedded representation through a self-attention mechanism, and learning the sequence-level embedded representation by using a feedforward neural network in deep learning; 4) Fusing a plurality of sequences of user interaction into a single sequence, and fusing element-level embedded representations of the sequences into element-level embedded representations of the single sequence by utilizing a self-attentive query vector, a key vector and a value vector; The step 1) of modeling the object-behavior sequence comprises the steps of obtaining a user interaction object sequence and a user interaction behavior sequence, and modeling the two sequences, wherein the method specifically comprises the following steps: Converting the independent user interaction objects and the user interaction behaviors into corresponding dense embedded representation vectors through embedded representation query, calculating the similarity between the sequence embedded representation vectors and the embedded representation vectors of each object through a similarity function, calculating the recommendation probability of each object through a Softmax function, and recommending the object with the highest recommendation probability to a user; The method for transforming the independent user interaction objects and the user interaction behaviors into the corresponding dense embedded representation vectors through the embedded representation query specifically comprises the following steps: First define a sequence set Here, where Representing a set comprising all sequences, where Representing a collection Middle (f) The number of sequences in the sequence, Representing the number of sequences in the set, for each sequence , Wherein Represent the first A sequence of user-interactive items of a sequence Represent the first A sequence of user interaction actions for each sequence of items , Here, where Represent the first The first of the sequences Articles of Then represent the first Length of individual article sequence, and Wherein Is a set of all items, for each sequence of user actions , Here, where Represent the first The first of the sequences Individual user behavior, an Then represent the first The length of the individual user interaction sequences is Obviously have A kind of electronic device Here, where Is a set of all user behaviors; searching for a sequence of user-interacted items Sequence of interactions with a user Is represented by the following formula (12): (12) In the middle of Representing a sequence of user-interacted items Is embedded in the representation; Representing the length of the sequence; representing the dimension of the item embedded representation vector; Embedding a representation matrix from an item by looking up Acquiring embedded representations of the corresponding numbered articles; Representing a sequence of user interactions Wherein Representing the dimension of the user interaction behavior embedding representation vector; embedding a representation matrix from user behavior by lookup Acquiring embedded representations of user interaction behaviors of corresponding numbers; next, by the method comprising The encoder built by the layer cross-sequence coding layer obtains embedded representation vectors of all elements in the sequence learned by the encoder as shown in formula (13): (13) In the middle of Representing the output of a cross-sequence encoder, which is formed by The output of the layer cross-sequence coding layers is spliced; And then take out Is the first of (2) Layer 1 The elements are embedded as an entire sequence of expression vectors as shown in equation (14): (14) In the middle of An embedded representation representing the entire sequence; Representing the fetch vector operation described above; Then through similarity function Computing a sequence embedded representation vector And (3) with Similarity score for each element in a list And by normalizing an exponential function Obtaining a recommendation probability for each item As shown in formula (15): (15) In the middle of ; Is of the size of The value interval of the vector is The similarity function uses equation (16) to quickly calculate the similarity between vectors: (16)。
  2. 2. The Transformer-based article-behavior cross-sequence recommendation method of claim 1, wherein the cross-sequence encoder is composed of a multi-head cross-sequence fusion module, a feedforward neural network, a random deactivation, an activation function, and a residual connection; First, an embedded representation of a sequence of items Embedded representation of interaction with a user Input into the multi-headed attention mechanism and cross-sequence fusion module as shown in equation (19): (19) In the middle of Representing an embedded representation of the sequence fused with the sequence of user-interactive items and the sequence of user-interactive behaviors; Representing a multi-head cross-sequence fusion function; then the embedded representation of the fused sequence Embedded representation of sequence of items interacted with by user Residual connection is carried out and layer normalization is carried out, as shown in a formula (20): (20) In the middle of Representing the fusion sequence after layer normalization; Representing a layer normalization function; The layer normalized fusion sequence is then used Input into the feedforward neural network as shown in formula (21): (21) In the middle of Representing the fusion sequence after passing through the feedforward neural network; the activation function is represented as a function of the activation, Representing a weight matrix; Representing an offset vector; finally, the fusion sequence after the feedforward neural network is processed Inputting dense connection, nonlinear activation and random inactivation, and subsequent fusion sequence normalized with layer Residual connection and layer normalization are performed, as shown in formula (22): (22) In the middle of Representing the fusion sequence normalized by the layer; Representing a layer normalization function; Representing a random inactivation function; representing a weight matrix; Representing the offset vector.
  3. 3. The Transformer-based item-behavior cross-sequence recommendation method of claim 2, wherein the multi-headed cross-sequence self-attention fusion function Fusing embedded representations of a sequence of user-interacted items using a multi-headed self-attention mechanism Embedded representation of sequences of interactions with users In a Query vector Query, a Key vector Key and a Value vector Value in a multi-head self-attention mechanism, attention scores are calculated through the Query vector and the Key vector, the Value vector is weighted and summed by the attention scores to obtain a new embedded representation, the Value vector is processed in a transform manner, the fusion mode is constructed, the fusion of the Query vector and the Key vector is divided into two types, namely the fusion before linear transformation and the fusion after linear transformation.
  4. 4. The method for trans-former-based article-behavior cross-sequence recommendation according to claim 3, wherein the linear pre-transformation fusion is to conduct sequence fusion before two sequences conduct linear transformation to obtain embedded representation of the fused sequences, then conduct linear transformation and matrix multiplication on the embedded representation after fusion to obtain an attention matrix, obtain attention scores through a Softmax function, and finally conduct multiplication on the embedded representation of the user interaction article sequence after linear transformation to obtain final embedded representation.
  5. 5. The Transformer-based item-behavior cross-sequence recommendation method of claim 4, wherein the linear pre-transformation fusion comprises additive fusion, multiplicative fusion and splice fusion; adding the sequences to be fused element by adding fusion to obtain the fused embedded representation; Multiplication fusion is carried out by multiplying the corresponding elements of the sequence element by element so as to fuse the embedded representation of the user interaction object sequence and the embedded representation of the user interaction behavior sequence; splice fusion fuses sequences by concatenating embedded representations.
  6. 6. The Transformer-based item-behavior cross-sequence recommendation method of claim 4, wherein the post-linear transformation fusion comprises a double QK fusion, a double K fusion, a double QK attention fusion, a triple QK fusion.

Description

Transformer-based article-behavior cross-sequence recommendation method Technical Field The invention relates to a recommendation system and a deep learning related method, in particular to the fields of feedforward neural networks, transformers and deep learning. Background The advantages of excellent performance, rich modeling modes and the like enable deep learning to be revived and bring breakthrough progress to a plurality of fields. The learning task, the loss function and the optimization method of the deep learning are three important basic concepts in the field of deep learning. The learning task of deep learning can be classified into a classification task and a regression task according to the category of the tag. Wherein, when the label type is a limited number of discrete labels, and there is no explicit numerical meaning between the discrete labels, the task is a classification task. When the type of the tag is a continuous value in definition or the tag has an explicit numerical meaning, the task belongs to the regression task. The loss function is a function that maps random events to non-negative real numbers to characterize their risk. Typically the nature of the penalty function is related to the learning task of the deep learning, i.e. the deep learning model can be optimized indirectly by minimizing the penalty function. For different learning tasks, different penalty functions will be chosen to describe the optimization objective of the model as accurately as possible. In deep learning, training samples (X, y) and deep learning models are distributed for a given independent co-occurrenceThe loss function L can be defined asThe method comprises the steps of measuring the difference between the probability distribution of a model output and an observation result, wherein D is a function for measuring the probability distribution of the model and the observation result, w represents model parameters, and p (y|X) is the probability of obtaining the observation result y under a sample characteristic X. For regression problems, there are two common loss function selection schemes, namely L 1 loss functionsAnd L 2 loss functionBoth measure the estimated value in different and similar waysThe distance from the observation y is measured. The deep learning model gradually minimizes the loss function value of the model through an optimization algorithm, and optimizes the loss function value of the model to a local minimum or a global minimum. For classification problems, a 0-1 loss is a common classification accuracy metric, as shown in equation (1): In the middle of And y is an observed value. However, because discontinuous functions are detrimental to deep learning solutions, it is a common approach to replace the 0-1 loss function by constructing a proxy loss function. E.g. cross entropy loss functionBecause of the excellent properties of smoothness and unbiasedness of the cross entropy loss function, the function is often chosen to calculate the model error and use it to optimize the model in a deep learning classification task. After the loss function of the model is determined, the model parameters are continuously updated through an optimizer to enable the model to be continuously optimized towards the direction of reducing the loss function, and updating of the model parameters is stopped when the model is converged at a certain minimum value, so that the model of on-line inference or off-line test is obtained. The feed-forward neural network (Feedforward Neural Network, FNN) is composed of an input layer, a hidden layer, and an output layer. FIG. 1 is a schematic diagram of a feed-forward neural network having two hidden layers, each node in each hidden layer being referred to as a neuron, the neurons in each layer being independent of each other, there being one connection between any two neurons in adjacent layers, each connection being made up of a linear transformation and a nonlinear activation. The linear transformation reassembles the input data according to a weight matrix as shown in equation (2): Where x l denotes the data after linear change recombination, x denotes the input data, W f denotes the weight matrix of the feedforward neural network, and b f denotes the offset vector. The nonlinear activation will provide a nonlinear transformation for the input data as shown in equation (3): xa=Activatef(xl) (3) Where x a represents the input after nonlinear activation, active f represents the nonlinear activation function, and x l represents the data after linear transformation and recombination. A common activation function is ReLU (x l)=max(xl,0),LeakyReLU(xl)=max(αxl,zi),Etc. As shown in fig. 2 (a), the derivative value of the ReLU function is constant and 1 when the input value is greater than 0, otherwise, the output value and the derivative value are both 0. Therefore, when the input value is smaller than 0, no adjustment value is generated in the back propagation process,