CN-122022949-A - Commodity recommendation method based on federal deep reinforcement learning

CN122022949ACN 122022949 ACN122022949 ACN 122022949ACN-122022949-A

Abstract

The invention discloses a commodity recommendation method based on federal deep reinforcement learning, and relates to the technical field of distributed computing. The method comprises the steps of constructing a commodity recommendation frame based on a federal deep reinforcement learning model, wherein the commodity recommendation frame comprises a cloud center and a plurality of local models. The method comprises the steps of storing local user data in each local model and corresponding to the local model, firstly constructing a commodity grading matrix and commodity grading vectors according to the local user data, inputting the commodity grading vectors into the local model, outputting predictive grading as single-step actions, calculating single-step action rewards according to errors of the predictive grading and real grading, obtaining global action rewards according to historical information, updating local model parameters by combining the global action rewards, converging the locally updated parameters by a cloud center, generating global model parameters by combining the global action rewards, and issuing the global model parameters to each local model to complete model training. And finally, realizing efficient and privacy-protected commodity recommendation service through the trained local model.

Inventors

WANG PENG
WANG XUN
WU TONG

Assignees

浙江工商大学

Dates

Publication Date: 20260512
Application Date: 20260127

Claims (9)

1. A commodity recommendation method based on federal deep reinforcement learning is characterized by comprising the following steps: obtaining local user data, wherein the local user data comprises user ID, commodity ID and commodity grading of a user; The method comprises the steps of constructing a commodity recommendation frame based on a federal deep reinforcement learning model, wherein the commodity recommendation frame comprises a cloud center and a plurality of sub-content service centers, each sub-content service center stores local user data, each sub-content service center corresponds to a local federal deep reinforcement learning model and is recorded as a local model, and the local model adopts a DNN model; Constructing commodity grading matrixes according to the local user data stored in each sub-content service center, and determining commodity grading vectors of each sub-content service center according to the commodity grading matrixes; Training the local models according to commodity grading vectors, wherein the training comprises the steps of inputting commodity grading vectors into the local models, outputting absolute errors of commodity prediction grades corresponding to each commodity grading vector and commodity real grades, marking the absolute errors as single-step action rewards, summing the single-step action rewards of all commodity grading vectors in each local model, taking an average value to obtain global action rewards of all local models, collecting the global action rewards of all local models through a cloud center, and carrying out weighted average on the global action rewards to obtain global network parameters; Inputting the target user ID and the target commodity ID into a commodity recommendation frame of the trained local model, outputting commodity prediction scores of the target user on the target commodity, and obtaining commodity recommendation results.
2. The commodity recommendation method based on federal deep reinforcement learning according to claim 1, wherein the determining commodity scoring vector of each sub-content service center according to the commodity scoring matrix comprises: The method comprises the steps of constructing a first commodity grading vector based on the grading of nearest neighbor users of each user in a commodity grading matrix of the user, constructing a second commodity grading vector based on the grading of nearest neighbor commodities of each commodity in the commodity grading matrix of the user, combining the first commodity grading vector with the second commodity grading vector to obtain a commodity grading vector, wherein the first commodity grading vector is used for reflecting interest similarity among users, the second commodity grading vector is used for reflecting attribute similarity among commodities, and the combined commodity grading vector comprehensively represents cooperative characteristics of commodity pairs of the user.
3. The method for recommending commodity based on federal deep reinforcement learning according to claim 2, wherein when the nearest neighbor commodity or the nearest neighbor commodity is in a missing state, the commodity is fully scored by using the scored nearest neighbor commodity of the user to replace if the user does not score the target commodity, and using the scored nearest neighbor commodity of the commodity to replace if the commodity is not scored by the target user.
4. The method for recommending goods based on federal deep reinforcement learning according to claim 1, wherein the local federal deep reinforcement learning model comprises a state space, an action space and a double rewards mechanism; The state space is composed of commodity grading vectors and represents cooperative characteristics of commodity pairs of users, the action space is a commodity prediction grading value set, namely each action corresponds to one grading value, the double-rewarding mechanism comprises single-step action rewards and global action rewards, and the local federal deep reinforcement learning model gradually optimizes grading prediction strategies through interaction processes of the state action rewards.
5. The commodity recommending method based on federal deep reinforcement learning according to claim 1, wherein the local federal deep reinforcement learning model corresponding to each sub-content service center adopts a dual-network structure, and comprises a main network and a target network, wherein the network parameters of the local model are updated according to single-step action rewards, and specifically comprises the steps that the main network determines a main network action value function according to commodity grading vectors and single-step action rewards in the current training period, and the formula is as follows: ; Wherein, the ; In the formula, Is action of In state A function of the action value of the primary network, To score vectors for goods Execute action downwards Is used for the (a) and (b), Is action of In commodity grading vector The next single step action is awarded, Representing DNN-based network model parameters A function of the action value of the fit, In order for the rate of learning to be high, For representing the number of steps of different training periods of the current strategy A compromise factor of importance of (c), wherein, Is the first The number of steps of the sub-content service center in the current training period In (a) the policy of (c) is, Is the first The number of steps of the sub-content service center in the current training period In (3) the action of the motion vector, Is the first The number of steps of the sub-content service center in the current training period The commodity grading vector in (2) is recorded as a state, The state in the step number of the next training period; the target network determines a target network action value function according to the commodity grading vector of the next training period and the single-step action rewards of the current training period, and the formula is as follows: ; In the formula, As a function of the value of the action of the target network, Is a parameter of the target network and, The action in the step number of the next training period; Defining a loss function according to the main network action value function and the target network action value function, wherein the formula is as follows: ; In the formula, As a function of the loss, Is the expected value; according to the derivative of the loss function, updating the parameters of the local model main network by a gradient descent method, wherein the formula is as follows: ; In the formula, Is the learning rate, determines the size of the primary network parameter update, A derivative that is a loss function; Wherein the parameters of the target network are periodically synchronized with the parameters of the primary network.
6. The commodity recommendation method based on federal deep reinforcement learning according to claim 1, wherein the global network parameters are determined in the following manner: Calculating the weight of global action rewards of each sub-content service center in global parameter aggregation through the cloud center; according to the weight of the global action rewards of each sub-content service center in the global parameter aggregation, the global action rewards of each sub-content service center are aggregated in a weighted average mode to obtain global network parameters, wherein the formula is as follows: ; In the formula, The global network parameters are determined for the cloud center; serving sub-content centers Global action rewards of (a); Total number of service centers for sub-content; serving sub-content centers Is a global action bonus weight of (1), calculated by an attention mechanism; For a collection of sub-content service centers, 。
7. The federal deep reinforcement learning-based commodity recommendation method according to claim 1, wherein the local data set of each sub-content service center is randomly scrambled before each training period of the local model of the sub-content service center starts.
8. The method for federal deep reinforcement learning-based commodity recommendation according to claim 1, wherein said local model of said sub-content service center uses an empirical playback mechanism during training, comprising: Storing the state, action, rewards and next state of each step of the local model to a playback buffer as historical experience data; when the playback buffer is full, historical empirical data is randomly sampled to update the local model parameters.
9. The commodity recommendation method based on federal deep reinforcement learning according to claim 1, wherein the local model is controlled to select a random action or an optimal action according to a preset exploration probability threshold in the training process.

Description

Commodity recommendation method based on federal deep reinforcement learning Technical Field The application relates to the technical field of distributed computing, in particular to a commodity recommendation method based on federal deep reinforcement learning. Background The personalized commodity recommendation method identifies the preference of the user to the commodity by analyzing the historical interaction data of the user, so that the commodity content suggestion customized by the user is provided, and the user is ensured to obtain relevant and attractive recommendation. In the prior art, the existing commodity recommendation method based on federal learning is used for relieving the sparsity of basic scoring data by introducing additional auxiliary information such as social relationship of users, commodity attributes and the like besides the basic scoring data (specifically, the scoring of users-commodities-users on commodities), so that the accuracy of commodity scoring prediction is ensured, however, in a recommendation scene only provided with the basic scoring data, the commodity scoring prediction performance of the prior art cannot be ensured, and the accuracy of commodity scoring prediction is obviously reduced. Disclosure of Invention In view of the above, it is desirable to provide a commodity recommendation method based on federal deep reinforcement learning. The invention adopts the following technical scheme: the invention provides a commodity recommendation method based on federal deep reinforcement learning, which comprises the following steps: obtaining local user data, wherein the local user data comprises user ID, commodity ID and commodity grading of a user; The method comprises the steps of constructing a commodity recommendation frame based on a federal deep reinforcement learning model, wherein the commodity recommendation frame comprises a cloud center and a plurality of sub-content service centers, each sub-content service center stores local user data, each sub-content service center corresponds to a local federal deep reinforcement learning model and is recorded as a local model, and the local model adopts a DNN model; Constructing commodity grading matrixes according to the local user data stored in each sub-content service center, and determining commodity grading vectors of each sub-content service center according to the commodity grading matrixes; Training the local models according to commodity grading vectors, wherein the training comprises the steps of inputting commodity grading vectors into the local models, outputting absolute errors of commodity prediction grades corresponding to each commodity grading vector and commodity real grades, marking the absolute errors as single-step action rewards, summing the single-step action rewards of all commodity grading vectors in each local model, taking an average value to obtain global action rewards of all local models, collecting the global action rewards of all local models through a cloud center, and carrying out weighted average on the global action rewards to obtain global network parameters; Inputting the target user ID and the target commodity ID into a commodity recommendation frame of the trained local model, outputting commodity prediction scores of the target user on the target commodity, and obtaining commodity recommendation results. Preferably, determining the commodity grading vector of each sub-content service center according to the commodity grading matrix specifically includes: The method comprises the steps of constructing a first commodity grading vector based on the grading of nearest neighbor users of each user in a commodity grading matrix of the user, constructing a second commodity grading vector based on the grading of nearest neighbor commodities of each commodity in the commodity grading matrix of the user, combining the first commodity grading vector with the second commodity grading vector to obtain a commodity grading vector, wherein the first commodity grading vector is used for reflecting interest similarity among users, the second commodity grading vector is used for reflecting attribute similarity among commodities, and the combined commodity grading vector comprehensively represents cooperative characteristics of commodity pairs of the user. Preferably, when the nearest neighbor user or the nearest neighbor commodity scores the target commodity or the target user in a missing state, the scoring is completed in such a way that if the user does not score the target commodity, the scored nearest neighbor commodity of the user is used for replacing, and if the commodity is not scored by the target user, the scored nearest neighbor commodity is used for replacing. Preferably, the local federal deep reinforcement learning model includes a state space, an action space, and a double rewards mechanism; the state space is composed of commodity grading vectors and represents the cooperative cha