CN-115481316-B - Multi-model knowledge-fusion distillation recommendation model

CN115481316BCN 115481316 BCN115481316 BCN 115481316BCN-115481316-B

Abstract

The invention discloses a multi-model fusion knowledge distillation recommendation model, which comprises an integrated learning module and a student module, wherein the integrated learning module uses a full-connection layer to carry out weighted voting on predicted results of DeepFM models, DIN models and MMDIN models to obtain a final predicted result, the weight of the weighted voting weight is adaptively adjusted by adopting a gradient descent method, and the student module adopts a shallow DIN structure and uses a soft label to guide convergence of the student model. The invention adds the full connection layer, integrates the advantages of three deep learning models DeepFM, DIN, MMDIN models, and updates the parameters of the full connection layer by using a gradient descent method. And initializing parameters of the model, and limiting the parameter variation range, so that the integrated model can be converged faster and better and simulate the voting scene of each model.

Inventors

LI SHAOBO
YANG MINGBAO
ZHOU PENG
WANG KUN
ZHANG QIANFU
ZHANG JUNXING

Assignees

贵州大学

Dates

Publication Date: 20260508
Application Date: 20220901

Claims (3)

1. The multi-model fusion knowledge distillation recommendation model is characterized by comprising an integrated learning module and a student module, wherein the integrated learning module uses a full-connection layer to carry out weighted voting on predicted results of a DeepFM model, a DIN model and a MMDIN model to obtain a final predicted result, the weight of the weighted voting is adaptively adjusted by adopting a gradient descent method, the input of the integrated learning module and the predicted value transmit the student model to train, and the student module adopts a shallow DIN structure and uses a soft label to guide convergence of the student model; The DIN model introduces an attention mechanism, namely 5 movies which are scored recently by a user are respectively subjected to outer product and weighted pooling to obtain the nearest interest point of the user, original input sparse characteristics and non-numerical data are coded to form dense characteristic vectors, then the current movies are respectively subjected to outer product with the 5 movies recently, then connected to perform PRelu and sigmoid activation to obtain the similarity between the current movies and the movies scored recently, and the similarity is weighted to each movie scored recently to perform sum pooling; The MMDIN model predicts the user scoring by introducing the picture features of the articles on the basis of DIN, and adds a multi-head mechanism, so that the model can perform feature extraction from different dimensions and is divided into a multi-mode module, an attention mechanism module and a multi-layer neural network module, wherein the multi-mode module is responsible for extracting the picture color features; The parameters of the full connection layer are initialized in a designated mode, constraint is applied, the change rate is set to prescribe the change rate of the parameters, and the parameter calculation method is shown in a formula (1): (1), Wherein V represents the current parameter value, P represents the last parameter value, R represents the rate of change, V min represents the minimum value defined by the parameter, and V max represents the maximum value defined by the parameter; Meanwhile, a new excitation function is designed, and the calculation method of the excitation function is shown in a formula (2): (2), wherein x is input, y is output, b is an initial factor, and k is a proportional adjustment coefficient; The new excitation function makes the output value range of the integrated model be [0,1], namely the final output scoring range, when the definition range of the output of each sub-model is [0,1 ].
2. The multi-model fusion knowledge distillation recommendation model according to claim 1, wherein the loss function design of the student module knowledge distillation model is as shown in formula (3): (3), Where L sum represents the total loss, L 1 represents the loss between soft value and model predicted value, L 2 represents the loss between true value and predicted value, and α is the distillation coefficient.
3. The model of claim 1, wherein the gradient descent method parameter updating formula is as follows: (4), In the formula, theta represents a parameter to be updated, alpha represents a learning rate which is a super parameter, h (x) represents the output of a last neuron, y represents a true value, and x represents an input.

Description

Multi-model knowledge-fusion distillation recommendation model Technical Field The invention belongs to the technical field of recommendation optimization models, and relates to a multi-model fusion knowledge distillation recommendation model. Background As people move into the information age, people produce a lot of information every moment and also spend a lot of time browsing internet information. How to enable the user to find information interested by the user from massive internet information becomes an important subject of current research. Also, in the aerospace equipment manufacturing industry, there is a problem of how to find better upstream service providers from among mass services. The aviation equipment manufacturing cloud service platform attracts a large number of aviation equipment manufacturing service merchants to stay, and how to enable users to acquire interesting information from a large number of manufacturing cloud services becomes important research content of platform developers, so that recommendation algorithms are important ways for solving the problem. In order to make the user better able to obtain the information of interest to himself, expert learners propose various methods from machine learning to deep learning. However, researchers integrate recommendation models with various advantages, so that the research of enhancing the recommendation effect of the models is less, the parameter quantity of the integrated recommendation model is larger, the prediction effect of a single recommendation model is poorer, the deep learning recommendation model is difficult to integrate, the integration method lacks parameter self-adaption capability, the integrated model is overlarge, the storage space of the model is overlarge, the reasoning speed of the model is too slow, the excitation function of the integrated model cannot simulate the scene of model voting, the input and the output are compressed to be between 0 and 1, and the excitation function is required to be designed for simulation. Disclosure of Invention The invention aims to solve the technical problem of providing a multi-model fusion knowledge distillation recommendation model to solve the technical problem in the prior art. The technical scheme includes that the multi-model fusion knowledge distillation recommendation model comprises an integrated learning module and a student module, wherein the integrated learning module carries out weighted voting on predicted results of a DeepFM model, a DIN model and a MMDIN model by using a full-connection layer to obtain a final predicted result, the weight of the weighted voting weight is adaptively adjusted by adopting a gradient descent method, the input of the integrated learning module and the predicted value are transmitted to the student model for training, the student module adopts a shallow DIN structure, and soft labels are used for guiding convergence of the student model. The parameters of the full connection layer are initialized in a designated mode, constraint is applied, the change rate is set to prescribe the change rate of the parameters, and the parameter calculation method is shown in a formula (1): Wherein V represents the current parameter value, P represents the last parameter value, R represents the rate of change, V min represents the minimum value defined by the parameter, and V max represents the maximum value defined by the parameter; Meanwhile, a new excitation function is designed, and the calculation method of the excitation function is shown in a formula (2): wherein x is input, y is output, b is an initial factor, and k is a proportional adjustment coefficient; the new excitation function makes the output value range of the integrated model be [0,1], namely the final output scoring range, when the definition range of the output of each sub-model is [0,1 ]. The loss function design of the student module knowledge distillation model is shown in a formula (3): Lsum＝L1α+L2(1-α) (3) Where L sum represents the total loss, L 1 represents the loss between soft value and model predicted value, L 2 represents the loss between true value and predicted value, and α is the distillation coefficient. The gradient descent method parameter updating formula: θ=θ-α(h(x)-y)x (4) In the formula, theta represents a parameter which needs to be updated, alpha represents a learning rate which is a super parameter (which needs to be manually set and adjusted), h (x) represents the output of a last neuron (an expression is obtained by learning a model through fitting data), y represents a true value, and x represents an input. Compared with the prior art, the method has the beneficial effects that the full-connection layer is added, the advantages of the three deep learning models DeepFM, DIN, MMDIN are integrated, and the gradient descent method is used for updating the parameters of the full-connection layer. And initializing parameters of the model, applying limitation