CN-122019998-A - Missing value completion prediction method based on Tucker tensor decomposition
Abstract
A missing value complement prediction method based on Tucker tensor decomposition relates to the field of research and application of missing values of second-hand electronic product recovery evaluation results. The method comprises the steps of 1) removing abnormal data based on a box diagram, 2) removing characteristic attributes with higher characteristic attribute Loss rate, 3) searching characteristic attributes which are related to high recovery evaluation result values based on a mutual information mode, 4) constructing a Tucker tensor decomposition complement model based on a training set, verifying the accuracy of the model by a testing set, and finally realizing complement prediction of the recovery evaluation result Loss value of the second-hand electronic product, wherein a specific construction prediction process comprises four small steps of initializing a sparse tensor and a factor matrix, calculating a complement prediction tensor, calculating difference Loss between a real tensor and the prediction tensor, descending according to gradient, and circulating 3 small steps until convergence. After model training is completed, the invention can be directly provided for relevant practitioners to use, thereby realizing efficient recovery.
Inventors
- SU XING
- WANG AIJUN
- DU YONGPING
- HAN HONGGUI
Assignees
- 北京工业大学
Dates
- Publication Date
- 20260512
- Application Date
- 20260129
Claims (6)
- 1. A missing value complement prediction method based on Tucker tensor decomposition is characterized by comprising the following steps of firstly removing abnormal data based on a box diagram, secondly removing features with higher feature attribute missing rate, thirdly searching feature attributes which are high in correlation with recovery evaluation result values based on mutual information, fourthly constructing a Tucker tensor decomposition complement prediction model based on training data, verifying the accuracy of the model by a test set, and finally achieving complement prediction of missing values of recovery evaluation results of second-hand electronic products.
- 2. The method for deficiency value completion prediction based on Tucker tensor decomposition is characterized in that in the first step, a data rejection mode of a box diagram promoted by Tukey systemization is adopted, and the maximum value (Max), the minimum value (Min), the lower Quartile (QL) and the upper Quartile (QU) of the box diagram are calculated, so that a quartile range (IQR) can be obtained, a calculation formula of IQR=QU-QL can be finally calculated, the value of a lower limit line is Max { QL-1.5IQR, min } and the value of an upper limit line is Min { QU+1.5IQR, max } so that related abnormal data which are larger than the upper limit line and smaller than the lower limit line can be rejected.
- 3. The method for complement prediction of deficiency value based on Tucker tensor decomposition as recited in claim 1, wherein step two selectively eliminates samples with data characteristic attribute information deficiency rate exceeding 50% in original samples.
- 4. The method of claim 1, wherein in the third step, before formally constructing the Tucker tensor model, a feature attribute screening strategy of mutual information is adopted, and the strategy comprehensively considers the correlation between each feature attribute and the recovery evaluation result value to select the feature attribute with the largest contribution to the model, wherein the Mutual Information (MI) is a measure of statistical independence between two random variables, and the method is characterized in that The mutual information formula between T is described as follows: Wherein the method comprises the steps of Is the edge distribution of the random variable Z, Is the edge distribution of the random variable T Is a joint distribution of (Z, T), Representing the mutual information value between two random variables Z and T, wherein in the practical sense, the random variable Z represents the characteristic attribute of the second-hand electronic product, the random variable T represents the recovery evaluation prediction result value of the second-hand electronic product to be fully predicted, the value range of the Mutual Information (MI) is mapped to be between 0 and 1, the standardized mutual information (NMI) with more representativeness is obtained, the judgment (NMI) is carried out according to the value of the final standardized mutual information, and the random variable The mutual information formula between T is as follows: Wherein, the Representing random variables Is used as a reference to the entropy of (a), Representing random variables Is used as a reference to the entropy of (a), Then represents the mutual information value between the two random variables Z, T, And the normalized mutual information value between the two random variables Z and T is represented, in the practical sense, the random variable Z represents the characteristic attribute of the second-hand electronic product, the random variable T represents the recovery evaluation prediction result value of the second-hand electronic product to be complemented and predicted, and finally 5 characteristic attributes with the highest mutual information value are reserved to construct a Tucker decomposition complement tensor model.
- 5. The method of claim 1, wherein the fourth step is to construct a Tucker decomposition completion tensor model based on training data, and the test set verifies the accuracy of the model, and before the fourth step is formally adopted, the definition describing Tucker decomposition completion is introduced, and Tucker tensor is decomposed into a product of a core tensor and a factor matrix in each dimension, and a tensor of order n Tucker decomposition of (C) is as follows: Wherein, the Is referred to as the core tensor, Is referred to as a factor matrix, Representing the mode product of tensors in the nth dimension, and finally reserving 5 characteristic attributes with highest standardized mutual information values, namely ID lock, function options, color, storage capacity and memory after the first three steps, wherein the tensors constructed on the basis of the training data set are defined as sparse tensors Definition of core tensors Representing 5 feature attribute interaction cores, Representing the sign of the real set, then explaining Is a% , , , , ) A set of real tensors of a size, Represents the length of the dimension that interacts with the ID lock feature attribute, Represents the length of the dimension that interacts with the feature attributes of the function option, The dimension length representing the interaction with the color feature attributes, Representing a dimension length of interaction with the storage capacity feature attribute; Then the dimension length of the interaction with the memory characteristic attribute is represented, and a factor matrix is defined Representing the characteristic attribute of the ID lock, Representing the sign of the real set, then explaining Is a% , ) A set of real matrices of a size, Representing the number of ID lock feature attribute values, Representation and core tensor Dimension length of interaction, hold and core tensor A kind of electronic device Identical, thus take on value Defining a factor matrix =3 Representing the feature attributes of the function option, Representing the sign of the real set, then explaining Is a% , ) A set of real matrices of a size, Representing the number of feature attribute values of the function option, Representation and core tensor Dimension length of interaction, hold and core tensor A kind of electronic device Identical, thus take on value Defining a factor matrix =3 Representing the attribute of the color feature, Representing the sign of the real set, then explaining Is a% , ) A set of real matrices of a size, Represents the number of color feature attribute values, Representation and core tensor Dimension length of interaction, hold and core tensor A kind of electronic device Identical, thus take on value Defining a factor matrix =3 Representing a characteristic attribute of the storage capacity, Representing the sign of the real set, then explaining Is a% , ) A set of real matrices of a size, Represents the number of color feature attribute values, Representation and core tensor Dimension length of interaction, hold and core tensor A kind of electronic device Identical, thus take on value Defining a factor matrix =3 Representing the characteristic attributes of the memory, Representing the sign of the real set, then explaining Is a% , ) A set of real matrices of a size, Representing the number of memory attribute values, Representation and core tensor Dimension length of interaction, hold and core tensor A kind of electronic device Identical, thus take on value =3, Define error disturbance Representing the gap before and after Loss function Loss, define Definition of learning rate representing gradient decline Contribution size representing regularization penalty term, definition Representing the complete tensor after the predicted missing evaluation result value is completed through the Tucker tensor decomposition, and defining Representing the specific position of a non-zero element in the tensor, thus defining Represented as sparse tensor Non-zero elements of position, definition Representing the complete tensor after the Tucker decomposition completion The complement of the location, definition of Representing the mode product of tensors in the nth dimension, and therefore Representative tensor at the first Mode product in dimension; representative tensor at the first Mode product in dimension; representative tensor at the first Mode product in dimension; representative tensor at the first Mode product in dimension; representative tensor at the first Mode product in dimension, definition Representing the first ID Lock feature attribute matrix Line vector of lines, definition Representing the first functional option feature attribute matrix Line vector of lines, definition Representing the first color feature attribute matrix Line vector of lines, definition Representing the first matrix of storage capacity characterization attributes Line vector of lines, definition Representing the first memory attribute matrix The row vector of the row, The Kronecker product of the representation matrix is based on the definition, a Tucker tensor decomposition complement prediction model is constructed, the accuracy of the model is verified by a test set, the complement prediction of the missing value of the recycling evaluation result of the second-hand electronic product is finally realized, the specific construction prediction flow is simplified into four small steps, and the explanation of the text can be known from the following: step 1, initializing a 5-dimensional Tucker sparse tensor based on the existing training data set According to the initialized 5-dimensional Tucker sparse tensor Initializing a 5-dimensional core tensor And corresponding factor matrix of 5 dimensions ; Step 2, constructing a 5-dimensional Tucker tensor after recovery evaluation result missing value completion prediction based on a gradient descent iterative algorithm of Tucker tensor decomposition completion ; Step 3, calculating and comparing the predicted tensor And true tensor A difference Loss between them; step 4, updating and adjusting the core tensor based on the gradient descent update algorithm according to the difference value Factor matrix Repeating the steps 2, 3 and 4 until convergence.
- 6. The method for deficiency value completion prediction based on Tucker tensor decomposition according to claim 5, wherein, The gradient descent-based updating algorithm is specifically as follows, firstly, a sparse Tucker tensor constructed based on a training set is input The grid element value inside the tensor represents the evaluation result value of the secondhand electronic product, and then a minimum error value is input It is used for judging the condition of stopping final gradient descent iteration, i.e. representing tensor after the output complement evaluation result is missing value With a true sparse Tucker tensor The difference Loss of (1) reaches a convergence condition, and step 1 defines a core tensor of 5 th order randomly initialized according to the definition of a Tucker tensor decomposition formula The core interaction relation representing 5 characteristic attribute dimensions of the secondhand electronic product can be randomly initialized, and then 5 factor matrixes are respectively initialized , , , , Their internal values come from basic attribute values representing 5 feature attribute dimensions, step 2 hyper-parameters Step3 representing degree of suppression of regularization term in loss function of gradient descent iterative computation final error The learning rate and the step length of the Tucker decomposition completion learning formula in the gradient descent process are represented The larger the fluctuation is, the faster the gradient drop can be brought to the lower convergence minimum value but the convergence minimum value can be skipped at the same time, so that the convergence fails, and the step between 4 and 16 is considered as a complete epoch iteration step, and each complete iteration calculates the last time And this time Is less than the minimum error value When the gradient is reduced and stabilized, convergence is considered to be achieved, and steps 5-13 represent traversing the sparse Tucker tensor of the training set structure The training set sample data, and then back-propagating based on the gradient form of the Tucker decomposition formula to update its core tensor parameters and factor matrix parameters, wherein, Representing the mode product of the tensor in the nth dimension, Kronecker product representing matrix, step 6 formula Representing decomposition definition based on Tucker tensors, core tensors Respectively at The product of the dimension and the corresponding factor matrix is complemented to obtain new element Step 7, the formula pairs ID lock characteristic attribute matrix according to a Tucker decomposition formula Obtaining a derived gradient update formula, calculating error back propagation update ID lock characteristic factor matrix of elements after prediction completion and real element values Is the first of (2) 8 Th step formula is used for updating and replacing the characteristic factor matrix of the function option according to the Tucker decomposition formula Obtaining a derived gradient update formula, and calculating an error back propagation update function option feature attribute matrix of the element and the real element value after prediction completion Is the first of (2) A 9 th step formula updates the color characteristic attribute factor matrix according to a Tucker decomposition formula Obtaining a derived gradient update formula, and calculating a color characteristic attribute matrix of error back propagation of the element after prediction completion and a true element value Is the first of (2) A 10 th step formula updates the storage capacity characteristic attribute factor matrix according to a Tucker decomposition formula Obtaining a derived gradient update formula, calculating error back propagation update storage capacity characteristic matrix of elements after prediction completion and real element values Is the first of (2) 11 Step of formula, according to Tucker decomposition formula, to memory characteristic attribute factor matrix Obtaining a derived gradient update formula, calculating error back propagation update memory characteristic attribute matrix of elements and real element values after prediction completion Is the first of (2) A 12 th step formula updates the core tensor according to a Tucker decomposition formula Obtaining a derivative gradient update formula, and calculating error back propagation of elements after prediction completion and real element values to update the whole core tensor, wherein the symbol represents update replacement operation, and the 13 th step represents sparse tensor after completion Every non-zero element in the sequence is traversed, and step 14 is represented by Update replacement Step 15, calculating a loss function formula: Wherein, the Representing sparse Tucker tensors based on training set construction , Represents the latest nuclear tensor parameters after training, Representing the result of tensors The least square method result is calculated by all non-zero element items and elements after the decomposition and the completion of the Tucker, , , , , A matrix of trained updated characteristic attribute value parameters is represented, An L2 regularization term representing a core tensor; representing an ID Lock feature attribute factor matrix Is a L2 regularization term of (2); Attribute factor matrix representing feature of function option Is a L2 regularization term of (2); representing a matrix of color feature attribute factors Is a L2 regularization term of (2); attribute factor matrix representing storage capacity characteristics Is a L2 regularization term of (2); representing a matrix of memory characteristic attribute factors The L2 regularization term of (2) is considered as a complete epoch iteration step between the execution and the 16 th step, namely, the iteration traversal of the complete training data is completed, and the 17 th step represents obtaining a complete tensor after completion 。
Description
Missing value completion prediction method based on Tucker tensor decomposition Technical Field The invention relates to the field of research and application of missing values of recycling evaluation results of secondhand electronic products, in particular to a missing value complement prediction method based on Tucker tensor decomposition. Background The rapid development of technological productivity, the speed of updating the electronic products by people is faster and faster, and a large number of second-hand electronic products are generated worldwide. However, only 20% of the electronics were recorded and recycled. These second-hand electronics have many reusable rare heavy metals and even some strategic resources. However, these second-hand electronic products, if improperly recycled, may generate a large amount of harmful substances to affect the surrounding soil, groundwater, and air environment, thereby affecting the health of human beings. Meanwhile, the second-hand electronic product also has high recovery value, the recovery rate of the second-hand electronic equipment is increased, and the economic and environmental values can be obviously improved. Therefore, there is an urgent need for a method for recycling, evaluating and predicting second-hand electronic products, so as to realize more efficient recycling of the second-hand electronic products. In recent years, in order to realize recycling evaluation prediction of second-hand electronic products, researchers in the field of relevant evaluation prediction propose a construction mode based on mathematical modeling for modeling association relations between various characteristic factors and evaluation prediction results, so that higher evaluation prediction can be realized. The intelligent mobile phone evaluation model comprises a fuzzy comprehensive evaluation method, and the accuracy and the practicability of evaluation are improved through objective weighting and weighted average operators. And analyzing the behaviors of the consumer and the recycler based on the game theory model, and establishing a corresponding assessment prediction model by establishing a comprehensive static information game theory model. In addition, another group of scholars propose a predictive assessment model based on time series. For example, some scholars have proposed a method of fusing Convolutional Neural Network (CNN), two-way long and short term memory network (BiLSTM), and Attention Mechanism (AM) to predict the resulting value of the stock market. Or constructing a prediction model by combining a variational modal decomposition VMD with an LSTM method for predicting various nonferrous metals, wherein the VMD is utilized to decompose an original sequence into a plurality of relatively stable subsequences, and then the LSTM is utilized to excavate the inherent dependency relationship in the subsequences to learn the variation trend of the result. Still other related researchers implement evaluation predictions for related fields using classical related machine learning classical models. However, when the methods face the complex evaluation result value-characteristic relation and sparse data relation in the field of second-hand electronic products, the methods lack sufficient stable effect to realize accurate prediction of the evaluation result. Performance, including RBF models, is affected by kernel function selection and parameter settings, and in second hand electronics evaluation predictions, it may be difficult to determine the optimal combination of parameters due to lack of sufficient domain knowledge. LSTM is an excellent time series model, and can learn the time series characteristics of the predicted result values to predict future evaluation result values, however, the data of the second-hand electronic products are relatively sparse, and particularly in some specific models or categories, a large amount of continuous data is usually required to capture the dynamic characteristics of the time series, which is difficult to apply to the evaluation result missing value prediction of the second-hand electronic products. The MLP and the BPNN have good effects in fitting a nonlinear relation as a neural network model, but the recovery related data set of the second-hand electronic product is lacking, and the method is difficult to be applied to the sparse data scene of the second-hand electronic product. Summarizing, the existing evaluation result missing value prediction method has the following problems: 1. The traditional mathematical modeling mode is based, and the relation between the characteristic attribute of the secondhand electronic product and the evaluation result value is established through complicated steps, so that the problem of low evaluation prediction accuracy is solved, and the method is difficult to be suitable for expansion. 2. The method model based on machine learning and the related method of deep learning through a larg