Search

CN-121658877-B - Internet of things multi-view feature selection method and system based on similarity matrix fusion

CN121658877BCN 121658877 BCN121658877 BCN 121658877BCN-121658877-B

Abstract

The invention discloses a multi-view feature selection method and a multi-view feature selection system of the Internet of things based on similarity matrix fusion, and relates to the technical field of machine learning, wherein the method comprises the steps of firstly splicing incomplete data of each view into a wide table according to columns, and constructing a complete matrix of each view through sampling and filling; the method comprises the steps of clustering data of each view for multiple times to construct corresponding similarity matrixes, stacking all the similarity matrixes into tensors along a third dimension, introducing adaptive weight fusion under low-rank constraint, learning to unify the common similarity matrixes to perform eigenvalue decomposition, extracting eigenvectors corresponding to positive eigenvalues to form pseudo labels, screening high-confidence samples by combining a pace learning strategy, jointly optimizing feature selection coefficients and self-step weights of each view, and calculating importance scores based on the features of each view to obtain an optimal subset. According to the invention, through similar matrix tensor fusion, self-adaptive low-rank constraint and pace learning combined optimization, the characteristic selection of incomplete multi-view data of the industrial Internet of things is realized.

Inventors

  • SHI YIFAN
  • ZENG HAIXIN
  • ZENG HUANQIANG
  • ZHU JIANQING
  • GONG XINRONG
  • XIANG WENJIE
  • CAI LEI
  • LIN QI
  • ZHENG HUIJIE

Assignees

  • 华侨大学

Dates

Publication Date
20260508
Application Date
20260206

Claims (2)

  1. 1. The multi-view feature selection method of the Internet of things based on similarity matrix fusion is characterized by comprising the following steps of: S1, splicing incomplete data of each industrial Internet of things according to columns to obtain a data matrix of a wide table, and performing interpolation based on the data matrix of the wide table to construct a data matrix of each industrial Internet of things view; s2, clustering the data matrix of each view of the industrial Internet of things for a plurality of times, and constructing a similarity matrix of each view; s3, stacking the similarity matrix of each view into a third-order tensor, applying tensor learning constraint to capture a cross-view global low-rank structure, and simultaneously introducing self-adaptive view weight to perform weighted fusion on the cross-view global low-rank structure to obtain a common similarity matrix; S4, performing eigenvalue decomposition on the common similarity matrix to extract a pseudo-tag matrix, and guiding incomplete industrial Internet of things data to construct a multi-view feature selection model through cadence learning; s5, calculating importance scores of view features of each industrial Internet of things based on the multi-view feature selection model, arranging all the features according to a score descending order, and selecting a preset number of features before ranking as a final feature subset; the S1 is specifically as follows: incomplete data of each industrial Internet of things The wide tables are spliced according to the columns as follows: ; ; Wherein, the Representing a multi-view data wide table; representing the overall dimension of the view; representing the dimension of a single view v; Representing a superposition of multi-view dimensions; v represents a single view; The data matrix based on the multi-view data wide table is subjected to interpolation to construct a data matrix of each industrial Internet of things view, as follows: ; Wherein, the And carrying out a plurality of clustering calculation formulas on the data matrix of each industrial Internet of things view, wherein the data matrix represents the data matrix of each industrial Internet of things view after the extraction and the compensation, and the clustering calculation formulas are as follows: ; k clustering results can be obtained through multiple clustering And then constructing a similarity matrix of the k clustering results, wherein the similarity matrix is as follows: ; Wherein, the A matrix representing each view, U representing a clustering result; representing a transpose; The common similarity matrix calculation formula is as follows: ; Wherein, the Representing tensors of the similarity matrix for each view stacked along a third dimension; representing the uniformly learned common similarity matrix; Representing the weighting parameters; representing the F norm; And decomposing the characteristic values of the common similarity matrix to extract a pseudo tag matrix, wherein the calculation formula is as follows: ; ; ; Wherein, the Representing a pseudo tag matrix; representing a first maximum eigenvalue; representing the c-th maximum eigenvalue; Representing elements in brackets to construct a diagonal matrix; indicating that the maximum value is taken; representing a matrix of the first c eigenvectors; Representing the square root of the diagonal matrix; the incomplete industrial internet of things data is guided to construct a multi-view feature selection model through pace learning, and the calculation formula is as follows: ; ; Wherein the method comprises the steps of Learning a proportional parameter for the pace of each view of the internet of things; Representing a step-by-step initial parameter; representing a step-by-step maximum parameter; Representing a step-by-step matrix; Representing a feature selection matrix; representing a pseudo tag matrix; a step-by-step parameter for each data point; representing data selected by self-step; Representing constraint conditions; all the features are arranged according to the descending order of scores, and the features with the preset quantity before ranking are selected as a final feature subset, which specifically comprises the following steps: calculating importance scores of all features in each Internet of things view : ; Wherein, the A j row of a view feature selection matrix of the v-th Internet of things is represented; summarizing feature importance scores of all the Internet of things views: T2 ; The scores of all the features in T2 are arranged in descending order, and m features which are ranked at the top are selected as a feature subset which is finally selected: ; Wherein m is a preset feature selection number.
  2. 2. The internet of things multi-view feature selection system based on similarity matrix fusion is characterized by comprising: The extraction and compensation module is used for splicing incomplete data of each industrial Internet of things according to columns to obtain a data matrix of a wide table, and extracting and compensating the data matrix of the wide table to construct a data matrix of each industrial Internet of things view; the clustering module is used for clustering the data matrix of each view of the industrial Internet of things for a plurality of times and constructing a similarity matrix of each view; The weighting fusion module is used for stacking the similar matrix of each view into a third-order tensor and applying tensor learning constraint to capture a cross-view global low-rank structure, and simultaneously introducing self-adaptive view weight to carry out weighting fusion on the cross-view global low-rank structure to obtain a common similar matrix; the pace guiding module is used for carrying out eigenvalue decomposition on the common similarity matrix to extract a pseudo-tag matrix, and guiding incomplete industrial Internet of things data to construct a multi-view feature selection model through pace learning; The feature selection module is used for calculating importance scores of view features of each industrial Internet of things based on the multi-view feature selection model, arranging all the features according to a score descending order, and selecting a preset number of features before ranking as a final feature subset; the S1 is specifically as follows: incomplete data of each industrial Internet of things The wide tables are spliced according to the columns as follows: ; ; Wherein, the Representing a multi-view data wide table; representing the overall dimension of the view; representing the dimension of a single view v; Representing a superposition of multi-view dimensions; v represents a single view; The data matrix based on the multi-view data wide table is subjected to interpolation to construct a data matrix of each industrial Internet of things view, as follows: ; Wherein, the A data matrix representing each industrial Internet of things view after the extraction and compensation; The data matrix of each industrial Internet of things view is subjected to multiple clustering calculation formulas as follows: ; k clustering results can be obtained through multiple clustering And then constructing a similarity matrix of the k clustering results, wherein the similarity matrix is as follows: ; Wherein, the A matrix representing each view, U representing a clustering result; representing a transpose; The common similarity matrix calculation formula is as follows: ; Wherein, the Representing tensors of the similarity matrix for each view stacked along a third dimension; representing the uniformly learned common similarity matrix; Representing the weighting parameters; representing the F norm; And decomposing the characteristic values of the common similarity matrix to extract a pseudo tag matrix, wherein the calculation formula is as follows: ; ; ; Wherein, the Representing a pseudo tag matrix; representing a first maximum eigenvalue; representing the c-th maximum eigenvalue; Representing elements in brackets to construct a diagonal matrix; indicating that the maximum value is taken; representing a matrix of the first c eigenvectors; Representing the square root of the diagonal matrix; the incomplete industrial internet of things data is guided to construct a multi-view feature selection model through pace learning, and the calculation formula is as follows: ; ; Wherein the method comprises the steps of Learning a proportional parameter for the pace of each view of the internet of things; Representing a step-by-step initial parameter; representing a step-by-step maximum parameter; Representing a step-by-step matrix; Representing a feature selection matrix; representing a pseudo tag matrix; a step-by-step parameter for each data point; representing data selected by self-step; Representing constraint conditions; all the features are arranged according to the descending order of scores, and the features with the preset quantity before ranking are selected as a final feature subset, which specifically comprises the following steps: calculating importance scores of all features in each Internet of things view : ; Wherein, the A j row of a view feature selection matrix of the v-th Internet of things is represented; summarizing feature importance scores of all the Internet of things views: ; The scores of all the features in T2 are arranged in descending order, and m features which are ranked at the top are selected as a feature subset which is finally selected: ; Wherein m is a preset feature selection number.

Description

Internet of things multi-view feature selection method and system based on similarity matrix fusion Technical Field The invention relates to the technical field of machine learning, in particular to a multi-view feature selection method and system of the Internet of things based on similarity matrix fusion. Background Along with the rapid development of the industrial Internet of things technology, a large number of sensors, monitoring equipment and intelligent terminals are deployed in an industrial production environment, and massive multi-source heterogeneous data are generated. These data often come from different types of acquisition devices (e.g., temperature sensors, vibration sensors, image collectors, sound monitors, etc.), forming a typical multi-view data structure, providing a rich source of information for industrial equipment condition monitoring, fault diagnosis, and production optimization. However, in a practical industrial scenario, due to sensor failure, network transmission interruption, data acquisition dyssynchrony, etc., the multi-view data often has an incomplete problem, namely, the data of a partial view is missing. This imperfection presents a serious challenge to conventional multi-view learning methods, directly affecting the accuracy and reliability of feature selection. The multi-view feature selection is used as an important means for reducing the data dimension and extracting key features, redundant information can be effectively removed, and the efficiency and performance of subsequent machine learning tasks are improved. However, the existing multi-view feature selection method has the following disadvantages: firstly, most of the existing methods assume that all view data are completely available, and the problem of data missing commonly existing in the industrial Internet of things cannot be effectively processed, so that the performance is obviously reduced in practical application; secondly, when multi-view information is fused by the traditional method, each view is often subjected to equal weight processing or independent processing, the complementarity and consistency relation between different views are ignored, and the potential association structure of the multi-view data cannot be fully mined; thirdly, the existing similarity matrix construction method mostly adopts a fixed similarity measurement mode, lacks self-adaptive learning capability, and is difficult to accurately describe complex relations among samples in industrial data; fourth, when incomplete multi-view data is processed, the existing method lacks an effective learning strategy to balance the learning process of samples with different difficulties, is easily interfered by noise data and missing data, and affects the robustness and generalization capability of the model. Therefore, a multi-view feature selection method capable of effectively processing incomplete data of an industrial internet of things, fully fusing multi-view similarity information and having adaptive learning capability is needed. Disclosure of Invention In order to solve the problems, the invention provides the multi-view feature selection method and system of the Internet of things based on similarity matrix fusion, and the robust, consistent and interpretable screening of key features in incomplete industrial Internet of things data is realized through multi-view similarity fusion, tensor low-rank modeling, pace learning guidance and joint feature scoring. On the one hand, the internet of things multi-view feature selection method based on similarity matrix fusion comprises the following steps: S1, splicing incomplete data of each industrial Internet of things according to columns to obtain a data matrix of a wide table, and performing interpolation based on the data matrix of the wide table to construct a data matrix of each industrial Internet of things view; s2, clustering the data matrix of each view of the industrial Internet of things for a plurality of times, and constructing a similarity matrix of each view; s3, stacking the similarity matrix of each view into a third-order tensor, applying tensor learning constraint to capture a cross-view global low-rank structure, and simultaneously introducing self-adaptive view weight to perform weighted fusion on the cross-view global low-rank structure to obtain a common similarity matrix; S4, performing eigenvalue decomposition on the common similarity matrix to extract a pseudo-tag matrix, and guiding incomplete industrial Internet of things data to construct a multi-view feature selection model through cadence learning; and S5, calculating importance scores of view features of each industrial Internet of things based on the multi-view feature selection model, arranging all the features according to a score descending order, and selecting the features with the preset quantity before ranking as a final feature subset. Further, S1 is specifically as follows: incomplete data of e