CN-122020135-A - Online multi-tag feature selection method for tag missing stream feature scene
Abstract
The invention discloses an online multi-tag feature selection method for a tag missing stream feature scene, and belongs to the technical field of feature selection. The method comprises the steps of aiming at a multi-label data environment in which characteristics dynamically arrive and labels have missing, firstly recovering missing labels in a multi-label decision system to obtain a complete label set, then constructing fuzzy similarity under the label set by utilizing fuzzy label co-occurrence similarity, constructing the fuzzy similarity under the characteristics through a Gaussian kernel function when new characteristics arrive, calculating fuzzy relative correlation between the characteristics and the complete label set, discarding the characteristics if the fuzzy relative correlation is lower than a threshold value, otherwise, further analyzing fuzzy redundancy between the characteristics and selected characteristics, and deciding to reserve, replace or discard the characteristics according to redundancy comparison results. The method can realize efficient online feature selection under the conditions that the complete feature space is unknown in advance and the label is incomplete, and has the advantages of strong uncertainty processing capability, adaptation to dynamic data environment and effective reduction of feature dimension.
Inventors
- XU DUO
- DAI JIANHUA
Assignees
- 湖南师范大学
Dates
- Publication Date
- 20260512
- Application Date
- 20260414
Claims (7)
- 1. An online multi-label feature selection method for a label missing stream feature scene is characterized in that, Input ① A Multi-tag decision System with tag loss Wherein Is the domain of the theory of the invention, Is a set of features that are reached dynamically, Is a set of labels with missing values, Is a value range of the value, Is U and To the point of Is mapped to; ② First threshold value And a second threshold value ; Output of selection feature set ; The method specifically comprises the following steps: s101, recovering missing labels in a multi-label decision system to obtain a complete label set ; S102, calculating complete label set by utilizing fuzzy label co-occurrence similarity Fuzzy similarity matrix among samples in the lower discussion domain U, and fuzzy similarity class under the L is constructed by utilizing the fuzzy similarity ; S103, initializing and selecting a feature set Is an empty set; S104, when the new feature Upon arrival, construction using gaussian kernel functions Similarity matrix among samples in the following discussion domain U, and construction by utilizing fuzzy similarity Fuzzy similarity class below ; S105, calculating the complete label set Fuzzy relative correlation between If (if) Permanently discard Returning to the step S104 to process the next arrived feature, otherwise, entering the next step; S106, regarding the current selection feature set Each feature of (a) Respectively calculate Is known as follows Is of the fuzzy redundancy of (2) A kind of electronic device Is known as follows Is of the fuzzy redundancy of (2) ; S107, pair of With the currently selected feature set Each feature of (a) Performing paired redundancy analysis if Discarding Returning to step S104 to process the next arrived feature, if Then use Replacement of And then go to the next step; S108, returning to the step S104 until no new feature arrives; S109, outputting the final selection feature set 。
- 2. The method for online multi-tag feature selection for a tag-missing stream feature scene according to claim 1, wherein the step S101 comprises: s201, for multi-tag decision system with tag deletion Each sample of (3) Traversing its tag vector Identifying all markers as missing, i.e Is recorded as a sample Is a missing tag set of (2) ; S202, for sample Is not included in the sequence of the deletion tag Generating a sample And all other samples in the system In the process of removing labels Tag uniformity ratio on all tags outside ; S203, generating a sample based on the tag consistency ratio of the step S202 On-label Probability estimation with an upper value of 1 ; S204, recovering the missing label according to the probability estimation value of the step S203, if Will be The value is restored to 1, otherwise, to 0.
- 3. The method for online multi-tag feature selection for a tag-missing stream feature scene as claimed in claim 2, wherein the tag consistency ratio in step S202 Generated by formula (1): Formula (1) Wherein, the To indicate the function, if If true, returning to 1, otherwise, returning to 0; In step S203, a sample On-label The probability estimate with an upper value of 1 is generated by equation (2): Formula (2) Wherein, the , 。
- 4. The method for online multi-tag feature selection for tag-missing stream feature scenes according to claim 1, wherein the step S102 is to simulate tag co-occurrence similarity Obtained by the formula (3): Formula (3) Wherein, the Is a sample in the domain; The fuzzy similarity class under L in the step S102 Obtained by formula (4): Formula (4) Wherein n is Is the total number of samples.
- 5. The method for online multi-tag feature selection for a tag-missing stream feature scene according to claim 1, wherein the gaussian kernel function in step S104 is obtained by the formula (5): formula (5) Wherein, the For the samples in the domain of discussion, For the euclidean distance between two samples, Is the standard deviation; In the step S104 Fuzzy similarity class below Obtained by the formula (6): formula (6) Wherein n is Is the total number of samples.
- 6. The method for online multi-tag feature selection for tag-missing stream feature scenes according to claim 1, wherein in step S105 And complete tag set Fuzzy relative correlation between Generated by equation (7): Formula (7) Wherein, the And Fuzzy mutual information and fuzzy entropy are respectively generated through a formula (8) and a formula (9): Formula (8) Formula (9) Wherein n is In the total number of samples in the sample, Is that The underlying fuzzy similarity class is that of, As the fuzzy similarity class under L, The cardinality of fuzzy similarity classes is represented.
- 7. The method for online multi-tag feature selection for tag-missing stream feature scenes according to claim 1, wherein in step S106 Is known as follows Is of the fuzzy redundancy of (2) A kind of electronic device Is known as follows Is of the fuzzy redundancy of (2) Generated by equation (10) and equation (11), respectively: Formula (10) Formula (11) Wherein, the And For fuzzy condition mutual information, the fuzzy condition mutual information is generated by a formula (12) and a formula (13): formula (12) Formula (13) Wherein n is In the total number of samples in the sample, Is that The underlying fuzzy similarity class is that of, Is that The underlying fuzzy similarity class is that of, Is that The underlying fuzzy similarity class is that of, The cardinality of fuzzy similarity classes is represented.
Description
Online multi-tag feature selection method for tag missing stream feature scene Technical Field The invention relates to a feature selection method, in particular to an online multi-tag stream feature selection method for a tag missing scene, which is particularly suitable for a multi-tag data environment in which features arrive dynamically and tags are missing. Background With the advent of the big data age, the data dimension continues to climb, bringing more pressure to data mining or machine learning tasks. The multi-label feature selection is used as an important preprocessing step of data mining, pattern recognition and machine learning tasks, redundant and irrelevant attributes are eliminated from a large number of features, the data dimension is reduced, and the algorithm efficiency is improved. In practical applications, the feature space cannot be known in advance, and the feature space arrives sequentially in the form of a stream feature, i.e. the feature is dynamically generated under the condition that the instance set is fixed. To address this challenge, multi-tag stream feature selection methods have been developed that enable online selection of feature subsets from a dynamically arriving feature stream in real-time, enabling flexible adaptation of models to changing feature sets, as compared to traditional multi-tag feature selection methods. The acquisition process of multi-tag data is often limited. Due to the limitations of high data labeling cost and the like, the label missing phenomenon of the example is very common, and incomplete multi-label data is formed. In this tag-missing scenario, the conventional multi-tag stream feature selection method cannot directly address the tag-missing problem. More importantly, in a stream feature scenario, the complete feature space is unknown, which makes many conventional approaches that require reliance on complete feature space information for tag completion or restoration no longer applicable. Thus, there is a lack of a method in the art that can effectively handle multi-tag stream feature selection in the absence of tags. In a complex environment of multi-tag learning, effectively handling data uncertainty is critical to feature learning. Therefore, the fuzzy rough set theory constructs a set of mathematical frameworks for processing uncertainty and quantifying fuzzy relation between characteristics and labels. The fuzzy similarity relationship is used for measuring the similarity degree between samples, and the fuzzy similarity class forms fuzzy division of a domain, so that the fuzzy similarity class and the domain form an important basis of a feature selection method based on fuzzy rough sets. However, real-world multi-tag data often exhibits sparsity characteristics, in that most tags take negative values, while positive tags are rare but often contain stronger discrimination information. In this real-world context, traditional tag-consistency-based similarity measurement methods have difficulty adequately capturing and utilizing key information conveyed by positive tags. Disclosure of Invention In view of the defects in the prior art, the invention provides an online multi-tag feature selection method for a tag missing stream feature scene. The method aims to effectively process complex situations of tag data missing and feature dynamic arrival, and achieve dynamic feature dimension reduction. In order to achieve the above object, the present invention adopts the following technical scheme: the invention provides an online multi-tag feature selection method for a tag missing stream feature scene, which specifically comprises the following steps: step one, inputting a multi-label decision system with label missing WhereinIs the domain of the theory of the invention,Is a set of features that are reached dynamically,Is a set of labels with missing values,Is a value range of the value,Is U andTo the point ofAnd a first threshold valueAnd a second threshold value; Recovering the missing label data in the multi-label decision system to generate a complete label set; Step three, constructing a complete label set oriented to recovery by utilizing fuzzy label co-occurrence similarityFuzzy similarity matrix among samples in the lower discussion domain U, and fuzzy similarity class under the L is constructed by utilizing the fuzzy similarity; Initializing the selected feature setIs an empty set; Step five, when the new feature Upon arrival, construction using gaussian kernel functionsSimilarity matrix among samples in the following discussion domain U, and construction by utilizing fuzzy similarityFuzzy similarity class below; Step six, calculatingAnd complete tag setFuzzy relative correlation betweenIf (if)Permanently discardReturning to the fifth step to process the next arrived feature, otherwise, entering the next step; step seven, for the current selection feature set Each feature of (a)Respectively calculateIs known as followsIs of the fuzzy r