CN-121980260-A - Topology prototype enhancement method for traffic field class imbalance and few-sample scene
Abstract
The invention discloses a topology prototype enhancement method for traffic field class unbalance and few sample scenes, and belongs to the field of machine learning and data processing. The method solves the problems of unstable decision boundary and poor model generalization and robustness of the traditional machine learning model under the condition of scarce traffic data labeling and unbalanced category distribution, and comprises the following steps of firstly obtaining a representative point set of traffic data with local and whole topological structures based on topology sub-sampling and topology resampling; and finally, generating pseudo dimensions or performing high-dimensional topology embedding based on the traffic data topology prototype to construct enhanced features for training and reasoning of a follow-up neural network model and a traditional machine learning model. According to the invention, through explicitly integrating traffic data topological structure information, the variability of decision boundaries is effectively reduced, and the stability and the accuracy of the model under unbalanced and less sample scenes are improved.
Inventors
- SHI YUNYANG
- WEI LE
- PENG ZEHUA
- CHEN HAOWEN
- ZHOU ZHEN
- CHEN QIMING
- SONG ZHE
- HONG QI
- FANG WEI
- LIU ZHIYUAN
- Kuang Pengyu
- WANG ENLIN
Assignees
- 江南大学
Dates
- Publication Date
- 20260505
- Application Date
- 20260108
Claims (10)
- 1. A topology prototype enhancement method for traffic field class unbalance and few sample scenes is characterized by comprising the following steps: Step1, obtaining a representative point set for maintaining traffic data of local and whole topological structures based on topology sub-sampling and topology resampling; step 2, screening a topologically stable traffic data prototype through continuous coherence, topological distance and boundary topological variability analysis; and 3, generating a pseudo dimension or performing high-dimensional topology embedding based on the traffic data topology prototype to construct enhanced features, and training and reasoning a follow-up neural network model and a traditional machine learning model.
- 2. The topology prototype enhancement method for traffic domain class imbalance and few sample scenes according to claim 1, wherein step 1 comprises the sub-steps of: step 1.1, aiming at traffic data, adopting a characteristic lattice algorithm to perform topological sub-sampling on an original point cloud; And 1.2, performing topology resampling according to category distribution and local topology sparsity, and realizing category balance on the premise of keeping topology consistency.
- 3. The topology prototype enhancement method for traffic domain class imbalance and few sample scenes according to claim 1, wherein step 2 comprises the sub-steps of: step 2.1, extracting continuous coherent features based on constructed topological complex, and generating a persistent graph; And 2.2, combining two key indexes of point cloud distance and decision boundary variability, and screening a representative sample which keeps topological stability under multi-scale disturbance as a topological prototype.
- 4. The topology prototype enhancement method for traffic domain class imbalance and few sample scenes according to claim 1, wherein step 3 comprises the sub-steps of: step 3.1, extracting multi-scale topological features from a topological prototype based on continuous coherence, and introducing a topological clustering result as a pseudo dimension into an extended feature space; Step 3.2, constructing high-dimensional topology embedding based on triangulation and deformation gradient optimization to obtain enhanced representation maintaining topology consistency; And 3.3, inputting the enhancement features generated by pseudo-dimension or high-dimension embedding into a downstream neural network model and a traditional machine learning model so as to improve the robustness and the separability of classification or anomaly detection tasks.
- 5. The topology prototype enhancement method for traffic domain class imbalance and small sample scene according to claim 2, wherein step 1.1 comprises the sub-steps of Step 1.1.1 dividing the sampling space into m-dimensional hypercubes, each m-cube Is longer than the side length of (a) Ratio to sub-sample The relationship of (2) is as follows: , In the formula, And Representing the upper and lower bounds of the data value in the kth dimension respectively, D represents the spatial dimension and n represents the total number of samples; Step 1.1.2 m-cube based And point cloud Intersection from point cloud Selecting a sample point All m-cubes Form a new point cloud 。
- 6. The topology prototype enhancement method for traffic domain class imbalance and few sample scenarios of claim 5, wherein step 1.2 comprises the sub-steps of: step 1.2.1, identifying sparse areas of a few classes and dense areas of a plurality of classes according to distribution conditions of samples in different classes and sparseness of the samples in local adjacent domains; Step 1.2.2, set up for each batch Local upsampling is performed on sparse regions, moderate downsampling is performed on dense regions, and the integrity and connectivity of contiguous structures is maintained during the sampling operation.
- 7. The topology prototype enhancement method for traffic domain class imbalance and few sample scenarios of claim 3, wherein step 2.1 comprises the sub-steps of: step 2.1.1, constructing different scales according to the following formula The following VR complex describes the topology of traffic-participating individual data at both the local and global levels: For tags with traffic behavior Is a node of (2) VR Complex is noted as , wherein, Is a collection of vertices that are to be processed, Is defined in The above measures, if and only if for all Satisfies the following conditions In the time-course of which the first and second contact surfaces, ; And 2.1.2, aiming at VR complex shape, calculating continuous coherence, and extracting topological structure characteristics of traffic data by calculating coherence groups with different dimensions on a filtering sequence to generate a persistence graph of multi-scale topological change.
- 8. The topology prototype enhancement method for traffic domain class imbalance and few sample scene of claim 7, wherein step 2.2 comprises the sub-steps of: Step 2.2.1, calculating Wasserstein distance between a persistence graph of an original point cloud and a persistence graph of a sub-sampling point cloud, measuring the death pairs of topological features of the original point cloud and the sub-sampling point cloud under different scales, and reserving a sampling point cloud sample smaller than the preset Wasserstein distance as a post-selection type; Step 2.2.2, evaluating the sensitivity of the decision boundary to disturbance of the data distribution based on the variability of the decision boundary, wherein for a probability space and its corresponding probability density function, given a decision function f, the decision boundary manifold is represented as an equivalent set represented by the following formula: , where Z denotes a sample point in the feature space Z, Z denotes the feature space of the sample, Representing a conditional probability density function with a random variable Z value of Z under the condition that the output class of the given classifier f is y; For training set X, set To be according to the sub-sampling ratio (Same as 1.1.1) the selected sub-sample set satisfies the following conditions Parameters (parameters) Is readjusted by the learning algorithm a thereupon, The data decision boundary variability is defined as: , wherein D represents the data distribution, Representing the pair of samples extracted from D Is not limited to the desired one; And (3) with Respectively represent parameters as And (3) with I is an indication function, 1 is taken when the condition in brackets is established, otherwise 0 is taken; Representing the sub-sampling ratio at all satisfaction Taking a minimum value on a sub-sampled subset of (a); for when the sub-sampling ratio is At this time, the decision boundary has a minimum expected degree of change to the training data perturbation.
- 9. The topology prototype enhancement method for traffic domain class imbalance and few sample scenarios of claim 4, wherein step 3.1 comprises the sub-steps of: Step 3.1.1 for the traffic participant individual data set Each of the traffic participant individuals Constructing a different scale parameter The VR complex sequence below, i.e., VR filter, and calculate its corresponding persistence map As a topological feature of the node; step 3.1.2 for Is a local neighborhood of all And Is a local clinical domain of (2) Persistence using 2-Wasserstein distance metric graphs And (3) with Is a similarity of (3). At the same time, define nodes Expressed in d-dimensional topologically embedded space as And construct its corresponding persistence graph PD based on the representation ) Similarly defined as And constructing a persistent graph PD # ) Two persistent graphs PD # ) And PD # - ) Wasserstein distance between L infinity norms mapped by bijections between two persistence maps Measuring; Step 3.1.3 constructing a distance complete graph G and corresponding adjacency matrix based on 2-Wasserstein distances Wherein , Parameters for 2-Wasserstein distance between the i-th sample and the persistence graph corresponding to the j-th sample The method is characterized in that the method is carried out through an elbow method or a cross validation method, the connected components of the graph G correspondingly obtain topological clusters, and point sets with similar topological features are mapped onto the same plane of a three-dimensional space, so that the original data are unfolded into a structure which is easier to split.
- 10. The topology prototype enhancement method for traffic domain class imbalance and few sample scenarios of claim 9, wherein step 3.2 comprises the sub-steps of: step 3.2.1, introducing a triangulation-based quantization technology for expressing the topological structure of the point cloud in a discrete form, wherein the method specifically comprises the following steps: Performing topology expression on the original point cloud by adopting a triangle topology space formed by a group of non-overlapping simplex, and before and after high-dimensional embedding, performing vertex of the triangle Respectively denoted as And By edge vector And Connected, edge vectors And Represented as And Local topology information is encapsulated, wherein Is an index vector; Step 3.2.2, based on triangulation structure, deformation gradient and transformation are defined as: , In the formula, The representation is made of a combination of a first and a second color, The representation is made of a combination of a first and a second color, Representing deformation gradient Is that Transformation of Is that By means of : The QR decomposition of (1) to obtain deformation gradient, the transformation is divided into scaling matrix And Expressed as Wherein the method comprises the steps of Is a unitary matrix; Step 3.2.3 to minimize high-dimensional embedding transformations And deformation gradient The difference between them is targeted, an objective function is defined as follows, , Wherein the method comprises the steps of , The original dimension remains unchanged after high-dimensional embedding, i.e ; Step 3.2.4, dividing the regularization term into two parts, namely a first term Regularizing the embedding, second term Regularizing topology and embedding regularized items The expression of (2) is as follows: , In the middle of Is the number of the vertexes, and is the number of the vertexes, Is a neighborhood consisting of K nearest neighbors; Topology regularization term The expression is as follows: , the final objective function is expressed as: , In the formula, , The weight coefficient is used for balancing the relative importance of each item in the overall objective function, and the value of the weight coefficient can be determined through empirical setting or cross verification according to specific task requirements.
Description
Topology prototype enhancement method for traffic field class imbalance and few-sample scene Technical Field The invention belongs to the field of machine learning and data processing, and particularly relates to a topology prototype enhancement method for traffic field class unbalance and few-sample scenes. Background With the rapid development of artificial intelligence in the traffic field, machine learning models are increasingly applied to decision tasks in high-risk scenes, such as anomaly determination of vehicle trajectory data. However, real world traffic data is ubiquitous with the problem of labeling scars and extremely unbalanced categories, particularly for relatively rare but highly influential events such as illegal lane changes and abnormal parking. Once a minority class sample is misjudged, serious security consequences can be caused, so that stable identification of minority classes under limited labeling conditions becomes an important challenge. The existing method generally relieves the influence caused by category imbalance through means of resampling, synthetic sample generation or feature space dimension reduction. However, these methods are mainly applied to the distribution or geometric representation of data, and cannot explicitly preserve the topological structure of the data, and especially the sparse region near the decision boundary easily causes unstable models, so that the generalization capability of the neural network model and the traditional machine learning model in a few-sample scene is limited. In recent years, topology Data Analysis (TDA) provides tools for describing multi-scale structures of data, such as Vietoris-Rips complex and persistent congruence, capable of characterizing the topology of connectivity, holes, etc. Based on the method, the invention provides a topology prototype enhancement method for traffic field class unbalance and few sample scenes, which is used for more stably maintaining data structure information and improving the recognition performance of a model in a key area. The results of verification on a plurality of real data sets comprising vehicle tracks and the like show that the method achieves performance superior to the traditional technology under the conditions of few samples and unbalance. Disclosure of Invention Aiming at the problems existing in the prior art, the invention provides a topology prototype enhancement method for traffic field class unbalance and few sample scenes, and solves the problems of unstable decision boundary, poor model generalization and poor robustness of a traditional machine learning model under the conditions of traffic data labeling scarcity and class distribution unbalance. In order to solve the technical problems, the invention provides the following technical scheme that the topology prototype enhancement method for traffic field class unbalance and few sample scenes comprises the following steps: Step1, obtaining a representative point set for maintaining traffic data of local and whole topological structures based on topology sub-sampling and topology resampling; step 2, screening a topologically stable traffic data prototype through continuous coherence, topological distance and boundary topological variability analysis; and 3, generating a pseudo dimension or performing high-dimensional topology embedding based on the traffic data topology prototype to construct enhanced features, and training and reasoning a follow-up neural network model and a traditional machine learning model. Further, the foregoing step 1 includes the following substeps: step 1.1, aiming at traffic data, adopting a characteristic lattice algorithm to perform topological sub-sampling on an original point cloud; And 1.2, performing topology resampling according to category distribution and local topology sparsity, and realizing category balance on the premise of keeping topology consistency. Further, the foregoing step 2 includes the following substeps: step 2.1, extracting continuous coherent features based on the constructed topological complex, and generating a persistent graph; and 2.2, combining two key indexes of the cloud distance of the point and the variability of the decision boundary, and screening a representative sample which keeps topological stability under multi-scale disturbance as a topological prototype. Further, the foregoing step 3 includes the following substeps: Step 3.1, extracting multi-scale topological features from a topological prototype based on continuous coherence, and introducing topological clustering results as pseudo dimensions into an extended feature space; Step 3.2, constructing high-dimensional topology embedding based on triangulation and deformation gradient optimization to obtain enhanced representation for maintaining topology consistency; And 3.3, inputting the enhanced features generated by pseudo-dimension or high-dimension embedding into a downstream neural network model and a tra