CN-121982586-A - Unmanned aerial vehicle hyperspectral vegetation species classification method and system based on deep learning

CN121982586ACN 121982586 ACN121982586 ACN 121982586ACN-121982586-A

Abstract

The application discloses an unmanned aerial vehicle hyperspectral vegetation species classification method based on deep learning systems, devices, and media. The method comprises the steps of obtaining original hyperspectral images, liDAR point clouds and time sequence data of flight gestures, which are synchronously collected by an unmanned aerial vehicle, dynamically correcting the hyperspectral images through a gesture self-adaptive spectrum feature extraction network, inhibiting spectrum distortion caused by gesture disturbance, extracting hyperspectral depth feature images, constructing a three-dimensional space image structure based on the LiDAR point clouds to obtain point cloud feature images, combining the two feature images into a cross-modal abnormal image, carrying out node level feature fusion by using a cross attention mechanism of fusion space constraint, carrying out time sequence alignment treatment, and finally inputting a classification network to output vegetation species classification results. The method effectively solves the problems of spectrum pseudo variation and multi-source data space-time asynchronism in the traditional mode, and remarkably improves classification precision and model generalization capability.

Inventors

ZHANG FUCUN
SUN YONGWANG
LI HONGYING
LIU YAN
XU SHANSHAN
Yan Luqing

Assignees

青海理工学院

Dates

Publication Date: 20260505
Application Date: 20260123

Claims (10)

1. An unmanned aerial vehicle hyperspectral vegetation species classification method based on deep learning, which is characterized by comprising the following steps: Acquiring an original hyperspectral image, an original LiDAR point cloud and flight attitude time sequence data which are synchronously acquired by an unmanned aerial vehicle; inputting the original hyperspectral image and the flight attitude data at the corresponding moment into an attitude self-adaptive spectral feature extraction network, and processing the hyperspectral image and the flight attitude data at the corresponding moment through an attitude sensing dynamic correction mechanism embedded in the network to obtain a hyperspectral depth feature image; Constructing a space map structure based on the original LiDAR point cloud to obtain a LiDAR point cloud feature map with three-dimensional structural features; Combining the hyperspectral depth feature map with the LiDAR point cloud feature map to obtain a cross-modal heterogeneous map, wherein the cross-modal heterogeneous map comprises hyperspectral image nodes and LiDAR point cloud nodes; Fusing the graph nodes by using a cross attention mechanism according to the cross modal abnormal graph to obtain a fused cross modal characteristic graph; performing time sequence alignment processing on the fused cross-modal feature map to obtain an aligned and fused cross-modal feature map; inputting the aligned and fused cross-modal feature images into a classification network to obtain vegetation species classification results.
2. The method according to claim 1, wherein the fusing between graph nodes according to the cross-modal iso-graph by using a cross-attention mechanism to obtain a fused cross-modal feature graph includes: Extracting the characteristics of the hyperspectral image nodes to obtain characteristic vectors; generating a key vector set and a value vector set for the LiDAR point cloud node through linear transformation based on the LiDAR point cloud node; Performing dot product similarity calculation on the feature vector and the key vector set to obtain a feature similarity score; the feature similarity score is combined with the corresponding space distance constraint, and attention weight is obtained through normalization processing; The attention weight and the value vector set are subjected to weighted aggregation, and a corresponding three-dimensional structure context feature vector is generated for the hyperspectral image node; and obtaining the fused cross-modal feature map according to the spatial position relation of each node based on the context feature vector of the three-dimensional structure.
3. The method of claim 2, wherein the feature similarity score, in combination with a corresponding spatial distance constraint, is normalized to obtain an attention weight, comprising: acquiring two-dimensional image coordinates of the hyperspectral image nodes and two-dimensional coordinates of the LiDAR point cloud nodes projected to a horizontal plane; calculating Euclidean distance between the two-dimensional coordinates of the hyperspectral image nodes and the two-dimensional coordinates of the LiDAR point cloud nodes to obtain a spatial offset; inputting the spatial offset into a learnable negative correlation function, and calculating to obtain a spatial constraint penalty term; adding the dot product similarity calculation result with the corresponding space constraint penalty term to obtain corrected attention score; And carrying out Softmax normalization on the corrected attention score to obtain the attention weight.
4. A method according to claim 3, wherein the learnable negative correlation function is defined as: wherein S (Δd) is a spatial constraint penalty term, Δd is the spatial offset, And β is a positive number learnable parameter optimized by back propagation during model training.
5. The method according to claim 1, wherein the inputting the hyperspectral image and the flight attitude data at the corresponding time to the attitude adaptive spectral feature extraction network, processing by an attitude-aware dynamic correction mechanism embedded in the network, and obtaining a hyperspectral depth feature map, includes: preprocessing the original hyperspectral image to obtain a preprocessed hyperspectral image cube; extracting attitude angle data matched with the hyperspectral image cube imaging moment from the flight attitude time sequence data to obtain an attitude feature vector; inputting the preprocessed hyperspectral image cube into a three-dimensional convolution network to extract initial space and spectral characteristics, and obtaining an initial characteristic diagram; Inputting the initial feature map and the gesture feature vector into a dynamic feature correction module to generate a spatial deformation field related to the current gesture; Based on the space deformation field, correcting the characteristic space distortion of the initial characteristic map caused by flight disturbance to obtain a geometric correction characteristic map; Based on the attitude feature vector, carrying out channel attention weighting on the geometric correction feature map to obtain a weighted geometric correction feature map; and suppressing a pseudo spectrum variation channel caused by imaging angle change of the weighted geometric correction feature map to obtain the hyperspectral depth feature map.
6. The method of claim 5, wherein inputting the initial feature map and the pose feature vector into a dynamic feature correction module generates a spatial deformation field associated with a current pose, comprising: Performing global feature coding on the initial feature map to obtain a global feature vector; fusing the global feature vector and the attitude feature vector to obtain a fused feature vector; Inputting the fusion feature vector into a multi-layer perceptron network for nonlinear mapping to obtain a two-dimensional offset parameter; and adding the two-dimensional offset parameter and a sampling grid of a standard convolution kernel to form an adaptive sampling position, wherein a field formed by the adaptive sampling position is the spatial deformation field.
7. The method of claim 1, wherein the constructing a space map structure based on the original LiDAR point cloud, obtaining a LiDAR point cloud feature map with three-dimensional structural features, comprises: Dividing the original LiDAR point cloud into vegetation point clouds to obtain vegetation canopy points; carrying out regularized pretreatment on the vegetation canopy points to obtain pretreated vegetation canopy points; extracting structural features of the pretreated vegetation canopy points to obtain a point-level feature vector set; And establishing a three-dimensional graph structure taking LiDAR point cloud data points as nodes based on the point-level feature vector set, and obtaining the LiDAR point cloud feature graph.
8. Unmanned aerial vehicle hyperspectral vegetation species classification system based on deep learning, characterized in that the system comprises: The data acquisition module is used for acquiring an original hyperspectral image, an original LiDAR point cloud and flight attitude time sequence data which are synchronously acquired by the unmanned aerial vehicle; The feature extraction module is used for inputting the hyperspectral image and the flight attitude data at the corresponding moment into an attitude self-adaptive spectral feature extraction network, and processing the hyperspectral image and the flight attitude data at the corresponding moment through an attitude sensing dynamic correction mechanism embedded in the network to obtain a hyperspectral depth feature map; The feature map construction module is used for constructing a space map structure based on the original LiDAR point cloud to obtain a LiDAR point cloud feature map with three-dimensional structural features; The cross-modal heterogeneous map construction module is used for combining the hyperspectral depth feature map and the LiDAR point cloud feature map to obtain a cross-modal heterogeneous map comprising two types of nodes, wherein the cross-modal heterogeneous map comprises hyperspectral image nodes and LiDAR point cloud nodes; The cross-modal feature fusion module is used for fusing the graph nodes by using a cross-attention mechanism according to the cross-modal iso-graph to obtain a fused cross-modal feature graph; The time sequence alignment processing module is used for performing time sequence alignment processing on the fused cross-modal feature images to obtain aligned and fused cross-modal feature images; And the classification result output module is used for inputting the aligned and fused cross-modal feature images into a classification network to obtain vegetation species classification results.
9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 7 when the computer program is executed.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.

Description

Unmanned aerial vehicle hyperspectral vegetation species classification method and system based on deep learning Technical Field The invention relates to the field of unmanned aerial vehicle hyperspectral image processing, in particular to a deep learning-based unmanned aerial vehicle hyperspectral vegetation species classification method and system. Background Along with the development of unmanned aerial vehicle multi-mode remote sensing technology, fusion of hyperspectral images and laser radar (LiDAR) data has become an important means for vegetation fine classification. Hyperspectral images contain rich spectral information, while LiDAR point clouds can provide accurate three-dimensional structural information. Conventional techniques typically employ a serial approach of "first alignment, then classification". Firstly, carrying out space-time registration on LiDAR point clouds and hyperspectral images through hardware synchronization or complex post-processing, and then inputting registered data into a classification model. However, there are significant limitations to the conventional approach. First, the inherent operational differences of sensors lead to serious spatiotemporal asynchrony problems, and later "hard alignment" tends to introduce errors in complex areas such as vegetation edges. Secondly, spectrum distortion can be caused by unmanned aerial vehicle flight attitude disturbance, and the traditional geometric correction cannot eliminate the 'pseudo variation' coupled with the real spectrum characteristics, so that the model generalization capability is reduced. Disclosure of Invention Based on the above, it is necessary to provide a method, a system, a computer device and a computer readable storage medium for classifying hyperspectral vegetation species of unmanned aerial vehicle based on deep learning, which can fuse hyperspectral images, liDAR point clouds and flight attitude data of the unmanned aerial vehicle, eliminate spectral distortion through attitude adaptive correction, solve the problem of spatiotemporal asynchronism of multisource data by means of cross-modal heterograph cross attention fusion and time sequence alignment, and improve classification precision and generalization capability of vegetation species. In a first aspect, the application provides a deep learning-based unmanned aerial vehicle hyperspectral vegetation species classification method, which comprises the following steps: Acquiring an original hyperspectral image, an original LiDAR point cloud and flight attitude time sequence data which are synchronously acquired by an unmanned aerial vehicle; Inputting the hyperspectral image and the flight attitude data at the corresponding moment into an attitude self-adaptive spectral feature extraction network, and processing the hyperspectral image and the flight attitude data at the corresponding moment through an attitude sensing dynamic correction mechanism embedded in the network to obtain a hyperspectral depth feature map; Constructing a space map structure based on the original LiDAR point cloud to obtain a LiDAR point cloud feature map with three-dimensional structural features; combining the hyperspectral depth feature map with the LiDAR point cloud feature map to obtain a cross-modal heterogeneous map, wherein the cross-modal heterogeneous map comprises hyperspectral image nodes and LiDAR point cloud nodes; According to the cross-modal iso-graph, the cross-attention mechanism is utilized to fuse graph nodes, and a fused cross-modal feature graph is obtained; performing time sequence alignment processing on the fused cross-modal feature map to obtain an aligned and fused cross-modal feature map; and inputting the aligned and fused cross-modal feature graphs into a classification network to obtain vegetation species classification results. In one embodiment, the merging of graph nodes is performed by using a cross-attention mechanism according to a cross-modal iso-graph to obtain a merged cross-modal feature graph, which includes: extracting the characteristics of the hyperspectral image nodes to obtain characteristic vectors; based on LiDAR point cloud nodes, generating a key vector set and a value vector set for the LiDAR point cloud nodes through linear transformation; performing dot product similarity calculation on the feature vector and the key vector set to obtain a feature similarity score; the feature similarity score is combined with the corresponding space distance constraint, and attention weight is obtained through normalization processing; The attention weight and the value vector set are subjected to weighted aggregation, and a corresponding three-dimensional structure context feature vector is generated for the hyperspectral image node; Based on the context feature vector of the three-dimensional structure, the fused cross-modal feature map is obtained according to the spatial position relation of each node. On the basis of the above embodiment