CN-121999384-A - Improved TransUNet land coverage classification method based on feature fusion and attention enhancement

CN121999384ACN 121999384 ACN121999384 ACN 121999384ACN-121999384-A

Abstract

The invention discloses an improved TransUNet land cover classification method based on feature fusion and attention enhancement, and belongs to the technical field of evaluation of remote sensing image land cover semantic segmentation. The method comprises the steps of model dataset processing and manufacturing, model improvement TransUNet and training, model data and ablation experiments on modules, model evaluation based on overall pixel precision, average intersection ratio and average F1 score, model precision comparison by high-resolution image dataset transfer learning training, MFE-TransUNet effectiveness and reliability evaluation and land coverage classification mapping. The method is based on the multi-feature compression excitation fusion module to realize effective fusion of different features, and the multi-feature cross attention fusion module is used for realizing information interaction and mutual learning among the features, so that a finer land coverage classification result is finally obtained.

Inventors

SHEN WENJUAN
LIU ZIYAN
CHEN XIUMEI
CAO JINHONG
CHANG GUANGYI
WANG JINHUI
JIN YUXI

Assignees

南京林业大学

Dates

Publication Date: 20260508
Application Date: 20260126

Claims (8)

1. The improved TransUNet land cover classification method based on feature fusion and attention enhancement is characterized by comprising the following steps of: Step 1, model dataset processing and manufacturing, namely collecting high-resolution remote sensing data, preprocessing the remote sensing data, calculating normalized vegetation indexes and normalized water indexes, and manufacturing semantic tags by using a maximum likelihood method and visual interpretation; Step 2, improving TransUNet models and training; Step 3, performing model data and an ablation experiment on a module, and evaluating a model based on overall pixel precision, an average intersection ratio and an average F1 score; Step 4, training the accuracy of the comparison model by utilizing the high-resolution image data set transfer learning; And 5, evaluating the effectiveness and reliability of the MFE-TransUNet and carrying out land coverage classification mapping.
2. The improved TransUNet land cover classification method based on feature fusion and attention enhancement as set forth in claim 1, wherein in step 1, the specific implementation process is as follows: Step 1.1, data acquisition and preprocessing of the acquired data, namely acquiring high-score data of the past year of a research area from state-run assets source satellite data and an application center, calculating a normalized vegetation index and a normalized water index of each scene remote sensing image of the research area, and performing mosaic processing on a plurality of scene high-score remote sensing images of the research area; Step 1.2, classifying the preprocessed data by using a maximum likelihood method; step 1.3, performing visual interpretation and correction error classification based on the classification result, and further manufacturing a model semantic label by using the corrected result; and 1.4, dividing and storing the remote sensing image, the semantic tag, the normalized vegetation index and the normalized water body index, and dividing the manufactured all data into data sets.
3. The improved TransUNet land cover classification method based on feature fusion and attention enhancement as set forth in claim 1, wherein in step 2, the specific implementation process is as follows: Step 2.1, expanding an original ResNet framework from a single channel to three branches, and respectively extracting original RGB image characteristics, NDVI vegetation characteristics and NDWI water body characteristics; Step 2.2, designing a three-branch transducer encoder, and respectively designing independent Query, key, value projection layers and multi-head self-attention calculation for RGB, NDVI, NDWI features; Step 2.3, realizing bidirectional information exchange among three features through a multi-feature cross attention fusion module; And 2.4, constructing a hierarchical fusion strategy, controlling fusion opportunities of different depth hierarchies, and realizing effective multi-feature information integration while maintaining feature independence.
4. The improved TransUNet land cover classification method based on feature fusion and attention enhancement as set forth in claim 3, wherein in step 2.1, the specific implementation contents are: given a feature map of three branches at each spatial scale 、 And The multi-feature compression excitation fusion module firstly carries out channel-level recalibration through global average pooling and a full-connection layer, and specifically comprises the following steps: ; ; ; Wherein, the Representing the characteristics of the RGB feature map after MSEF module channel level recalibration; Representing the characteristics of the NDVI characteristic diagram after MSEF module channel level recalibration; The NDWI characteristic diagram is subjected to MSEF module channel level recalibration; representing Sigmoid functions, GAP represents global average pooling, Representing element-by-element multiplication, W 1 represents the weight matrix of the first fully connected layer, and W 2 represents the weight matrix of the second fully connected layer; The recalibrated features are fused through self-adaptive weights, and specifically: ; ; Wherein, the Representing the characteristics of three characteristic flows after self-adaptive weighted fusion; 、 And Representing the parameters of the weight that can be learned, Representing the sum of three learnable weight parameters.
5. The improved TransUNet land cover classification method based on feature fusion and attention enhancement as set forth in claim 3, wherein in step 2.2, the specific implementation steps are as follows: the self-attention enhancement is that the self-attention layer independently extracts global context information of each feature flow, and the specific calculation process is as follows: ; ; ; ; ; ; Wherein, the Representing the middle output of RGB feature stream after multi-head self-attention processing in the nth layer self-attention layer; representing the intermediate output of the NDVI characteristic stream after multi-head self-attention processing in the n-th layer self-attention layer; the NDWI feature stream is represented in the n-th layer self-attention layer and is output in the middle after multi-head self-attention processing; representing the final output of the RGB feature stream after being completely processed by the n-th self-attention layer; representing the final output of the NDVI characteristic stream after being completely processed by the n-th self-attention layer; representing the final output of the NDWI feature stream after being completely processed by the n-th self-attention layer; Representing a feature sequence of RGB features after being processed by an n-1 layer, and using the feature sequence as the input of an n-layer self-attention layer; representing a feature sequence of the NDVI features after being processed by the n-1 th layer, and using the feature sequence as the input of the n-th layer self-attention layer; the method is characterized by comprising the steps of representing a feature sequence of NDWI features processed by an n-1 layer and used as input of an n-layer self-attention layer, LN representing layer normalization, MSA representing multi-head self-attention and MLP representing a multi-layer perceptron.
6. The improved TransUNet land cover classification method based on feature fusion and attention enhancement as set forth in claim 3, wherein in step 2.3, the specific implementation steps are as follows: The multi-feature mutual enhancement is that a multi-feature cross attention fusion module realizes RGB, NDVI, NDWI bidirectional information exchange among three feature streams, and the output of the multi-feature cross attention fusion is expressed as: ; Order the , , Then the fusion output can be written as: ; ; ; Wherein, the The RGB feature stream is output after cross attention processing in an n-layer MCAF module; The NDVI characteristic flow is output after the cross attention processing in an n-layer MCAF module; the NDWI characteristic flow is output after cross attention processing in an n-layer MCAF module; MCAF output representing RGB feature streams And input of A result after residual connection is carried out; MCAF output representing NDVI feature stream And input of A result after residual connection is carried out; MCAF output representing NDWI feature flow And input of A result after residual connection is carried out; representing the final output of the RGB feature stream after the complete processing of the n-th layer MCAF; Representing the final output of the NDVI characteristic stream after the n-th layer MCAF complete processing; Representing the final output of the NDWI feature stream after the n-th layer MCAF complete processing; Representing the final output of the NDVI characteristic stream after the n-th layer MCAF complete processing; the final output of the NDWI feature stream after the n-th layer MCAF complete processing is represented; Representing the output of RGB features after the n-1 layer cross attention processing as the input of the n layer MCAF; The output of the NDVI characteristic after the n-1 layer cross attention processing is used as the input of the n layer MCAF; the characteristic sequence of the NDWI characteristic after being processed by the n-1 layer is used as the input of the n-layer MCAF, and LN represents layer normalization.
7. The improved TransUNet land cover classification method based on feature fusion and attention enhancement as set forth in claim 3, wherein the internal structure process of the multi-feature cross attention fusion module is as follows: First given an input 、 And The method comprises the steps of generating query, key and value matrixes by linear projection of each feature stream, then calculating SA information of three features and CA information among the features simultaneously, and realizing mutual information exchange among the feature streams by a cross attention mechanism.
8. The improved TransUNet land cover classification method based on feature fusion and attention enhancement as set forth in claim 3, wherein in step 2.4, the specific implementation steps are as follows: Fusion feature enhancement, specifically: ; Wherein, the The channel dimension concatenation is represented as such, Representing the final output of RGB feature stream after N layer transform processing, MLP will dimension from Projected to ; Representing a fused feature sequence; representing the final output of the NDVI characteristic stream after being processed by an N layer of convertor; Representing the final output of the NDWI feature stream after being processed by the N layer of convertors; representing a multi-layer perceptron.

Description

Improved TransUNet land coverage classification method based on feature fusion and attention enhancement Technical Field The invention belongs to the technical field of evaluation of remote sensing image land coverage semantic segmentation, and particularly relates to an improved TransUNet land coverage classification method based on feature fusion and attention enhancement. Background The deep learning semantic segmentation model is used for carrying out land coverage drawing and dynamic monitoring on the high-resolution remote sensing image, and has important significance on sustainable management of urban forests and climate benefits. Optimizing the land cover classification method to obtain more accurate land cover data has important social significance and use value for sustainable development of the area. The high-resolution remote sensing image contains more detailed land coverage information, and a land coverage classification method based on the high-resolution remote sensing image is also widely studied by a plurality of students. But at the same time, the high-resolution image has large data size, is easily interfered by homonymous foreign matters and homonymous foreign matters, and has higher processing difficulty. For the high-resolution remote sensing image, the same type of ground feature may show different spectrum curves under the influence of factors such as topography, imaging conditions and the like, and different ground feature types may also show the same spectrum curves. For example, the forest spectrum is different on the sunny side and the dorsal side, which is the phenomenon of alien spectrum caused by the topography factors. This problem often results in misclassification of the feature type, which is detrimental to sustainable management of the area. Therefore, it is necessary to optimize the existing land cover classification method and improve the classification accuracy. With the development of remote sensing technology, land cover classification methods have also changed, including visual interpretation, supervised classification, and unsupervised classification. The rapid development of big data and artificial intelligence technology has also advanced the advanced technology of deep learning classification in land coverage classification. Visual interpretation is a method for analyzing images by utilizing features of ground features of remote sensing images, and the method for classifying the image information of land coverage is completed by interpreting the features in the images. However, the method depends on the expertise reserve of the interpreter, the precision of the classification result often varies from person to person, the classification efficiency is relatively low, and the method is difficult to face the situation of huge data volume or relatively complex data volume. The pixel-based method is to use each pixel in the remote sensing image as a basic unit for classification, judge the pixel category according to the reflectivity or the radiation value of different wave bands by comparing the spectrum information, and common methods include an end member decomposition method, a pixel dichotomy method, a spectrum information classification method and the like. The method has low interpretation efficiency, is mainly suitable for medium-resolution and low-resolution remote sensing images, and is difficult to process the spectrum characteristics of complex ground objects and the semantic information of scenes in high-resolution remote sensing images. The basic unit of the object-oriented classification method is not a pixel, but a plurality of polygonal objects, and the method is to calculate the distribution characteristics of the objects by a mathematical method on the basis of object segmentation to realize the extraction of the ground object category information. The object-oriented method takes polygonal objects instead of single pixels as classification units, can comprehensively utilize more visual characteristics such as geometric forms, texture structures and the like of the objects, and can effectively avoid the problem of common salt and pepper noise based on the pixel method. However, there are still many problems that exist if only an object-oriented method is relied on, for example, repeated experiments are required for selecting parameters for segmentation, the segmentation of objects with different scales is difficult, and the data processing capability of the method is still relatively weak. Common machine learning methods include methods such as a support vector machine, a random forest, a decision tree and the like, so that the labor cost is reduced, and the processing speed is improved. However, how to fully utilize the ground object information contained in the remote sensing image to further obtain more refined ground coverage data is still a problem to be solved. Compared with the traditional classification method, the land coverage classifica