CN-122024049-A - Precise recognition method for homeland landscape

CN122024049ACN 122024049 ACN122024049 ACN 122024049ACN-122024049-A

Abstract

The invention relates to the technical field of image processing and homeland landscape classification, and discloses a precision identification method of homeland landscape, which comprises the steps of obtaining multi-mode remote sensing images and constructing image samples; calculating pixel-level prediction confidence of an image sample, generating an neglect mask, combining the pixel-level prediction confidence and the neglect mask to construct a training data set, constructing a semantic segmentation model based on a MobileNetV backbone network and a DeepLabV & lt3+ & gt decoder, embedding a convolution block attention structure in an inverse residual structure of the backbone network, establishing a low-layer feature transmission path between a shallow feature output end of the backbone network and an input end of the decoder, performing iterative training on the semantic segmentation model by using the training data set, inputting a homeland landscape image to be identified into the trained semantic segmentation model, and outputting a classification result image corresponding to a classification label. The focus labeling method optimizes and enhances data, and enhances the generalization capability, small-scale feature learning capability and classification accuracy of MobileNet models on complex scenes by improving the quality and diversity of training sets.

Inventors

ZHOU ZHIYONG
JIN XINGXING

Assignees

湖州学院

Dates

Publication Date: 20260512
Application Date: 20260123

Claims (10)

1. The precise recognition method for the homeland landscape is characterized by comprising the following steps of: S1, acquiring a multi-mode remote sensing image and constructing an image sample containing natural, water system, farmland and aggregation classification labels, calculating pixel level prediction confidence of the image sample, screening out a fuzzy region by using a set high threshold and a set low threshold, generating an neglect mask, and constructing a training data set by combining the image sample, the classification labels and the neglect mask; S2, constructing a semantic segmentation model based on MobileNetV backbone network and DeepLabV & lt3+ & gt decoder, embedding a convolution block attention structure formed by serially connecting channels and space attention units after a depth separable convolution layer is arranged in an inverse residual error structure of the backbone network, and establishing a low-layer characteristic transmission path between a shallow characteristic output end of the backbone network and an input end of the decoder; S3, carrying out iterative training on the semantic segmentation model by using the training data set, calculating a prediction error by adopting a mixed Loss function formed by Focal Loss and Dice Loss weighting in training, setting the Loss weight of a fuzzy region to be zero by using the neglected mask, and updating model parameters through back propagation; s4, inputting the homeland landscape image to be identified into the trained semantic segmentation model, and outputting a classification result image corresponding to the classification label.
2. The precise recognition method of the homeland landscape of claim 1, wherein the S1 specifically comprises: S101, acquiring an orthographic projection satellite image and a low-altitude unmanned aerial vehicle image of a target area as multi-mode remote sensing image data; s102, performing radiation correction and geometric correction on the multi-mode remote sensing image data, and generating an image sample with uniform size by adopting a sliding window cutting mode; s103, establishing a classification system comprising nature, water system, farmland and aggregation, and manually marking part of the image samples to generate corresponding label drawings; S104, predicting the image sample by using a pre-training model, obtaining a pixel-level prediction probability distribution, and calculating a prediction confidence coefficient of each pixel point, wherein the prediction confidence coefficient The calculation method comprises obtaining pixel points output by a pre-training model Probability distribution vectors belonging to various ground object categories Taking the maximum value in the probability distribution vector as the prediction confidence of the pixel point, and calculating the following formula: ; s105, generating an neglect mask consistent with the image sample size based on the prediction confidence and the set high and low double thresholds; And S106, performing association combination on the image sample, the label graph and the neglected mask to construct a training data set, wherein the training data set comprises an original image set for storing the image sample, a segmentation label set for storing the label graph and a mask set for storing the neglected mask, and the mask set is used for performing Hadamard product operation with the segmentation label set to remove noise gradients when a model is trained to calculate a loss function.
3. The precise recognition method of the homeland landscape of claim 2, wherein S105 specifically comprises: Setting the high confidence threshold as A low confidence threshold of Traversing each pixel point of the image sample : If the prediction confidence is Or (b) Setting the value of the corresponding position in the neglected mask as 1, and marking the value as an effective participation training area; If the prediction confidence meets And setting the value of the corresponding position in the neglected mask to 0, and marking the corresponding position as a fuzzy neglected area.
4. The precise recognition method of the homeland landscape of claim 2, wherein the S2 specifically comprises: s201, adopting MobileNetV network as backbone network for extracting characteristics, selecting DeepLabV & lt3+ & gt network as decoder for characteristic recovery and multi-scale fusion, so as to construct basic network architecture; S202, embedding a convolution block attention structure after a depth separable convolution layer in an inverse residual structure of the MobileNetV backbone network; S203, setting a convolution block attention structure, wherein the convolution block attention structure consists of a channel attention unit and a space attention unit which are connected in series, and an input feature map is processed by the channel attention unit and the space attention unit in sequence; S204, a low-layer feature transmission path is established, shallow-layer high-resolution features of the MobileNetV backbone network are extracted, and are spliced and fused with high-layer semantic features output by the DeepLabV & lt3+ & gt decoder.
5. The precise recognition method of the homeland landscape as claimed in claim 4, wherein, in S203, The specific calculation mode of the channel attention unit comprises the steps of carrying out global average pooling and global maximum pooling on an input feature map respectively to generate two one-dimensional vectors, respectively inputting the two one-dimensional vectors into a shared multi-layer perceptron for processing, adding the processed output element by element and generating a channel attention map through a Sigmoid activation function; The specific calculation mode of the spatial attention unit comprises the steps of carrying out average pooling and maximum pooling on the feature map processed by the channel attention unit respectively in the channel dimension to generate two-dimensional feature maps, splicing the two-dimensional feature maps, carrying out convolution operation through a convolution layer with the convolution kernel size of 7 multiplied by 7, generating a spatial attention map through a Sigmoid activation function, and multiplying the spatial attention map with the feature map processed by the channel attention unit element by element to obtain the final attention enhancement feature.
6. The precise recognition method of a homeland landscape as claimed in claim 4, wherein in S204, the splicing fusion specifically comprises: extracting the inverted residual error module output with the downsampling multiplying power of 4 in the MobileNetV backbone network as low-layer characteristics, and performing dimension reduction processing on the low-layer characteristics by using 1 multiplied by 1 convolution; 4 times up-sampling the high-level semantic features output by the cavity space pyramid pooling module of the DeepLabV3+ decoder; Splicing the low-level features after dimension reduction and the high-level semantic features after up-sampling in the channel dimension, inputting the spliced features into a 3X 3 convolution layer for fusion, and up-sampling to the same resolution as the input image.
7. The precise recognition method of the homeland landscape of claim 4, wherein the S3 specifically comprises: s301, initializing parameters of the semantic segmentation model, setting transfer learning weights and newly added structure weights, wherein the specific mode of the parameter initialization is that MobileNetV model weights pre-trained on an ImageNet data set are loaded to serve as initial parameters of a backbone network of the semantic segmentation model; S302, setting training super-parameters, wherein the super-parameters comprise batch size, iteration rounds, initial learning rate and learning rate adjustment strategies, and the learning rate adjustment strategies are that a cosine annealing strategy is adopted to dynamically adjust the learning rate in the training process; s303, inputting the image samples in the training data set into a model for forward propagation, and outputting a prediction probability map; S304, based on the prediction probability map and the label map, sequentially calculating 、 The original mixed loss value is filtered and calculated by using the neglected mask, and a final effective loss value is obtained; s305, back propagation is carried out based on the final effective loss value, and an automatic differentiation mechanism is utilized to calculate gradients and update model parameters; S306, evaluating the performance of the model by using the verification set after each training round is finished, and when the average cross ratio index of the current round is better than the historical optimal record, storing the current model parameters as a semantic segmentation model after training.
8. The precise recognition method of the homeland landscape as claimed in claim 7, wherein in S304, the specific calculation process of the final effective loss value is: Step one, calculate for each pixel point The formula is: In the formula, For the model's predicted probability for the ith pixel point, In order to be able to take the focus parameter as such, Is a balance factor; Step two, calculating for each pixel point The formula is: In the formula, Is the true label value for the i-th pixel, Is a smoothing coefficient; Step three, will And (3) with Weighted summation to obtain the original mixed loss value : In the formula, And Is a weight coefficient; step four, calculating a final effective loss value by using the neglected mask The formula is: where N is the total number of pixels, The mask value is ignored for the i-th pixel correspondence, To prevent minima of zero denominator when the pixel is in the confidence region When the pixel is located in the blurred region 。
9. A precision recognition apparatus for a homeland landscape based on the precision recognition method for a homeland landscape as set forth in any one of claims 1 to 8, the apparatus comprising: The acquisition module is used for acquiring the multi-mode remote sensing image and constructing an image sample containing natural, water system, farmland and aggregation classification labels, calculating pixel level prediction confidence of the image sample, screening out a fuzzy region by using a set high threshold and a set low threshold and generating an neglect mask, and constructing a training data set by combining the image sample, the classification labels and the neglect mask; The construction module is used for constructing a semantic segmentation model based on MobileNetV < 2 > backbone network and DeepLabV < 3+ > decoder, embedding a convolution block attention structure formed by serially connecting channels and space attention units after a depth separable convolution layer is arranged in an inverse residual error structure of the backbone network, and establishing a low-layer characteristic transmission channel between a shallow characteristic output end of the backbone network and an input end of the decoder; The training module is used for carrying out iterative training on the semantic segmentation model by utilizing the training data set, calculating a prediction error by adopting a mixed Loss function formed by Focal Loss and Dice Loss weighting in training, setting the Loss weight of the fuzzy region to be zero by utilizing the neglected mask, and updating model parameters through back propagation; the recognition module is used for inputting the homeland landscape image to be recognized into the trained semantic segmentation model and outputting a classification result image corresponding to the classification label.
10. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 8 when the computer program is executed.

Description

Precise recognition method for homeland landscape Technical Field The invention relates to the technical field of image processing and homeland landscape classification, in particular to a precision identification method of homeland landscape. Background The precise identification and classification of the homeland landscapes are important bases for regional ecological protection and resource census and dynamic monitoring. Traditional homeland landscape identification mainly relies on means such as manual visual interpretation, conventional remote sensing technology and Geographic Information System (GIS), and the like, and although the classification of ground features can be realized to a certain extent, the problems of low operation efficiency, strong subjectivity caused by excessive dependence on manual experience, insufficient resolution and high cost in the face of large-scale landscape general investigation are generally existed. With the development of artificial intelligence technology, a deep learning algorithm (such as MobileNet, deepLab series) represented by a Convolutional Neural Network (CNN) gradually replaces the traditional machine learning method, and is widely applied to the fields of image classification and object detection, and models such as SegNet, U-Net and the like have also shown application potential in land utilization classification. However, the existing deep learning method still has significant technical limitations when dealing with complex homeland landscapes (such as traditional irrigation areas) with the composite feature of "nature-water conservancy-farmland-fall". Firstly, the generalization capability of a general semantic segmentation model is insufficient, the refined analysis capability of complex landscape units (such as water-farmland transition zones, fallow cultivation fields and barren lands) is lacking, the characteristic loss or boundary blurring of fine linear features (such as irrigation channels) is easy to cause, secondly, the prior art focuses on pixel-level recognition of single features, the graphic logic relationship of space coupling between landscape elements is ignored, the recognition result is frequently subjected to topological fracture or isolated noise points, and in addition, the prior art generally depends on a single sensor data source, multi-time-phase and multi-mode information is difficult to fuse, balance is difficult to be achieved between model weight reduction and recognition stability, and the actual requirements of rapid and accurate monitoring of large-scale homeland landscapes are difficult to meet. Disclosure of Invention The invention provides a precise recognition method of a homeland landscape, which aims at the image recognition requirement of a complex homeland landscape, optimizes and enhances data by a focusing labeling method, and strengthens the generalization capability, small-scale feature learning capability and classification accuracy of a MobileNet model on a complex scene by improving the quality and diversity of a training set. The invention provides a precise recognition method of a homeland landscape, which comprises the following steps: S1, acquiring a multi-mode remote sensing image and constructing an image sample containing natural, water system, farmland and aggregation classification labels, calculating pixel level prediction confidence of the image sample, screening out a fuzzy region by using a set high threshold and a set low threshold, generating an neglect mask, and constructing a training data set by combining the image sample, the classification labels and the neglect mask; S2, constructing a semantic segmentation model based on MobileNetV backbone network and DeepLabV & lt3+ & gt decoder, embedding a convolution block attention structure formed by serially connecting channels and space attention units after a depth separable convolution layer is arranged in an inverse residual error structure of the backbone network, and establishing a low-layer characteristic transmission path between a shallow characteristic output end of the backbone network and an input end of the decoder; S3, carrying out iterative training on the semantic segmentation model by using the training data set, calculating a prediction error by adopting a mixed Loss function formed by Focal Loss and Dice Loss weighting in training, setting the Loss weight of a fuzzy region to be zero by using the neglected mask, and updating model parameters through back propagation; s4, inputting the homeland landscape image to be identified into the trained semantic segmentation model, and outputting a classification result image corresponding to the classification label. Further, the step S1 specifically includes: S101, acquiring an orthographic projection satellite image and a low-altitude unmanned aerial vehicle image of a target area as multi-mode remote sensing image data; s102, performing radiation correction and geometric correction on