CN-121999376-A - Deep learning-based forestry complex scene element remote sensing intelligent interpretation method

CN121999376ACN 121999376 ACN121999376 ACN 121999376ACN-121999376-A

Abstract

The invention provides a deep learning-based forestry complex scene element remote sensing intelligent interpretation method, which belongs to the technical field of forestry remote sensing monitoring and comprises an input end, wherein the input end is output to a backbone structure OOP, the backbone structure OOP is output to a semi-supervised double-branch decoding structure, the semi-supervised double-branch decoding structure is output to an output end, the OOP backbone structure of a local parallel Transformer comprises a PTB module, a Transformer module and a PRM module, a semi-supervised double-branch decoding structure network is designed, a decoding part consists of three parts, two branches without sharing parameters, namely a Hard branch and a Soft branch, the two branches structurally adopt UNet++ decoding structures, different Loss functions are respectively used, the Hard_loss ensures the positive inspection rate, the soft_loss ensures the recall rate, and the Fuse part is the fusion of the two branches, so that the positive inspection and the full inspection can be balanced.

Inventors

Xiao qianhui
CHEN JINPENG
Xiao Zhouhong
CHEN SHENGLAN
XU LEI
ZENG MINGYU
LIU YU
LIU WEI
HE PENG
LIU ZIWEI
HUANG XIN
YU YANG

Assignees

国家林业和草原局中南调查规划院

Dates

Publication Date: 20260508
Application Date: 20251022

Claims (4)

1. The remote sensing intelligent interpretation method for the forestry complex scene elements based on deep learning is characterized by comprising an input end, wherein the input end is output to a backbone structure OOP, the backbone structure OOP is output to a semi-supervised double-branch decoding structure, and the semi-supervised double-branch decoding structure is output to an output end; 1) Step of designing backbone structure OOP The thought of the local parallel transducer is adopted, and a single-stage multi-scale parallel attention backbone network OOP, namely an OOP backbone structure of the local parallel transducer is provided, wherein the OOP backbone structure comprises a PTB module, a transducer module and a PRM module; wherein the PTB module is a parallel attention module, and can be expressed as follows: Alpha and beta respectively represent adjustable super parameters in the model, default to 0.9 and 0.1, and m is the number of scale stages; the transducer module employs a window attention module in Swin, and the transducer can be expressed as follows: Wherein the method comprises the steps of Query, key, value respectively; The PRM module is a pyramid convolution module, and the formula can be expressed as: [x 1 ,x 2 ...x n ]=PRM(x) 2) Step of designing semi-supervised dual branch decoding architecture A semi-supervised double-branch decoding structure network is designed, a decoding part consists of three parts, namely a Hard branch and a Soft branch, which do not share parameters, the two branches structurally adopt a decoding structure of UNet++ respectively and use different Loss functions, wherein the hard_loss ensures the positive inspection rate, the soft_loss ensures the recall rate, and the Fuse part is the fusion of the two branches and can balance the positive inspection and the recall.
2. The method of claim 1, wherein the Soft Branch principle For X ε R n ,Y∈R m , the mapping F is set to be X→Y, F is the network model, because the model is constructed in accordance with the continuous definition, namely: For the following The positive real number epsilon >0, Positive real number delta >0, st If c-delta < x < c + delta, there are: F(c)-ε<F(x)<F(c)+ε It may be equivalent to |x 1 -x 2 | < a (a→0) having |f (x 1 )-F(x 2 ) | < r (r→0), if a miss-label or false-label occurs, it means that x 1 and x 2 are very close, i.e. the similarity of the two images is very high, but the labels y 1 and y 2 are inconsistent, the following contradiction will occur in the process of loss function optimization: Loos(x 1 )＝L(F(x 1 ),y 1 ) Loos(x 2 )＝L(F(x 2 ),y 2 ) according to the continuity of F (E is error), contradictory to the above result, where L is a general loss function such as cross entropy, L1, etc.; If there is a Loss function L that can be compatible with two different labels at the same time, the above contradiction will be resolved, a method is proposed to construct Soft-Loss, and the following losses are constructed: Wherein T is true value, P is predicted value, SL 1 ensures the full search of the image spots, the contradiction can be well relieved under the conditions of missed marks and false marks, SL 2 is Multi-window loss, MWDL (Multi-WindowDice Loss) loss is designed for the missed extraction of small targets in the dice, the loss is expressed by the dice-loss m in the above formula, the extraction of the very small targets on the remote sensing image is solved, and the optimization difficulty of SL 1 in the optimization process is far less than SL 2 , so the image spots are expanded and contracted from beginning to end.
3. The method of claim 1, wherein the Hard branching principle The Hard branch is responsible for positioning to improve the positive detection rate of a model, the Hard-Loss consists of MWDL and CE - Loss, the MWDL can relieve imbalance of positive and negative samples in samples and miss-extraction of small targets, but partial miss-extraction can be brought, the partial miss-extraction is easier to occur in remote sensing images, therefore, the MWDL alone cannot obtain a better result, CE - Loss combination is needed, and the Hard-Loss can be expressed by the following formula:
4. The method of claim 1, wherein the Fuse branch principle is that a Soft-Loss branch and a Hard-Loss branch are used in a fused mode, a target positioned in a Hard branch is inquired on the Soft branch, and the two branches do not share parameters, so that better forward inspection is guaranteed, meanwhile, connectivity of a pattern spot is guaranteed, and recall ratio is guaranteed for inquiry of a changed pattern spot.

Description

Deep learning-based forestry complex scene element remote sensing intelligent interpretation method Technical Field The invention belongs to the technical field of forestry remote sensing monitoring, and particularly relates to a forestry complex scene element remote sensing intelligent interpretation method based on deep learning. Background Forestry resource change detection is the most typical complex scene application in forestry remote sensing applications. Currently, deep learning-based change detection models can be divided into two main categories, twin models (e.g., CHANGENET, DASNET, etc.) and semantic segmentation models (deeplab, unet, swin, etc.). In the whole, the detection precision of the twin model is slightly higher than that of the semantic segmentation model, but the detection efficiency is far lower than that of the semantic segmentation model, so that the semantic segmentation model is often preferentially selected in the actual production task of mass data. Due to the complexity and the specificity of the change detection task, the problems of extremely unbalanced samples, missed marks, seasonal pseudo-changes of ground features, image color cast, position offset and the like exist, and the problems that 1) training is difficult to converge, 2) pattern spots are not communicated, 3) pseudo-changes are more, and all the normal inspection is difficult to balance cannot be avoided, and 4) the normal inspection, full inspection and pattern spot connectivity of the model cannot be considered by the current main stream model. The method solves three core problems of remote sensing interpretation under a forestry complex scene, namely, 1, high preparation cost and limited quality of an artificial sample, 2, insufficient intelligent level caused by the reduction of the precision of a complex scene model, and 3, low intelligent engineering level of remote sensing, and difficulty in supporting the requirement of business production. Disclosure of Invention The invention aims to provide a remote sensing intelligent interpretation method for forestry complex scene elements based on deep learning, so as to solve the problems in the prior art. The invention is especially suitable for identifying and detecting typical elements such as forest land change, pine wood nematode disease epidemic wood, rural green land and the like, and solves the problems of strong subjectivity, low efficiency, insufficient intellectualization and the like of the traditional visual interpretation. According to the invention, the backbone structure OOP is designed, the advantages of CNN and a transducer are combined, and the local parallel module is designed, so that semantic information can be learned in a cross-level manner, and the convergence of the network in the early stage is accelerated. The fusion of the two results in simultaneous consideration of local characteristics and long-distance dependence of the model, and the multi-class segmentation precision of the remote sensing image exceeds the current mainstream backbone, such as swin, vitae, mit, efficientnet, hornet. The technical scheme of the invention provides a forestry complex scene element remote sensing intelligent interpretation method based on deep learning, which comprises an input end, wherein the input end is output to a backbone structure OOP, the backbone structure OOP is output to a semi-supervised double-branch decoding structure, and the semi-supervised double-branch decoding structure is output to an output end; 1) Step of designing backbone structure OOP The thought of the local parallel transducer is adopted, and a single-stage multi-scale parallel attention backbone network OOP, namely an OOP backbone structure of the local parallel transducer is provided, wherein the OOP backbone structure comprises a PTB module, a transducer module and a PRM module; wherein the PTB module is a parallel attention module, and can be expressed as follows: And Representing adjustable hyper-parameters within the model, default to 0.9 and 0.1,Is the number of scale stages; the transducer module employs a window attention module in Swin, and the transducer can be expressed as follows: Wherein the method comprises the steps of Query, key, value respectively; The PRM module is a pyramid convolution module, and the formula can be expressed as: 2) Step of designing semi-supervised dual branch decoding architecture A semi-supervised double-branch decoding structure network is designed, a decoding part consists of three parts, namely a Hard branch and a Soft branch, which do not share parameters, the two branches structurally adopt a decoding structure of UNet++ respectively and use different Loss functions, wherein the hard_loss ensures the positive inspection rate, the soft_loss ensures the recall rate, and the Fuse part is the fusion of the two branches and can balance the positive inspection and the recall. The Soft branching principle is that For the followingProv