CN-121999397-A - Multi-scale blade instance segmentation method and system for complex forest stand

CN121999397ACN 121999397 ACN121999397 ACN 121999397ACN-121999397-A

Abstract

The invention belongs to the technical field of image processing, and discloses a multi-scale blade example segmentation method and system for a complex forest stand. The method comprises the steps of constructing a blade segmentation data set of a forestry scene, generating an initial multi-scale feature pyramid through multi-scale feature extraction, recalibrating features of the feature pyramid along a channel dimension by using an extrusion and excitation attention mechanism to obtain a feature pyramid with enhanced channel, carrying out multi-scale feature fusion by using BiFPN to generate the multi-scale feature pyramid, carrying out space dimension recalibration by using a space attention sub-module to obtain a final refined feature pyramid, and adopting a cascade region convolution neural network as a wharf to obtain a positioning and pixel level classification result of each detection blade through iterative optimization. The invention realizes the self-adaptive fusion of the multi-scale blade characteristics and the directional suppression of complex background noise, thereby realizing the example distinction of pixel-level precision in dense and severely-shielded scenes.

Inventors

XU SHENG
Ge Tianxiao
CHEN ZHULIN
Bi Changwei
Dai Size
ZHANG KAI

Assignees

南京林业大学

Dates

Publication Date: 20260508
Application Date: 20260305

Claims (7)

1. A multi-scale blade example segmentation method for a complex forest stand is characterized by comprising the following steps: acquiring image data of a target forest by using an unmanned aerial vehicle remote sensing technology, and constructing a blade segmentation dataset of a forestry scene; based on the blade segmentation dataset, performing multi-scale feature extraction by utilizing a modified Swin transform backbone network to generate an initial multi-scale feature pyramid which is suitable for blade multi-scale feature analysis and consists of four layers of feature graphs ; Based on initial multi-scale feature pyramid Recalibrating its features along the channel dimension using extrusion and excitation attention mechanisms to obtain a channel enhanced feature pyramid ; Feature pyramid based on channel enhancement Multi-scale feature fusion is performed by utilizing an improved bidirectional weighted feature pyramid network BiFPN to generate a group of enhanced multi-scale feature pyramids fused with full-scale information ; Multi-scale feature pyramid output by BiFPN using a spatial attention sub-module of a convolution block attention module Each scale feature in the model is subjected to space dimension recalibration to obtain a final refined feature pyramid ; Based on the refined feature pyramid And (3) adopting a cascade area convolutional neural network as a wharf, and obtaining the positioning and pixel level dividing results of each detection blade through iterative optimization.
2. The method according to claim 1, characterized in that: The improved Swin transducer backbone network adopts a window-based self-attention mechanism, and comprises a window-based multi-head self-attention W-MSA, a shift window-based multi-head self-attention SW-MSA and a downsampling layer which is inserted between the window-based multi-head self-attention W-MSA and the shift window-based multi-head self-attention SW-MSA; wherein, the switching between the W-MSA and the SW-MSA is realized by a mask matrix.
3. The method according to claim 1, characterized in that: recalibrating features of the channel along its channel dimension using extrusion and excitation attention mechanisms to yield a channel-enhanced feature pyramid Comprising the following steps: For an initial multi-scale feature pyramid In (a) feature map Compressing spatial information of each channel by using global average pooling operation to generate a channel descriptor; using a two-layer fully connected network with a bottleneck structure to enable the channel descriptors to learn nonlinear interaction among channels, so as to obtain a channel weight vector; multiplying the channel weight vector with the original feature map channel by channel to obtain a calibrated feature map Recalibrating each layer of features in the model to obtain a channel-enhanced feature pyramid 。
4. The method according to claim 1, characterized in that: The improved bidirectional weighted feature pyramid network BiFPN realizes the fusion of features with different resolutions by constructing a bidirectional information propagation diagram and utilizing bidirectional trans-scale information flow and learnable self-adaptive weights, thereby capturing the textures of fine veins and the shapes of macroscopic leaves at the same time, and the updating process of the bidirectional information propagation diagram is divided into a top-down path and a bottom-up path; Wherein, in a top-down path, the features Is to generate a higher resolution feature that merges the current layer input and the upper layer, and to output the feature in the bottom-up path Further merging the middle feature of the top-down path with the lower resolution feature of the next layer; Fusion function The method comprises the following steps: ; BiFPN implement iterations through cross-layer connections.
5. The method according to claim 1, characterized in that: Multi-scale feature pyramid output by BiFPN using a spatial attention sub-module of a convolution block attention module Recalibrating the spatial dimensions for each of the scale features of (a) comprises: First, for a multi-scale feature pyramid Each scale feature in the model (C) is subjected to global maximum pooling and global average pooling along the channel dimension to obtain two space description diagrams And Then, after the two are spliced along the channel dimension, information integration is carried out through a standard convolution layer, and a two-dimensional space importance diagram is generated Finally multiplying the spatial importance map with the input feature map element by element to obtain a spatially refined feature map 。
6. The method according to claim 1, characterized in that: The method for obtaining the positioning and pixel fraction analysis results of each detection blade by using the cascade region convolution neural network as a wharf solving device comprises the following steps: for refined feature pyramids Using regional proposal network RPN to refine feature pyramids Generating initial proposal frame set Subsequently, the first Output proposal frame of previous stage before cascade stage For input, finer detection and segmentation is performed: , in the formula, A learnable detection head function representing the mth stage, the parameters being , Extracting a fixed-size feature area corresponding to each proposal frame from the feature pyramid; Function of Internal parallel execution of three subtasks, including bounding box regression, object classification, and mask prediction, each using a step-up IoU threshold The boundary box of the mask prediction output is combined with the mask which is subjected to bilinear upsampling to the original image resolution, namely the positioning and segmentation result of each detected blade instance.
7. A multi-scale blade instance segmentation system for a complex forest stand, the system applying the method of any one of claims 1-6, and comprising an image data acquisition unit, a multi-scale feature extraction unit, a calibration unit, a fusion unit, a recalibration unit and an optimization unit; the image data acquisition unit is used for acquiring image data of a target forest by using an unmanned aerial vehicle remote sensing technology and constructing a blade segmentation data set of a forestry scene; The multi-scale feature extraction unit is used for extracting multi-scale features by utilizing a modified Swin transform backbone network based on the blade segmentation dataset to generate an initial multi-scale feature pyramid which is suitable for blade multi-scale feature analysis and consists of four layers of feature graphs ; The calibration unit is used for being based on an initial multi-scale feature pyramid Recalibrating its features along the channel dimension using extrusion and excitation attention mechanisms to obtain a channel enhanced feature pyramid ; The fusion unit is used for characteristic pyramid based on channel enhancement Multi-scale feature fusion is performed by utilizing an improved bidirectional weighted feature pyramid network BiFPN to generate a group of enhanced multi-scale feature pyramids fused with full-scale information ; The recalibration unit is used for outputting a multi-scale characteristic pyramid to BiFPN by using a space attention submodule of the convolution block attention module Each scale feature in the model is subjected to space dimension recalibration to obtain a final refined feature pyramid ; The optimizing unit is used for being based on the refined feature pyramid And (3) adopting a cascade area convolutional neural network as a wharf, and obtaining the positioning and pixel level dividing results of each detection blade through iterative optimization.

Description

Multi-scale blade instance segmentation method and system for complex forest stand Technical Field The invention belongs to the technical field of image processing, and particularly relates to a multi-scale blade example segmentation method and system for a complex forest stand. Background Forest is the core of land ecological system, and its health condition and productivity are directly related to carbon sink capacity, biodiversity protection and timber resource safety. The method realizes the accurate management and sustainable operation of forest resources, and is characterized in that the growth condition of the forest is monitored and estimated efficiently and accurately. Leaves are used as main organs for photosynthesis, transpiration and response to environmental stress of trees, and the quantity, morphology, color and physiological state of the leaves are key phenotypic traits reflecting the health condition, productivity and environmental adaptability of individuals and groups. Therefore, the high-throughput and automatic phenotype analysis of the tree leaves is realized, and the method has important significance for accelerating the breeding of excellent tree species, implementing precise irrigation and early warning of biotic and abiotic stress. The traditional blade property measurement mainly depends on manual sampling and laboratory analysis, and the method has low efficiency, strong destructiveness and large subjective error, and is difficult to realize space-time continuous monitoring on the canopy scale. In recent years, the popularization of unmanned aerial vehicle remote sensing and ground proximity sensing technologies makes it possible to rapidly acquire high-resolution stand image data. However, how to automatically and accurately segment each blade instance from these complex natural scene images and further extract the quantitative trait remains a central challenge in the field. The natural forest stand environment is mainly due to inherent complexity of a natural forest stand environment, namely structural complexity that leaves are seriously overlapped and blocked, boundaries are blurred, morphological variability that the shapes, sizes and orientations of the leaves are highly heterogeneous in individuals and species, and environment interference that illumination changes, shadows and complex backgrounds (soil, branches and weeds) cause strong interference on image characteristics. The existing general image segmentation models (such as Mask R-CNN and YOLO series) and models trained based on standard data sets (such as COCO) often have the problems of example loss, boundary error segmentation, incapability of distinguishing adhered blades and the like when facing the specific combat of the forestry, and have insufficient precision and robustness. Although deep learning has revolutionized the computer vision field, especially the Transformer architecture has shown significant advantages in global context modeling, there is a significant gap in applying it directly to forestry blade analysis, firstly, the lack of domain-specific data, the lack of large-scale, finely labeled natural scene stand blade instance segmentation data, for most indoor environments or simple backgrounds, secondly, the lack of targeted optimization of model architecture, the failure of existing models to effectively fuse the special modules for handling the core problems of forestry blade multiscale, high occlusion, background clutter, etc., and finally, the failure of "segmentation" to "character" gap, most research steps in improving segmentation accuracy, failure to construct an automated analysis pipeline from pixel level classification results directly to available agronomic parameters (such as stress index). The development of a complete technical scheme from special data construction, customized model design and practical phenotype analysis has become an urgent need for promoting forestry phenotype histology to go from research to application and realizing intelligent forestry accurate decision. The present study addresses this need by proposing an innovative, systematic solution. Disclosure of Invention The invention aims to solve the problems of insufficient precision, limited practicability, flow splitting and the like in the blade phenotype analysis in the complex natural forest stand environment in the prior art, and provides a multi-scale blade example segmentation method and system for the complex forest stand. In order to achieve the above purpose, the present invention provides the following technical solutions: a multi-scale blade example segmentation method facing complex forest stand comprises the following steps: acquiring image data of a target forest by using an unmanned aerial vehicle remote sensing technology, and constructing a blade segmentation dataset of a forestry scene; based on the blade segmentation dataset, performing multi-scale feature extraction by utilizing a modified Swin