CN-122023784-A - Corn disease detection method based on light-weight Transformer

CN122023784ACN 122023784 ACN122023784 ACN 122023784ACN-122023784-A

Abstract

The invention discloses a corn disease detection method based on a lightweight Transformer, which relates to the technical field of intelligent agriculture and plant protection and comprises the steps of inputting corn images into a feature extraction module, and generating multi-level feature representation through multi-level feature extraction; the method comprises the steps of carrying out multi-scale feature fusion through a feature pyramid network to generate a fused multi-scale feature, carrying out attention processing on a feature image through a dynamic detection head to generate classification scores of all resolution levels and a boundary frame prediction result, calculating training loss through a weighted boundary frame loss function, applying a weight coefficient higher than standard weight to a boundary frame with a area lower than a preset threshold value, updating model parameters through back propagation to generate a trained detection model, and carrying out non-maximum suppression processing on the output boundary frame prediction result to generate a detection position and a classification result of corn diseases. Realizes high-precision and rapid detection of corn diseases.

Inventors

CHEN MEIMEI
LI YUNFAN
ZHAO JIAMAN
YUE MENGYAO
TAO LUJING

Assignees

吉林大学

Dates

Publication Date: 20260512
Application Date: 20260414

Claims (10)

1. A corn disease detection method based on a lightweight converter is characterized by comprising the following steps: S1, inputting a corn image into a feature extraction module taking SwinTransformer as a main network, and generating a multi-level feature representation through multi-level feature extraction; s2, based on multi-level feature representation, carrying out multi-scale feature fusion through a feature pyramid network to generate fused multi-scale features; s3, performing attention processing on the feature images of the resolution levels through a dynamic detection head based on the fused multi-scale features to generate classification scores and boundary frame prediction results of the resolution levels; S4, calculating training loss by adopting a weighted boundary frame loss function based on the classification score and the boundary frame prediction result, applying a weight coefficient higher than a standard weight to the boundary frame with the area lower than a preset threshold value, updating model parameters through back propagation, and generating a trained detection model; S5, inputting the corn image to be detected into the trained detection model to perform forward propagation, and performing non-maximum suppression processing on the output boundary frame prediction result to generate a detection position and a category result of the corn disease.
2. The method for detecting corn diseases based on lightweight transformers according to claim 1, wherein step S1 comprises: S11, segmenting an input corn image into non-overlapping image blocks, flattening each image block, and mapping the flattened image block to an initial channel number through a linear projection layer to generate an initial characteristic representation; and S12, inputting the initial feature representation into a plurality of stacked SwinTransformerBlock, extracting the features by using common window attention and window shifting attention alternately by each SwinTransformerBlock, carrying out step-by-step downsampling by PATCHMERGING layers, constructing a multi-scale feature level with the channel number doubled step by step and the spatial resolution halved step by step, and generating a multi-level feature representation.
3. The method for detecting corn diseases based on lightweight transformers according to claim 1, wherein step S2 comprises: S21, receiving a multi-level feature representation, wherein the multi-level feature representation comprises a plurality of feature levels with the channel number doubled step by step and the spatial resolution halved step by step, channel unification and scale alignment are carried out on features of all levels through a neck module of a feature pyramid network, and a multi-scale feature list with the unified channel number is generated, wherein the multi-scale feature list comprises high-resolution features from a shallow network and low-resolution features from a deep network; S22, in a top-down path of the feature pyramid network, performing content perception upsampling operation on the high-resolution feature map, generating a self-adaptive recombination kernel through kernel prediction, and performing weighted recombination on the high-resolution feature map based on the self-adaptive recombination kernel to generate an upsampled high-resolution feature map; and S23, carrying out feature fusion on the up-sampled high-resolution feature map and the low-resolution feature map of the corresponding level, and generating a fused multi-scale feature through gradual fusion.
4. The lightweight fransformer-based corn disease detection method of claim 2, wherein the PATCHMERGING layers in step S12 further comprise a multi-scale feature enhancement process comprising: PATCHMERGING layers carry out 2×2 window combination and channel splicing on input features to generate intermediate features with the number of channels being 4C and the spatial resolution being H/2×W/2, and the intermediate features are input into a multi-scale feature enhancement module, wherein C represents the number of channels, H represents the height of a feature map, and W represents the width of the feature map; S122, performing feature extraction on intermediate feature tensors by adopting three paths of parallel convolution with convolution kernel sizes of 3 multiplied by 3,5 multiplied by 5 and 7 multiplied by 7 by the multi-scale feature enhancement module, and respectively capturing feature information in different receptive field ranges to generate three paths of feature graphs; s123, channel fusion is carried out on the three-path feature graphs through 1X 1 convolution, and fusion feature representation is generated; S124, inputting the fusion characteristic representation into an extrusion excitation channel attention module, enhancing the characteristic response of the small target disease through channel-level weighting to generate an enhanced characteristic representation, reducing the channel number of the enhanced characteristic representation from 4C to 2C through a dimension-reduction mapping layer, generating a part of the enhanced multi-level characteristic representation, and transmitting the part of the enhanced multi-level characteristic representation into a follow-up SwinTransformerBlock.
5. The lightweight Transformer-based corn disease detection method of claim 4, wherein SwinTransformerBlock in step S12 further comprises a cross-scale self-attention process after the attention calculation, in particular applying the cross-scale self-attention process to every other Block in the stacked SwinTransformerBlock sequence, the cross-scale self-attention process comprising: S125, receiving output characteristics of window attention W-MSA or window shift attention SW-MSA calculation in the current SwinTransformerBlock, and inputting the output characteristics into a cross-scale self-attention module; S126, calculating attention characteristics of two different window sizes in parallel by a cross-scale self-attention module, and respectively capturing context information in a local window and cross-window to generate two paths of attention characteristic diagrams; s127, inputting the two paths of attention feature graphs into a fusion layer to integrate the cross-scale features, and generating the cross-scale fusion features; S128, inputting the cross-scale fusion characteristic into an extrusion excitation channel attention module for channel-level weighting, generating an enhanced cross-scale attention characteristic, and transmitting the enhanced cross-scale attention characteristic as the output of the current SwinTransformerBlock to the next layer.
6. The method for detecting corn diseases based on lightweight transformers according to claim 3, wherein the step S22 comprises: S221, performing channel compression convolution operation on the high-resolution feature map to generate compression features for kernel prediction; S222, generating an adaptive recombination kernel through a convolution layer based on the compression characteristics, normalizing the adaptive recombination kernel, and remolding the adaptive recombination kernel into a kernel tensor corresponding to the space position of the characteristic map; S223, performing preliminary upsampling on the high-resolution feature map to generate a preliminary upsampled feature map; s224, performing position-by-position weighted recombination on the preliminary upsampling feature map based on the normalized self-adaptive recombination check to generate a content perception upsampling feature map; And S225, performing post-processing convolution operation on the content-aware upsampling feature map to generate an upsampling feature map matched with the size of the low-resolution feature map from the deep network.
7. The method for detecting corn diseases based on lightweight transformers according to claim 1, wherein the step S3 comprises: S31, receiving a fused multi-scale feature list, wherein the multi-scale feature list comprises a plurality of feature graphs with different resolutions, and three-level attention chain processing of scale attention, space attention and task attention is independently executed on the feature graph resolution level of each resolution level in the list; S32, applying a scale attention mechanism to the feature map of each resolution level, and dynamically adjusting the response weights of the features with different resolutions to generate a scale weighted feature map; S33, weighting the space through a3 multiplied by 3 convolution space attention layer based on the scale weighted feature map to generate a space refining feature map; s34, compressing space dimension through global average pooling based on a space refining feature map, and generating task-level feature representation through a two-stage 1 multiplied by 1 convolution layer and a ReLU activation function in sequence; And S35, based on the task-level feature representation, generating classification scores and boundary box prediction results of the resolution levels through a classification head and a regression head respectively.
8. The method for detecting corn diseases based on lightweight transformers according to claim 1, wherein step S4 comprises: S41, calculating the area value of each prediction boundary box in the current training batch; S42, comparing the area value of the real labeling boundary frame with a preset area threshold value, marking the boundary frame with the area value smaller than the area threshold value as a small target boundary frame, and marking the real labeling boundary frame with the area value larger than or equal to the area threshold value as a standard weight boundary frame; S43, multiplying the prediction boundary box regression loss truly marked as a small target boundary box by a preset weight coefficient to generate a weighted boundary box regression loss, and keeping the weight coefficient of the prediction boundary box regression loss truly marked as a standard weight boundary box to be 1.0 to generate a standard boundary box regression loss; S44, summing the weighted boundary box regression loss and the standard boundary box regression loss to generate overall weighted boundary box regression loss and form overall training loss together with the classification loss; and S45, performing back propagation based on the total training loss, updating model parameters, and repeatedly performing S41-S45 until the model converges to generate a trained detection model.
9. The method for detecting corn diseases based on lightweight transformers according to claim 1, wherein step S5 comprises: S51, decoding the boundary frame prediction results of all resolution levels, converting the prediction results into boundary frame coordinates under an image coordinate system, and generating a candidate detection frame set; S52, arranging the candidate detection frame sets in descending order of the classification scores, selecting the detection frame with the highest classification score as a reference frame, executing non-maximum value inhibition processing, removing the overlapped detection frames with the intersection ratio of the reference frame exceeding a preset intersection ratio threshold value, and repeating the process of the step S52 until all the candidate detection frames are processed, so as to generate a final detection frame set; And S53, outputting disease category labels, boundary frame coordinates and confidence scores corresponding to each detection frame based on the final detection frame set, and finishing detection and positioning of corn diseases.
10. The method for detecting corn diseases based on lightweight Transformer of claim 8, wherein the preset area threshold is 32 x 32 pixels and the preset weight coefficient is 2.0.

Description

Corn disease detection method based on light-weight Transformer Technical Field The invention relates to the technical field of intelligent agriculture and plant protection, in particular to a corn disease detection method based on a lightweight Transformer. Background Corn is used as an important grain crop and feed source in China, and the plant diseases and insect pests (such as rust disease, large spot disease, aphid and the like) in the growth process seriously affect the yield and quality. In the existing corn pest control scene, the following technical problems mainly exist: Corn leaf disease spots are usually small targets (the area is smaller than 32 x 32 pixels), a CNN model has limitation on global feature modeling, long-distance dependency of the small disease spots is difficult to capture effectively, the standard Vision Transformer has global modeling capability, but the calculated amount is huge, the parameter redundancy is high, the corn leaf disease spots are difficult to deploy on edge equipment with limited calculation power such as an unmanned aerial vehicle, and under the condition of complex farmland background (illumination change and shielding), the existing model is easy to lose small target features, so that the detection precision is insufficient. The Feature Pyramid Network (FPN) adopts standard interpolation up-sampling, so that detailed information of a small target is difficult to fully retain when multi-scale features are fused, the feature expression capability of the small target is reduced in the multi-scale fusion process, meanwhile, the adaptability of a standard detection head to targets with different scales is limited, the detection capability of the small target is difficult to be pertinently enhanced, in addition, a small bounding box sample is ignored by a standard loss function due to small pixel contribution in the training process, a model optimization focus is deviated to the large target, and the detection precision of the small target is further weakened. Therefore, there is an urgent need for a corn disease detection method based on lightweight transformers. Disclosure of Invention The invention provides a corn disease detection method based on a lightweight converter, which aims to solve the problems in the prior art. In order to achieve the above purpose, the present invention provides the following technical solutions: A corn disease detection method based on a lightweight transducer comprises the following steps: S1, inputting a corn image into a feature extraction module taking SwinTransformer as a main network, and generating a multi-level feature representation through multi-level feature extraction; s2, based on multi-level feature representation, carrying out multi-scale feature fusion through a feature pyramid network to generate fused multi-scale features; s3, performing attention processing on the feature images of the resolution levels through a dynamic detection head based on the fused multi-scale features to generate classification scores and boundary frame prediction results of the resolution levels; S4, calculating training loss by adopting a weighted boundary frame loss function based on the classification score and the boundary frame prediction result, applying a weight coefficient higher than a standard weight to the boundary frame with the area lower than a preset threshold value, updating model parameters through back propagation, and generating a trained detection model; S5, inputting the corn image to be detected into the trained detection model to perform forward propagation, and performing non-maximum suppression processing on the output boundary frame prediction result to generate a detection position and a category result of the corn disease. Further, the step S1 includes: S11, segmenting an input corn image into non-overlapping image blocks, flattening each image block, and mapping the flattened image block to an initial channel number through a linear projection layer to generate an initial characteristic representation; and S12, inputting the initial feature representation into a plurality of stacked SwinTransformerBlock, extracting the features by using common window attention and window shifting attention alternately by each SwinTransformerBlock, carrying out step-by-step downsampling by PATCHMERGING layers, constructing a multi-scale feature level with the channel number doubled step by step and the spatial resolution halved step by step, and generating a multi-level feature representation. Further, the step S2 includes: S21, receiving a multi-level feature representation, wherein the multi-level feature representation comprises a plurality of feature levels with the channel number doubled step by step and the spatial resolution halved step by step, channel unification and scale alignment are carried out on features of all levels through a neck module of a feature pyramid network, and a multi-scale feature list with the unif