CN-121982646-A - Tomato leaf disease detection method by combining multi-scale feature enhancement and cross-layer features
Abstract
A tomato leaf disease detection method with multi-scale feature enhancement and cross-layer feature fusion takes a tomato leaf image as input, constructs a feature extraction module based on a cross-scale multi-head self-attention mechanism and efficient multi-scale feature enhancement convolution, realizes collaborative modeling of high and low resolution features through cross-scale attention interaction, and enhances micro-lesion and low contrast texture feature expression by utilizing the multi-scale convolution. The method further builds a cross-scale and cross-layer collaborative feature fusion network, and dynamic aggregation and consistency calibration of semantic information of different levels are achieved in a feature fusion stage. And a shape-adaptive multi-fusion boundary box regression loss function is introduced, so that the sensitivity of irregular lesions to positioning regression is reduced, model convergence is accelerated, and high-precision and low-omission ratio detection of tomato leaf diseases is realized.
Inventors
- XU HONGYAN
- SONG YATAO
- FENG YONG
- YANG CHAO
Assignees
- 辽宁大学
Dates
- Publication Date
- 20260505
- Application Date
- 20260130
Claims (4)
- 1. A tomato leaf disease detection method with multi-scale characteristic enhancement and cross-layer characteristic fusion is characterized by comprising the following steps: The method comprises the steps of 1, constructing a multi-scale feature enhancement-based feature extraction module, synchronously extracting and fusing different scale features through EMSFEConv, realizing the joint expression of local detail features and context information, and pointedly enhancing the response capability of tiny lesions and low-contrast texture features; Step 2, constructing a cross-scale multi-head self-attention mechanism, namely introducing a self-adaptive weighting mechanism in a feature fusion stage connecting a backbone network and a detection head, receiving high-resolution and low-resolution features by the mechanism, introducing a self-attention module by utilizing a cross-layer semantic calibration path, and dynamically fusing the up-sampled deep semantic features with shallow detail features; And 3, constructing shape adaptation boundary box regression loss, namely carrying out joint modeling on scale characteristics and morphological information of a target based on a candidate boundary box regression result of network prediction, carrying out comprehensive measurement on the position and the size deviation of the boundary box by introducing multidimensional constraint in the boundary box regression process, reducing the sensitivity of irregular lesions to positioning regression, carrying out optimization training on the network by using a loss function, and outputting a final tomato leaf disease positioning and detection result.
- 2. The method for detecting tomato leaf diseases by combining multi-scale feature enhancement and cross-layer feature fusion according to claim 1 is characterized in that in the step1, the specific method for constructing the feature extraction module based on multi-scale feature enhancement is as follows: The input feature map is set as follows: Wherein N is the batch size, C is the number of input channels, H is the height of the feature map, and W is the width of the feature map; First, channel grouping processing is performed on an input feature map: in order to realize targeted multi-scale feature extraction, channel grouping operation is firstly carried out on an input feature map, channel dimensions of the input feature map are split into G mutually independent feature groups according to a preset grouping number G, and the channel number of each group is that Grouping features obtained by dimension rearrangement operations : Wherein, the The last dimension is the "grouping dimension", REARRANGE represents a dimension rearrangement operation, the core is the total channel Split into "single set of channels +Number of packets ”; Then enter the multi-scale convolution stage, configure the convolution kernel of the exclusive size for each characteristic group And performing convolution operation on the corresponding grouping feature graphs, wherein the convolution output expression of the single grouping is as follows : After the convolution of all the packets is completed, the output characteristics of all the packets are stacked into a unified tensor according to the packet sequence : Wherein, the For a single packet to be convolved with the output, Is the stacking back dimension; in order to be compatible with a subsequent network structure, channel reorganization is required to be carried out on the stacked multi-scale features, and the multi-grouping features are recombined into original channel dimensions through dimension rearrangement operation, so that a restored feature diagram is obtained: Wherein, the ; Finally, realizing feature fusion of channel dimension through 1X 1 convolution, and recombining the feature map And convolution operation is executed, so that feature information of different scales is effectively integrated: Wherein, the The method is an output characteristic of the multi-scale characteristic enhancement module and is used for subsequent trans-scale characteristic fusion and lesion detection.
- 3. The method for detecting tomato leaf diseases by combining multi-scale characteristic enhancement and cross-layer characteristic fusion according to claim 1, wherein the specific method for constructing a cross-scale multi-head self-attention mechanism in the step 2 is as follows: Constructing a cross-scale multi-head self-attention module, taking high-resolution features and low-resolution features as inputs, and realizing self-adaptive association modeling among different scale features; the feature map obtained in the first step is subjected to different levels of a backbone network Downsampling, namely taking middle-shallow layer features as high-resolution features and deep layer features as low-resolution features; The input characteristics are as follows: Wherein, the In order to provide a high-resolution feature map, The number of times the batch size is =, The number of input channels for the high resolution feature map, For the width of the high resolution feature map, The height of the high-resolution feature map; Wherein, the In the case of a low-resolution feature map, The number of times the batch size is =, The number of input channels for the low resolution feature map, For the width of the low resolution feature map, The height of the low-resolution feature map; first, query, key, value feature projections are performed: For high resolution features Performing 2 times up-sampling operation to match the size of low resolution feature, and performing feature mapping by 1×1 convolution to obtain Query feature The expression is: Wherein, the And (3) with The sizes are consistent; for low resolution features Feature mapping is completed through 1X1 convolution respectively, and Key features are correspondingly generated And Value feature : Wherein, the , ; Subsequently, query, key, value was subjected to multi-head signature remodeling: Will be 、 、 Remodelling into a multi-head format, splitting the channel number C into h attention heads: Wherein, the , The number of attention points is =the number of attention points, =The number of channels of a single attention header, =Space dimension [ ] ) The flattened sequence length; Next, a cross-scale attention weight is calculated: First pair Dimension replacement is carried out, and then the dimension replacement is carried out with Performing matrix multiplication to obtain an attention score matrix: Wherein, the Representation of Transpose the latter two dimensions, the dimensions become , ; Attention score matrix In the sequence dimension Normalization processing is carried out to obtain attention weight meeting probability distribution characteristics : Wherein, the Each row of the weight sum is 1; Multi-head Value feature And normalized attention weight Performing matrix multiplication operation, and realizing dimension adaptation by transposed weight matrix to obtain multi-head attention output : Wherein, the ; Finally, outputting the attention to multiple heads Channel splicing and dimension reduction are carried out, the original channel number and the space size are remolded, and final output characteristics are obtained : Wherein, the The method is an output characteristic consistent with the low-resolution characteristic size and is used for carrying out regression optimization on a boundary box in a subsequent detection head and detecting a lesion target.
- 4. The method for detecting tomato leaf diseases by combining multi-scale feature enhancement and cross-layer feature fusion according to claim 1, wherein the specific method for constructing the shape-adaptive bounding box regression loss in the step 3 is as follows: first, the feature map is outputted in the second step For input, the detection head is connected to conduct preliminary prediction of the disease spot frame to obtain a predicted frame Simultaneously obtain the real disease spot frame Then, constructing a shape-adaptive bounding box regression loss; The loss function consists of a shape-aware Wise-ShapeIoU loss term and a distributed regression DFL loss term, wherein the Wise-ShapeIoU loss is used for describing the matching relationship between a prediction boundary frame and a real boundary frame in terms of position, scale and contour morphology, and the expression is as follows: Wherein, the For WiseIoU loss functions, ltype = 'ShapeIoU' is specified to turn on shape perception, inner _ iou = True enhances matching to the lesion contour, Is a target fractional weight, used to focus high quality positive samples, For a positive total number of samples, for normalization loss, In order to predict the lesion bounding box, Is a true lesion bounding box; meanwhile, DFL loss is introduced to carry out distributed modeling on each coordinate component of the boundary frame, and the loss form is as follows: Wherein, the For the distributed encoding of the i-th coordinate of the prediction block, Coding the distribution of the ith coordinates of the real frame, For the interval of the distribution of the weights, Is cross entropy loss; the total bounding box loss is: and (3) carrying out reverse optimization training on the network by using the loss function, and finally outputting accurate positioning and detection results of the tomato leaf diseases.
Description
Tomato leaf disease detection method by combining multi-scale feature enhancement and cross-layer features Technical Field The invention relates to agricultural intelligent detection and crop disease identification technology, in particular to a tomato leaf disease detection method with multi-scale feature enhancement and cross-layer feature fusion, which is suitable for automatic detection and accurate plant protection operation of tomato leaf diseases in the scenes of facility agriculture, field planting and the like. Background The tomato is used as an important cash crop in China, and the timely and accurate detection of leaf diseases is a key for guaranteeing the yield and the quality. In a field environment, the disease spots are various in shape and different in size, are also interfered by factors such as leaf curling and wrinkling, have low traditional manual detection efficiency and strong subjectivity, are difficult to meet the real-time monitoring requirements of large-scale agricultural production, and are urgently supported by an efficient and intelligent automatic detection technology. Early detection of tomato leaf diseases depends on a traditional machine learning method, and is realized by manually designing shallow visual features such as colors, textures, shapes and the like and combining models such as a Support Vector Machine (SVM), random Forests (RF) and the like. However, the method has weak generalization capability, is difficult to adapt to the diversity of the disease spots and the coupling relation with the blade background in the field complex environment, has high iteration cost, needs to redesign features and label a large number of samples when new disease types are added or new interference is handled, and has low efficiency. With the development of deep learning technology, a Convolutional Neural Network (CNN) -based method becomes mainstream, and deep visual features can be automatically learned without manually designing features. Early models such as AlexNet and VGG and target detection models such as subsequent fast R-CNN and YOLO series are realized, and synchronous completion of lesion location and category identification is realized, wherein the YOLO series is preferable in field scene due to high efficiency and real-time performance. However, the existing deep learning method still has obvious defects of insufficient extraction of 1-2mm micro disease spot characteristics and high omission factor, difficulty in adapting to irregular spread disease spots by traditional bounding box regression, positioning deviation, insufficient anti-interference capability on interference such as blade form distortion and the like, low detection precision in a real field scene and difficulty in meeting the actual requirements of high precision, low omission factor and strong robustness. Disclosure of Invention The invention provides a tomato leaf disease detection method with multi-scale feature enhancement and cross-layer feature fusion. The method sequentially realizes accurate positioning and rapid detection of tomato spotted targets under complex background through multi-scale feature coding, cross-scale and cross-layer feature collaborative fusion and shape adaptation bounding box regression loss function optimization. The invention is realized by the following technical scheme: A tomato leaf disease detection method with multi-scale characteristic enhancement and cross-layer characteristic fusion comprises the following steps: first step, constructing a feature extraction module based on multi-scale feature enhancement This step aims at multi-scale feature encoding of the input tomato leaf image. Aiming at the characteristics of obvious scale difference, irregular boundary morphology, low local texture contrast and easiness in interference of vein structure and illumination change of tomato leaf lesions in a natural growth environment, the invention introduces a feature extraction module based on multi-scale feature enhancement into a Backbone network (Backbone) of a target detection model, and is used for improving the significance expression capability of a lesion region. The module enhances the traditional convolution feature extraction unit on the whole topological structure, and is characterized in that on the premise of keeping the spatial resolution of a feature map unchanged, local texture information under different receptive fields is modeled in parallel through a multi-scale feature enhancement structure, the collaborative expression of multi-scale features is realized on the channel dimension, and the self-adaptive integration of multi-scale response is realized in the feature fusion stage, so that the feature response capability to tiny lesions and low-contrast disease textures is effectively enhanced. The specific implementation process is as follows: 1.1 Input feature definition The input feature map is set as follows: Wherein N is the batch size, C is the numbe