CN-122024055-A - Automatic identification method for crop key feature points
Abstract
The invention relates to an automatic identification method for key feature points of crops, and belongs to the technical field of agricultural intelligent perception and image identification. The method comprises the steps of carrying out image enhancement and normalization processing on crop images by adopting an image enhancement strategy and a normalization strategy in a data preprocessing stage, constructing a key feature point identification model, and carrying out joint optimization on an output prediction heat map by adopting a mixed loss function combining mean square error and binary cross entropy, wherein the key feature point identification model comprises an improved encoder-decoder and an output layer. The invention realizes the automatic identification and positioning of the key feature points of the crop image, outputs the coordinate and confidence information comprising each key point, and simultaneously generates a corresponding high-resolution thermodynamic diagram, thereby providing high-precision visual input for the subsequent applications of crop growth monitoring, phenotype measurement, automatic picking and the like.
Inventors
- Pan Yongting
- ZUO XIAOQING
- ZHAO KANG
- ZHANG YONGZHE
- WANG LIZHI
- ZHU DAMING
- SONG WEIWEI
- CHEN GUOPING
Assignees
- 昆明理工大学
Dates
- Publication Date
- 20260512
- Application Date
- 20260129
Claims (8)
- 1. The automatic identification method for the key characteristic points of the crops is characterized by comprising the following steps: s1, in a data preprocessing stage, performing image enhancement and normalization processing on crop images by adopting an image enhancement strategy and a normalization strategy; s2, constructing a key feature point identification model, wherein the key feature point identification model comprises an improved encoder-decoder and an output layer; The improved encoder extracts multi-layer semantic features by double-layer convolution and batch normalization, and fuses shallow detail information with deep abstract features by jump connection; The characteristic enhancement mechanism combining channel attention and spatial attention is introduced in the encoding stage, and the mechanism adaptively adjusts the weight of different characteristic channels and the spatial response distribution of key point areas in a characteristic diagram, and is used for highlighting the key point areas and inhibiting irrelevant background characteristics, so as to realize the fine characterization of complex textures and structural characteristics of crops; A multi-scale cavity convolution module is designed at a network bottleneck layer, and multi-receptive field feature fusion is realized through convolution operations with different cavity rates, so that local texture details are reserved, and global structure information can be captured; Introducing an edge perception module in a decoding stage, extracting image gradient information by using a fixed Sobel edge operator, and carrying out key constraint and reinforcement on a boundary region of a predicted heat map by a characteristic re-weighting mechanism; and S3, adopting a mixed loss function combining the mean square error and the binary cross entropy to perform joint optimization on the output prediction heat map.
- 2. The method for automatically identifying key feature points of crops according to claim 1, wherein S1 comprises: s11, crop image reading and standardization: the image scaling is to uniformly adjust the original crop image size to be a fixed size for ensuring the consistency of model input; normalizing the pixel value of each channel to make the mathematical distribution more suitable for deep network training; And S12, generating a key point Gaussian heat map, namely generating two-dimensional Gaussian distribution for each pixel position and each key point to obtain a real Gaussian heat map: S13, converting the channel into tensor data to acquire an original image tensor.
- 3. The method for automatically identifying key feature points of crops according to claim 1, wherein in S2, the objective of the constructed key feature point identification model is to learn a function map : ; So that a heat map is predicted And true Gaussian heat map As close as possible to the one that is being used, Is the original image tensor.
- 4. The method for automatically identifying key feature points of crops according to claim 1, wherein in S2, the improved encoder internal processing flow comprises: 1) The first stage of the encoder is to input the original image tensor, output the tensor to the attention module after double convolution processing, and process the tensor through a channel attention mechanism and a space attention mechanism; the channel attention mechanism process includes: channel compression, namely compressing the information of each channel into two groups of vectors through global average pooling and global maximum pooling; channel excitation-two sets of vectors are non-linearly mapped by a two-layer MLP sharing weights Channel attention force diagram, namely adding two groups of mapping results and obtaining channel weight through sigmoid normalization; Channel recalibration, broadcasting channel weights to space dimension, and weighting input features channel by channel; the spatial attention mechanism is characterized in that the attention mechanism of the spatial dimension is further generated on the channel attention mechanism, and comprises channel aggregation, average pooling and maximum pooling are calculated Space convolution aggregation is carried out, and two single-channel graphs are spliced; Generating a spatial attention map by using the 7*7 convolution layer and the sigmoid; Spatial recalibration, broadcasting a spatial attention map to the channel dimension; after spatial attention mechanism CBAM, downsampling its layer 1 output: Sampling 2 x 2 max pooling; 2) The processing mode of the second stage of the encoder is identical to the processing mode of the first stage of the encoder, and the difference is that the input is the sampling result of the last step; 3) The third stage of the encoder comprises the steps of performing first convolution block processing on the output of the second stage of the encoder, and performing second convolution block processing on the result output by the first convolution block processing; 4) And carrying out multi-scale fusion by a multi-scale cavity convolution module MSF: The multi-scale cavity convolution branches are used for enhancing the characteristic capability of the model under different sensing fields, processing the multi-scale cavity convolution branches of the result output by the third stage of the encoder, splicing, normalizing and activating the multi-scale cavity convolution branches in the channel dimension, and finally mapping back to the original channel by using 1*1 convolution.
- 5. The method for automatically identifying key feature points of crops according to claim 1, wherein in S2, the improved encoder internal processing flow specifically and in detail comprises: 1) Encoder first stage: Input as original image tensor ; (1) Double convolution DoubleConv: first convolution block processing: ; ; ; second convolution block processing: ; ; ; Wherein, the And Is the weight of the first convolution layer and the second convolution layer of the first stage; And Is the offset value of the first convolution layer and the second convolution layer of the first stage; Will output Input to the attention module: ; (2) Attention mechanism processing: a. Channel attention mechanism: channel compression, namely compressing information of each channel into two groups of vectors through global average pooling and global maximum pooling: ; ; Wherein, the 、 For the total pixel position Number of pieces; channel excitation-both sets of vectors are non-linearly mapped by a two layer MLP sharing weights: ; ; Wherein, the And As the weight of the material to be weighed, And In order for the offset to be a function of, ; Channel attention striving to add the two sets of mapping results and get the channel weights by sigmoid normalization: ; Wherein, the ; Channel weight calibration, namely, channel weight is calculated Broadcast to spatial dimension, channel-by-channel weighting of input features: ; b. On the channel attention mechanism, further generating an attention mechanism of a space dimension; Channel aggregation Average pooling and maximum pooling are calculated: ; ; wherein C represents a spatial dimension; space convolution aggregation, and splicing two single-channel graphs: ; Generating a spatial attention map by 7*7 convolution layers and sigmoid: ; Wherein, the Is a convolution kernel weight matrix of 7*7 convolution layers; Is a paranoid term used for correcting convolution output; Spatial recalibration-broadcasting a spatial attention attempt to the channel dimension: ; after passing through the spatial attention mechanism CBAM, the first The output of the layers is: ; (3) Downsampling: Sample 2 x 2 max pooling: ; 2) The encoder second stage: input as the result of the last step ; (1) Double convolution DoubleConv: first convolution block processing: ; ; ; second convolution block processing: ; ; ; Wherein, the And Weights of the first convolution layer and the second convolution layer of the second stage; And Is the offset value of the first convolution layer and the second convolution layer of the second stage; Will output Input to the attention module: ; (2) Attention mechanism processing: a. Channel attention mechanism: channel compression, namely compressing information of each channel into two groups of vectors through global average pooling and global maximum pooling: ; ; Wherein, the 、 For the total pixel position Quantity of Channel excitation-both sets of vectors are non-linearly mapped by a two layer MLP sharing weights: ; ; Wherein, the And As the weight of the material to be weighed, And In order for the offset to be a function of, ; Channel attention striving to add the two sets of mapping results and get the channel weights by sigmoid normalization: ; Wherein, the ; Channel weight calibration, namely, channel weight is calculated Broadcast to spatial dimension, channel-by-channel weighting of input features: ; b. On the channel attention mechanism, further generating an attention mechanism of a space dimension; Channel aggregation Average pooling and maximum pooling are calculated: ; ; wherein C represents a spatial dimension; space convolution aggregation, and splicing two single-channel graphs: ; Generating a spatial attention map by 7*7 convolution layers and sigmoid: ; Spatial recalibration-broadcasting a spatial attention attempt to the channel dimension: ; after passing through the spatial attention mechanism CBAM, the first The output of the layers is: ; (3) Downsampling: Sample 2 x 2 max pooling: ; 3) Third stage of encoder: the input being the last step ; (1) Double convolution: first convolution block processing: ; ; ; second convolution block processing: ; ; ; Wherein, the And Weights of the first convolution layer and the second convolution layer in the third stage; And Is the offset value of the first convolution layer and the second convolution layer of the third stage; 4) And carrying out multi-scale fusion by a multi-scale cavity convolution module MSF: the multi-scale cavity convolution branch is used for enhancing the characteristic capability of the model under different sensing fields; Branch 1: ; branch 2: ; branch 3: ; Wherein, the For the filling of the convolution kernel, Distance of sampling point; fusion and projection, namely splicing, normalizing and activating each multi-scale cavity convolution branch in the channel dimension: ; ; Finally, 1*1 convolutions are used to map back to the original channel: 。
- 6. The method for automatically identifying key feature points of crops according to claim 1, wherein in S2, the decoder stage internal flow comprises: 1) First stage of decoder: (1) Upsampling to encoder output Performing bilinear upsampling to double the feature scale: ; (2) Skip link attention, acquisition of attention output from encoder layer 2 Screening useful detail features through a skip link attention module: ; (3) Splicing and fusing, namely splicing the upsampling feature and the jump feature in the channel dimension: ; (4) Multi-perception module, multi-fusion feature The multi-perception module is used for enhancing depth semantics and detail information, wherein the multi-perception module comprises a double convolution module, a spatial attention module and an edge perception module; ; 2) A decoder second stage: (1) Upsampling, namely sampling the first stage of the decoder to restore the original resolution: ; (2) Skip link attention, acquisition of attention output from encoder layer 2 Screening useful detail features through a skip link attention module: ; (3) Splicing and fusing, namely splicing the upsampling feature and the jump feature in the channel dimension: ; (4) Multi-perception module, multi-fusion feature The multi-perception module is used for enhancing depth semantics and detail information: 。
- 7. The method for automatically identifying key feature points of crops according to claim 1, wherein in S2, the output layer comprises: Second stage output to decoder Performing 1*1 convolution and sigmoid activation to generate a key point heat map prediction result: ; the step S3 comprises the following steps: and at the model optimization level, adopting a mixed loss function design combining a mean square error MSE and a binary cross entropy BCE for realizing double constraint on the probability distribution of key points and the intensity of a heat map.
- 8. The method for automatically identifying key feature points of crops according to claim 1, wherein the step S3 specifically comprises the following steps: The mixed loss function adopts three loss forms, namely a mean square error MSE, a binary cross entropy BCE and two mixed forms; (1) Mean square error loss Mean square error loss is used to measure pixel-by-pixel differences between the predicted heat map and the true heat map, defined as: ; Wherein N is the number of samples, For the predicted heat map of the nth sample, Corresponding real heat map; (2) Binary cross entropy loss : The binary cross entropy loss is used to balance the difference in probability distribution of each pixel belonging to either keypoint 1 or background point 0, defined as: ; wherein N is the number of training samples, H and W are the height and width of the heat map, For the nth pixel The true tag value at which to locate, Prediction probability at; (3) Weighted mixing loss Employing weighted mixing loss during training phase : ; Wherein, the In order for the coefficient of balance to be present, 。
Description
Automatic identification method for crop key feature points Technical Field The invention relates to an automatic identification method for key feature points of crops, and belongs to the technical field of agricultural intelligent perception and image identification. Background Under the common pressure of global population growth, extreme climate frequency and agricultural labor shortage, precise agriculture and intelligent agriculture have become key technical paths for guaranteeing global grain safety. To achieve the intellectualization, automation and unmanned agriculture, the core premise is that the machine must be able to "see" the growth status of the crop like an experienced agricultural expert. In agricultural breeding, the indexes such as morphological structure, internode length, leaf area, fruit position and the like of plants are key data for phenotype research. The feature points can be automatically identified, so that structural parameters can be efficiently extracted, quantifiable and traceable morphological data support is provided for breeders, and the breeding screening efficiency is remarkably improved. The agricultural robot is supported to accurately operate, and the robot operations such as automatic picking, spraying, trimming and the like need to accurately identify the target part. The high-precision key point identification can provide centimeter-level positioning information for the robot, and ensure the operation safety and stability under a complex environment. Promote agricultural informatization and intelligent monitoring, namely, in intelligent greenhouse and field planting, the crop key point identification can realize automatic growth monitoring, early disease detection and yield prediction, and provide scientific basis for agricultural management decision. And the remote sensing and digital twin agricultural construction is promoted, a crop three-dimensional structure model can be rebuilt based on the key point identification result, and further digital twin simulation and dynamic growth prediction of farmland dimensions are realized. Therefore, the research on the method for automatically identifying the key characteristic points of the crops with high precision, strong robustness and generalization has remarkable scientific significance and application value for promoting the development of agricultural intelligent production and precise breeding technology. The invention provides a crop key feature point automatic identification method, which has the core meaning of endowing a machine vision system with the perception capability of 'expert level'. Crop key feature points, such as plant growth points, branching points, flower buds, calyx points (fruit stalks to fruit connection points) or grain spike points, are the most basic, critical biological indicators to quantify crop phenotype, determine development progress, assess health and predict final yield. 1. Current state of research and problems: at present, crop feature point identification mainly depends on two types of technical routes: traditional image processing methods and end-to-end detection models based on deep learning. 1. Limitations of conventional image processing methods In early researches, manual feature algorithms such as edge detection, color threshold segmentation, template matching and the like are mostly adopted. The method has poor stability under the conditions of illumination change, complex background and crop posture difference, and is difficult to cope with phenotype changes of different crop types and growth stages. 2. Rapid development of deep learning method With the popularization of Convolutional Neural Networks (CNN), U-Net, HRNet and other structures, deep learning models have become the dominant means of feature point detection. They achieve higher feature expression capability through end-to-end training, but still suffer from the following problems: (1) The model has complex structure and large parameter quantity, so that the reasoning speed is low; (2) In the image with dense features or serious shielding, the predicted thermal image has the phenomena of blurring, dislocation and the like; (3) The attention mechanism lacks self-adaptive characteristics and cannot fully distinguish a key region from background noise; (4) The edge information is not utilized enough, so that the positioning accuracy of the feature points is reduced. 3. Cross-scene robustness is not sufficient Current algorithms have limited ability to migrate between different crops, especially under different lighting, angles or camera conditions. How to realize a universal, robust and lightweight feature point detection model remains a key challenge in the agricultural vision field. 2. Challenges and difficulties faced On the basis of comprehensively analyzing the characteristics of agricultural visual tasks, the current crop key point identification still faces the following main challenges: (1) Complex backgro