CN-121982775-A - Grassland pasture beef cattle behavior identification method based on four-foot robot vision

CN121982775ACN 121982775 ACN121982775 ACN 121982775ACN-121982775-A

Abstract

The invention relates to the technical field of image processing, and discloses a grassland pasture beef cattle behavior recognition method based on four-foot robot vision, which comprises the steps of firstly receiving grassland pasture images acquired by image acquisition equipment arranged on a four-foot robot; the method comprises the steps of obtaining a behavior recognition result of a cow through feature extraction and cow behavior detection of a grassland pasture image through a trained cow behavior recognition model, generating a moving instruction according to the behavior recognition result to control a quadruped robot to move, improving the trained cow behavior recognition model based on a Yolo v s model, obtaining a Yolo v s model, wherein the Yolo v s model comprises a backbone network, a neck network and a detection head which are sequentially connected, the backbone network comprises a StarNet network, an SPPF module and a C2PSA module, the neck network replaces a feature pyramid structure with a multi-scale feature focusing extraction network, and the detection head replaces an adaptive decomposition and alignment detection head, so that a plurality of beef cows can be detected simultaneously, and the behavior recognition precision of the cow in the grassland pasture is improved.

Inventors

WEI PEIGANG
SUN WEI
CAO PANPAN
KONG FANTAO
MA NAN
WANG YI
ZHANG SONGXUE

Assignees

中国农业科学院农业信息研究所

Dates

Publication Date: 20260505
Application Date: 20260129

Claims (8)

1. The grassland pasture beef cattle behavior identification method based on the four-foot robot vision is characterized by comprising the following steps of: Receiving a grassland pasture image acquired by image acquisition equipment, wherein the image acquisition equipment is arranged on a quadruped robot, and the grassland pasture image comprises at least one cow; extracting features of the grassland pasture image and detecting behaviors of the cattle through the trained behavior recognition model of the cattle to obtain a behavior recognition result of the cattle, and generating a moving instruction according to the behavior recognition result; The motion instruction is used for indicating the motion of the quadruped robot, the trained cow behavior recognition model is obtained based on Yolo v s model improvement, the Yolo v s model comprises a backbone network, a neck network and a detection head which are sequentially connected, the backbone network comprises a StarNet network, an SPPF module and a C2PSA module which are sequentially connected, the neck network replaces a characteristic pyramid structure with a multi-scale characteristic focusing extraction network, and the detection head replaces an adaptive decomposition and alignment detection head; The StarNet network is used for fusing multi-channel characteristic information through element-by-element multiplication and mapping the multi-channel characteristic information to a new characteristic subspace, the multi-scale characteristic focusing extraction network is used for focusing the context information of the multi-scale characteristic map output by the backbone network, the self-adaptive decomposition and alignment detection head is used for decoupling detection tasks and aligning the characteristic information, and performing behavior classification and positioning on cattle in the grassland pasture image, and the multi-scale characteristic map at least comprises a high-resolution characteristic map, a medium-resolution characteristic map and a low-resolution characteristic map.
2. The method according to claim 1, wherein the backbone network is specifically configured to perform the following operations: Receiving the grassland pasture image through the StarNet network, and carrying out downsampling and multi-level feature extraction on the grassland pasture image; Receiving the feature map output by the StarNet network through the SPPF module, and performing multi-scale pooling operation on the feature map output by the StarNet network to obtain a pooled feature map; and carrying out feature enhancement on the pooled feature map by the C2PSA module through a spatial pyramid slice attention mechanism to obtain the low-resolution feature map.
3. The method of claim 2, wherein the StarNet network comprises a first convolution layer and a plurality of tiers, wherein the plurality of tiers comprises a first tier, a second tier, a third tier, and a fourth tier, wherein the first tier, the second tier, and the fourth tier each comprise a second convolution layer and a star module, wherein the third tier comprises a second convolution layer and three star modules in series, and wherein the StarNet network is configured to: Carrying out convolution operation on the grassland pasture image through a first convolution layer to obtain an initial feature map; Sequentially extracting space dimension features of the initial feature map through the first level and the second level to obtain the high-resolution feature map; carrying out multi-level feature extraction on the high-resolution feature map through the third level to obtain the medium-resolution feature map; And extracting spatial dimension characteristics of the medium-resolution characteristic map through the fourth level, and outputting the corresponding characteristic map to the SPPF module.
4. The method of claim 3, wherein the star module comprises two first depth separable convolutional layers, three linear transform layers, a star operation layer, a first residual connection layer, two normalization layers, and two activation function layers, wherein the star module is configured to: carrying out light feature extraction on the input features of the star module in the space dimension through a first depth separable convolution layer to obtain a first space feature map; normalizing the first space feature map through a first normalization layer, and introducing nonlinear response through a first activation function layer to obtain a first feature map; dividing the first feature map into two paths, respectively inputting the two paths to a first linear transformation layer and a second linear transformation layer, and respectively carrying out linear mapping of channel dimensions on the feature map after normalized activation through the first linear transformation layer and the second linear transformation layer; Multiplying the feature images output by the first linear transformation layer and the second linear transformation layer respectively element by element through the star-shaped operation layer so as to fuse multi-channel feature information and map the multi-channel feature information to a new feature subspace to obtain an enhanced feature image, and carrying out linear mapping of channel dimension on the enhanced feature image through a third linear transformation layer; carrying out channel normalization on the characteristics output by the third linear transformation layer through a second normalization layer, and introducing nonlinear response through a second activation function layer to obtain a second characteristic diagram; Carrying out light-weight feature extraction on the second feature map in the space dimension through a second depth separable convolution layer to obtain a second space feature map; and adding the second space feature map with the input features of the star module through the first residual error connection layer to obtain the output features of the star module.
5. The method of claim 1, wherein the multi-scale feature focus extraction network comprises a downsampling layer, an upsampling layer, a plurality of third convolution layers, a stitching layer, a plurality of parallel second depth separable convolution layers, an identity layer, a point-by-point convolution layer, and a second residual connection layer, the multi-scale feature focus extraction network being specifically configured to perform the steps of: Receiving the high resolution feature map, the medium resolution feature map, and the low resolution feature map; Performing downsampling processing on the low-resolution feature map through the downsampling layer, performing convolution operation on the medium-resolution feature map through a first third convolution layer, and performing upsampling and convolution operation on the high-resolution feature map through the upsampling layer and a second third convolution layer; splicing the processed high-resolution feature map, the processed medium-resolution feature map and the processed low-resolution feature map in the channel dimension through the splicing layer to obtain a spliced feature map; Performing depth-separable convolution operation on the spliced feature images through a plurality of parallel second depth-separable convolution layers to obtain a plurality of branch feature images, and reserving the spliced feature images through the identity layer; Element-by-element addition is carried out on the plurality of branch feature images and the reserved spliced feature images, and channel dimension linear transformation is carried out on the feature images subjected to element-by-element addition through the point-by-point convolution layer, so that a fused feature response image is obtained; And adding the fused characteristic response graph and the spliced characteristic graph through the second residual error connecting layer to obtain the output characteristic of the multi-scale characteristic focusing extraction network.
6. The method of claim 1, wherein the adaptive decomposition and alignment detection head comprises a shared feature extraction module and an improved task alignment head, the adaptive decomposition and alignment detection head performing the steps of: performing feature extraction on the input features of the self-adaptive decomposition and alignment detection head through the shared feature extraction module to obtain a shared feature map; And mapping the shared feature map to a classification feature subspace and a regression feature subspace through the improved task alignment head, classifying behaviors of the cattle in the classification feature subspace by combining a space attention mechanism and a probability modulation strategy, and driving the deformable convolution to position the cattle through space offset and weight mask in the regression feature subspace.
7. The method of claim 6, wherein the improved task alignment head comprises a task decoupling module, a classification branch, a regression branch and a feature fusion layer, wherein the classification branch comprises a classification probability convolution layer and a third depth separable convolution layer, and wherein the regression branch comprises an offset and mask prediction module; the improved task alignment head specifically performs the following operations: decomposing the shared feature map into an initial classification feature map and an initial regression feature map through the task decoupling module; Classifying the initial classification feature map through the classification probability convolution layer and the third depth separable convolution layer to obtain a target classification feature map; performing convolution prediction on the initial regression feature map through the offset and mask prediction module, generating a space offset and weight mask, and driving deformable convolution to obtain a target regression feature map; and integrating the target classification feature map and the target regression feature map through the feature fusion layer to obtain the behavior recognition result.
8. The method according to claim 1, wherein the method further comprises: Obtaining a sample image set of a grassland pasture, wherein the sample image set comprises a target cow behavior category and a target boundary box; performing iterative training on the cow behavior recognition model based on the sample image set to obtain a prediction category and a prediction boundary frame; Determining a first loss value between the behavior category of the target cow and the prediction category through a binary cross entropy loss function, and determining a second loss value between the target boundary box and the prediction boundary box through an Inner-MPDIoU loss function until the first loss value and the second loss value are converged, so as to obtain the trained cow behavior recognition model.

Description

Grassland pasture beef cattle behavior identification method based on four-foot robot vision Technical Field The invention relates to the technical field of target detection, in particular to a grassland pasture beef cattle behavior identification method based on four-foot robot vision. Background In the grassland pasture cultivation environment, the timely intervention of the health and welfare conditions of beef cattle has important significance for preventing disease transmission, improving treatment success rate and improving production performance of cattle groups. Wherein, the behavior of the cattle is visual reflection of the health state of the cattle, including eating, resting, exercise, licking and the like. For example, frequent licking of a cow may be associated with calving, thermoregulation or ectoparasites, and prolonged lying of a cow may reflect a disease or injury risk. Therefore, the method is beneficial to the manager to monitor the health of the cattle and evaluate the oestrus state through behavior recognition, so that the disease early warning is realized. At present, the intelligent collar, the intelligent foot ring, the electronic ear tag and other wearable equipment can be used for collecting movement and physiological data of the cow, such as the movement amount, the ruminant amount, the body temperature and the like, by utilizing various sensors, such as an accelerometer, a gyroscope, a temperature sensor and the like, and identifying the cow behaviors according to the collected movement and physiological data, but one wearable equipment can only monitor single individual behaviors at the same time. Disclosure of Invention In view of the above, the embodiment of the invention provides a grassland pasture beef cattle behavior recognition method based on four-foot robot vision, which can detect a plurality of beef cattle simultaneously through the four-foot robot. The technical scheme of the embodiment of the invention is as follows: The embodiment of the invention provides a grassland pasture beef cattle behavior identification method based on four-foot robot vision, which comprises the following steps: Receiving a grassland pasture image acquired by image acquisition equipment, wherein the image acquisition equipment is arranged on a quadruped robot, and the grassland pasture image comprises at least one cow; extracting features of the grassland pasture image and detecting the behaviors of the cow through the trained cow behavior recognition model to obtain a cow behavior recognition result, and generating a moving instruction according to the behavior recognition result, The motion instruction is used for indicating the quadruped robot to move, the trained cow behavior recognition model is obtained based on Yolo v s model improvement, the Yolo v s model comprises a backbone network, a neck network and a detection head which are sequentially connected, the backbone network comprises a StarNet network, an SPPF module and a C2PSA module which are sequentially connected, the neck network replaces a characteristic pyramid structure with a multi-scale characteristic focusing extraction network, and the detection head replaces an adaptive decomposition and alignment detection head; The StarNet network is used for fusing multi-channel characteristic information through element-by-element multiplication and mapping the multi-channel characteristic information to a new characteristic subspace, the multi-scale characteristic focusing extraction network is used for focusing the context information of the multi-scale characteristic map output by the backbone network, the self-adaptive decomposition and alignment detection head is used for decoupling detection tasks and aligning the characteristic information, and performing behavior classification and positioning on cattle in the grassland pasture image, and the multi-scale characteristic map at least comprises a high-resolution characteristic map, a medium-resolution characteristic map and a low-resolution characteristic map. In some embodiments, the backbone network is specifically configured to perform the following operations: Receiving the grassland pasture image through the StarNet network, and carrying out downsampling and multi-level feature extraction on the grassland pasture image; Receiving the feature map output by the StarNet network through the SPPF module, and performing multi-scale pooling operation on the feature map output by the StarNet network to obtain a pooled feature map; and carrying out feature enhancement on the pooled feature map by the C2PSA module through a spatial pyramid slice attention mechanism to obtain the low-resolution feature map. In some embodiments, the StarNet network comprises a first convolution layer and a plurality of levels, the plurality of levels comprises a first level, a second level, a third level and a fourth level, the first level, the second level and the fourth level each comprise a seco