CN-122024326-A - Wild animal target detection method and computing equipment based on lightweight structure

CN122024326ACN 122024326 ACN122024326 ACN 122024326ACN-122024326-A

Abstract

The invention provides a wild animal target detection method and computing equipment based on a lightweight structure, the method is executed on the edge computing equipment, the method comprises the steps of obtaining image data collected by a field infrared camera, preprocessing the image data, carrying out target detection on the preprocessed image data in real time by utilizing a pre-trained target detection model, and outputting a detection result comprising a target category, a predicted boundary frame coordinate and a predicted confidence coefficient, wherein the target detection model is based on a YOLO11 model and comprises a main network, a neck network and a detection head, and a sensitivity perception attention module is integrated at the output end of the main network and used for carrying out channel-level weighting on an input feature map. According to the technical scheme, the problems that the target image features of wild animals cannot be effectively extracted, secondary feature dominant gradient update exists in the traditional loss function, the quantity of model parameters is large, the calculation load is heavy, the real-time detection requirement of edge equipment cannot be met and the like are solved.

Inventors

Deng Shuchong
CHEN AIBIN

Assignees

中南林业科技大学

Dates

Publication Date: 20260512
Application Date: 20260211

Claims (10)

1. A method of wild animal target detection based on lightweight construction, the method being performed on an edge computing device, the method comprising: acquiring image data acquired by a field infrared camera; Preprocessing the image data; Performing target detection on the preprocessed image data in real time by utilizing a pre-trained target detection model, and outputting a detection result comprising a target category, a prediction boundary frame coordinate and a prediction confidence; Wherein, the The target detection model is based on a YOLO11 model and comprises a backbone network, a neck network and a detection head, a sensitivity perception attention module is integrated at the output end of the backbone network and used for carrying out channel-level weighting on an input feature map, the sensitivity perception attention module calculates the minimum energy value of a single neuron by using an adaptive regularization coefficient based on channel variance, the inverse of the minimum energy value is mapped into attention weight by a Sigmoid activation function, and the sensitivity perception of the input feature map is weighted by using the attention weight.
2. The method of claim 1, wherein the step of determining the position of the substrate comprises, The attention weight is obtained by the following formula: Wherein, the For the input feature map to be described, Weighted attention profile, E is all The energy matrix of (2), by which is meant the multiplication of the corresponding elements, In order to adapt the regularization coefficient, As a result of the predetermined basic regularization parameter, As the variance of the neurons in the channel, For the calculated minimum energy value, t is the value of the target neuron, Is the mean value of neurons in the channel.
3. The method of claim 1, wherein the neck network is a SlimNeck lightweight structure, the SlimNeck lightweight structure comprising feature fusion paths based on GSConv convolution and VoV-GSCSP modules for preserving multi-scale semantic information while reducing computation.
4. A method according to claim 3, wherein the GSConv convolution operation in the SlimNeck lightweight structure divides the input channel into two parts, performs standard convolution and depth separable convolution, and fuses the output through channel splicing and shuffling operations.
5. A method according to claim 3, wherein the VoV-GSCSP module adopts a single-path dense connection structure, comprises a plurality of GSBottleneck units connected in series, and is parallel to the direct-connection branch which retains the original input, and finally, two paths of characteristics are spliced and output through 1x 1 convolution compression.
6. The method of claim 1, wherein the detection head utilizes ASFL adaptive statistical focus loss functions.
7. The method of claim 6, wherein the ASFL adaptive statistical focus loss function utilizes statistical properties of the current lot to construct an adaptive decision threshold: Wherein, the The probability average is predicted for the current lot, Is the first in the batch The probability of prediction of the individual samples is determined, For the total number of samples in a batch, As a parameter of the focus of the sample, As a basis for the focus super-parameter, In order to adjust the coefficient of the power supply, For the class balancing parameter to be a class balancing parameter, For the predicted probability of a sample, Is the final loss value.
8. The method of claim 1, wherein the edge computing device comprises an infrared triggered camera deployed in the field, an embedded AI box, or a low power consumption mobile terminal.
9. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-8.
10. A computing device, comprising: Processor, and Memory storing a computer program which, when executed by the processor, implements the method according to any of claims 1-8.

Description

Wild animal target detection method and computing equipment based on lightweight structure Technical Field The invention relates to the technical field of artificial intelligence, in particular to a wild animal target detection method and computing equipment based on a lightweight structure. Background With the enhancement of the protection of global biodiversity, infrared camera technology has been widely laid and applied in various natural protection areas as a non-invasive investigation means. The technology has the advantages of all-weather operation, strong concealment, small interference to animals and the like, and can capture the moving images of rare and nocturnal wild animals for a long time under the unattended condition. The wild animal monitoring rapidly takes a 'big data' monitoring age from the traditional field investigation, and massive influence data are accumulated. However, explosive growth of data also presents a significant challenge for data processing. In the face of millions of field monitoring return data, the traditional manual screening and labeling mode has the defects that on one hand, the traditional manual screening and labeling mode is limited by the complexity of a field environment, the duty ratio of an idle shooting image triggered by factors such as wind blowing, fluctuation of light and shadow is extremely high, so that huge manpower resource waste is caused, on the other hand, manual auditing is time-consuming and labor-consuming and high in cost, and moreover, missed judgment or misjudgment is easily caused by visual fatigue generated by long-time operation, so that the monitoring data is seriously lagged. Therefore, the automatic wild animal target detection is realized by utilizing the advanced computer vision technology, and the method has extremely important significance for liberating labor cost and improving biodiversity investigation efficiency. At present, a single-stage target detection algorithm represented by YOLO (You Only Look Once) series is widely applied to the fields of industrial defect detection, automatic driving, face recognition and the like due to an end-to-end reasoning mode and good instantaneity. However, when the general YOLO model is directly migrated to a wild animal monitoring scene in the wild, problems still exist that the computing power and memory resources of wild equipment are limited, the model is too heavy for edge equipment, so that delay is high, power consumption is high, real-time monitoring requirements in the wild are difficult to meet, noise in the complex environment in the wild is high, target features of wild animals are covered by secondary background features, and the traditional model is guided by a large number of simple samples to learn directions, so that important target detection accuracy is limited. Therefore, a technical scheme is needed, so that the target image features of wild animals in the wild can be effectively extracted, and the requirement of real-time detection of edge equipment is met. Disclosure of Invention The invention aims to provide a wild animal target detection method and computing equipment based on a lightweight structure, and provides an attention module, which introduces a lightweight neck structure and a self-adaptive statistical focus loss function to solve the problems of limited computing power of field edge computing equipment, overlarge computing load of a traditional model and insufficient important model learning characteristics caused by extreme unbalance of difficult and easy samples in wild animal data, aiming at the problems of difficult feature extraction caused by complex field background and strong environmental noise interference. According to an aspect of the present invention, there is provided a wild animal target detection method based on a lightweight structure, the method being performed on an edge computing device, the method comprising: acquiring image data acquired by a field infrared camera; Preprocessing the image data; Performing target detection on the preprocessed image data in real time by utilizing a pre-trained target detection model, and outputting a detection result comprising a target category, a prediction boundary frame coordinate and a prediction confidence; Wherein, the The target detection model is based on a YOLO11 model and comprises a backbone network, a neck network and a detection head, a sensitivity perception attention module is integrated at the output end of the backbone network and used for carrying out channel-level weighting on an input feature map, the sensitivity perception attention module calculates the minimum energy value of a single neuron by using an adaptive regularization coefficient based on channel variance, the inverse of the minimum energy value is mapped into attention weight by a Sigmoid activation function, and the sensitivity perception of the input feature map is weighted by using the attention weight. According to so