CN-121982627-A - Lightweight human shape detection method, device, medium and product based on improved RTDETR

CN121982627ACN 121982627 ACN121982627 ACN 121982627ACN-121982627-A

Abstract

The invention provides a lightweight humanoid detection method, equipment, medium and product based on an improved RTDETR, which comprises the steps of obtaining a data set aiming at humanoid detection, dividing the data set into a training set, a verification set and a test set, constructing an improved RTDETR model, using ES Block to replace Basic Block in a RTDETR characteristic extraction network before improvement in the improved RTDETR model, using NWD Loss as a Loss function of the improved RTDETR model, obtaining a trained improved RTDETR model after training the improved RTDETR model by using the training set, the verification set and the test set, inputting an image to be detected into the trained improved RTDETR model for reasoning, and obtaining a humanoid detection result of the image to be detected. By utilizing the technical scheme, the human shape detection precision and the detection of the small target human shape under the complex security monitoring scene can be improved.

Inventors

GUO LIPENG
HUANG JINHU
HUANG WUSONG

Assignees

厦门星纵物联科技有限公司

Dates

Publication Date: 20260505
Application Date: 20251219

Claims (8)

1. A lightweight humanoid detection method based on an improved RTDETR, comprising: Acquiring a data set aiming at human shape detection, and dividing the data set into a training set, a verification set and a test set; Constructing an improved RTDETR model, wherein the improved RTDETR model comprises an improved RTDETR feature extraction network, wherein an ES Block is used in the improved RTDETR feature extraction network to replace Basic Block in the pre-improved RTDETR feature extraction network, and the ES Block comprises EFFICIENTVIT BLOCK and SimAM attention modules connected in sequence; Using NWD Loss as a Loss function of the modified RTDETR model; After training the improved RTDETR model with a training set, validating with a validation set, and testing with a testing set, obtaining a trained improved RTDETR model; Inputting the image to be detected into the trained improved RTDETR model for reasoning, and obtaining the humanoid detection result of the image to be detected.
2. The method for lightweight human detection according to claim 1, wherein the improved RTDETR model comprises Conv module, maxPool and a plurality of ES blocks, and wherein the step of inputting the image to be detected into the trained improved RTDETR model for reasoning to obtain human detection results of the image to be detected comprises: After the image to be detected passes through the Conv module and MaxPool, a feature map is output ; Map the characteristic map After a plurality of ES blocks are input, a feature map P is obtained; Inputting simAM the feature map P to an attention module, and the simAM attention module dynamically adjusts weight information corresponding to each pixel in the feature map P by calculating similarity information between each pixel in the feature map P and surrounding pixels, and outputting a multi-scale feature map.
3. The lightweight humanoid detection method of claim 2, wherein the improved RTDETR model further includes, when reasoning: the improved RTDETR feature extraction network inputs the extracted and output multi-scale feature map to a hybrid encoder module comprising a AIFI module and a CCFM module connected in sequence; After the multi-scale feature map performs intra-scale feature interaction through the AIFI module, the interacted feature map is input into the CCFM module for cross-scale feature fusion, after the query selection with minimum uncertainty, the coordinates and the confidence level of the targets are decoded, the feature information of the first N targets is selected according to the confidence level to initialize the target query, and the confidence level and the coordinate frame of the prediction category are output after the decoder.
4. The lightweight humanoid detection method of claim 1, wherein the step of acquiring a dataset for humanoid detection includes: acquiring a data set aiming at human shape detection from open source data and security monitoring scene data, wherein the data set comprises people in different scenes, marking and processing coordinates and categories of the people into a COCO data format by using a preset tool, and dividing the data set into a training set, a verification set and a test set according to proportion.
5. The lightweight humanoid detection method of claim 4, further comprising: dynamic blur, color change and/or noise are added to the training data used in training the improved RTDETR model for data enhancement.
6. An electronic device comprising a memory and a processor, the memory storing at least one program, the at least one program being executed by the processor to implement the steps of the lightweight humanoid detection method of any one of claims 1 to 5.
7. A computer readable storage medium, characterized in that at least one program is stored in the storage medium, the at least one program being executed by a processor to implement the steps of the lightweight humanoid detection method of any one of claims 1 to 5.
8. A computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements the steps of the lightweight humanoid detection method as claimed in any one of claims 1 to 5.

Description

Lightweight human shape detection method, device, medium and product based on improved RTDETR Technical Field The invention relates to the technical field of humanoid detection, in particular to an improved RTDETR-based lightweight humanoid detection method, equipment, medium and product. Background The humanoid detection is an important technology in the technical field of security monitoring, and is helpful for timely finding abnormal behaviors through accurate detection and intelligent tracking of humanoid, triggering alarm and pushing abnormal information to users, so that the abnormal situation can be responded quickly. Along with the development of deep learning and the continuous enrichment of data sets, the humanoid detection technology has significantly advanced, but in practical application, the characteristics of humanoid targets are easily influenced by a plurality of factors such as weather, illumination, shielding, distance and the like along with the complexity and diversification of scenes, so that the algorithm performance is seriously influenced. Under severe weather conditions, the video picture can be interfered by rain, snow, fog and the like, the visibility of the human body is reduced, under the environments of infrared, low illumination, light reflection and the like, the video picture can be interfered by noise, winged insects and the like, the human body edge is fuzzy, and in shielding and small target scenes, the human body target features are seriously lost, so that false detection and missing detection situations are caused. In particular, in a security monitoring system, the calculation power and the memory of an AI chip are limited, the parameter number and the calculation amount of a humanoid detection model are severely limited, and the algorithm performance faces a serious challenge. The RTDETR model utilizes the self-attention mechanism of the transducer model, so that the algorithm can capture global context information, and can directly predict from images to bounding boxes and class labels, thereby realizing a real-time end-to-end target detector, and compared with the mainstream YOLO target detection, RTDETR has better adaptability and generalization capability under complex weather, illumination, shielding and other environments without additional anchor boxes and complex post-processing steps. However, in the security monitoring scene, a multi-scale humanoid target, complex and changeable weather and scenes and limited low-power consumption edge equipment still have great test on the actual engineering of the algorithm. Therefore, the lightweight humanoid detection method and the system suitable for the low-power-consumption edge equipment are provided, the humanoid detection precision under a complex scene and the detection of a small target humanoid are improved, and the method and the system are particularly important in the technical field of security monitoring. Disclosure of Invention The embodiment of the invention provides an improved RTDETR-based lightweight human shape detection method, equipment, medium and product, which are used for improving human shape detection precision and small target human shape detection in a security monitoring complex scene. To achieve the above object, in one aspect, there is provided an improved RTDETR-based lightweight humanoid detection method, including: Acquiring a data set aiming at human shape detection, and dividing the data set into a training set, a verification set and a test set; Constructing an improved RTDETR model, wherein the improved RTDETR model comprises an improved RTDETR feature extraction network, wherein an ES Block is used in the improved RTDETR feature extraction network to replace Basic Block in the pre-improved RTDETR feature extraction network, and the ES Block comprises EFFICIENTVIT BLOCK and SimAM attention modules connected in sequence; Using NWD Loss as a Loss function of the modified RTDETR model; After training the improved RTDETR model with a training set, validating with a validation set, and testing with a testing set, obtaining a trained improved RTDETR model; Inputting the image to be detected into the trained improved RTDETR model for reasoning, and obtaining the humanoid detection result of the image to be detected. Preferably, the improved RTDETR model comprises a Conv module, maxPool and a plurality of ES blocks, wherein the step of inputting the image to be detected into the trained improved RTDETR model to be inferred, and the step of obtaining the human shape detection result of the image to be detected comprises the following steps: After the image to be detected passes through the Conv module and MaxPool, a feature map is output ; Map the characteristic mapAfter a plurality of ES blocks are input, a feature map P is obtained; Inputting simAM the feature map P to an attention module, and the simAM attention module dynamically adjusts weight information corresponding to each pixel