CN-117058517-B - Safety helmet detection method, device and medium based on YOLOv optimization model

CN117058517BCN 117058517 BCN117058517 BCN 117058517BCN-117058517-B

Abstract

The application relates to a safety helmet detection method, a safety helmet detection device and a safety helmet detection medium based on YOLOv optimization models, wherein the safety helmet detection method based on YOLOv5 optimization models comprises the steps of obtaining safety helmet detection images, inputting the safety helmet detection images into YOLOv optimization models as training samples to train to obtain the safety helmet detection models, wherein a trunk structure, a neck structure and a head structure of YOLOv are obtained, the trunk structure comprises a plurality of CBS modules and a plurality of SwinT modules, each SwinT module is used for extracting layering characteristics of a second characteristic image which is output after the safety helmet detection images are processed by each CBS module, and staff images collected in a working scene are input into the safety helmet detection models to obtain safety helmet detection results. The problem that the existing YOLOv algorithm in the related technology is low in accuracy of small target detection is solved, and accuracy of detecting whether staff wear safety helmets in a power related work scene is improved.

Inventors

LIN XIANG
FANG JIAN
TIAN YAN
ZHANG MIN
YANG FAN

Assignees

广东电网有限责任公司广州供电局

Dates

Publication Date: 20260508
Application Date: 20230731

Claims (7)

1. A safety helmet detection method based on YOLOv optimization model, characterized in that the method comprises: Acquiring a safety helmet detection image; The safety helmet detection image is input into a YOLOv optimization model as a training sample to be trained to obtain a safety helmet detection model, wherein the YOLOv optimization model comprises a main structure, a neck structure and a head structure which are sequentially connected, the main structure is used for carrying out feature extraction on the safety helmet detection image to obtain a first feature map, the main structure comprises a plurality of CBS modules and a plurality of SwinT modules, each SwinT module is used for carrying out hierarchical feature extraction on a second feature map which is output after the safety helmet detection image is processed by each CBS module, the neck structure is used for carrying out multi-scale feature fusion on the first feature map to obtain a third feature map, and the head structure is used for carrying out prediction on the basis of the third feature map to obtain a detection result, wherein: The plurality of SwinT modules comprise a first SwinT module, a second SwinT module, a third SwinT module and a fourth SwinT module, the trunk structure further comprises an SE module, the SE module is connected to the output end of the third SwinT module, and the SE module is specifically configured to perform global average pooling operation on a sixth feature map output by the safety helmet detection image after being processed by the third SwinT module, output a one-dimensional vector, calculate a weight value through an excitation layer composed of two fully connected layers based on the one-dimensional vector, multiply the weight value by a pixel value of the sixth feature map, and obtain an output result; The main structure further comprises a convolution block attention module, wherein the convolution block attention module is connected to the output end of the fourth SwinT module, and the convolution block attention module comprises a space attention module and a channel attention module, wherein the channel attention module is used for adaptively correcting a seventh feature map which is output after the safety helmet detection image is processed by the fourth SwinT module so as to generate an eighth feature map, and the space attention module is used for correcting the eighth feature map so as to output a ninth feature map; The channel attention module is used for adaptively correcting a seventh feature map output by the safety helmet detection image after being processed by the fourth SwinT module to generate an eighth feature map, and the space attention module is used for correcting the eighth feature map to output a ninth feature map in the following calculation mode: Wherein the method comprises the steps of For convolution operation, F represents a seventh feature map, F' represents an eighth feature map, F″ represents a ninth feature map, F max C is a channel maximum pooling feature map, MLP is a multi-layer perceptron, maxPool () is a maximum pooling function, F avg C is a channel average pooling feature map, and AvePool () is an average pooling function; Is a Sigmoid activation function, f () is a standard convolution layer, and W C and W S are channel attention weights and spatial attention weights, respectively; and inputting the employee images acquired in the working scene into the safety helmet detection model to obtain a safety helmet detection result.
2. The method of claim 1, wherein each SwinT module comprises a normalization layer, a window-based multi-head self-attention layer, a moving window-based multi-head self-attention layer, and a multi-layer perceptron, wherein the SwinT module is configured to: The second feature image output after the safety helmet detection image is processed by the CBS module passes through the normalization layer, the multi-head self-attention layer of the window, the normalization layer and the multi-layer perceptron to obtain a fourth feature image; and the fourth characteristic diagram passes through the normalization layer, the multi-head self-attention layer of the moving window, the normalization layer and the multi-layer perceptron to obtain a fifth characteristic diagram.
3. The method of claim 1, wherein inputting the helmet detection image as a training sample into a YOLOv optimization model for training, and obtaining a helmet detection model comprises: Performing data enhancement processing on the safety helmet detection image by using a mosaic method; And inputting the safety helmet detection image with the enhanced data into a YOLOv optimization model for training to obtain a safety helmet detection model.
4. A method according to claim 3, wherein inputting the helmet detection image as a training sample into a YOLOv optimization model for training, and obtaining a helmet detection model comprises: Inputting the safety helmet detection image with the enhanced data into a YOLOv optimization model for training to obtain a training result; and adjusting model parameters based on the training result to perform iterative optimization so as to obtain a safety helmet detection model.
5. A helmet detection device based on optimization YOLOv, the device comprising: the acquisition module is used for acquiring a safety helmet detection image; The training module is used for inputting the safety helmet detection image as a training sample into a YOLOv optimization model to train to obtain a safety helmet detection model, wherein the YOLOv optimization model comprises a main structure, a neck structure and a head structure which are sequentially connected, the main structure is used for carrying out feature extraction on the safety helmet detection image to obtain a first feature map, the main structure comprises a plurality of CBS modules and a plurality of SwinT modules, each SwinT module is used for carrying out hierarchical feature extraction on a second feature map which is output after the safety helmet detection image is processed by each CBS module, the neck structure is used for carrying out multi-scale feature fusion on the first feature map to obtain a third feature map, and the head structure is used for carrying out prediction on the basis of the third feature map to obtain a detection result, wherein: The plurality of SwinT modules comprise a first SwinT module, a second SwinT module, a third SwinT module and a fourth SwinT module, the trunk structure further comprises an SE module, the SE module is connected to the output end of the third SwinT module, and the SE module is specifically configured to perform global average pooling operation on a sixth feature map output by the safety helmet detection image after being processed by the third SwinT module, output a one-dimensional vector, calculate a weight value through an excitation layer composed of two fully connected layers based on the one-dimensional vector, multiply the weight value by a pixel value of the sixth feature map, and obtain an output result; The main structure further comprises a convolution block attention module, wherein the convolution block attention module is connected to the output end of the fourth SwinT module, and the convolution block attention module comprises a space attention module and a channel attention module, wherein the channel attention module is used for adaptively correcting a seventh feature map which is output after the safety helmet detection image is processed by the fourth SwinT module so as to generate an eighth feature map, and the space attention module is used for correcting the eighth feature map so as to output a ninth feature map; The channel attention module is used for adaptively correcting a seventh feature map output by the safety helmet detection image after being processed by the fourth SwinT module to generate an eighth feature map, and the space attention module is used for correcting the eighth feature map to output a ninth feature map in the following calculation mode: Wherein the method comprises the steps of For convolution operation, F represents a seventh feature map, F' represents an eighth feature map, F″ represents a ninth feature map, F max C is a channel maximum pooling feature map, MLP is a multi-layer perceptron, maxPool () is a maximum pooling function, F avg C is a channel average pooling feature map, and AvePool () is an average pooling function; Is a Sigmoid activation function, f () is a standard convolution layer, and W C and W S are channel attention weights and spatial attention weights, respectively; The detection module is used for inputting the employee images acquired in the working scene into the safety helmet detection model to obtain a safety helmet detection result.
6. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the method of any one of claims 1 to 4 when executing the computer program.
7. A computer readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, implements the method of any one of claims 1 to 4.

Description

Safety helmet detection method, device and medium based on YOLOv optimization model Technical Field The application relates to the field of target detection, in particular to a safety helmet detection method, device and medium based on YOLOv optimization model. Background Currently, with the rapid development of society, demand for electric power is increasingly urgent, and work related to electric power is increasingly more and more. The security issue should also be a more focused issue. The safety helmet is used as protective equipment for the head of a person and is also indispensable safety protective equipment in electric work. But the safety helmet of the worker in the image is a relatively small target since the shop camera is usually placed at a high position. In addition, the problems of working environments such as weather, illumination, personnel density and the like put higher demands on the target detection algorithm used. Small target detection has been an important research topic in the field of computer vision target detection. The existing object detection methods are mainly of two types, namely a regression-based single-stage algorithm. The other is a two-stage algorithm based on candidate regions, which has higher real-time performance than a one-stage algorithm, but has slightly lower precision. YOLO (You Only Look Once) is a two-stage algorithm. YOLOv 5 is an excellent version of the YOLO series, published by Ultralytics in 2020. Compared with other versions, the method can be applied to wider fields and is more flexible, but the detection capability of the method on small targets is not outstanding, and the accuracy is not high. Aiming at the problem of low accuracy of small target detection based on YOLOv algorithm in the related art, no effective solution is proposed at present. Disclosure of Invention Based on the foregoing, it is necessary to provide a method, a device and a medium for detecting a helmet based on YOLOv optimization model. In a first aspect, an embodiment of the present application provides a method for detecting a helmet based on YOLOv optimization model, where the method includes: Acquiring a safety helmet detection image; The safety helmet detection image is used as a training sample to be input into a YOLOv optimization model for training to obtain a safety helmet detection model, wherein the YOLOv optimization model comprises a main structure, a neck structure and a head structure which are sequentially connected, the main structure is used for carrying out feature extraction on the safety helmet detection image to obtain a first feature map, the main structure comprises a plurality of CBS modules and a plurality of SwinT modules, each SwinT module is used for carrying out hierarchical feature extraction on a second feature map which is output after the safety helmet detection image is processed by each CBS module, the neck structure is used for carrying out multi-scale feature fusion on the first feature map to obtain a third feature map, and the head structure is used for carrying out prediction on the basis of the third feature map to obtain a detection result; and inputting the employee images acquired in the working scene into the safety helmet detection model to obtain a safety helmet detection result. In one embodiment, each SwinT module includes a normalization layer, a multi-head self-attention layer based on windows, a multi-head self-attention layer based on moving windows, and a multi-layer perceptron, and the SwinT module is specifically configured to: The second feature image output after the safety helmet detection image is processed by the CBS module passes through the normalization layer, the multi-head self-attention layer of the window, the normalization layer and the multi-layer perceptron to obtain a fourth feature image; and the fourth characteristic diagram passes through the normalization layer, the multi-head self-attention layer of the moving window, the normalization layer and the multi-layer perceptron to obtain a fifth characteristic diagram. In one embodiment, the plurality of SwinT modules includes a first SwinT module, a second SwinT module, a third SwinT module, and a fourth SwinT module, and the backbone structure further includes an SE module, where the SE module is connected to an output end of the third SwinT module, and the SE module is specifically configured to: Performing global average pooling operation on the sixth feature map which is output after the safety helmet detection image is processed by the third SwinT module, and outputting a one-dimensional vector; Calculating a weight value through an excitation layer consisting of two fully connected layers based on the one-dimensional vector; And multiplying the weight value by the pixel value of the sixth feature map to obtain an output result. In one embodiment, the backbone structure further includes a convolution block attention module, where the convolution block