CN-121982683-A - Method for detecting wearing of passenger safety belt in commercial vehicle based on video stream

CN121982683ACN 121982683 ACN121982683 ACN 121982683ACN-121982683-A

Abstract

The invention discloses a method for detecting wearing of a passenger safety belt in a commercial vehicle based on video stream time sequence feature fusion, which comprises the steps of adopting an improved YOLOv model to process a single-frame video image in a target detection stage, introducing a DIoU-NMS based on geometric constraint to replace a traditional NMS, utilizing double constraint of overlapping area and center point distance, accurately positioning dense crowds and extracting a region of interest (ROI) of the upper half of the passenger under the condition of independent time sequence information, and introducing video stream time sequence information in a classification and discrimination stage to construct a classification model based on time sequence feature fusion. And inputting the ROI of the continuous multiframes into the EFFICIENTNETV model after improvement to extract the characteristics, and comprehensively judging the wearing state of the safety belt by combining time sequence context information. The invention not only solves the problem of space shielding by DIoU-NMS, but also eliminates the classification shake caused by single frame image blurring or gesture mutation by using time sequence consistency, and obviously improves the detection robustness and accuracy under complex dynamic scenes with low calculation cost.

Inventors

SUN LIBO
TANG JIACHENG
ZHANG ZEYUN
QIN WENHU

Assignees

东南大学

Dates

Publication Date: 20260505
Application Date: 20260114

Claims (8)

1. The method is characterized by combining a target detection algorithm based on geometric constraint with an image classification strategy introducing time sequence information to solve the problems of target loss and single frame classification jitter caused by intensive shielding in a commercial vehicle, and comprises the following steps: Step S1, continuous video stream data of a commercial vehicle in different time periods and under different illumination and road conditions are acquired by using an on-line camera device of the commercial vehicle, images are acquired after the continuous video stream data are processed by a video frame extraction technology, a passenger target detection data set containing dense crowd scenes and a safety belt wearing detection data set containing time sequence related information are respectively constructed through manual arrangement and labeling, and the data sets are divided into a training set, a verification set and a test set; step S2, training a YOLOv passenger detection model which is introduced with a geometric constraint-based distance intersection ratio-non-maximum suppression DIoU-NMS post-processing technology by using the passenger target detection dataset constructed in the step S1, and obtaining the optimal weight parameters of the passenger detection model by using a double constraint optimization model of the overlapping area and the center point distance; Step S3, inputting a continuous video stream monitored in real time in the commercial vehicle into the passenger detection model obtained in the step S2, detecting passengers in video frames frame by frame, and extracting an ROI image sequence of the upper body region of interest of each passenger from the continuous video frames according to the high confidence detection frame coordinates; S4, training the improved EFFICIENTNETV depth convolutional neural network model by using the safety belt wearing detection dataset constructed in the step S1, and obtaining an optimal weight parameter of an image classification model capable of extracting fine features through transfer learning to serve as a safety belt wearing feature extractor; and S5, inputting the continuous multi-frame passenger upper body ROI image extracted in the step S3 into the EFFICIENTNETV depth convolution neural network model trained in the step S4 to extract the feature vector, introducing a video stream time sequence analysis strategy, and carrying out weighted fusion and comprehensive judgment on the wearing state of the safety belt by combining the time sequence context information of the continuous frames so as to output a wearing detection result of the safety belt.
2. The method for detecting the wearing of the seat belt by the passenger in the vehicle based on the video stream according to claim 1, wherein the process of constructing the data set in the step S1 includes: s1-1, installing a wide-angle high-definition camera at the top of a commercial vehicle, recording an in-vehicle picture after the vehicle is started in real time by using a Python script, covering different illumination conditions in the daytime and at night, sorting and storing the in-vehicle picture as in-vehicle passenger video data, screening a representative in-vehicle video, and acquiring a passenger target detection data set image according to fixed intervals by using a video frame extraction script; S1-2, constructing a Passenger detection dataset, namely marking a rectangular frame of a Passenger upper body area in a picture by using a LabelImg tool, setting a label as Passenger, generating a corresponding txt format label file, and enabling the file name to be consistent with the picture name to obtain the Passenger target detection dataset; S1-3, constructing a safety belt wearing image classification data set, namely preparing a plurality of continuous video frame sequences containing the upper body area of a passenger, ensuring that samples of the safety belt worn and not worn under different illumination and clothes background are covered, extracting the frame sequences from an original video at equal intervals and storing the frame sequences in a named mode of 'frame number. Jpg', manually screening and classifying the frame sequences into 'SeatBelt' and 'No SeatBelt', finally recording a folder path, a sampling initial frame and a classification result of each sequence in a designated TXT index file to obtain the safety belt wearing image classification data set containing time sequence information, and dividing the safety belt wearing image classification data set into a training set, a testing set and a verification set according to the proportion of 7:2:1.
3. The method for detecting the wearing of the vehicle passenger safety belt based on the video stream according to claim 1, wherein in the step S2, training is performed on a YOLOv network integrated with a distance cross ratio-non-maximum suppression DIoU-NMS post-processing technology by using the passenger detection data set constructed in the step S1, the DIoU-NMS optimizes a target frame screening mechanism by means of the overlapping area and the center point distance between candidate frames, and finally obtains the optimal weight parameters of a passenger detection model through 100 rounds of iterative training.
4. The video stream-based in-vehicle occupant seat belt wear detection method according to claim 1, wherein the distance cross ratio-non-maximum suppression DIoU-NMS is applied to each detected target Calculate it and other targets Is shown in formula (1) as a distance intersection ratio DIoU; (1) Wherein, the The cross ratio of the two frames; representing two prediction frame center points And (3) with A Euclidean distance between them; Is the diagonal length of the minimum closure area that can cover both frames; Under DIoU-NMS, when the overlapping area of two frames is large, it is further judged that their center points are far from each other, that if IoU of two frames is high but the center points are far away, DIoU-NMS tends to consider these as two overlapping different targets, thereby preserving them, if IoU of two frames is high and the center points almost coincide, it is judged that the frames are redundant, the frames with low scores are removed logically according to the formula (2), if IoU of them is large but the center points are far away, that is, the calculated value of the formula (1) is smaller than Equation (2) will preserve the score of the box, indicating that they are two separate individuals with different physical locations, whereas if IoU is large and the center point is extremely close, equation (1) calculates a value greater than Judging the repeated frames of the same target and resetting the score of the repeated frames; (2) Wherein, the Is a set threshold.
5. The method for detecting the wearing of the passenger safety belt in the commercial vehicle based on the video stream according to claim 1, wherein in the step S3, passenger target detection and region extraction are carried out by adopting a passenger detection model based on YOLOv, the passenger target detection and region extraction is carried out by adopting an in-vehicle monitoring video frame which is decoded in real time and is output as an upper body ROI region image of the passenger, multiscale characteristics of the image are extracted by utilizing a CSPDARKNET main network and PANet of YOLOv, a prediction tensor comprising the confidence degree of the passenger, the class probability and the position coordinates of a boundary frame is generated, the candidate frames are screened by adopting a DIoU-NMS algorithm in a post-processing stage, blocked passenger targets are effectively distinguished by comprehensively calculating IoU values and the center point Euclidean distance between the candidate frames while the spatial positions are effectively distinguished to ensure the accuracy of passenger counting in a dense scene, and finally, the upper body region of the passenger is cut out according to the screened boundary frame coordinates (c x ,c y , w, h) to extract the upper body region of the passenger and unify the upper body region to be scaled to a fixed size as the input of a subsequent wearing state classification model of the safety belt.
6. The method for detecting the wearing of the seat belt by the passenger in the vehicle based on the video stream according to claim 1, wherein the training of the improved EFFICIENTNETV in step S4 specifically involves: S4-1, inputting the safety belt wearing image classification dataset in the step S1 into an improved EFFICIENTNETV feature extraction trunk, and outputting a high-dimensional classification feature vector with space details and time context; S4-2, inputting the high-dimensional classification feature vector output in the step S4-1 into a multi-layer perceptron MLP classification head at the tail end of the model, and outputting a classification probability value of whether the passenger wears a safety belt or not in the current frame through full-connection layer mapping and Softmax activation; S4-3, in a training stage, error of a prediction result and a real label is calculated by adopting a cross entropy loss function, weight parameters of a T-FusedConv module and an MLP layer are adjusted by utilizing an Adam optimizer based on error back propagation, accuracy and recall rate are evaluated by using a verification set after each round of iteration is finished, and model parameters which are optimal in performance on a time sequence classification task are reserved.
7. The method for detecting the wearing of the vehicle occupant safety belt based on the video stream according to claim 6, wherein the improvement of EFFICIENTNETV2 in the step S4-1 is to replace a convolution module in the backbone network with a T-FusedConv module fusing timing information; The T-FusedConv module firstly realizes time sequence interaction through TSM Block, divides the channel dimension of an input feature map into two parts through dividing operation, extracts 1/8 channel features into a feature buffer path, stores the feature buffer path in a feature buffer pool, keeps the remaining 7/8 channel features unchanged, and when a current frame is processed, splices the 7/8 features of the current frame with the 1/8 features of the previous frame extracted from the feature buffer pool through splicing operation, as shown in a formula (3): (3) Wherein, the Is shown at the present time Outputting a characteristic diagram after time sequence interaction fusion; representing a splice operation performed in a channel dimension; Representing the current Time-of-day reserved without timing shift Channel characteristics; Representing the last frame extracted from a feature cache pool Cached at the moment Channel characteristics; The time sequence fusion characteristic after interaction then enters a space characteristic extraction path and sequentially passes through a 3X 3 convolution layer, an SE attention module and a 1X 1 point convolution layer, the SE module adaptively rescales characteristic channel response by modeling the dependency relationship among channels, and the process is expressed as a formula (4): (4) Wherein, the The output characteristic diagram after the SE module self-adaptive recalibration is represented; Representing a channel-by-channel weighted scaling operation; A feature map representing the input to the module; Representing a Sigmoid activation function; representing excitation operations for fitting nonlinear relationships between channels; Representation pair Channel statistics containing global information obtained after compression operation; A set of weight parameters representing the fully connected layers in the excitation operation; Finally, residual connection is carried out on the characteristic branch subjected to convolution and attention processing and the original input branch, and a high-dimensional classification characteristic vector with space detail and time sequence context is output.
8. The method for detecting the wearing of the vehicle occupant safety belt based on the video stream according to claim 6, wherein in the step S4-3, the maximum iteration number epoch of the training phase is 100, the input binarization mask picture size is 224 x 224, the number of pictures input per model is 32, the model learning rate is 0.0001, the weight parameter is adjusted by using the cross entropy loss function, and the calculation formula is as formula (5): (5) Wherein the method comprises the steps of Is a real tag that is not a real tag, Is the probability of model prediction.

Description

Method for detecting wearing of passenger safety belt in commercial vehicle based on video stream Technical Field The invention relates to the technical fields of computer vision, target detection, image classification and intelligent traffic safety monitoring, in particular to a method for detecting the wearing of a passenger safety belt in a commercial vehicle based on video stream time sequence feature fusion, which aims to solve the shielding and blurring problems in a complex vehicle environment through a computer vision and deep learning technology and accurately monitor and judge the wearing state of the passenger safety belt so as to improve the running safety and intelligent management level of the commercial vehicle. Background The commercial vehicle is used as an important carrier for road transportation, and the operation safety of the commercial vehicle is directly related to public safety and social stability. In traffic accidents, the safety belt is known as a life belt, and passengers can be effectively prevented from being thrown out of the vehicle or secondarily collided with objects in the vehicle during emergency braking or collision. With the rapid development of Intelligent Transportation Systems (ITS) and internet of vehicles technology, commercial vehicle supervision is gradually transformed into digital, visual and intelligent directions. Through accurately monitoring the wearing condition of the safety belt of the passenger in the vehicle, the driving safety management level can be obviously improved, the traffic accident casualties rate is reduced, and the increasingly strict traffic regulation requirements are met. The wearing state of the safety belt is an important index in the safety monitoring in the vehicle, and particularly, in the long-distance running process, the problems of the safety belt being unfastened halfway by passengers, false wearing or difficult wearing state identification caused by clothing shielding and the like directly affect the effectiveness of a monitoring system. The traditional manual spot check method is low in efficiency, cannot cover the whole road section, and is easy to disturb the attention of a driver or cause the objection of passengers. With the expansion of the operation scale of commercial vehicles and the improvement of safety standards, it is important to establish a scientific, real-time and non-invasive automatic detection method for wearing safety belts so as to better realize intelligent safety management of drivers and passengers. In recent years, deep learning techniques have made remarkable progress in the field of computer vision, particularly in the task of object detection and image classification, exhibiting powerful performance. The method has three major problems in commercial vehicle scenes, namely, serious mutual shielding and spatial close proximity exist among passengers in a complex environment with dense personnel in a commercial vehicle, effective targets are easy to be deleted by mistake due to non-maximum suppression (NMS) of a traditional algorithm, detection accuracy is relatively low, a simple static image detection technology lacks time sequence association information between frames of a video stream and is influenced by illumination change or vehicle jolt, a detection result is easy to flicker or misjudge (such as mistaken clothes texture as a safety belt), and the existing high-precision model is huge in calculation amount and difficult to run in real time on a vehicle-mounted terminal device with limited calculation force, so that the method is widely applied to practical operating vehicles. In order to solve the defects in the prior art, the invention provides a method for detecting the wearing of a passenger safety belt in a commercial vehicle based on video stream time sequence feature fusion. The method first uses YOLOv model that introduces DIoU-NMS (distance cross ratio-non-maximum suppression) based on geometric constraint to improve post-processing, and accurately detects and locates dense passengers in the vehicle. DIoU-NMS can effectively distinguish passenger targets with close spatial positions by combining double constraint of overlapping area and center point distance, and extract the upper body region of interest (ROI) of each passenger. Next, the present invention innovatively constructs a modified version EFFICIENTNETV of the classification network that incorporates a Time Shift Module (TSM). The model takes a continuous multi-frame passenger ROI image sequence as input, and utilizes TSM to realize interaction of time dimension information under the condition of zero parameter increase, so that dynamic characteristics and time sequence consistency of wearing of the safety belt are effectively captured. The invention improves YOLOv target detection algorithm and cascades the target detection algorithm and a time sequence feature fusion classification network, solves the problem of dense shiel