CN-120689845-B - Driving distraction detection model and method

CN120689845BCN 120689845 BCN120689845 BCN 120689845BCN-120689845-B

Abstract

The invention belongs to the technical field of safe driving of automobiles, and particularly discloses a driving distraction detection model which comprises an image acquisition module, an image processing module, a feature vector extraction module and a feature vector dimension reduction output module, wherein the image acquisition module is used for acquiring RGB images, the image processing module is used for processing the RGB images based on Sobel operators to acquire edge images, the feature vector extraction module is used for carrying out convolution operation on the edge images to acquire feature vectors, the feature vector dimension reduction output module is used for reducing feature vector dimensions and outputting prediction results, and the driving distraction detection method is used for converting the RGB images containing the gestures of a driver into the edge images based on the detection model, so that background noise is low, only the gestures of the driver related to tasks are reserved, powerful support is provided for the improvement of training and reasoning speeds of pictures, a convolution layer with a convolution kernel size of 6 multiplied by 6 is used, and the calculation amount and the risk of fitting are reduced while the situation that global information is captured by a larger receptive field is ensured.

Inventors

WANG ZHEN
SHENG XING
ZHANG JUNZHE
DUAN ZONGTAO
CAO JIANRONG

Assignees

长安大学

Dates

Publication Date: 20260512
Application Date: 20250429

Claims (8)

1. A driving distraction detection system, comprising: the image acquisition module is used for acquiring RGB images containing the gesture of the driver; The image processing module is used for carrying out image edge processing on the RGB image based on a Sobel operator to obtain an edge image; the feature vector extraction module comprises a plurality of feature extraction units, wherein each feature extraction unit sequentially carries out convolution operation on the edge image to obtain a feature vector; The feature vector dimension reduction output module is used for reducing the feature vector dimension and outputting a prediction result; The multi-scale feature enhancement module is connected with the feature vector extraction module; The multi-scale feature enhancement module comprises a plurality of different branch convolution layers, a convolution feature vector and a multi-scale depth separable convolution feature vector are obtained, feature fusion is carried out on the convolution feature vector and the multi-scale depth separable convolution feature vector, and an enhancement feature vector is obtained; The feature fusion is to connect the convolution feature vector and the multi-scale depth separable convolution feature vector by adopting a residual connection method; The multi-scale feature enhancement module comprises a first branch convolution layer, a second branch convolution layer and a third branch convolution layer, wherein the first branch convolution layer is a convolution with a convolution kernel size of 1 multiplied by 1, the second branch convolution layer is composed of a depth convolution with a convolution kernel size of 3 multiplied by 3 and a point convolution with a convolution kernel size of 1 multiplied by 1, and the third branch convolution layer is composed of a depth convolution with a convolution kernel size of 5 multiplied by 5 and a point convolution with a convolution kernel size of 1 multiplied by 1; Assuming that x 1 ' is the eigenvector output by the first branch convolution layer, x 2 ' is the eigenvector output by the second branch convolution layer, and x 3 ' is the eigenvector output by the third branch convolution layer, the enhancement eigenvector x is: ; Wherein, the And Representing the element multiply and add operations, respectively.
2. A driving distraction detection system according to claim 1, wherein each of said feature extraction units performs at least normalization and averaging pooling of said edge images, respectively.
3. The driving distraction detection system according to claim 1, wherein the feature extraction unit comprises a first feature extraction unit, a second feature extraction unit, a third feature extraction unit, and a fourth feature extraction unit, which are sequentially provided; The first feature extraction unit, the second feature extraction unit, the third feature extraction unit and the fourth feature extraction unit all comprise a convolution layer, a normalization layer, an excitation layer and an average pooling layer which are sequentially arranged; And in the first feature extraction unit, the second feature extraction unit, the third feature extraction unit and the fourth feature extraction unit, the convolution kernel number of the convolution layer is sequentially increased, and the activation function adopted by the excitation layer is a ReLU function.
4. A driving distraction detection system according to claim 3, wherein the convolution kernel of the first feature extraction unit is 32 and the convolution kernel size is 6x 6; The convolution kernel number of the second feature extraction unit is 64, and the convolution kernel size is 6×6; The convolution kernel number of the third feature extraction unit is 128, and the convolution kernel size is 6 multiplied by 6; the convolution kernel number of the fourth feature extraction unit is 256, and the convolution kernel size is 6×6.
5. The driving distraction detection system of claim 4, wherein the feature vector dimension reduction output module comprises a pooling layer and a full-connection layer, the pooling layer and the full-connection layer being sequentially connected with the multi-scale feature enhancement module for performing adaptive maximum pooling and full-connection operations on the enhanced feature vector to reduce the feature vector dimension from 256 to n, where n is the number of detection categories to be predicted.
6. A driving distraction detection method, characterized by applying the driving distraction detection system according to any one of claims 1 to 5, comprising the steps of: S1, acquiring an RGB image containing the gesture of a driver; s2, performing image edge processing on the RGB image through a Sobel operator to obtain an edge image; S3, inputting the edge images into a feature extraction unit in sequence to obtain feature vectors; and S4, outputting the types of distracted driving after carrying out self-adaptive maximum pooling and full-connection operation on the acquired characteristic information.
7. The driving distraction detection method according to claim 6, wherein the step S3 of obtaining the feature vector specifically comprises: The method comprises the steps of carrying out convolution operation on an edge image for 1 time by using 32 convolution kernels with the convolution kernel size of 6 multiplied by 6 and the convolution with the activation function of ReLU and the average pooling, carrying out convolution operation for 1 time by using 64 convolution kernels with the convolution kernel size of 6 multiplied by 6 and the convolution with the activation function of BatchNorm and the average pooling, carrying out convolution operation for 1 time by using 128 convolution kernels with the convolution kernel size of 6 multiplied by 6 and the convolution with the activation function of ReLU and the activation function of BatchNorm and the average pooling, carrying out convolution operation for 1 time by using 256 convolution kernels with the convolution kernel size of 6 multiplied by 6 and the convolution with the activation function of ReLU and the activation function of BatchNorm and the average pooling, and obtaining a feature vector.
8. The driving distraction detection method according to claim 7, wherein the feature vector obtained in the step S3 is sent to a multi-scale feature enhancement module, and feature fusion is performed on the feature vector extracted from the first branch and the feature vector obtained from the second two branches, in which element multiplication is performed on the feature vector obtained from the second two branches to obtain an intermediate feature vector, and element addition is performed on the intermediate feature vector and the feature vector extracted from the first branch, so that the captured motion features of the hands and the body of the driver are used for classifying the distraction behavior.

Description

Driving distraction detection model and method Technical Field The invention belongs to the technical field of safe driving of automobiles, and particularly relates to a driving distraction detection model and method. Background According to the report of global road safety conditions of 2023 issued by the World Health Organization (WHO), the number of road traffic deaths per year is somewhat reduced, but 119 tens of thousands of people still die only in 2021, and road traffic remains the main killer for children and young adults 5-29 years. From traffic accident data counted by various countries and caused by drivers, the most common traffic behaviors with larger harm are mainly overspeed driving, driving under the influence of alcohol or mental medicines and distracted driving. From the statistical data, the accidents caused by distracted driving account for a large percentage. The highway safety administration (NHTSA) defines distraction driving as a particular type of distraction of the driver from the driving task to another activity during driving. Casualties and economic losses caused by distracted driving are countless each year. It is observed that these accidents are completely avoided if the driving anomalies of the driver can be found in time and the driver can be appropriately reminded in time. Therefore, many automobiles are currently equipped with advanced driving assistance models (ADAS), such as Lane Departure Warning (LDW) and Front Collision Warning (FCW), and the like. It is currently popular to monitor the behavior of the driver from RGB images, and such a contactless device can prevent interference with the driver due to human factors. With the development of computer vision technology, the video-based method has a faster and faster processing speed, and is rapidly a viable scheme for automatic driving. And this approach can also be applied to other tasks in intelligent traffic models (ITS), such as fatigue and driving attention detection, driver skeleton detection, in-vehicle child retention detection, etc. With the rapid development of deep learning and computer vision, many researchers have focused on studying different types of Distraction Driving Detection (DDD). In recent years, a deep learning method has been widely used for problems such as image classification, detection, and segmentation. Compared with the traditional algorithm, the deep learning has better performance and precision. Many studies have been made on the deep convolution method, and many models with superior performance have been proposed, such as LeNet, alexNet, VGG and VGG19, gooLeNet, resNet. Although they all achieve good results, these network models are too large and have excessive parameter volumes, and are not suitable for real-time processing in embedded models. Disclosure of Invention The invention aims to overcome the defects in the prior art and provides a driving distraction detection model and a driving distraction detection method. In a first aspect of the present invention, there is provided a driving distraction detection model comprising: Comprising the following steps: the image acquisition module is used for acquiring RGB images containing the gesture of the driver; The image processing module is used for carrying out image edge processing on the RGB image based on a Sobel operator to obtain an edge image; the feature vector extraction module comprises a plurality of feature extraction units, wherein each feature extraction unit sequentially carries out convolution operation on the edge image to obtain a feature vector; and the feature vector dimension reduction output module is used for reducing the feature vector dimension and outputting a prediction result. Further, each feature extraction unit at least completes normalization processing and average pooling of the edge image. The feature extraction unit comprises a first feature extraction unit, a second feature extraction unit, a third feature extraction unit and a fourth feature extraction unit which are sequentially arranged; The first feature extraction unit, the second feature extraction unit, the third feature extraction unit and the fourth feature extraction unit all comprise a convolution layer, a normalization layer, an excitation layer and an average pooling layer which are sequentially arranged; And in the first feature extraction unit, the second feature extraction unit, the third feature extraction unit and the fourth feature extraction unit, the convolution kernel number of the convolution layer is sequentially increased, and the activation function adopted by the excitation layer is a ReLU function. The convolution kernel number of the first feature extraction unit is 32, and the convolution kernel size is 6 multiplied by 6; The convolution kernel number of the second feature extraction unit is 64, and the convolution kernel size is 6×6; The convolution kernel number of the third feature extraction unit is 128, and the convo