CN-116994223-B - Target detection method, model, early warning method, vehicle-mounted equipment and storage medium

CN116994223BCN 116994223 BCN116994223 BCN 116994223BCN-116994223-B

Abstract

The embodiment of the application provides a target detection method, a target detection model, an intelligent auxiliary driving early warning method, vehicle-mounted equipment and a computer readable storage medium, wherein the target detection method comprises the steps that an input layer acquires an infrared image to be identified; the method comprises the steps of carrying out multi-channel feature extraction on an infrared image to be identified by utilizing grouping convolution, learning channel weights of the multi-channel features by utilizing a channel attention mechanism, carrying out weighting treatment on the multi-channel features according to the corresponding channel weights, respectively connecting a plurality of head networks with the main network, carrying out feature fusion on the multi-channel features output by the main network by utilizing channel aliasing by each head network according to different groups, determining the types and positions of targets with different sizes in the infrared image to be identified, and correspondingly connecting a plurality of output layers with the plurality of head networks, and outputting target detection results containing the types and positions corresponding to the targets.

Inventors

LI GANGQIANG
QIN HUA
XU ZHAOFEI

Assignees

烟台艾睿光电科技有限公司

Dates

Publication Date: 20260508
Application Date: 20220424

Claims (12)

1. A method of detecting an object, comprising: the input layer acquires an infrared image to be identified; The method comprises the steps of connecting a backbone network with an input layer, extracting multi-channel characteristics of an infrared image to be identified by utilizing grouping convolution, learning channel weights of the multi-channel characteristics by utilizing a channel attention mechanism, and carrying out weighting processing on the multi-channel characteristics according to the corresponding channel weights, wherein a first GhostSENeck structure is formed by adopting a first ordinary convolution Ghost module and a first depth separable convolution Ghost module with the number of first convolution kernels, a channel attention mechanism module SENet is added between the first ordinary convolution Ghost module and the first depth separable convolution Ghost module, a second GhostSENeck structure is formed by adopting a second ordinary convolution Ghost module and a second depth separable convolution Ghost module with the number of second convolution kernels, a channel attention mechanism module SENet is added between the second ordinary convolution Ghost module and the second depth separable convolution Ghost module, and the network is constructed by utilizing the first GhostSENeck structure and the second backbone GhostSENeck structure; The head networks are respectively connected with the main network, and each head network utilizes channel aliasing to perform feature fusion on the multi-channel features output by the main network according to different groups to determine the types and the positions of targets with different sizes in the infrared image to be identified; and the plurality of output layers are respectively and correspondingly connected with the plurality of head networks, and output target detection results containing the category and the position corresponding to the target.
2. The method of claim 1, wherein the plurality of head networks are respectively connected to the backbone network, each of the head networks performs feature fusion on the multi-channel features output by the backbone network according to different groups by using channel aliasing, and determining the category and the position of the targets with different sizes in the infrared image to be identified includes: the first head network is connected with an output channel of a first dimension feature matrix in the backbone network, the channel aliasing is utilized to perform feature fusion on the multi-channel features contained in the first dimension feature matrix according to different groups, and the category and the position of a first dimension target in the infrared image to be identified are determined; the second head network is connected with an output channel of a second dimension feature matrix in the main network, and channel aliasing is utilized to perform feature fusion on the multi-channel features contained in the second dimension feature matrix according to different groups, so as to determine the category and the position of a second dimension target in the infrared image to be identified; the third head network is connected with an output channel of a third three-dimensional feature matrix in the main network, and channel aliasing is utilized to perform feature fusion on the multi-channel features contained in the third three-dimensional feature matrix according to different groups, so that the category and the position of a target with a third size in the infrared image to be identified are determined; The dimensions of the first dimension feature matrix, the second dimension feature matrix and the third dimension feature matrix are sequentially reduced, and the first dimension, the second dimension and the third dimension are sequentially increased.
3. The method for detecting an object according to claim 1, wherein before the input layer acquires the infrared image to be recognized, the method further comprises: Constructing an initial target detection network, wherein the initial target detection network comprises an input layer, a main network connected with the input layer, a plurality of head networks connected with the main network and output layers respectively connected with the head networks; the method comprises the steps of acquiring a sample infrared image set, wherein the sample infrared image set comprises a positive sample image and a negative sample image carrying a target object type and a position label; and carrying out iterative training on the initial target detection network through the sample infrared image set until convergence to obtain a trained target detection model.
4. The method of claim 3, wherein the acquiring a sample infrared image set comprises: Selecting at least one sample infrared image containing a designated target object type, and dividing an image area containing the designated target object type in the at least one sample infrared image to serve as a target sample; Respectively combining the target sample with other sample infrared images in the sample infrared image set to form a corresponding new sample infrared image; amplifying the sample infrared image set through the new sample infrared image.
5. An intelligent auxiliary driving early warning method is characterized by comprising the following steps: Collecting an infrared image corresponding to a target driving scene; Performing object detection on the infrared image by the object detection method according to any one of claims 1 to 4 to obtain the object detection result; determining the current speed of each target object and the current distance between each target object according to the types, sizes and positions of different target objects in the target detection result; and carrying out early warning prompt on collision risk of each target object.
6. The intelligent driving assistance warning method according to claim 5, wherein determining the current speed of each target object and the current distance between each target object according to the category, size and position of different target objects in the target detection result comprises: According to the category, the size and the position of different target objects in the target detection result, the current distance between the target objects is calculated by combining the relation between the size and the distance of each target object; and calculating the current speed of each target object according to the size and position transformation of different target objects in the target detection results corresponding to adjacent moments.
7. The intelligent driving assistance early warning method according to claim 5, wherein after the acquisition of the infrared image corresponding to the target driving scene, further comprises: And carrying out normalization processing on the infrared image, wherein the normalization processing comprises the steps of subtracting the average value of each pixel value in the infrared image and dividing the average value by the variance.
8. The intelligent driving assistance warning method according to claim 5, wherein the warning prompting of the collision risk of each target object comprises: Determining collision risk values of the target objects; And/or when the target object with the collision risk value higher than the threshold value exists, controlling the anti-collision early warning equipment to send out an alarm prompt.
9. The object detection model is characterized by comprising an input layer, a backbone network connected with the input layer, a plurality of head networks connected with the backbone network and a plurality of output layers correspondingly connected with the head networks respectively: The input layer is used for acquiring an infrared image to be identified; The backbone network is used for extracting multi-channel characteristics of the infrared image to be identified by utilizing grouping convolution, learning channel weights of the multi-channel characteristics by utilizing a channel attention mechanism, and carrying out weighting processing on the multi-channel characteristics according to the corresponding channel weights, wherein a first GhostSENeck structure is formed by adopting a first common convolution Ghost module and a first depth separable convolution Ghost module with the number of first convolution kernels, a channel attention mechanism module SENet is added between the first common convolution Ghost module and the first depth separable convolution Ghost module, a second GhostSENeck structure is formed by adopting a second common convolution Ghost module and a second depth separable convolution Ghost module with the number of second convolution kernels, a channel attention mechanism module SENet is added between the second common convolution Ghost module and the second depth separable convolution Ghost module, and the network is constructed by utilizing the first GhostSENeck structure and the second backbone GhostSENeck structure; Each head network is used for carrying out feature fusion on the multi-channel features output by the main network according to different groups by utilizing channel aliasing, and determining the types and positions of targets with different sizes in the infrared image to be identified; and the plurality of output layers are used for outputting object detection results containing categories and positions corresponding to the objects.
10. The vehicle-mounted equipment is characterized by comprising an image acquisition device, a memory, a processor and a display device; The image acquisition device is used for acquiring an infrared image corresponding to a target driving scene; The memory for storing the object detection method according to any one of claims 1 to 4; The processor is used for carrying out target detection on the infrared image by executing a target detection method stored in the memory to obtain a target detection result, determining the current speed of the target object and the current distance between the target objects according to the types, the sizes and the positions of different target objects in the target detection result, and carrying out early warning prompt on the collision risk of the target object; and the display device is used for displaying the early warning prompt.
11. The vehicle-mounted device of claim 10, further comprising an anti-collision pre-warning device for sending an alarm signal based on the pre-warning prompt of the processor.
12. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when executed by a processor, implements the object detection method according to any one of claims 1 to 4 or the intelligent driving assistance early warning method according to any one of claims 5 to 8.

Description

Target detection method, model, early warning method, vehicle-mounted equipment and storage medium Technical Field The application relates to the technical field of intelligent driving, in particular to a target detection method, an intelligent auxiliary driving early warning method, a target detection model, vehicle-mounted equipment and a computer readable storage medium. Background With the continuous development of the auxiliary driving industry, the requirements of the related technology are also more and more urgent. Among them, how to detect obstacles and perform collision early warning is important in the aspect of safe driving in the process of driving a vehicle. And the realization of the barrier early warning and anti-collision functions requires the target detection of the barrier, and is aided with the distance measurement and speed measurement technology to determine whether collision risks exist between the barrier and the target. Object detection on an obstacle is important, the most widely applied field is the visible light field for current object detection, and the object detection algorithm based on deep learning often considers the characteristic of visible light more. The infrared image and the visible light image have obvious differences, and the automobile anti-collision function is realized with some defects at present. The main manifestations are as follows: 1. At present, a target detection technology based on deep learning is rapidly developed, the detection precision gradually approaches the level of a person, but the detection precision is improved under many conditions, and the requirement for strong calculation force is accompanied. Some target detection algorithms with higher precision are difficult to truly produce and apply in the industrial field because real-time performance is ignored by excessively considering the precision. In addition, the detection algorithm capable of production and application does not perform multi-platform adaptation; 2. the infrared image has fewer characteristics, mainly contour and gray level characteristics, and is lack of colors and abundant detail characteristics of visible light, so that the infrared image-based target detection is more challenging, the network is required to have stronger description and excavation capability of the characteristics, the gray level average value of the infrared image relative to the visible light image is lower, the infrared image is more concentrated, and the visible image visually shows that the contrast is weaker than that of the visible light. The concentration of gray scales makes the requirements of the network on the distinguishing capability of the target higher; 3. In order to realize the collision early warning of the automobile, distance information is often required to be obtained by ranging sensors such as a laser radar, a millimeter wave radar and the like, however, the sensors are expensive, and the development cost is obviously greatly increased. Disclosure of Invention In order to solve the existing technical problems, the application provides a target detection method, an intelligent auxiliary driving early warning method, a target detection model, vehicle-mounted equipment and a computer readable storage medium, which have high suitability, can accurately detect multiple targets with different sizes such as pedestrians, vehicles and the like, and can early warn possible collision risks. In order to achieve the above object, the technical solution of the embodiment of the present application is as follows: The target detection method comprises the steps of obtaining an infrared image to be identified by an input layer, connecting a main network with the input layer, carrying out multi-channel feature extraction on the infrared image to be identified by utilizing grouping convolution, learning channel weights of the multi-channel features by utilizing a channel attention mechanism, carrying out weighting processing on the multi-channel features according to the corresponding channel weights, respectively connecting a plurality of head networks with the main network, carrying out feature fusion on the multi-channel features output by the main network by each head network through channel aliasing according to different groups, determining the types and positions of targets with different sizes in the infrared image to be identified, and correspondingly connecting a plurality of output layers with the plurality of head networks, and outputting target detection results containing the types and positions of the targets. The method comprises the steps of selecting a plurality of head networks, connecting the head networks with a trunk network, respectively, performing feature fusion on the multi-channel features output by the trunk network according to different groups by using channel aliasing, determining the types and positions of targets with different sizes in an infrared image to be ident