CN-115876198-B - Target detection and early warning method, device, system and medium based on data fusion
Abstract
The application provides a target detection and early warning method, device, system and medium based on data fusion, which are implemented by acquiring point cloud data and infrared images of a target detection object synchronously, fusing the point cloud data and the infrared images of the target object, and obtaining point cloud image fusion data, and then carrying out target detection on the point cloud image fusion data by utilizing a three-dimensional neural network model to obtain type information and three-dimensional information of a target detection object. The target detection method provided by the application can realize accurate three-dimensional detection of the target detection object in various application scenes, has relatively low dependence on the perception performance of the infrared acquisition equipment and the laser radar, has certain error detection and omission error correction capability of the target, has relatively small data processing amount, and has low calculation force requirement on the target detection equipment.
Inventors
- LI CHUNLIU
Assignees
- 烟台艾睿光电科技有限公司
Dates
- Publication Date
- 20260508
- Application Date
- 20221128
Claims (10)
- 1. The target detection method based on data fusion is characterized by comprising the following steps: acquiring point cloud data and an infrared image of a target detection object in synchronization; Fusing the point cloud data and the infrared image to obtain point cloud image fusion data, and fusing the point cloud data and the infrared image to obtain point cloud image fusion data, wherein the fusing comprises the following steps: The method comprises the steps of obtaining infrared characteristic data of each pixel point in an infrared image by carrying out characteristic extraction on the infrared image through a two-dimensional neural network model, determining infrared characteristic data corresponding to each point cloud according to coordinate mapping relation between each point cloud in the point cloud data and each pixel point in the infrared image and the infrared characteristic data of each pixel point in the infrared image; the fusing is performed according to the point cloud characteristic data of each point cloud and the corresponding infrared characteristic data to obtain point cloud image fusion data corresponding to each point cloud, including: respectively determining a neighborhood taking each pixel point corresponding to each point cloud as a center according to the point cloud characteristic data of each point cloud, respectively carrying out weighted average on the infrared characteristic data corresponding to each pixel point in each neighborhood in the infrared image to obtain weighted average infrared characteristic data corresponding to each point cloud, and fusing the point cloud characteristic data, the infrared characteristic data and the weighted average infrared characteristic data corresponding to each point cloud to obtain the point cloud image fusion data corresponding to each point cloud; performing target detection on the point cloud image fusion data through a three-dimensional neural network model to obtain category information and three-dimensional information of the target detection object; wherein the three-dimensional information includes at least one of size information, position information, distance information, and movement direction information of the target detection object.
- 2. The method according to claim 1, wherein the feature extraction and recognition of the point cloud image fusion data by the three-dimensional neural network model to obtain category information and three-dimensional information of the target detection object comprises: Voxel processing is carried out on the point cloud image fusion data, and a plurality of columnar voxels are obtained; performing feature extraction and mapping on a plurality of columnar voxels to obtain voxel features of the point cloud image fusion data, mapping the voxel features to a bird's-eye view image to obtain a bird's-eye view feature image corresponding to the point cloud image fusion data, wherein the voxel features comprise point cloud three-dimensional coordinate data, the infrared feature data, the weighted average infrared feature data, geometric center data and geometric center offset data corresponding to the point cloud image fusion data; And inputting the aerial view feature map into the three-dimensional neural network model to perform feature extraction and identification, and obtaining category signals and three-dimensional information of the target detection object.
- 3. The target detection method according to claim 1, wherein before the feature extraction and recognition of the point cloud image fusion data by the three-dimensional neural network model are performed to obtain the category information and the three-dimensional information of the target detection object, the target detection method further comprises: Constructing a training sample data set based on sample point cloud image fusion data carrying marking information, wherein the marking information comprises category marking information and three-dimensional marking frame information of the target detection object, and the three-dimensional marking frame information comprises center point information, length, width and height information and yaw angle information of the three-dimensional marking frame; and carrying out iterative training on the three-dimensional neural network model by using the training sample data set until a preset training condition is reached.
- 4. The method of claim 3, wherein constructing a training sample data set based on sample point cloud image fusion data carrying labeling information comprises: Labeling the sample point cloud data corresponding to the sample point cloud image fusion data to obtain the three-dimensional labeling frame information; Projecting a three-dimensional annotation frame corresponding to the sample point cloud data into the sample infrared image according to the mapping relation between the sample infrared image corresponding to the sample point cloud image fusion data and the sample point cloud data; and when the projection area of the three-dimensional annotation frame corresponding to the sample point cloud data projected into the sample infrared image meets a preset condition, taking corresponding sample point cloud image fusion data carrying the three-dimensional annotation frame information as a training sample to form a training sample data set.
- 5. The method according to claim 4, wherein after labeling the sample point cloud data corresponding to the sample point cloud image fusion data to obtain the three-dimensional labeling frame information and before extracting features of the infrared image by using a two-dimensional neural network model to obtain infrared feature data of each pixel point in the infrared image, the method further comprises: Determining two-dimensional annotation frame information corresponding to the target detection object in the sample infrared image according to the three-dimensional annotation frame information, wherein the two-dimensional annotation frame information comprises center point information and width and height information of a two-dimensional annotation frame; training the two-dimensional neural network model by using the sample infrared image carrying the two-dimensional annotation frame information; the feature extraction is performed on the infrared image through a two-dimensional neural network model to obtain infrared feature data of each pixel point in the infrared image, including: and extracting the characteristics of the infrared image through a characteristic extraction network in the trained two-dimensional neural network model to obtain infrared characteristic data of each pixel point in the infrared image.
- 6. An intelligent auxiliary driving early warning method is characterized by comprising the following steps: Acquiring an infrared image of a target detection object in a target running scene acquired by infrared acquisition equipment and synchronously acquiring point cloud data of the target detection object in the target running scene acquired by a laser radar, wherein the infrared acquisition equipment and the laser radar are both arranged on a running body; Performing target detection by the target detection method according to any one of claims 1 to 5 according to the point cloud data and the infrared image to obtain the three-dimensional information of the target detection object; And judging whether collision risk exists between the running body and the target detection object according to the three-dimensional information, and carrying out corresponding prompt.
- 7. The intelligent driving assistance warning method according to claim 6, wherein the three-dimensional information includes distance information and movement direction information, the distance information includes a lateral distance and a longitudinal distance between the target detection object and the driving body, the determining whether there is a collision risk between the driving body and the target detection object according to the three-dimensional information, and the prompting includes: When the movement direction information is in the same direction movement, if the transverse distance is smaller than a first transverse threshold value and the longitudinal distance is smaller than a first longitudinal threshold value, judging that collision risk exists between the driving body and the target detection object and generating early warning prompt information, and/or, When the movement direction information is in the same direction movement, if the transverse distance is larger than the first transverse threshold and smaller than the second transverse threshold, and the transverse distance is prolonged and reduced along with the driving time, and the longitudinal distance is smaller than the second longitudinal threshold, judging that collision risk exists between the driving body and the target detection object, and generating early warning prompt information, and/or, When the movement direction information is reverse movement, if the transverse distance is smaller than a third transverse threshold value, the longitudinal distance is smaller than a third longitudinal threshold value, and the longitudinal distance is reduced along with the extension of driving time, judging that collision risk exists between the driving body and the target detection object, and generating early warning prompt information.
- 8. An object detection device comprising a memory and a processor, wherein the memory stores a computer program executable by the processor, the computer program implementing the object detection method according to any one of claims 1 to 5 when executed by the processor.
- 9. An intelligent auxiliary driving early warning system is characterized by comprising an infrared acquisition device, a laser radar, a processor and an alarm device; the infrared acquisition equipment is used for acquiring an infrared image of the target detection object; The laser radar is used for collecting point cloud data of a target detection object; The processor, when executing a computer program, implements the intelligent driving assistance warning method as defined in any one of claims 6 or 7; the alarm device is used for alarming according to the prompt information generated by the processor.
- 10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a computer program which, when executed by a processor, implements the object detection method according to any one of claims 1 to 5 or the intelligent driving assistance early warning method according to any one of claims 6 or 7.
Description
Target detection and early warning method, device, system and medium based on data fusion Technical Field The application relates to the technical field of intelligent driving, in particular to a target detection and early warning method, device, system and medium based on data fusion. Background With the rapid development of the autopilot industry, the requirements of the related technology are also becoming more and more urgent. In an automatic driving vehicle system, not only the types of targets and obstacles but also the positions, sizes, orientations and distance information of the targets and the obstacles need to be identified, so that important guarantee is provided for safe and reliable automatic driving. Common 2D (Two Dimens iona l, two-dimensional) object detection cannot provide all information required for perception, but only provides the position of the object in the two-dimensional image and the confidence of the corresponding category, and 3D (THREE DIMENS iona l, three-dimensional) object detection combines the depth information of the object, and can provide the position, size, direction and other space scene information of the object. The 3D target detection serves as an important task of the automatic driving perception subsystem, and the reliability of the result provides important guarantee for an automatic driving back-end decision layer and a planning layer. The 3D target detection aims at enabling an automatic driving vehicle to have the capability of detecting targets such as vehicles, pedestrians and obstacles through multi-sensor data such as laser radars, cameras and millimeter wave radars, and guaranteeing the running safety of the automatic driving vehicle. At present, 3D target detection technology is in a high-speed development period, and is mainly divided into three types of point cloud 3D target detection by using a laser radar, monocular or stereoscopic image 3D target detection by using a visible light camera and multi-mode fusion 3D detection by using a laser radar and a visible light image according to different sensors and input information. The camera is used for detecting the 3D target of the monocular or stereoscopic image, although the cost is low, the monocular camera cannot provide accurate information, and the monocular ranging method based on the similar triangle is easily affected by the size of the target object. Binocular 3D target detection can utilize binocular to generate a parallax map, but is easily affected by environment, and the distance obtained by binocular vision has a certain error compared with a laser radar. Point cloud 3D target detection by using a laser radar, although the laser radar can provide accurate distance sensing and three-dimensional information, the point cloud has the problems of sparseness and lack of color information, and is extremely easy to misdetect and miss for a long-distance small target. The multi-mode fusion 3D detection performed by the laser radar and the image can better utilize 3D perception of the laser radar and semantic information rich in visible light, but the visible light camera and the laser radar sensor have weak anti-interference capability, are poor in adaptability to severe weather such as smoke, snow, rain, fog, haze and the like, and are difficult to meet the requirements of safety and reliability of an automatic driving perception system. Therefore, research on the multi-mode fusion 3D target detection technology is particularly important to the safe and reliable performance of automatic driving perception. At present, multi-mode fusion 3D target detection is mainly divided into two major categories, namely soft association fusion, namely, the expression of images and point clouds is integrated by using attention, and the relation between the characteristics of multiple sensors is learned, but the requirements on the data quantity are high, and hundreds of millions of data support are needed to achieve a good effect. And secondly, based on hard-association fusion detection of calibration external parameters, the calibration external parameters can be obtained by carrying out multi-mode data calibration through a specific calibration plate or a non-target method. Further, the hard association is mainly divided into data layer and decision layer fusion. The decision layer fusion is also called target level fusion, which is to fuse the prediction results of vision and point cloud modes, but the decision layer fusion is strongly dependent on the perception performance of each sensor, and has low error correction capability on target false detection and omission detection. The richer original data can be reserved based on fusion of the data layers, but the requirements on time synchronization and space registration of the camera and the point cloud are higher, the data processing amount is larger, and the requirements on the platform are higher. Disclosure of Invention In order to solve the technical