CN-116189171-B - Three-dimensional target detection method and related device

CN116189171BCN 116189171 BCN116189171 BCN 116189171BCN-116189171-B

Abstract

A three-dimensional target detection method and a related device comprise the steps of obtaining a training data set, expanding preprocessed image data and 3D point cloud data, dividing the whole three-dimensional space into three-dimensional voxel grids in equal parts, constructing a three-dimensional target detection network aiming at the characteristics of the voxel grids and the image data, constructing a omission searching module and a omission predicting module, and carrying out 3D target detection by using a trained model. The invention realizes the detection of the missing target in the three-dimensional target detection, can find out the undetected target in the three-dimensional target detection according to the advanced accuracy of the two-dimensional target detection and make two-stage detection on the undetected target, improves the accuracy of the three-dimensional target detection, and can be more comprehensive and accurate when detecting the remote target. The virtual point cloud special training of the missing target is used for overcoming the problem that the three-dimensional characteristics of part of the target cannot be captured, and the blocked or truncated target can be detected.

Inventors

AI LINGMEI
Xie Zhuoyu
Yao Ruoxia

Assignees

陕西师范大学

Dates

Publication Date: 20260505
Application Date: 20230223

Claims (8)

1. A three-dimensional object detection method, comprising: acquiring a training data set, and preprocessing the acquired training data set; Expanding the preprocessed image data and 3D point cloud data; Dividing the whole three-dimensional space into equal parts of three-dimensional voxel grids, placing the 3D point cloud in the voxel grids according to coordinate positions, and encoding the characteristics of each voxel grid into an average value of the point cloud characteristics in the grid; constructing a three-dimensional target detection network aiming at the characteristics of the voxel grid and the image data; Constructing a omission searching module and a omission predicting module; Training the omission searching module, and after obtaining trained parameters, taking point cloud and images of the 3D target detection data set as input of each channel of a neural network in the three-dimensional target detection network, and inputting the input into a three-dimensional target detection network model for training; 3D target detection is carried out by using the trained three-dimensional target detection network model, the trained three-dimensional target detection network model predicts objects on each sample one by one on a verification set and a test set until all samples are predicted, and the 3D target detection test is completed; Building a omission searching module and a omission predicting module: The structure of the omission searching module sequentially comprises a self-adaptive average pooling layer, a 1-layer convolution layer, a maximum pooling layer, 4 residual error convolution groups, an activation layer, an average pooling layer, a characteristic pyramid network FPN and a omission searching algorithm, wherein the self-adaptive average pooling layer is used for adjusting the size of an image, the convolution layer, the maximum pooling layer and the residual error convolution groups are used for extracting 2D characteristics from RGB images, the activation layer is used for adding nonlinear factors, the characteristic pyramid network is used for predicting targets from the 2D characteristics, and the omission searching algorithm is used for searching objects omitted by the three-dimensional target detecting network; there are two prediction networks in the missing prediction module: a. The system comprises a fine prediction module, a projection module, a feature compression module, a feature splicing module and a target detection module, wherein the fine prediction module is used for extracting fine features to predict, the structure of the fine prediction module sequentially comprises a point cloud projection module, 3 upsampling layers, 2 interpolation layers, a feature compression layer, 6 2D convolution layers, 3 2D deconvolution layers, a feature splicing layer and 2D convolution layers, the projection module is used for screening point clouds of missing objects, the upsampling layers and the interpolation layers are used for adding features to sparse point clouds, the feature compression layer is used for compressing the 3D features into 2D features, the 6 2D convolution layers are used for extracting the 2D features, the 3 2D deconvolution layers are used for aligning the features, the feature splicing layer is used for splicing the features, and the 2D convolution layers are used for realizing target detection; b. The virtual point cloud prediction module sequentially comprises a virtual point cloud generation algorithm, an orientation prediction algorithm and a sliding window-based target detection network, wherein the orientation prediction algorithm is used for predicting the orientation of a 3D object from a 2D detection result, the virtual point cloud generation algorithm is used for generating a conical point cloud of a missing object, and the sliding window-based target detection network is used for realizing target detection; The three-dimensional scene range of the virtual point cloud generating algorithm is x epsilon [0,70], y epsilon [ 40,40], z epsilon [3, 3], a number of 140X 160X 6= 134400 virtual points are firstly generated, all virtual points are further screened to generate cone point clouds of the missing objects in a projection mode, a target detection network based on a sliding window is used for realizing target detection, the length, the width and the height of the sliding window are [3.9, 1.6, 1.56], and the direction of the sliding window is determined by the direction prediction algorithm.
2. The method according to claim 1, wherein a training data set is acquired, and the acquired training data set is preprocessed: Downloading a 2D object detection dataset and a 3D object detection dataset of the common dataset KITTI, wherein the 2D object detection dataset comprises a training set of 200 samples, wherein the 3D object detection dataset comprises a left RGB image, a point cloud, camera parameters and labels, and is divided into a training set and a testing set, and preprocessing the datasets, wherein the operations comprise deleting redundant width of a point cloud scene, limiting the whole scene range to a fixed value, randomly overturning along an X axis, randomly rotating and randomly zooming the scene.
3. The method for three-dimensional object detection according to claim 1, wherein the preprocessed image data and the 3D point cloud data are expanded: The expansion method for RGB image includes horizontal overturn, angle transformation, brightness/contrast/color transformation, image blurring and sharpening, gaussian noise addition and random curling; the expansion method for the 3D point cloud comprises random world overturn, random world rotation, random world scaling and random image sliding.
4. The three-dimensional object detection method according to claim 1, wherein a three-dimensional object detection network is constructed: the three-dimensional target detection network structure sequentially comprises a batch normalization layer, an activation layer, a 4-layer 3D sparse convolution neural network, a feature compression layer and a region suggestion network RPN, wherein the normalization layer and the activation layer are used for normalizing and activating voxel features, the 3D sparse convolution layer is used for extracting 3D features, and the RPN network is used for realizing target detection.
5. The method of claim 1, wherein the training network model: The method comprises the steps of firstly, training an FPN (field programmable gate array) network in a missing searching module, inputting a KITTI 2D target detection dataset into the network for training, inputting point clouds and RGB (red, green and blue) images of a KITTI 3D target detection dataset into a whole network model for training after obtaining trained parameters, verifying the trained network model with average accuracy to meet the requirements, wherein the missing predicting module comprises a network for fine detection and a network for generating virtual point cloud detection, in the fine detection, as the network goes deep, continuously upsampling the point clouds of an object to enrich 3D characteristics of the object, and in the virtual point cloud detection network, a sliding window with fixed orientation slides with fixed step length to accurately position the target of interest.
6. A three-dimensional object detection system, comprising: the data acquisition module is used for acquiring a training data set and preprocessing the acquired training data set; The data expansion module is used for expanding the preprocessed image data and the 3D point cloud data; The system comprises a voxel grid establishing module, a three-dimensional point cloud generating module and a three-dimensional point cloud generating module, wherein the voxel grid establishing module is used for dividing the whole three-dimensional space into three-dimensional voxel grids in equal parts, and placing a 3D point cloud in the voxel grids according to coordinate positions, wherein the characteristics of each voxel grid are coded into an average value of the characteristics of the point cloud in the grid; the three-dimensional target detection network construction module is used for constructing a three-dimensional target detection network aiming at the characteristics of the voxel grid and the image data; Training the omission searching module, taking point cloud and images of the 3D target detection data set as input of each channel of the neural network in the three-dimensional target detection network after obtaining trained parameters, and inputting the point cloud and images into the three-dimensional target detection network model for training; the detection module is used for carrying out 3D target detection by using the trained three-dimensional target detection network model, predicting the trained three-dimensional target detection network model on each sample one by one on the verification set and the test set until all samples are predicted, and completing the test of 3D target detection; Building a omission searching module and a omission predicting module: The structure of the omission searching module sequentially comprises a self-adaptive average pooling layer, a 1-layer convolution layer, a maximum pooling layer, 4 residual error convolution groups, an activation layer, an average pooling layer, a characteristic pyramid network FPN and a omission searching algorithm, wherein the self-adaptive average pooling layer is used for adjusting the size of an image, the convolution layer, the maximum pooling layer and the residual error convolution groups are used for extracting 2D characteristics from RGB images, the activation layer is used for adding nonlinear factors, the characteristic pyramid network is used for predicting targets from the 2D characteristics, and the omission searching algorithm is used for searching objects omitted by the three-dimensional target detecting network; there are two prediction networks in the missing prediction module: a. The system comprises a fine prediction module, a projection module, a feature compression module, a feature splicing module and a target detection module, wherein the fine prediction module is used for extracting fine features to predict, the structure of the fine prediction module sequentially comprises a point cloud projection module, 3 upsampling layers, 2 interpolation layers, a feature compression layer, 6 2D convolution layers, 3 2D deconvolution layers, a feature splicing layer and 2D convolution layers, the projection module is used for screening point clouds of missing objects, the upsampling layers and the interpolation layers are used for adding features to sparse point clouds, the feature compression layer is used for compressing the 3D features into 2D features, the 6 2D convolution layers are used for extracting the 2D features, the 3 2D deconvolution layers are used for aligning the features, the feature splicing layer is used for splicing the features, and the 2D convolution layers are used for realizing target detection; b. The virtual point cloud prediction module sequentially comprises a virtual point cloud generation algorithm, an orientation prediction algorithm and a sliding window-based target detection network, wherein the orientation prediction algorithm is used for predicting the orientation of a 3D object from a 2D detection result, the virtual point cloud generation algorithm is used for generating a conical point cloud of a missing object, and the sliding window-based target detection network is used for realizing target detection; The three-dimensional scene range of the virtual point cloud generating algorithm is x epsilon [0,70], y epsilon [ 40,40], z epsilon [3, 3], a number of 140X 160X 6= 134400 virtual points are firstly generated, all virtual points are further screened to generate cone point clouds of the missing objects in a projection mode, a target detection network based on a sliding window is used for realizing target detection, the length, the width and the height of the sliding window are [3.9, 1.6, 1.56], and the direction of the sliding window is determined by the direction prediction algorithm.
7. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of a three-dimensional object detection method according to any one of claims 1 to 5 when the computer program is executed.
8. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of a three-dimensional object detection method according to any one of claims 1 to 5.

Description

Three-dimensional target detection method and related device Technical Field The invention belongs to the technical field of computer vision and pattern recognition, and particularly relates to a three-dimensional target detection method and a related device. Background The three-dimensional target Detection (3D Object Detection) is the basis for environment perception and behavior decision by unmanned and robotic machines, and can detect surrounding scene information, namely the related information (position, direction, size and the like) of surrounding environment (roads, vehicles, pedestrians and the like), and the main perception information sources are cameras and laser Detection AND RANGING (LiDAR). There are two main three-dimensional target detection technical routes, namely a point-based target detection method and a voxel-based target detection method. The three-dimensional object detection method based on the voxels overcomes the technical problems of sampling characteristic loss and low operation efficiency of the point-based method, but some objects which are clearly visible in RGB images have almost no point cloud in the labels due to distance or shielding. In this case, both point-based and voxel-based methods have difficulty predicting object positions because they are almost impossible to extract any features from the point cloud. Considering that effective 3D features cannot be extracted from the point cloud when detecting occluded or truncated 3D objects, these objects are often missed by the 3D object detection network or predicted as a result of excessive errors. Disclosure of Invention The invention aims to provide a three-dimensional target detection method and a related device, which are used for solving the problem that detection omission is caused by the fact that 3D features of a shielded or truncated 3D object cannot be effectively extracted. In order to achieve the above purpose, the present invention adopts the following technical scheme: in a first aspect, the present invention provides a three-dimensional object detection method, including: acquiring a training data set, and preprocessing the acquired training data set; Expanding the preprocessed image data and 3D point cloud data; Dividing the whole three-dimensional space into equal parts of three-dimensional voxel grids, placing the 3D point cloud in the voxel grids according to coordinate positions, and encoding the characteristics of each voxel grid into an average value of the point cloud characteristics in the grid; constructing a three-dimensional target detection network aiming at the characteristics of the voxel grid and the image data; Constructing a omission searching module and a omission predicting module; training the omission searching module, and after obtaining trained parameters, taking the point cloud and the image of the 3D target detection data set as the input of each channel of the neural network, and inputting the input into the whole network model for training; 3D target detection is carried out by using the trained model, the trained target detection model predicts the object on each sample one by one on the verification set and the test set until all samples are predicted, and the 3D target detection test is completed. Optionally, a training data set is acquired, and the acquired training data set is preprocessed: Downloading a 2D object detection dataset and a 3D object detection dataset of the common dataset KITTI, wherein the 2D object detection dataset comprises a training set of 200 samples, wherein the 3D object detection dataset comprises a left RGB image, a point cloud, camera parameters and labels, and is divided into a training set and a testing set, and preprocessing the datasets, wherein the operations comprise deleting redundant width of a point cloud scene, limiting the whole scene range to a fixed value, randomly overturning along an X axis, randomly rotating and randomly zooming the scene. Optionally, the preprocessed image data and the 3D point cloud data are expanded: The expansion method for RGB image includes horizontal overturn, angle transformation, brightness/contrast/color transformation, image blurring and sharpening, gaussian noise addition and random curling; the expansion method for the 3D point cloud comprises random world overturn, random world rotation, random world scaling and random image sliding. Optionally, a three-dimensional object detection network is constructed: the three-dimensional target detection network structure sequentially comprises a batch normalization layer, an activation layer, a 4-layer 3D sparse convolution neural network, a feature compression layer and a region suggestion network RPN, wherein the normalization layer and the activation layer are used for normalizing and activating voxel features, the 3D sparse convolution layer is used for extracting 3D features, and the RPN network is used for realizing target detection. Optionally, building a