CN-116824320-B - Multi-sensor feature fusion method, system, electronic equipment and storage medium

CN116824320BCN 116824320 BCN116824320 BCN 116824320BCN-116824320-B

Abstract

The embodiment of the invention provides a multi-sensor feature fusion method, a multi-sensor feature fusion system, electronic equipment and a storage medium, and relates to the technical field of radar and vision fusion. The method comprises the steps of obtaining image BEV features according to image data, obtaining initial point cloud BEV features according to the point cloud data, converting the initial point cloud BEV features into a coordinate system of the image BEV features to obtain first point cloud BEV features, determining the image BEV features matched with the first point cloud BEV features in the coordinate system, carrying out feature fusion on the first point cloud BEV features and the matched image BEV features to obtain fusion BEV features, and inputting the fusion BEV features into a target detection head to carry out target detection. By the multi-sensor feature fusion method, the precision of fusion features can be improved in the process of multi-sensor feature fusion, so that the accuracy of target detection is improved.

Inventors

QIAN XIN
QIAN SHAOHUA

Assignees

重庆长安汽车股份有限公司

Dates

Publication Date: 20260508
Application Date: 20230628

Claims (9)

1. A method of multi-sensor feature fusion, the method comprising: acquiring image BEV features according to the image data; Acquiring initial point cloud BEV characteristics according to the point cloud data; converting the initial point cloud BEV characteristics into a coordinate system of the image BEV characteristics to obtain first point cloud BEV characteristics; determining an image BEV feature matched with the first point cloud BEV feature under the coordinate system, and carrying out feature fusion on the first point cloud BEV feature and the matched image BEV feature to obtain a fused BEV feature; The method comprises the steps of inputting the fusion BEV characteristic into a target detection head for target detection, determining an image BEV characteristic matched with the first point cloud BEV characteristic under the coordinate system, carrying out characteristic fusion on the first point cloud BEV characteristic and the matched image BEV characteristic to obtain the fusion BEV characteristic, and comprises the following steps: Searching the image BEV features through a search matching part of a neighborhood search network, and matching to obtain the matched image BEV features corresponding to the first point cloud BEV features under the coordinate system; Feature fusion is carried out on the first point cloud BEV feature and the matched image BEV feature through a feature fusion part of the neighborhood search network, so that the fused BEV feature is obtained; The searching and matching part of the neighborhood searching network searches the image BEV features, matches the image BEV features to obtain the matched image BEV features corresponding to the first point cloud BEV features in the coordinate system, and comprises the following steps: Performing feature addition on the first point cloud BEV feature and the image BEV feature through the search matching part, and judging whether the association degree between the BEV features after feature addition exceeds a threshold value; if the association degree exceeds the threshold value, a matching pair is obtained; if the association degree does not exceed the threshold value, iterative learning is carried out, and the offset of the first point cloud BEV characteristic is determined; Determining an offset first point cloud BEV feature based on the first point cloud BEV feature and the offset; performing feature addition on the offset first point cloud BEV features and the image BEV features until the association degree is judged to exceed the threshold value, so as to obtain the matching pair; the image BEV features in the matching pair are determined to be image BEV features that match the first point cloud BEV features in the matching pair under the coordinate system.
2. The multi-sensor feature fusion method of claim 1, wherein the feature fusion of the first point cloud BEV feature with the matched image BEV feature by the feature fusion portion of the neighborhood search network, resulting in the fused BEV feature, comprises: Sequentially inputting the characteristic addition result of the matching pair into a 3X3 convolution kernel, an encoder and a decoder through the characteristic fusion part to obtain a first output; Performing feature fusion on the first output and the output of the 3X3 convolution kernel to obtain a second output; The second output is input into a 1X1 convolution kernel to be subjected to dimension reduction, and then a third output is obtained; Screening the third output through a judging device to obtain a fourth output; And carrying out feature fusion on the fourth output and the output of the 3X3 convolution kernel to obtain the fused BEV feature.
3. The multi-sensor feature fusion method of claim 1, wherein the acquiring image BEV features from the image data comprises: Acquiring a plurality of pieces of image data shot by a panoramic camera of a vehicle; extracting 2D features and enhancing the features of the image data through a trunk network Dual-Swin-Tiny and an FPN network in the improved LSS algorithm to obtain a multi-view 2D feature map; Based on a vehicle coordinate system of the vehicle, projecting the multi-view 2D feature map into the vehicle coordinate system to obtain the image BEV feature; Wherein the vehicle coordinate system is a coordinate system of the image BEV feature.
4. The multi-sensor feature fusion method of claim 1, wherein the acquiring initial point cloud BEV features from the point cloud data comprises: Acquiring the point cloud data shot by the point cloud equipment of the vehicle; the point cloud data is converted into the initial point cloud BEV features by a parameterized super voxel method of learning the initial point.
5. The method of claim 1, wherein the target detection head comprises at least one of CenterPoint detection heads, pointPillars detection heads, transFusion-L detection heads.
6. A multi-sensor feature fusion system, the system comprising: the image feature extraction module is used for acquiring image BEV features according to the image data; The point cloud feature extraction module is used for acquiring initial point cloud BEV features according to the point cloud data; The feature conversion module is used for converting the initial point cloud BEV feature into the image BEV feature coordinate system to obtain a first point cloud BEV feature; The feature fusion module is used for determining the image BEV features matched with the first point cloud BEV features under the coordinate system, and carrying out feature fusion on the first point cloud BEV features and the matched image BEV features to obtain fusion BEV features; the detection head module is used for inputting the fusion BEV characteristics into a target detection head to perform target detection; the feature fusion module comprises: The feature matching sub-module is used for searching the image BEV features through a search matching part of a neighborhood search network, and matching to obtain the matched image BEV features corresponding to the first point cloud BEV features under the coordinate system; the feature fusion sub-module is used for carrying out feature fusion on the first point cloud BEV feature and the matched image BEV feature through the feature fusion part of the neighborhood search network to obtain the fused BEV feature; the feature matching sub-module comprises: the characteristic adding sub-module is used for carrying out characteristic addition on the first point cloud BEV characteristic and the image BEV characteristic through the searching matching part, and judging whether the association degree between the BEV characteristics after the characteristic addition exceeds a threshold value or not; A first threshold determining submodule, configured to obtain a matching pair if the association degree exceeds the threshold; A second threshold determining submodule, configured to perform iterative learning if the association degree does not exceed the threshold, and determine an offset of the BEV feature of the first point cloud; An offset sub-module for determining an offset first point cloud BEV feature based on the first point cloud BEV feature and the offset; The judging submodule is used for carrying out feature addition on the shifted first point cloud BEV features and the image BEV features until the association degree is judged to exceed the threshold value, so that the matching pair is obtained; And the BEV feature matching sub-module is used for determining the image BEV features in the matching pair as the image BEV features matched with the first point cloud BEV features in the matching pair under the coordinate system.
7. The multi-sensor feature fusion system of claim 6, wherein the feature fusion sub-module comprises: The first output sub-module is used for sequentially inputting the characteristic addition result of the matching pair into a 3X3 convolution kernel, an encoder and a decoder through the characteristic fusion part to obtain a first output; The second output sub-module is used for carrying out feature fusion on the first output and the output of the 3X3 convolution kernel to obtain a second output; The third output sub-module is used for reducing the dimension of the second output input 1X1 convolution kernel to obtain a third output; the fourth output sub-module is used for screening the third output through the judging device to obtain a fourth output; And the BEV feature fusion submodule is used for carrying out feature fusion on the fourth output and the output of the 3X3 convolution kernel to obtain the fused BEV feature.
8. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the computer program implementing the multi-sensor feature fusion method of any one of claims 1 to 5 when executed by the processor.
9. A computer readable storage medium having a computer program stored thereon, wherein the computer program when executed by a processor implements the multi-sensor feature fusion method of any of claims 1 to 5.

Description

Multi-sensor feature fusion method, system, electronic equipment and storage medium Technical Field The embodiment of the invention relates to the technical field of radar and vision fusion, in particular to a multi-sensor feature fusion method, a multi-sensor feature fusion system, electronic equipment and a storage medium. Background Environmental awareness is an important module of autopilot, including but not limited to the tasks of 2D/3D object detection, semantic segmentation, depth completion, and prediction, which rely on raw data collected from the environment by sensors mounted on the vehicle. Sensors commonly used for environmental perception include cameras, lidar, millimeter wave radar, etc., each of which has advantages and disadvantages. However, in order to enable the automobile to have all-weather detection capability, and achieve multiple functions of road obstacle detection, lane line detection, target detection, speed measurement, distance measurement and the like, the perception capability of single-mode data reaches a certain bottleneck, and most industrial and academic researchers begin to put eyes into multi-mode fusion. At present, the multi-sensor fusion sensing method comprises a front fusion algorithm which mainly aims at data fusion or feature fusion of a laser radar, a camera and a millimeter wave radar. However, in the related pre-fusion scheme, there is a problem that the working frequencies of the vision sensor and the radar sensor are different, time alignment during feature fusion is difficult, and when the target movement speed is too high, errors generated by feature fusion are large. Therefore, how to improve the accuracy of the fusion features in the process of fusing the multi-sensor features so as to improve the accuracy of target detection is a technical problem to be solved in the invention. Disclosure of Invention The embodiment of the invention provides a multi-sensor feature fusion method, a system, electronic equipment and a storage medium, which are used for improving the precision of fusion features in the process of multi-sensor feature fusion. An embodiment of the present invention provides a multi-sensor feature fusion method, where the method includes: acquiring image BEV features according to the image data; Acquiring initial point cloud BEV characteristics according to the point cloud data; converting the initial point cloud BEV characteristics into a coordinate system of the image BEV characteristics to obtain first point cloud BEV characteristics; determining an image BEV feature matched with the first point cloud BEV feature under the coordinate system, and carrying out feature fusion on the first point cloud BEV feature and the matched image BEV feature to obtain a fused BEV feature; and inputting the fused BEV characteristics into a target detection head to perform target detection. Optionally, the determining an image BEV feature that matches the first point cloud BEV feature in the coordinate system, and performing feature fusion on the first point cloud BEV feature and the matched image BEV feature to obtain a fused BEV feature, including: Searching the image BEV features through a search matching part of a neighborhood search network, and matching to obtain the matched image BEV features corresponding to the first point cloud BEV features under the coordinate system; and carrying out feature fusion on the first point cloud BEV feature and the matched image BEV feature through a feature fusion part of the neighborhood search network to obtain the fused BEV feature. Optionally, the searching of the image BEV features through the searching matching part of the neighborhood searching network, matching to obtain the matched image BEV features corresponding to the first point cloud BEV features in the coordinate system, includes: Performing feature addition on the first point cloud BEV feature and the image BEV feature through the search matching part, and judging whether the association degree between the BEV features after feature addition exceeds a threshold value; if the association degree exceeds the threshold value, a matching pair is obtained; if the association degree does not exceed the threshold value, iterative learning is carried out, and the offset of the first point cloud BEV characteristic is determined; Determining an offset first point cloud BEV feature based on the first point cloud BEV feature and the offset; performing feature addition on the offset first point cloud BEV features and the image BEV features until the association degree is judged to exceed the threshold value, so as to obtain the matching pair; the image BEV features in the matching pair are determined to be image BEV features that match the first point cloud BEV features in the matching pair under the coordinate system. Optionally, the feature fusion portion of the neighborhood search network performs feature fusion on the first point cloud BEV feature and the matc