CN-115512319-B - Multimode collaborative detection method and system based on different-pattern network

CN115512319BCN 115512319 BCN115512319 BCN 115512319BCN-115512319-B

Abstract

The invention provides a multimode collaborative detection method and a multimode collaborative detection system based on an heteromorphic network, which belong to the technical field of target detection and comprise the following steps that an intelligent agent extracts BEV characteristics based on point cloud and images respectively; the method comprises the steps of generating multi-modal BEV features of a plurality of agents, transmitting the generated multi-modal BEV features to a central vehicle, fusing the multi-modal BEV features of the agents at a node layer and a semantic layer based on an abnormal composition method to obtain new cooperation features, and carrying out target detection by the central vehicle based on the new cooperation features to obtain a final detection result. The invention adopts the multi-mode single-order fusion detection model, the detection precision is obviously superior to that of the single-mode single-stage detection model, and the perception visual field of a bicycle is greatly enlarged and the perception information is enriched through the feature fusion of the heterogeneous collaborative map, so that the collaborative perception performance is improved.

Inventors

ZHANG HUI
LI YADONG
Cao Yuanzhouhan
HAN YUSHAN
JIN YI
CHEN NAIYUE

Assignees

北京交通大学

Dates

Publication Date: 20260508
Application Date: 20220915

Claims (7)

1. The multi-mode collaborative detection method based on the heterograph network is characterized by comprising the following steps of: The intelligent agent extracts BEV features based on the point cloud and the image respectively; The plurality of agents transmit BEV features respectively extracted based on the point cloud and the image to a central vehicle; The method for fusing multi-mode BEV features of multiple intelligent agents at a node layer and a semantic layer based on different composition to obtain new cooperative features comprises setting element paths of heterogeneous cooperative charts for different intelligent agents, fusing attention mechanism features at the node layer in each element path according to the element paths, fusing attention mechanism features at the semantic layer between different element paths, fusing the features output by the heterogeneous cooperative charts as fusion features of a central vehicle, wherein the fusion of the attention mechanism features at the node layer in each element path is carried out according to the element paths, and comprises the steps of inputting the multi-mode features of the multi-source heterogeneous intelligent agents into the heterogeneous charts, firstly fusing the attention mechanism at the node layer in each element path, and fusing the attention mechanism at the node layer for one element path First, a specific conversion matrix is designed The characteristics are subjected to a transformation of the characteristics, Is a projection feature: after converting the features, self-attention is used to calculate the weights between node features in the meta-path given the meta-path Node pairs within Attention at node level Learning node Opposite node The significance of self-attention is calculated as: ; Wherein, the For self-attention operations: ; The structural information of the graph is injected into the model to ensure that the weight is only aimed at the current edge, regularized features are calculated through softmax, wherein I is a splicing operation, Is an activation function: ; Obtaining regularized weight, and aggregating neighbor features by the following formula: ; Wherein, the For the current meta-path The characteristics obtained by node level fusion are learned; The method comprises the steps of carrying out semantic-level attention mechanism feature fusion among different element paths, wherein as different nodes in a heterogram contain multiple semantics, carrying out feature fusion in the element paths only can consider the semantics in the current path, and in order to more comprehensively consider the multiple semantics of the nodes, fusing multiple element paths, and giving the node layer fused output features, wherein the weights among the element paths are calculated by using the following formula: ; Wherein, the And (3) with The same is the self-attention operation; and the central vehicle performs target detection based on the new cooperative characteristics to obtain a final detection result.
2. The heterogeneous collaborative detection method based on an heterograph network according to claim 1, wherein the agent extracts BEV features based on point clouds and images, respectively, comprising: taking aerial view as conversion characteristic of two modes, assuming sharing Objects of each category, n autonomous vehicles and m road-side infrastructures, for each agent Point cloud data of (a) (I=1, 2,3,., n+m), extracting three-dimensional point cloud data by using a point cloud feature extractor, and converting the three-dimensional point cloud data into two-dimensional bird's-eye view feature Obtaining two-dimensional aerial view characteristics generated by a plurality of images through projection by utilizing a characteristic extractor of the images And dividing grids based on the point cloud range to generate an anchor frame of single-stage target detection for final region extraction.
3. The heterograph network-based multimodal collaborative detection method according to claim 2, wherein the plurality of agents communicate the generated multimodal BEV signature to a central vehicle comprising: Reducing transmission bandwidth by adopting a characteristic compression mode; given a characteristic for each agent Compressing it to Wherein Decoding the compressed features at the central vehicle end to obtain features consistent with the original feature size 。
4. A heterograph network-based multimodal collaborative detection system based on the method of any of claims 1-3 including: the extraction module is used for respectively extracting BEV features based on the point cloud and the image by the agent; the transmission module is used for transmitting BEV features respectively extracted based on the point cloud and the images to the central vehicle by a plurality of intelligent agents; The fusion module is used for fusing the multi-modal BEV characteristics of the multi-agent at the node layer and the semantic layer based on the heterogram method to obtain new cooperative characteristics; And the detection module is used for carrying out target detection on the basis of the new cooperative characteristics by the central vehicle to obtain a final detection result.
5. A non-transitory computer readable storage medium storing computer instructions which, when executed by a processor, implement the heterograph network-based multimodal collaborative detection method in accordance with any of claims 1-3.
6. A computer program product comprising a computer program for implementing the heterograph network-based multimodal collaborative detection method in accordance with any one of claims 1-3 when run on one or more processors.
7. An electronic device comprising a processor, a memory and a computer program, wherein the processor is connected to the memory, the computer program is stored in the memory, and when the electronic device is running, the processor executes the computer program stored in the memory to cause the electronic device to execute instructions for implementing the heterogeneous network-based multimodal collaborative detection method in accordance with any of claims 1-3.

Description

Multimode collaborative detection method and system based on different-pattern network Technical Field The invention relates to the technical field of target detection, in particular to a multi-mode collaborative detection method and system based on an heteromorphic network. Background Target detection is an important research direction in the field of autopilot vision, where vehicles are also known as agents in autopilot scenes. Traditional target detection is single-agent target detection based on vehicle-mounted sensors. However, due to the fact that the target is blocked and the limitations of the vehicle-mounted sensor, a dead zone exists in the detection of the bicycle, and a good detection result cannot be obtained frequently. To address the challenges of bicycle target detection, collaborative target detection has evolved. The collaborative target detection is a detection method based on multi-agent information fusion, and is realized by adding a multi-agent collaborative module into a traditional target detection framework. Under the automatic driving scene, a plurality of automatic driving vehicles and road infrastructures are arranged on the road, sensors such as a laser radar, an RGB camera and the like are configured, the blind area of one vehicle can be located in the detection area of other intelligent agents, and the central vehicle can obtain a more comprehensive visual field by transmitting target information observed by other vehicles and infrastructures to the central vehicle, so that more accurate target detection is completed. The collaborative target detection can be classified into car-car collaboration and car-road collaboration according to the types of agents participating in the collaboration. The solution ideas of vehicle-vehicle coordination and vehicle-road coordination are approximately consistent, and the difference is that the vehicle-road coordination is different in sensor types of intelligent bodies, so that the problem of heterogeneous data sources is needed to be considered. In addition, the real-time application of the vehicle-vehicle coordination method cannot be guaranteed due to unpredictable vehicle dynamic changes, and the view provided by the infrastructure is not necessarily used by an automatic driving vehicle due to the fact that the position of the road infrastructure is fixed. To make up for the deficiency of the two single tasks in the actual scene, combining the two is more beneficial to providing a complete view for the autonomous vehicle. The collaborative target detection method may be discussed in terms of both a collaborative phase and a fusion strategy. The collaboration stage refers to a stage of object detection in which collaboration module is inserted, and according to different collaboration stages, the collaboration object detection method can be divided into three types of data level collaboration, feature level collaboration and decision level collaboration. Wherein, the data level cooperation refers to the original observation data of the fusion agent, the feature level cooperation refers to the data feature of the fusion agent, and the decision level cooperation refers to the final detection data of the fusion agent. The fusion strategy refers to a specific fusion calculation process of the collaboration module, and can be divided into simple fusion, feature-based fusion and graph-based fusion. The simple fusion adopts strategies such as mean value solving, maximum value solving, splicing and the like, the vehicle with the highest relevance is selected based on feature fusion, the graph-based fusion refers to the construction of a multi-vehicle cooperation process into a graph, and the information of multiple intelligent agents is fused through the graph learning process. The data level collaborative transfer original data brings excessive bandwidth pressure, and the decision level collaborative detection result loses some target information, so that in order to maintain the balance of precision and bandwidth, the invention selects a target detection method based on feature level collaborative. Depending on the sensor type, the perception tasks can be divided into image-based perception, point cloud-based perception and multi-modal perception. RGB image features clear semantics and dense pixels, but has the problems of small visual field range and target shielding, and laser radar features wide coverage range and sparse point cloud. The 3D target detection of the bicycle generally adopts multi-mode data as model input, and makes up the defect of single-mode perception. However, the existing collaborative sensing methods are all based on point clouds, and a multi-mode fusion method is not considered yet. The existing collaborative awareness method is mostly aimed at vehicle-to-vehicle collaborative tasks, and feature level fusion is adopted to achieve balance of bandwidth and accuracy, wherein a method based on graph learning and attention