DE-102025100001-A1 - Collaborative perception system for generating a cooperative perceptual map from a bird's-eye view

DE102025100001A1DE 102025100001 A1DE102025100001 A1DE 102025100001A1DE-102025100001-A1

Abstract

A collaborative perception system for generating a cooperative bird's-eye view perception map based on bird's-eye view perception data collected from a multitude of vehicles comprises one or more central computers wirelessly connected to one or more controllers on each of the multitude of vehicles in an environment. The one or more central computers execute instructions to reconstruct lost features in order to generate a multitude of corresponding repaired feature maps for each of the multitude of vehicles, an initial cross-attention map, and a temporal attention map. The one or more central computers fuse the temporal attention map and the initial cross-attention map to produce a fused bird's-eye view attention map and generate the cooperative bird's-eye view perception map based on this fused bird's-eye view attention map.

Inventors

Ruiyang Zhu
Shuqing Zeng
Fan Bai
Zhuoqing Morley Mao

Assignees

GM Global Technology Operations LLC
THE REGENTS OF THE UNIVERSITY OF MICHIGAN

Dates

Publication Date: 20260513
Application Date: 20250101
Priority Date: 20241113

Claims (10)

Collaborative perception system for generating a cooperative bird's-eye view perception map based on bird's-eye view perception data collected from a variety of vehicles, wherein the collaborative perception system comprises: one or more central computers wirelessly connected to one or more controllers from each of the plurality of vehicles located in an environment, wherein the one or more central computers execute instructions to: receive an individual bird's-eye view feature map from each of the plurality of vehicles; perform a lost feature reconstruction to reconstruct one or more lost feature indices within the individual bird's-eye view feature map for each of the plurality of vehicles in order to generate a plurality of corresponding repaired feature maps for each of the plurality of vehicles; address spatial misalignments within an initial individual bird's-eye view feature map from an ego vehicle based on the plurality of corresponding repaired feature maps for each of the plurality of vehicles in order to generate an initial cross-attention map, wherein the initial individual bird's-eye view feature map from the ego vehicle is based on a current time step; to compute a temporal attention map by transforming a second individual bird's-eye view feature map, based on a previous time step, from the ego vehicle from the previous time step to a current time step based on a difference between a first ego vehicle pose and a second ego vehicle pose to generate a temporally aligned bird's-eye view feature map; then performing deformable attention on the temporally aligned bird's-eye view feature map and the first individual bird's-eye view feature map; fusing the temporal attention map and the initial cross-attention map to generate a fused bird's-eye view attention map; and generating the cooperative bird's-eye view perception map based on the fused bird's-eye view attention map.
Collaborative perception system according to Claim 1 , wherein one or more central computers contain a masked auto-encoder network with an encoder and a decoder.
Collaborative perception system according to Claim 2 , wherein the one or more central computers execute instructions to: decompose each of the individual bird's-eye view feature maps into a plurality of patches, each patch being sized to contain one or more feature vectors of the individual bird's-eye view feature map.
Collaborative perception system according to Claim 3 , wherein the one or more central computers execute instructions to: learn, by means of the masked auto-coder network's encoder, characteristics of undamaged patches that are part of the individual bird's-eye view feature map and omit the one or more indices of lost features; and, by means of the masked auto-coder network's decoder, restore remaining patches of the individual bird's-eye view feature map that contain the one or more indices of lost features, based on the characteristics of the undamaged patches learned by means of the encoder, in order to produce the appropriate repaired feature map for each of the plurality of vehicles.
Collaborative perception system according to Claim 3 , where the size of each patch is based on the level of detail required by the collaborative perception system and the amount of computing power available from the one or more central computers.
Collaborative perception system according to Claim 1 , wherein the one or more central computers determine the initial cross-attention map by: comparing each feature vector located within the first individual bird's-eye view feature map with a predefined number of equivalent individual feature vectors located within each of the plurality of corresponding repaired feature maps for each of the plurality of vehicles to determine an attention weight; and computing a unique cross-attention map corresponding to each of the predefined number of equivalent individual feature vectors, each individual feature vector of each unique cross-attention map representing a unique attention weight.
Collaborative perception system according to Claim 6 , where the attention weight represents a similarity between a given feature vector located within the first individual bird's-eye view feature map and an equivalent individual feature vector located within a corresponding repaired feature map.
Collaborative perception system according to Claim 6 , wherein the one or more central computers determine the initial cross-attention map by: assigning attention weights according to each feature vector across each of the unique cross- Attention maps are compared according to each specific position within the unique cross-attention maps to determine a maximum attention weight; and the attention weight of the feature vector with the maximum attention weight is assigned to the feature vector within the initial cross-attention map with the same specific position.
Collaborative perception system according to Claim 1 , wherein the one or more controllers of the multitude of vehicles are wirelessly connected to each other on the basis of a vehicle-to-everything (V2X) communication network.
Collaborative perception system according to Claim 6 , wherein the one or more central computers fuse the temporal attention map and the initial cross-attention map to produce the fused bird's-eye view attention map by: comparing attention weights corresponding to each feature vector within the initial cross-attention map with a corresponding feature vector located at the same specific position within the temporal attention map to determine a maximum attention weight; and assigning the attention weight of the feature vector with the maximum attention weight to the feature vector within the fused bird's-eye view attention map that has the same specific position.

Description

Introduction The present disclosure relates to a collaborative perception system for generating a cooperative bird's-eye view perception map based on bird's-eye view perception data collected from a multitude of vehicles. An autonomous vehicle performs various tasks, including but not limited to perception, localization, mapping, path planning, decision-making, and motion control. For example, an autonomous vehicle may have perceptual sensors to collect data about its surroundings. However, sometimes objects in the environment may not be seen or detected by the sensory capabilities of an autonomous vehicle for various reasons. One approach to mitigating the aforementioned problems with perception sensors involves partially sharing perception data between multiple vehicles over a wireless network to generate a map. However, several challenges can arise when attempting to merge or fuse perception data to create a map. Specifically, the perception data shared between vehicles may exhibit significant misalignment due to localization and synchronization errors. Furthermore, perception data loss can occur for a variety of reasons, including but not limited to unreliable or lossy networks, channel noise, packet collisions, malicious hacking, and environmental interference, further exacerbating the problems encountered when attempting to fuse the perception data. For example, lossy communication, which is sometimes encountered in a vehicle-to-vehicle (V2V) network, leads to the loss of network packets. Although current perception systems fulfill their intended purpose, there is therefore a need for an improved approach to sharing perception data between vehicles. Summary According to several aspects, a collaborative perception system for generating a cooperative bird's-eye view perception map based on bird's-eye view perception data collected from a multitude of vehicles is disclosed. The collaborative perception system comprises one or more central computers wirelessly connected to one or more controllers on each of the multitude of vehicles in an environment. The one or more central computers execute instructions to receive an individual bird's-eye view feature map from each of the multitude of vehicles and to perform lost feature reconstruction in order to reconstruct one or more lost feature indices within the individual bird's-eye view feature map for each of the multitude of vehicles, in order to generate a multitude of corresponding repaired feature maps for each of the multitude of vehicles. The one or more central computers address spatial misalignments within a first individual bird's-eye view feature map from an ego-vehicle based on the plurality of corresponding repaired feature maps for each of the plurality of vehicles to generate an initial cross-attention map, where the first individual bird's-eye view feature map from the ego-vehicle is based on a current time step. The one or more central computers compute a temporal attention map by transforming a second individual bird's-eye view feature map, based on a previous time step, from the previous time step to a current timestamp or time step based on a difference between a first ego-vehicle pose and a second ego-vehicle pose to generate a temporally aligned bird's-eye view feature map, and then performing deformable attention on the temporally aligned bird's-eye view feature map and the first individual bird's-eye view feature map. The one or more central computers fuse the temporal attention map and the initial cross-attention map to create a fused bird's-eye view attention map, and generate the cooperative bird's-eye view perception map based on the fused bird's-eye view attention map. In another aspect, the one or more central computers contain a masked auto-encoder network with an encoder and a decoder. In yet another aspect, one or more central computers execute instructions to divide or patch each of the individual bird's-eye view feature maps into a multitude of fields or patches, with each patch being sized to contain one or more feature vectors of the individual bird's-eye view feature map. In one aspect, one or more central computers execute instructions to learn, using the encoder of the masked auto-encoder network, characteristics of uncorrupted or undamaged patches that are part of the individual bird's-eye view feature map and that omit or exclude one or more indices of lost features, and, using the decoder of the masked auto-encoder network, to restore remaining patches of the individual bird's-eye view feature map that contain one or more indices of lost features, based on the characteristics of the undamaged patches learned by the encoder, in order to generate the corresponding repaired feature map for each of the multitude of vehicles. In another aspect, the size of each patch is based on the level of detail required by the collaborative perception system and the amount of computing power available from the one or more central comput