CN-121999464-A - Vehicle-road collaborative sensing method and system based on feature quantization compression and semantic segmentation guidance

CN121999464ACN 121999464 ACN121999464 ACN 121999464ACN-121999464-A

Abstract

The invention discloses a vehicle-road collaborative perception method and a vehicle-road collaborative perception system based on feature quantization compression and semantic segmentation guidance, which belong to the technical field of intelligent traffic systems, wherein multi-scale features are extracted, shallow high-resolution features are imported into feature compression branches to execute dimension reduction and vehicle-road communication adaptive low-bit non-uniform quantization coding to generate compression features, deep high-semantic features are imported into semantic segmentation branches to generate scene semantic masks comprising vehicle, pedestrian and road categories to generate semantic masks; the vehicle-mounted terminal performs decompression processing on the received compression characteristics to obtain decompression characteristics, performs pixel-level dynamic weight matching fusion on the decompression characteristics and the semantic masks, generates enhanced road side perception characteristics, fuses the perception characteristics acquired by the vehicle end, performs 3D target detection, and outputs accurate detection results. The invention can effectively reduce the volume of vehicle-road communication data, improve the perceived robustness of shielding and beyond-view distance targets, and realize engineering coordination of communication efficiency and detection precision.

Inventors

PENG JIANKUN
YAN XINYA
ZHOU YANG
WANG LEI
YU SHUANGZHI

Assignees

东南大学

Dates

Publication Date: 20260508
Application Date: 20251225

Claims (10)

1. A vehicle-road collaborative sensing method based on feature quantization compression and semantic segmentation guidance is characterized by comprising the following steps: The road side high definition perception terminal collects traffic scene images, a vehicle-road collaborative customization residual feature extraction network pre-trained by a general traffic scene annotation data set is utilized to extract and structure and reconstruct to obtain multi-scale features, shallow high resolution features in the multi-scale features are led into feature compression branches to execute dimension reduction and vehicle-road communication adaptive low-bit non-uniform quantization coding to generate compression features, deep high semantic features in the multi-scale features are led into semantic segmentation branches to generate scene semantic masks comprising vehicle, pedestrian and road categories to generate semantic masks; Transmitting the compression characteristics and the semantic mask to the vehicle-mounted terminal through the V2X communication link, and sequentially executing inverse quantization and hardware adaptive decompression module processing on the received compression characteristics by the vehicle-mounted terminal to obtain reconstructed decompression characteristics; In the vehicle-mounted terminal, pixel-level dynamic weight matching fusion is carried out on the decompression feature and the semantic mask through a self-adaptive gating fusion unit comprising a double convolution layer and a Sigmoid weight generation unit, so as to generate an enhanced road side perception feature; And performing dimension cascade integration on the enhanced road side perception feature and the perception feature acquired by the vehicle-end local front-view camera to generate a fusion feature, performing 3D target detection on the fusion feature through a vehicle-end BEV special decoding network, and finally outputting an accurate detection result containing a target category, a three-dimensional space coordinate and a heading angle.
2. The vehicle-road collaborative awareness method based on feature quantization compression and semantic segmentation guidance according to claim 1, wherein the step of performing dimension reduction and vehicle-road communication adaptive low-bit non-uniform quantization coding on the shallow high-resolution feature-importing feature compression branches in the multi-scale features to generate compression features, importing deep high-semantic features in the multi-scale features into the semantic segmentation branches to generate scene semantic masks including vehicle, pedestrian and road categories to generate semantic masks comprises the steps of: in the characteristic compression branch, a road side heterogeneous scene special channel compression module integrating standard convolution, a batch normalization layer and a nonlinear activation unit is combined with a layering space compression module stacked by a plurality of groups of standard convolution to develop vehicle-road communication bandwidth adaptive type precise dimension reduction optimization on shallow high-resolution characteristics, and a non-uniform quantization mechanism through global characteristic distribution statistics is adopted to generate low-redundancy high-efficiency compression characteristics; In the semantic segmentation branch, a multi-scale context modeling is carried out on deep high semantic features by utilizing a multi-receptive field hole convolution pooling fusion module integrating 1X 1 convolution, 3 groups of hole convolutions with different expansion rates and global average pooling, semantic-space information fusion is completed with shallow detail features through a cross-layer feature aggregation mechanism, and a high-precision semantic mask with the same resolution as an original image is generated.
3. The vehicle-road collaborative sensing method based on feature quantization compression and semantic segmentation guidance according to claim 1, wherein the processing procedure of the feature compression branch sequentially comprises channel dimension reduction, space layering downsampling and 8-bit quantization coding; Adopting an integrated convolution module integrating a3 multiplied by 3 convolution layer, a batch normalization layer and a ReLU activation function layer as a special channel compressor, accurately reducing the number of channels of shallow high-resolution features in equal proportion based on the reduced proportion of engineering channels of an adaptive vehicle-road cooperative communication link, and keeping the spatial resolution of a feature map unchanged in the compression process; Adopting a plurality of groups of standard convolution modules to stack to form a layered space compressor, wherein the step length of each group of convolution modules and the filling parameters adapt to the space downsampling requirement, the number of input and output channels of each group of modules is kept consistent, and the step-by-step layered downsampling operation is carried out on the characteristics after channel reduction based on the engineering space downsampling proportion of balanced perception precision and transmission efficiency; The 8-bit quantization coding comprises the steps of adopting a non-uniform quantization mechanism, coding floating point characteristics after space compression into a low-bit integer format which is suitable for vehicle-road communication, and ensuring transmission stability by combining a value range cut-off mechanism in a quantization process.
4. The vehicle-road collaborative awareness method based on feature quantization compression and semantic segmentation guidance according to claim 3, wherein the encoding of the floating point feature after spatial compression into the low-bit integer format adapted to vehicle-road communication by adopting the non-uniform quantization mechanism specifically comprises: Statistics of global values of feature graphs to determine distribution intervals thereof And calculating a scaling factor S and an offset Z based on the ranges of the floating point number and the fixed point number, wherein the calculation formula is as follows: ; ; Quantifying the original floating point features: ; Wherein R max represents the maximum value in all positions on the feature map, R min represents the minimum value in all positions on the feature map, round represents the rounding operation, R represents the original floating point number, and Q represents the quantized fixed-point number.
5. The vehicle-road collaborative awareness method based on feature quantization compression and semantic segmentation guidance according to claim 1, wherein the processing procedure of the semantic segmentation branch comprises: carrying out context modeling on the deep features P 5 through a cavity space pyramid pooling module ASPP to obtain enhanced deep feature representations under different sensing fields; Splicing the enhanced deep feature representation with the shallow feature P 3 along the channel dimension by 4 times through upsampling, and integrating the spatial detail information of the shallow feature and the semantic association information of the deep feature through a 3X 3 convolution block by the spliced feature; the feature map integrated by the 3×3 convolution block is restored to the original image resolution by an up-sampling operation, and a high-resolution semantic mask is generated.
6. The vehicle-road collaborative awareness method based on feature quantization compression and semantic segmentation guidance according to claim 1, wherein the inverse quantization and hardware-adaptive decompression module processes, comprising: The dequantizing includes: after the compression characteristics are received by the vehicle end, the compression characteristics are restored to be floating point number representation through inverse quantization operation, and the calculation formula is as follows: ; Wherein, the Representing the recovered floating point number, Q represents the quantized fixed point number, Z represents the offset, and S represents the scaling factor; The decompressing includes: after the dequantization processing, the convolution operation in the compression is replaced by the transposed convolution in the space decompression process to realize the up-sampling, and the convolution block with the same structure as the compressor is used in the channel decompression process to only adjust the number of input and output channels.
7. The vehicle-road collaborative awareness method based on feature quantization compression and semantic segmentation guidance according to claim 1, wherein the processing procedure of the adaptive gating fusion unit comprises: the batch shape of the calculation force of the input-adaptive vehicle-mounted terminal is that Carrying out heterogeneous feature alignment and splicing on the decompressed features f inf and the semantic masks f mask in the channel dimension to obtain spliced features f cat , wherein B is the batch size of engineering deployment, C inf/mask is the number of channels of features/masks, and H and W are the space height and width matched with the resolution of the roadside cameras; inputting the spliced characteristic f cat into a gating network to generate a shape of The gating network comprises two standard convolution layers for realizing the successive dimension reduction of the channel number and finally changing the channel number into a logic diagram of 1 channel, wherein nonlinear factors are introduced between the two standard convolution layers through a ReLU activation function to enhance the representation capability of the model; Activating the gating logic diagram through a Sigmoid function to generate a gating weight diagram gate, wherein the value of each pixel level coordinate point in the space height H and width W dimensions is between (0, 1); Based on the gate weight map gate, the decompression feature F inf and the semantic mask F mask are fused to generate an enhanced road side feature F inf , and the formula is as follows: ; where Conv denotes a3 x 3 convolution operation.
8. The vehicle-road collaborative awareness method based on feature quantization compression and semantic segmentation guidance according to claim 1, wherein the 3D object detection adopts a decoder network based on a bird's eye view, outputs detection results of categories, 3D positions and orientation angles of traffic participants, and the detection process optimizes specific layer parameters by using a multi-task loss function.
9. The vehicle-road collaborative awareness method based on feature quantization compression and semantic segmentation guidance according to claim 8, wherein the multitasking loss function comprises traffic objective classification loss L cls , 3D bounding box regression loss L bbox and objective orientation classification loss L dir , and the calculation formula is: ; Wherein, L loss is a multi-task comprehensive loss function, N is the number of effective positive samples, lambda 1 、λ 2 、λ 3 is three types of loss engineering balance coefficients of the adaptive vehicle-road collaborative 3D detection scene; The classification loss L cls adopts a difficult sample focusing type classification loss function, and the calculation formula is as follows: ; Wherein P t is the class probability of the target prediction frame, and alpha and gamma are the balance parameters of positive and negative samples and difficult samples of the traffic target detection scene adapted by the loss function; The 3D boundary box regression loss L bbox adopts a gradient smooth boundary box regression loss function, and the calculation formula is as follows: ;; Wherein, D is the deviation of the 3D boundary frame parameter predicted value and the true value, x, y and z represent the three-dimensional center point position of the target, l, w and h represent the length, width and height of the target respectively, θ represents the orientation angle of the target, smoothL1 represents the gradient smooth boundary frame regression loss function; the direction classification loss L dir adopts a binary cross entropy loss function, and the calculation formula is as follows: ; wherein p is a two-class label of the true orientation of the target, Is the predicted probability of the forward direction of the target.
10. The utility model provides a vehicle-road collaborative perception system based on characteristic quantization compression and semantic segmentation map guide which characterized in that includes: The road side perception module is used for acquiring traffic scene image data through road side perception equipment, extracting multi-scale features by utilizing a vehicle-road collaborative customization residual feature extraction network, and inputting shallow high-resolution features into feature compression branches and deep high-semantic features into semantic segmentation branches; The double-branch coding module is used for executing channel-space double-dimensional collaborative compression and vehicle-road communication adaptive low-bit quantization on shallow high-resolution features in the feature compression branch to generate low-redundancy compact features, and completing context modeling of deep features through the multi-receptive field cavity convolution pooling fusion module in the semantic segmentation branch; The cross-domain transmission module is used for stably transmitting the compression characteristics and the semantic masks to the vehicle end through the V2X communication link; The vehicle-end fusion detection module is used for finishing inverse quantization and decompression of the compression features at the vehicle end, realizing dynamic weighted fusion of the decompression features and the semantic masks through the gate control fusion module, generating enhanced road-side features, cascading the enhanced road-side features with the local perception features of the vehicle end, finishing 3D target detection through a special decoding network, and outputting a result.

Description

Vehicle-road collaborative sensing method and system based on feature quantization compression and semantic segmentation guidance Technical Field The invention relates to a vehicle-road collaborative awareness method and system based on feature quantization compression and semantic segmentation guidance, and belongs to the technical field of intelligent traffic systems. Background With the gradual popularization and application of the automatic driving technology, cooperative interaction (V2I) between a Vehicle and an Infrastructure becomes a key technology path for expanding the perception capability of the Vehicle and enhancing the overall efficiency of an intelligent transportation system. The V2I technology can remarkably enlarge the vehicle sensing range by means of collaborative processing of multisource heterogeneous data acquired by the front-mounted visual sensor at the vehicle end and the high-resolution visual equipment at the road side, fills a sensing blank area formed by a single vehicle-mounted sensor due to view angle limitation, and therefore effectively improves the accuracy of three-dimensional target detection in urban complex traffic scenes (such as intersections, busy main roads and the like). At present, engineering technical schemes in the field of vehicle-road collaborative 3D target perception mainly comprise two types, namely a multi-source information fusion scheme based on laser radar point cloud, the perception precision of a remote and shielding target is improved through a point cloud accurate registration and splicing technology, the scheme has engineering defects of high purchase and operation cost of hardware equipment and operation and maintenance and missing of acquired point cloud semantic information, the deployment requirement of a large-scale road network is difficult to meet, and the scheme is a fusion scheme based on visual perception characteristics, and the joint characterization modeling of multi-view visual characteristics is completed by means of an attention mechanism or a graph neural network, but the scheme faces the technical bottleneck of converting two-dimensional visual information into three-dimensional spatial information, and the perception precision is easy to be remarkably attenuated in actual traffic scenes such as vehicle shielding, light shadow complexity and the like. Meanwhile, the existing vehicle-road collaborative awareness scheme is generally not subjected to customized adaption aiming at terminal heterogeneous characteristics of road side fixed wide-angle awareness equipment and vehicle-mounted mobile narrow-angle awareness equipment, time sequence asynchronism and visual angle difference of collected data of road sides and vehicle ends, so that the integral stability of a feature fusion process is obviously attenuated, in addition, bandwidth constraint of a vehicle-road communication link is a core bottleneck of large-scale engineering landing of V2X technology, engineering level faults such as link congestion and system response lag are easily caused by direct transmission of high-dimensional original visual features, and real-time requirements of a vehicle-mounted safety system are difficult to meet. The existing scheme mostly adopts scene-level or instance-level compression, wherein the former transmits global semantic information, has environmental coverage advantages, but has large information redundancy and poor interpretability, and the latter focuses on key targets, has high communication efficiency, easily loses context and geometric relationship, and is difficult to maintain precision in complex scenes. How to consider the detection accuracy and the expression integrity under the premise of controllable communication cost is still an open challenge. Disclosure of Invention The invention aims to solve the bottleneck problem that the existing vehicle-road cooperative sensing system is difficult to consider between communication efficiency and semantic information completeness, and creatively provides a high-efficiency sensing strategy integrating feature quantification reduction and semantic segmentation guiding. In order to solve the technical problems, the invention is realized by adopting the following technical scheme. In a first aspect, the invention discloses a vehicle-road collaborative awareness method based on feature quantization compression and semantic segmentation guidance, which comprises the following steps: The road side high definition perception terminal collects traffic scene images, a vehicle-road collaborative customization residual feature extraction network pre-trained by a general traffic scene annotation data set is utilized to extract and structure and reconstruct to obtain multi-scale features, shallow high resolution features in the multi-scale features are led into feature compression branches to execute dimension reduction and vehicle-road communication adaptive low-bit non-uniform quantization codin