CN-122017792-A - Laser radar three-dimensional target detection method based on coarse-to-mental network

CN122017792ACN 122017792 ACN122017792 ACN 122017792ACN-122017792-A

Abstract

The invention provides a laser radar three-dimensional target detection method based on a coarse-to-mental network, which comprises the steps of firstly constructing a coarse positioning network based on a transducer network structure for positioning an interval of an effective echo pulse signal in one-dimensional time, and further designing a fine positioning network based on a coding-decoding structure for performing pulse peak value positioning with higher time resolution on a high signal-to-noise ratio signal interval extracted by the coarse positioning network. The coarse positioning network of the method can remove noise signal segments irrelevant to effective echo pulses to the greatest extent, thereby reducing the influence of noise on the fine positioning network and improving the signal-to-noise ratio of echo data and the three-dimensional target detection precision of the system.

Inventors

GUO SHANGWEI
LIU XIAODONG
FENG LUMING
LU KUNFENG
DU HUAJUN
HUANG ZHIWEI
JIA CHENHUI

Assignees

北京航天自动控制研究所

Dates

Publication Date: 20260512
Application Date: 20251231

Claims (20)

1. The laser radar three-dimensional target detection method based on the coarse-to-mental network is characterized by comprising the following steps of: (1) A crude positioning network design based on a transducer; (2) Designing a loss function of the coarse positioning network; (3) A fine positioning network design based on the encoding-decoding structure; (4) A loss function design of the precise positioning network; (5) Coarse-to-fine laser radar three-dimensional target detection network design.
2. The method for detecting the laser radar three-dimensional target based on the coarse-to-mental network according to claim 1, wherein the specific architecture of the coarse positioning network mainly comprises a local feature embedding module and a feature embedding module based on a transducer, wherein the local feature embedding module aims at extracting local detail features of a capture time arrival data matrix, and the feature embedding module based on the transducer is responsible for extracting non-local space-time features.
3. The method of claim 2, wherein the local feature embedding module comprises a 3D convolution layer, a max pooling layer that downsamples the time channel by a factor of 2, and N C basic feature embedding modules, wherein the last N C -1 modules downsample the time channel by a factor of 2, and wherein the 3D convolution kernel is used to extract local features from the time arrival data matrix in the basic feature embedding module, and wherein the module processes the input tensor High-dimensional embedded local features of output Where C e represents the dimension size of the embedded feature.
4. The method for detecting the laser radar three-dimensional target based on the coarse-to-mental network according to claim 2, wherein the feature embedding module based on the transducer processes the high-dimensional embedded local features output by the local feature embedding module Outputting the effective interval positioning probability tensor The method is characterized in that a 2D space downsampling layer is used for processing and inputting a high-dimensional local feature F e to obtain a local feature with 8 times reduced spatial resolution The second stage of the module then processes the local features F e0 through four transducer layers to extract non-local spatiotemporal features thereof, the self-attention mechanism based transducer layers first process F in to obtain a normalized attention moment array In this process, the query Q, key K, and value V matrix are obtained by linear transformation of F in , respectively.
5. The method for detecting the laser radar three-dimensional target based on the coarse-to-mental network according to claim 4, wherein the specific formula of the linear transformation is as follows: wherein W q ,W k and W v are each a learnable linear transformation layer, N represents the total number of features of all time and space positions, C e represents the feature embedding dimension, and then the attention moment matrix can be obtained by matrix multiplication operation of the query matrix Q and the transposed key matrix K T Wherein, the Is an attention matrix.
6. The method for three-dimensional object detection of laser radar based on coarse-to-mental network according to claim 5, wherein to ensure the stability of the calculated values, attention is paid to the moment array Normalization processing is carried out, and the normalization flow follows the following formula: Where α i,j represents the normalized weight in the attention matrix.
7. The method for three-dimensional object detection of laser radar based on coarse to mental network according to claim 6, wherein the transducer layer then performs feature aggregation processing using the normalized attention matrix a and the value matrix V to obtain self-attention features Wherein, the calculation process of each row vector of F sa is that The coefficient alpha i,k of the normalized attention weight matrix A represents the importance of the feature vector at the kth position to the feature vector at the ith position; The value matrix V contains feature vectors of all time and space positions, so that the self-attention mechanism has a global receptive field and can extract non-local space and time correlation features.
8. The method of claim 7, wherein finally, the self-attention feature F sa is processed by a forward propagation layer and added to the input feature F in residual connection to generate an output feature of the transducer layer The process follows the formula F out ＝LBR(F sa )+F in , wherein LBR represents a forward propagation layer, four transducer layers sequentially output four features F e1 、F e2 、F e3 and F e4 with different semantic grades, and the dimensions are all Along with the transition from F e1 to F e4 , the specific gravity of local information is gradually reduced, but the specific gravity of local space-time features is not increased, and the features of four different semantic layers are processed by a multi-level feature fusion up-sampling module to generate an effective interval positioning probability tensor In this tensor, the element The probability magnitude that an effective echo pulse is located in the interval [ (k-1) T C ,kT C ] in the echo signal of the (i, j) pixel channel is shown.
9. The method for three-dimensional object detection of lidar based on coarse-to-mental network according to any of claims 1-8, wherein the true value classification label t C of the coarse positioning network is obtained by the following formula: Wherein R represents the true value depth of the target area, deltat represents the time resolution, c represents the light speed, N c represents the number of sampling layers in the time dimension in the feature extraction process of the local feature embedding module, and the parameter determines the classification category number K of the coarse interval positioning network.
10. The method for three-dimensional target detection of laser radar based on coarse-to-mental network as claimed in claim 9, wherein in the course of training coarse positioning network, the cross entropy loss function is used to measure the distribution difference between the model prediction result and the truth value label, specifically, the truth value label t C is converted into according to the independent thermal coding strategy Effective interval positioning probability of subsequent coarse positioning network prediction Sum truth value label The cross entropy loss between is calculated by the following formula:
11. The method for three-dimensional target detection of laser radar based on coarse to mental network according to claim 10, wherein in order to guarantee the spatial smoothness of the classification result, the following total variation loss is introduced: In the calculation process of the total variation loss, as the argmax function is not conductive, a softargmax function is adopted to generate a classification result with maximum approximate expression probability of C i,j , and finally, the optimization target of the coarse positioning network is the weighted summation of the cross entropy and the total variation loss, which follows the following formula:
12. The method for three-dimensional target detection of laser radar based on coarse-to-mental network according to claim 11, wherein the classification result of the coarse positioning network is used Intercept operations arrive at a data matrix from noisy time Extracting effective echo pulse signal segment to construct effective time arrival data matrix For the (i, j) th pixel, if the classification result of the positioning interval predicted by the coarse positioning network is k, the signal data in the time interval from (k-1) T c to kT c is intercepted from the (i, j) th pixel of P r to construct an effective time arrival data matrix Thus, the effective time reaches the data matrix Is (H, W, T c ) and, in relation to the original time, arrives at the data matrix P r , the theoretical effective time arrives at the data matrix Contains a higher proportion of echo pulse data.
13. A method of three-dimensional object detection for lidar based on a coarse-to-mental network according to any of claims 1-12, wherein a fine positioning network is constructed for positioning the peak position of the echo pulse in the data matrix at the time of arrival to determine the time of flight of the pulse, thereby achieving three-dimensional detection.
14. The method for three-dimensional object detection of laser radar based on coarse-to-mental network according to claim 13, wherein the network is mainly composed of an encoder responsible for depth feature extraction and a decoder responsible for resolving high-dimensional embedded features, and the input valid time reaches a data matrix Outputting a peak positioning probability tensor In the output tensor, the element The probability that the echo pulse peak is at time point k at the (i, j) pixel location is shown.
15. The method for three-dimensional target detection of laser radar based on coarse-to-mental network as claimed in claim 14, wherein said encoder mainly adopts lightweight residual error connection structure to make depth feature extraction, and the encoder structurally designs the coarse positioning network to make local feature embedding modules similar, and inputs tensor Encoding to high-dimensional embedded features The decoder is constructed by alternately stacking a 3D deconvolution layer and a ReLU activation layer, and analyzes and upsamples the high-dimensional embedded feature F e output by the encoder to further output a predicted peak positioning probability tensor In the tensor, the element The probability that the echo pulse peak is at time point k at the (i, j) pixel location is shown.
16. The method for detecting the three-dimensional target of the laser radar based on the coarse-to-mental network according to claim 15, wherein a position coding module and a time attention module are introduced between convolution layers of the fine positioning network, so that the accurate echo signal arrival time is output, the position coding module effectively transmits position information predicted by the coarse positioning network to the fine positioning network, so that the position information can be ensured to identify echo pulse position change caused by intercepting operation, and the time attention module guides the fine positioning network to pay attention to effective echo pulse information preferentially, so that the feature extraction and peak positioning accuracy of the laser radar are further improved.
17. The method for three-dimensional target detection of laser radar based on coarse-to-mental network according to claim 16, wherein the input of the position coding module is a classification result output by the coarse positioning network: Wherein, the Representing classification probability of coarse positioning network output, the size of the classification probability is And F CR is a classification result with the size of (H, W), the position coding module is provided with a 3D convolution layer with the core size of (3, 1) and the step size of (1, 1) and is used for effectively embedding the classification result F CR into a high-dimensional feature space of the fine positioning network, the position coding module outputs a position coding feature F PE with the size of (C, H, W, 1), wherein C represents the dimension of the high-dimensional feature space, the position coding feature F PE is directly subjected to additive fusion with the input feature of each convolution layer in the encoder and the decoder to realize effective transmission of position change information caused by the interception operation, the positioning result F CR reflects feature distribution change caused by the interception operation on a time channel because the interception operation is performed based on the interval positioning result F CR output by the coarse positioning network, and the position coding module receives the positioning result F CR as input, so that the feature distribution change on the time dimension caused by the interception operation can be effectively correlated with the input feature distribution of the high-dimensional feature space, and the accurate spatial correlation of the extracted feature can be ensured.
18. The coarse-to-mental network-based laser radar three-dimensional target detection method according to claim 17, wherein a time domain attention module is further constructed to guide the fine positioning network to pay more attention to the characteristic information of the effective echo pulse; the time domain attention module firstly remodels F in with the size (c, h, w, t) into tensors with the size (c, t, h, w), obtains tensors F 4 with the size (c, t, h, w) through 3 2D convolution layers, then obtains a time domain attention weight tensor F score through performing sigmoid normalization operation on F 4 in the time dimension, remodels the size of the time domain attention weight tensor F score into (c, h, w, t), finally obtains an output characteristic F out ＝F score ·F in of the time domain attention module through element-by-element multiplication operation of F score and F in , wherein the size of the output characteristic F out of the time domain attention module is (c, h, w, t) and the size of the input tensor F in are completely consistent, the module realizes depth characteristic extraction of 1D time domain signals through a 2D convolution kernel with the size of (1, 1D convolution operation, and simultaneously obtains attention weight F score as a characteristic channel based on the neural distribution of each time domain signal of the time domain weight F score .
19. The method of claim 1-17, wherein the true value classification label t F of the fine positioning network is obtained by the following formula: Wherein, the The method comprises the steps of representing the prediction effective interval probability of a coarse positioning network, wherein R represents the true value depth of a target area, deltat is the time interval size, c represents the speed of light, and N c represents the number of sampling layers in the time dimension in the feature extraction process of a local feature embedding module; In the training process of the fine positioning network, the cross entropy loss function is adopted to measure the distribution difference between the model prediction result and the truth value label, and the true value t F is recoded into the true value t F through a single-heat coding strategy Output peak location probability Sum truth value label The cross entropy loss between is calculated by the following formula: In order to ensure the spatial smoothness of the accurate maximum positioning network classification result, the following total variation loss is introduced: Finally, the optimization objective of the fine positioning network is a weighted summation of cross entropy and total variation loss, which follows the following formula:
20. The method for detecting a three-dimensional target of a laser radar based on a coarse to fine cascade neural network according to any one of claims 1 to 18, wherein the method for detecting a three-dimensional target of a laser radar based on a coarse to fine cascade neural network is constructed by connecting a coarse positioning neural network and a fine positioning neural network in series, wherein the processing procedure of echo data of a target area received by the laser radar follows the following formula, Firstly, the Coarse positioning network Coarse (|θ C ) predicts the time interval of the effective echo pulse according to the input signal P r , and outputs the variable Representing the probability that the effective echo pulse of the (i, j) th pixel channel is located in the kth time interval, and then intercepting operation Crop (x) according to the positioning probability output by the coarse positioning network Intercepting the time interval of effective echo pulse from the original signal P r to construct effective echo pulse data with smaller noise ratio and higher signal-to-noise ratio Further, the Fine positioning network Fine (|θ F ) is used for effectively echo pulse data Performing peak positioning processing with higher time resolution, and outputting predicted echo pulse arrival time Completing the noise signal P r to the noise-free signal Realizes the denoising processing of the laser radar echo data by the conversion of the prediction result And performing Argmax (x) maximum index positioning operation, and accurately judging the arrival time of the echo pulse by the laser radar system so as to realize accurate three-dimensional target detection.

Description

Laser radar three-dimensional target detection method based on coarse-to-mental network Technical Field The invention belongs to the field of target identification and detection, and particularly relates to a laser radar three-dimensional target detection method based on a coarse-to-mental network. Background In a laser radar (LiDAR) imaging system, a laser emits laser pulses of a particular wavelength to illuminate a target region. The echo signals reflected by the target area are then received by the area array photodetector and the optical signals are converted into electrical signals. The system obtains the flight time of the laser pulse by analyzing the time difference between the echo pulse electric signal and the emission pulse, and finally calculates the target distance according to the speed of light. Compared with visible light imaging and microwave radar detection technologies, the laser radar can detect reflectivity and three-dimensional structure information of a target area at the same time, has the advantages of high three-dimensional target detection precision, high dynamic response speed, high anti-interference capability and the like, and has wide application prospects in the fields of remote sensing measurement, hidden pseudo target identification, accurate guidance and the like of military battlefields. And the laser radar system acquires the time difference between laser pulse emission and laser pulse receiving by analyzing echo signal data, and further measures the distance information of the target scene to finish the three-dimensional detection of the target. Under the conditions of long-distance target detection and weak echo signals of small targets, the signal-to-noise ratio of echo signals reflected by the targets is extremely low due to the influence of environmental background noise and system hardware noise, and the signal waveform has deviation from ideal distribution, so that the detection system is difficult to judge the arrival time of echo pulses, and great challenges are brought to the subsequent accurate three-dimensional target detection. In order to realize accurate three-dimensional detection of a long-distance weak target by a laser radar, the invention provides a laser radar three-dimensional target detection method based on a coarse-to-mental network. The method firstly builds a coarse positioning network based on a transducer network structure and is used for positioning the section of the effective echo pulse signal in one-dimensional time, and further designs a fine positioning network based on a coding-decoding structure, and performs pulse peak value positioning with higher time resolution on a high signal-to-noise ratio signal section extracted by the coarse positioning network. The coarse positioning network of the method can remove noise signal segments irrelevant to effective echo pulses to the greatest extent, thereby reducing the influence of noise on the fine positioning network and improving the signal-to-noise ratio of echo data and the three-dimensional target detection precision of the system. Disclosure of Invention In order to realize accurate three-dimensional detection of a long-distance weak target by a laser radar and reduce interference of invalid noise to a neural network characteristic fitting process, the invention provides a laser radar three-dimensional target detection method based on a coarse-to-mental network. The method firstly builds a coarse positioning network based on a transducer network structure, and utilizes the global feature extraction capability of the transducer network to excavate non-local space-time features of time reaching a data matrix, so as to accurately position the section of an effective echo pulse signal in a one-dimensional time domain. Then, by intercepting the positioned echo pulse interval, an effective time arrival data matrix is constructed, the interference of ineffective noise in a time channel is eliminated, and the signal to noise ratio of the echo pulse is improved. Further designed the accurate positioning network based on the encoding-decoding structure, the pulse peak value positioning with higher time resolution is carried out on the high signal-to-noise ratio effective time arrival data matrix extracted by the coarse positioning network. In the method, in the process of locating the arrival time of the echo pulse peak value from coarse to fine, the coarse locating network can remove noise signal sections irrelevant to effective echo pulses to the greatest extent, so that the influence of noise on the characteristic fitting process of the fine locating network is reduced, and the echo pulse peak value locating precision of the fine locating network is improved. Finally, the predicted output of the coarse positioning network is converted into three-dimensional scene information with low distance resolution through a laser pulse time flight formula, the predicted output of the fine positioning ne