CN-116844132-B - Millimeter wave radar radio frequency image target detection method based on semantic mask
Abstract
The invention discloses a millimeter wave radar radio frequency image target detection method based on a semantic mask, which comprises the steps of carrying out fast Fourier transform on radar signals to obtain radio frequency images and taking the radio frequency images as training data, constructing a neural network, carrying out forward propagation on the neural network through a training data set, respectively obtaining two predicted tag values through a main decoder and an auxiliary decoder, respectively determining a main loss function and an auxiliary loss function through real tag data and the two predicted tag values, carrying out weighted summation on the main loss function and the auxiliary loss function to obtain a final loss function, carrying out backward propagation on parameters of the neural network through the final loss function to obtain a trained neural network, and carrying out prediction and evaluation on the predicted data set through the trained neural network. The invention utilizes the auxiliary semantic loss to pointedly enhance the extraction capability of the semantic features of the radio frequency image, and obtains higher detection precision.
Inventors
- LIU JUN
- WU YUZHI
Assignees
- 中国科学技术大学
Dates
- Publication Date
- 20260508
- Application Date
- 20230620
Claims (5)
- 1. A millimeter wave radar radio frequency image target detection method based on semantic mask is used for automatic driving, and is characterized in that the method comprises the following steps: Performing fast Fourier transform on the radar signals to obtain radio frequency images, and taking the radio frequency images as training data; constructing a neural network; the neural network is transmitted forwards through the training data set, and two prediction tag values are obtained through a main decoder and an auxiliary decoder respectively; determining a main loss function and an auxiliary loss function respectively through real tag data and two predicted tag values; obtaining a final loss function through weighted summation of the main loss function and the auxiliary loss function; The parameters of the neural network are back propagated through the final loss function, and a trained neural network is obtained; predicting and evaluating the prediction data set through the trained neural network; the construction of the neural network specifically comprises a 3D embedded layer module, a space-time fusion module, a window transform module, a semantic mask module, a 3D convolution downsampling module, a main decoder transform module, a 3D expansion module and an auxiliary decoder detection head which are sequentially constructed in the neural network; Wherein, a 3D embedded layer module is constructed, and an input radio frequency image sequence is expressed as Wherein the input sizes are respectively Using two sets of successive, convolution kernels of size The stride is respectively And 3D convolution of (2) Using Group Normalization (GN) and ReLU activation functions after each 3D convolution layer, the final result is In which dimension ; 、 ; Constructing a window transducer module with space-time fusion, and tensor of features Performing space-time fusion by Layer Normalization (LN) and then inputting PATCHSHIFT to obtain Then press the setting window Splitting dimensions into Inputting into multi-head attention layer and combining dimensions to obtain Then sequentially sending the residual structure into a normalization layer and a multi-layer perceptron (MLP) to obtain a new characteristic tensor Simultaneously output Query, key and Value as encoder; the window is set as in the window converter module with space-time fusion The input feature tensor is Is obtained through space-time fusion Through dimension splitting Entering into multi-head self-attention layer and dimension-transforming to obtain Finally, obtaining a new characteristic tensor through normalization and MLP , 、 ; Constructing a semantic mask module and tensor of the features Front three-dimensional of (3) After dimension combination, respectively inputting two types of linear layers to obtain 、 And Then dimension-transforming Q into The final output features are obtained through residual structure and dimension transformation after the direct output is semantic mask features, Q, K and V are respectively input into the self-attention layer as Query, key, value to be interacted, then the trainable parameters lambda are multiplied through the linear layer 2 ; Constructing a 3D convolution downsampling module, wherein the input characteristic tensor is as follows Through an input dimension of The output dimension is The convolution kernel has a size of The stride is The padding is 3D convolutional layer of (c) Then passes through BN layer and ReLU layer, and then passes through one input dimension The output dimension is The convolution kernel has a size of The stride is The padding is 3D convolutional layer of (c) Then obtaining final output through GN layer and ReLU layer ; 、 ; Constructing a main decoder transducer module, and inputting the main decoder transducer module as a characteristic tensor And the received output of the same layer encoder And According to the corresponding setting window Will be Dimension splitting into Inputting into multi-head self-attention layer and combining dimensions to obtain Will (i) be As a Query, received, co-layer encoder And Input into the multi-head attention layer as Key and Value and combine the dimensions to get Will (i) be And Weighted sum and input Adding, sequentially feeding into normalization layer and multi-layer perceptron (MLP) to obtain new feature tensor 。
- 2. The method for detecting a millimeter wave radar radio frequency image target based on semantic mask according to claim 1, wherein the radar signal is subjected to fast fourier transform to obtain a radio frequency image, and as training data, specifically, the radar signal is subjected to Fast Fourier Transform (FFT) to estimate the reflection range, then all chirped high frequency noise in each frame is removed at a rate of 30FPS using a Low Pass Filter (LPF), and after LPF, the signal is subjected to a second FFT along a different receiver antenna to estimate the reflected azimuth, and a final radio frequency image is obtained.
- 3. The method for detecting the millimeter wave radar radio frequency image target based on the semantic mask according to claim 2, wherein the main loss function and the auxiliary loss function are respectively determined by real tag data and two predicted tag values, specifically, a result of determining the main loss function by using a classification cross entropy loss (BCELoss) on the predicted tag value output by the main decoder and the real tag data (Ground Truth) provided by a data set Determining a result of an auxiliary loss function using a bi-classification cross entropy loss (BCELoss) on the predicted tag value and the true tag data output by the auxiliary decoder Results by the main loss function And the result of the auxiliary loss function The weighted sum is taken as the final loss function.
- 4. The method for detecting millimeter wave radar radio frequency image targets based on semantic masks according to claim 3, wherein parameters of the neural network are back-propagated through the final loss function to obtain a trained neural network, specifically defining a maximum training period Batch size of training (batchsize) The total number of training samples is Number of lots trained in each cycle Optimizing all parameters of the network by using an Adam optimizer, and setting an initial learning rate as Reading in training data of 1 batch, inputting forward propagation of network, calculating loss function and backward propagation, judging total training step number Number of lots that can be trained in one cycle Dividing, if true, reading the training data from head next time, if false, reading the training data next time, judging total number of training steps Whether or not to be equal to If so, ending the training to obtain a trained neural network, otherwise, continuing to train the neural network.
- 5. The method for detecting the millimeter wave radar radio frequency image target based on the semantic mask according to claim 4, wherein the predicting data set is predicted and evaluated through a trained neural network, specifically, a set of confidence maps (ConfMaps) is obtained through the trained neural network, coordinates are distance-azimuth angles, each channel represents a specific class label, for example, a class 1, a class 2 and a class 3 are respectively detected by pedestrians, bicycles and automobiles, values at pixels in a class channel represent the probability that an object with a class cls appears at the distance azimuth position, then non-maximum suppression is carried out under a set target position similarity (OLS) threshold, whether the pixel is a positive sample or a negative sample is judged under the set confidence threshold, judgment that the pixel is a positive sample is judged under the confidence threshold, judgment that the confidence threshold is a negative sample is judged, and finally average Accuracy (AP) is calculated, and the cls takes a natural number larger than 1.
Description
Millimeter wave radar radio frequency image target detection method based on semantic mask Technical Field The invention belongs to the field of automatic driving and the field of deep learning, and particularly relates to a semantic mask-based millimeter wave radar radio frequency image target detection method for automatic driving. Background With the continuous reduction of the cost of various sensors such as cameras, millimeter wave radars, laser radars and the like and the high-speed development of deep learning and matched computing hardware in recent years, the automatic driving technology is continuously advanced and is applied to the market. The automatic driving task can be divided into a plurality of subtasks such as target detection, target tracking, decision judgment and the like, wherein the target detection aims at detecting the types and the positions of different objects nearby a vehicle, provides information for subsequent tracking and decision, and is a vital task. Among the sensors, the millimeter wave radar has the advantages of high-precision distance and speed measuring capability, strong adaptability to bad weather and the like, and in addition, in a dense scene, only the millimeter wave radar can sense vehicles and pedestrians in front of a lane and even in front of the lane. Target detection methods around millimeter wave radar designs are receiving increased attention from both academia and industry. In the current mainstream sensor, the camera can better understand the semantics of the visual scene, but under adverse conditions such as weak/strong illumination or bad weather, the problems such as low/high exposure rate or blurring/shielding of the image can occur, and the target detection performance is affected. In addition, the target detection method based on the laser radar needs relatively dense laser radar point clouds to acquire semantic information, and is high in equipment price and high in calculation cost. In contrast, millimeter wave radars are more reliable and less costly in most environments, and are suitable for deployment in autopilot missions. Frequency Modulated Continuous Wave (FMCW) radar operates in the millimeter wave (MMW) band (30-300 GHz) below visible light, and has the characteristics of 1) millimeter wave having a strong ability to penetrate fog, smoke and dust, and 2) a large bandwidth and high frequency that enable FMCW radar to have a strong ranging ability. Typically, FMCW radars have two data representations, namely a radio frequency image and a millimeter wave Lei Dadian cloud. Because peak detection is performed, the millimeter wave Lei Dadian cloud is much more sparse than the point cloud of the laser radar, and has insufficient information to complete the target detection task. However, the radio frequency image may retain a rich Doppler and object motion information, thereby having the ability to understand the semantics of the target. It is therefore desirable to design a target detection algorithm that can process radio frequency image information that has practical value. Disclosure of Invention In view of the above, the main objective of the present invention is to provide a method for detecting a millimeter wave radar radio frequency image target based on semantic masks. In order to achieve the above purpose, the technical scheme of the invention is realized as follows: The embodiment of the invention provides a millimeter wave radar radio frequency image target detection method based on semantic masks, which comprises the following steps: Performing fast Fourier transform on the radar signals to obtain radio frequency images, and taking the radio frequency images as training data; constructing a neural network; the neural network is transmitted forwards through the training data set, and two prediction tag values are obtained through a main decoder and an auxiliary decoder respectively; determining a main loss function and an auxiliary loss function respectively through real tag data and two predicted tag values; obtaining a final loss function through weighted summation of the main loss function and the auxiliary loss function; The parameters of the neural network are back propagated through the final loss function, and a trained neural network is obtained; and predicting and evaluating the prediction data set through the trained neural network. In the above scheme, the radar signal is subjected to fast fourier transform to obtain a radio frequency image, and as training data, specifically, the radar signal is subjected to Fast Fourier Transform (FFT) to estimate the reflection range, then a Low Pass Filter (LPF) is used to remove all chirped high frequency noise in each frame at a rate of 30FPS, and after LPF, a second FFT is performed on the signal along a different receiver antenna to estimate the reflected azimuth angle, and a final radio frequency image is obtained. In the scheme, the neural network is constructed, specifically,