CN-116679262-B - Airspace main lobe interference suppression method based on reinforcement learning

CN116679262BCN 116679262 BCN116679262 BCN 116679262BCN-116679262-B

Abstract

The invention provides a space domain main lobe interference suppression method based on reinforcement learning, which comprises the steps of sampling target echo signals, noise and interference signals to obtain a radar array antenna receiving signal matrix, selecting a radar as an agent for reinforcement learning to determine a state space and an action space of the agent, designing reward signals required by reinforcement learning based on the agent set in step S2 and the state space and the action space thereof, training the agent by adopting a DQN algorithm based on the radar array antenna receiving signal matrix and reinforcement learning models constructed by the content in steps S2 and S3 to obtain interference angle prediction, performing blocking matrix processing based on the predicted interference angle to cancel the main lobe interference signals, and completing main lobe anti-interference. The invention can quickly lock to the interference angle under the condition of fixed interference angle, and quickly converges to the interference angle along with the angle change when the interference angle is changed, so that the signal-to-interference-and-noise ratio of the system output meets the subsequent processing requirement.

Inventors

GUO SHANHONG
GAO JIAN
HAN YANG
DONG XIANG
WANG JUN
FANG WEI
SHENG WEIXING

Assignees

南京理工大学

Dates

Publication Date: 20260512
Application Date: 20230529

Claims (3)

1. The airspace main lobe interference suppression method based on reinforcement learning is characterized by comprising the following steps of: s1, sampling target echo signals, noise and interference signals to obtain a radar array antenna receiving signal matrix; s2, selecting a radar as an intelligent agent to perform reinforcement learning, and determining a state space S and an action space A of the intelligent agent, wherein the state space S of the intelligent agent is a set of possible states S of all the intelligent agent, and the state S at the time t is expressed as : In the formula, For signal-to-interference-and-noise ratio, NUM is the number of mismatches, Is the current occlusion angle; An estimation formula of the signal-to-interference-and-noise ratio: wherein Y is the amplitude of the measured unit, M is the number of the reference units, and Z is the total amplitude of all the reference units; s3, designing reward signals required by reinforcement learning based on the intelligent agent set in the step S2 and the state space S and the action space A thereof; s4, training an intelligent body by adopting a DQN algorithm based on the radar array antenna receiving signal matrix obtained in the step S1 and the reinforcement learning model constructed by the contents in the steps S2 and S3 to obtain the prediction of the interference angle; s5, performing blocking matrix processing based on the angle obtained in the S4, and eliminating main lobe interference signals to complete main lobe anti-interference.
2. The reinforcement learning-based airspace main lobe interference suppression method according to claim 1, wherein the action space is a set of actions taken by a radar intelligent agent, and comprises 4 optional actions respectively representing angle adjustments with different directions and different accuracies, specifically, a current angle is shifted to the left by phi degrees, a current angle is shifted to the right by phi degrees, and the current angle is shifted to the left The current angle is shifted right Phi is the maximum change angle of the output angle of the radar intelligent body each time.
3. The reinforcement learning-based airspace main lobe interference suppression method of claim 1, wherein the radar agent obtains a reward signal after interacting with the environment, and the reward signal can be divided into three categories, namely "success" reward, "punishment" reward and conventional reward, and the specific reward signal structure is as follows Wherein the method comprises the steps of , For the desired signal-to-interference-plus-noise ratio when the system is successfully anti-interference, Outputting signal-to-interference-and-noise ratio for the current BMP, a is a 'successful' reward signal, b is a 'punishment' signal, Is a coefficient.

Description

Airspace main lobe interference suppression method based on reinforcement learning Technical Field The invention belongs to the field of radar anti-interference, and particularly relates to a airspace main lobe interference suppression method based on reinforcement learning. Background The complex electromagnetic environment and the rapidly developed radar detection technology make the countermeasure between the radar and the target more violently complex. The radar anti-interference measures mainly comprise active anti-interference, passive anti-interference and intelligent anti-interference directions. The common anti-interference method is mainly used for interference suppression from the aspects of radar system, working mode, working frequency, emission waveform, polarization characteristics, airspace angle, signal processing algorithm, multi-sensor coordination and the like. The blocking matrix preprocessing method is to perform passive anti-interference from an airspace angle, firstly estimate a main lobe interference angle, then design a blocking matrix, perform blocking matrix preprocessing on received data to suppress interference, and then perform adaptive beam forming on the preprocessed received data to suppress side lobe interference. The interference null depth formed by the blocking matrix pretreatment method has small affected angle range and is an effective main lobe interference suppression method. The traditional blocking matrix preprocessing adopts an ES-DOA method to estimate the interference angle, and adopts interference noise data analysis during the transmission to carry out interference, thereby avoiding the influence of signals. This is difficult to achieve in practice, and is typically received as a mix of interfering and target signals. Disclosure of Invention The invention provides a space domain main lobe interference suppression method based on reinforcement learning. The technical scheme of the invention is that the airspace main lobe interference suppression method based on reinforcement learning comprises the following steps: s1, sampling target echo signals, noise and interference signals to obtain a radar array antenna receiving signal matrix; S2, selecting a radar as an intelligent agent for reinforcement learning, and determining a state space S and an action space A of the intelligent agent; s3, designing reward signals required by reinforcement learning based on the intelligent agent set in the step S2 and the state space S and the action space A thereof; s4, training an intelligent body by adopting a DQN algorithm based on the radar array antenna receiving signal matrix obtained in the step S1 and the reinforcement learning model constructed by the contents in the steps S2 and S3 to obtain the prediction of the interference angle; s5, performing blocking matrix processing based on the angle obtained in the S4, and eliminating main lobe interference signals to complete main lobe anti-interference. Preferably, the state space S of an agent is a set of all possible states S of the agent, the state S at time t being denoted S t: st=[θ,SINR,NUM] where SINR is the signal-to-interference-plus-noise ratio, NUM is the number of mismatches, and θ is the current blocking angle. Preferably, the method comprises the steps of, an estimation formula of the signal-to-interference-and-noise ratio: wherein Y is the amplitude of the measured unit, M is the number of the reference units, and Z is the total amplitude of all the reference units. Preferably, the action space is a set of actions taken by the radar agent, including 4 optional actions, respectively representing angular adjustments of different directions and different accuracies, specifically: Wherein phi is the maximum change angle of each output angle of the radar intelligent body. Preferably, the radar agent obtains a reward signal after interacting with the environment. The reward signals can be divided into three categories, namely "success" reward, "penalty" reward and conventional reward, and the specific reward signal structure is as follows Wherein a >0, b <0SINR ' is the expected signal-to-interference-plus-noise ratio when the system is successful in resisting interference, SINR is the current BMP output signal-to-interference-plus-noise ratio, a is a ' successful ' reward signal, b is a ' punishment ' signal, and alpha is a coefficient. Preferably, based on the radar array antenna receiving signal matrix obtained in the step S1 and the reinforcement learning model constructed by the contents in the steps S2 and S3, the specific process of training the intelligent agent by adopting the DQN algorithm is as follows: S41, initializing a system, namely randomly initializing training network parameters and an intelligent agent initial state, wherein the radar array antenna shape, the total capacity of an experience playback pool, the minimum training data volume batch, the learning rate, the return discount coefficient