KR-20260063062-A - Array antenna system using technology for defect detection and beam pattern compensation based on PPO reinforcement learning

KR20260063062AKR 20260063062 AKR20260063062 AKR 20260063062AKR-20260063062-A

Abstract

An array antenna system using PPO reinforcement learning-based fault detection and beam pattern compensation technology according to one embodiment of the present invention includes: a data collection unit that collects signal data by monitoring the output signals of arrayed antenna modules; and a processor that identifies faulty modules among the arrayed antenna modules by utilizing the collected signal data as input data for a Deep Neural Network (DNN)-based Proximal Policy Optimization (PPO) algorithm. By doing so, the beam pattern can be compensated to maintain communication quality and improve the reliability and availability of the system.

Inventors

남승구
최세환
홍석연
이보영

Assignees

한국전자기술연구원

Dates

Publication Date: 20260507
Application Date: 20241030

Claims (12)

A data collection unit that collects signal data by monitoring the output signal of an arrayed antenna module; and An array antenna system utilizing PPO reinforcement learning-based fault detection and beam pattern compensation technology, comprising: a processor that identifies faulty modules among arrayed antenna modules by utilizing collected signal data as input data for a Deep Neural Network (DNN)-based Proximal Policy Optimization (PPO) algorithm.
In claim 1, The processor, When a faulty module is identified, the PPO algorithm is utilized to calculate the optimal phase and optimal gain of each antenna module reflecting the faulty module based on the identification result, and An array antenna system using PPO reinforcement learning-based fault detection and beam pattern compensation technology, characterized by controlling the antenna module to compensate the beam pattern by applying calculation results.
In claim 2, The processor, An array antenna system utilizing PPO reinforcement learning-based fault detection and beam pattern compensation technology, characterized by performing a training process of a PPO algorithm using current signal characteristics extracted from signal data to identify fault modules and calculate optimal phase and amplitude control values.
In claim 3, The PPO algorithm is, An array antenna system utilizing PPO reinforcement learning-based fault detection and beam pattern compensation technology, characterized by applying current signal characteristics to the state of the environment, applying optimal phase and amplitude control values of each antenna module to the action taken by the reinforcement learning agent, setting a communication quality indicator as a reward, and performing a training process while updating the policy in a direction that maximizes the reward.
In claim 4, The PPO algorithm is, An array antenna system using PPO reinforcement learning-based fault detection and beam pattern compensation technology, characterized in that the state of the environment (S t ) according to time (t) is the Received Signal Strength Indicator (RSSI), the state of the environment (S t+1 ) according to time (t+1) is the Signal-to-Noise Ratio (SNR), and when the phase and amplitude adjustment values of each antenna module are applied to the action taken by the reinforcement learning agent, at least one indicator among Error Vector Magnitude (EVM), Bit Error Rate (BER), and Data Transmission Rate is set as a compensation.
In claim 5, The processor, An array antenna system utilizing PPO reinforcement learning-based fault detection and beam pattern compensation technology, characterized by continuously updating a trained (learned) PPO algorithm to respond to new environments and failure situations.
In claim 6, The processor, An array antenna system using PPO reinforcement learning-based fault detection and beam pattern compensation technology, characterized by updating the PPO algorithm by converting the optimal phase and amplitude control values of each antenna module calculated by the trained PPO algorithm into compensations.
In claim 5, The processor, An array antenna system utilizing PPO reinforcement learning-based fault detection and beam pattern compensation technology, characterized by, when two or more indicators among Error Vector Magnitude (EVM), Bit Error Rate (BER), and Data Transmission Rate are set as rewards for the PPO algorithm, assigning equal weight to each indicator set as a reward and updating the policy of the PPO algorithm in a direction that maximizes each indicator.
In claim 5, The processor, When two or more metrics among Error Vector Magnitude (EVM), Bit Error Rate (BER), and Data Transfer Rate are set as the reward for the PPO algorithm, different weights are assigned to each metric set as a reward, the policy of the PPO algorithm is updated in a direction that maximizes the metric with the relatively largest weight, and the beam pattern is compensated by utilizing the updated PPO algorithm. An array antenna system using PPO reinforcement learning-based fault detection and beam pattern compensation technology, characterized by comparing the signal characteristics of the output signal of the antenna module with a beam pattern compensated and the output signal of the antenna module in a normal state, and if the error per indicator is greater than or equal to a preset threshold, updating the policy of the PPO algorithm again in a direction that maximizes the indicator with the second largest weight assigned relatively.
A step of collecting signal data by monitoring the output signal of an arrayed antenna module through a data collection unit; and A method for controlling an array antenna using PPO reinforcement learning-based fault detection and beam pattern compensation technology, comprising the step of a processor utilizing collected signal data as input data for a Deep Neural Network (DNN)-based Proximal Policy Optimization (PPO) algorithm to identify a faulty module among the arrayed antenna modules.
A data collection unit that collects signal data by monitoring the output signal of an arrayed antenna module; and An array antenna system utilizing PPO reinforcement learning-based fault detection and beam pattern compensation technology, comprising: a processor that utilizes collected signal data as training data for a Deep Neural Network (DNN)-based Proximal Policy Optimization (PPO) algorithm to perform a training process for a PPO algorithm for fault module identification and beam pattern compensation, applies newly collected signal data as input data to the trained (learned) PPO algorithm to identify fault modules among the arrayed antenna modules, and performs a task to compensate the beam pattern by applying the identification result.
A step of collecting signal data by monitoring the output signal of the arrayed antenna module through a data collection unit; A step in which a processor utilizes collected signal data as training data for a Deep Neural Network (DNN)-based Proximal Policy Optimization (PPO) algorithm to perform a training process for a PPO algorithm for fault module identification and beam pattern compensation; A processor applies newly collected signal data as input data to a trained (learned) PPO algorithm to identify a faulty module among the arrayed antenna modules; and Array antenna control method using PPO reinforcement learning-based fault detection and beam pattern compensation technology, comprising the step of a processor performing a beam pattern compensation operation by applying identification results.

Description

Array antenna system using technology for defect detection and beam pattern compensation based on PPO reinforcement learning The present invention relates to an array antenna system, and more specifically, to an array antenna system that detects failures of some antenna elements in real time and maintains communication quality by compensating beam patterns using Proximal Policy Optimization (PPO), a reinforcement learning algorithm. FIG. 1a is a diagram illustrating a beam pattern being emitted in a desired direction in a conventional array antenna system, and FIG. 1b is a diagram illustrating a phenomenon in which the beam pattern is distorted due to a faulty antenna element (antenna module) in a conventional array antenna system. In an array antenna system configured with multiple antenna modules arranged, as exemplified in FIG. 1, if one or more specific antenna modules among the multiple antenna modules fail, the main lobe of the beam deviates from the desired direction or the side lobe increases, causing a problem in which the communication quality deteriorates. Accordingly, there is a need to explore methods to detect failures of some antenna elements in array antenna systems in real time and to compensate for beam patterns based on the detection results. FIG. 1 is a drawing illustrating the appearance of a beam pattern being emitted in a desired direction in a conventional array antenna system and the phenomenon of a beam pattern being distorted due to a faulty antenna element (antenna module) in a conventional array antenna system. FIG. 2 is a drawing provided for the description of the configuration of an array antenna system using PPO reinforcement learning-based defect detection and beam pattern compensation technology according to an embodiment of the present invention. FIG. 3 is a drawing provided for a more detailed configuration description of the processor illustrated in FIG. 2. FIG. 4 is a diagram provided for explaining the training (Pre-Train) process of the PPO algorithm through the PPO algorithm training unit illustrated in FIG. 3. FIG. 5 is a diagram illustrating the result of performing a beam pattern compensation operation using a PPO algorithm trained through a system according to an embodiment of the present invention. FIG. 6 is a diagram illustrating a beam pattern compensation result using a PPO algorithm trained through a system according to an embodiment of the present invention, and FIG. 7 is a flowchart provided to describe an array antenna control method using PPO reinforcement learning-based defect detection and beam pattern compensation technology according to one embodiment of the present invention. The present invention will be described in more detail below with reference to the drawings. To clearly explain the invention, parts unrelated to the description have been omitted from the drawings, and in the drawings, the width, length, thickness, etc., of the components may be exaggerated for convenience. FIG. 2 is a diagram provided to describe the configuration of an array antenna system using PPO reinforcement learning-based defect detection and beam pattern compensation technology according to one embodiment of the present invention. An array antenna system using PPO reinforcement learning-based fault detection and beam pattern compensation technology according to the present embodiment (hereinafter collectively referred to as the 'array antenna system') is provided to detect failures of some antenna elements in the array antenna system in real time and to compensate for beam patterns by utilizing Proximal Policy Optimization (PPO), a reinforcement learning algorithm. To this end, the array antenna system may include a data collection unit (100), a processor (200), and a storage unit (300) as exemplified in FIG. 2. The data collection unit (100) can collect signal data by monitoring the output signal of the arrayed antenna module. The storage unit (300) is provided to store programs and data necessary for the operation of the processor (200). A processor (200) is provided to detect failures of some antenna elements in an array antenna system in real time and to handle matters for compensating the beam pattern using a PPO algorithm (model). Specifically, the processor (200) can use signal data as training data for a Deep Neural Network (DNN)-based PPO algorithm to perform a training (learning) process for the PPO algorithm, use the signal data as input data for the trained (learned) PPO algorithm to identify faulty modules among the arrayed antenna modules, and apply the identification result to perform a task of compensating the beam pattern. For example, when a faulty module is identified, the processor (200) can use a PPO algorithm to calculate the optimal phase and optimal amplitude control value (Gain) of each antenna module that reflects the faulty module based on the identification result, and apply the calculation result to control the antenna module so that the