CN-121995904-A - Collaborative tracking control method based on distributed reinforcement learning and related device

CN121995904ACN 121995904 ACN121995904 ACN 121995904ACN-121995904-A

Abstract

The invention belongs to the technical field of collaborative tracking, and discloses a collaborative tracking control method and a relevant device based on distributed reinforcement learning, wherein the collaborative tracking control system based on the distributed reinforcement learning is used for a distributed system formed by networking a plurality of photoelectric theodolites, the distributed system comprises a plurality of agents serving as independent tracking nodes, each agent characterizes one photoelectric theodolite node, each agent comprises a calculation processing unit connected with a servo control unit, the collaborative tracking control system comprises an agent reinforcement learning decision layer, the agent reinforcement learning decision layer is deployed in the calculation processing unit, the instruction processing layer is deployed in the calculation processing unit and connected with the agent reinforcement learning decision layer, and the servo control execution layer is deployed in the servo control unit and connected with the instruction processing layer.

Inventors

XUE HAOQI
XIE MEILIN
WANG FAN
Xing Runqiang
Cheng Xiawen
ZHANG YIMING

Assignees

中国科学院西安光学精密机械研究所

Dates

Publication Date: 20260508
Application Date: 20260318

Claims (10)

1. The distributed reinforcement learning-based cooperative tracking control system is used for a distributed system formed by networking a plurality of photoelectric theodolites, and is characterized by comprising a plurality of agents serving as independent tracking nodes, wherein each agent characterizes one photoelectric theodolite node, each agent comprises a computing processing unit connected with a servo control unit, and the cooperative tracking control system comprises: The intelligent agent reinforcement learning decision layer is deployed in the calculation processing unit and is used for generating a multi-dimensional action vector based on local observation information of the intelligent agent, communication information received from adjacent intelligent agents and a historical action sequence, wherein the multi-dimensional action vector comprises role setting, communication action, azimuth angle and pitch angle setting values of the intelligent agent; The instruction processing layer is arranged in the calculation processing unit, connected with the intelligent agent reinforcement learning decision layer and used for receiving the multidimensional motion vector, sequentially carrying out safety check and smoothing on the set values of the azimuth angle and the pitch angle, and outputting a target angle value; and the servo control execution layer is arranged in the servo control unit, connected with the instruction processing layer and used for receiving the target angle value and driving the servo control unit to track the target based on the target angle value.
2. The distributed system formed by the networking of the photoelectric theodolites comprises a plurality of photoelectric theodolites and is characterized by comprising the cooperative tracking control system as claimed in claim 1, wherein the cooperative tracking control system is in communication connection with the plurality of photoelectric theodolites and is used for performing cooperative tracking control on the plurality of photoelectric theodolites.
3. A collaborative tracking control method based on distributed reinforcement learning is characterized in that, a distributed reinforcement learning based collaborative tracking control system in accordance with claim 1, comprising the steps of: S1, each intelligent agent acquires own local observation information and historical action sequences, receives communication information shared by adjacent intelligent agents, and constructs a joint observation state based on the local observation information, the historical action sequences and the communication information; S2, generating a multidimensional motion vector based on the joint observation state through an agent reinforcement learning decision layer; S3, receiving the multidimensional motion vector through an instruction processing layer, sequentially performing safety check and smoothing on the azimuth angle set value and the pitch angle set value, and outputting a target angle value; And S4, driving the intelligent agent to track the target through the servo control execution layer according to the target angle value, and executing S1-S4 again until the tracking task is finished.
4. The collaborative tracking control method based on distributed reinforcement learning of claim 3, wherein the local observation information includes an azimuth angle of the agent, a pitch angle current value, an off-target amount of the target, whether the target is in a field of view, and a relative position between the target and the agent.
5. The collaborative tracking control method based on distributed reinforcement learning according to claim 3, wherein the decision-making layer for reinforcement learning by an agent generates a multidimensional motion vector based on the joint observation state, specifically comprising: And inputting the joint observation state into a local Actor network through an agent reinforcement learning decision layer, and outputting a normalized multidimensional motion vector by the local Actor network according to the current strategy mapping.
6. The collaborative tracking control method based on distributed reinforcement learning according to claim 3, wherein the training step of the agent reinforcement learning decision layer specifically comprises: in a simulation environment, each agent executes S1-S4 to generate experience data; Centralized training is carried out on the Actor network of each intelligent agent based on experience data by utilizing a centralized Critic network and team rewards, so that trained strategy network parameters are obtained; And deploying the trained strategy network parameters to each agent to complete training of the agent reinforcement learning decision layer.
7. The collaborative tracking control method based on distributed reinforcement learning according to claim 3, wherein the smoothing process specifically comprises: And filtering the azimuth angle and pitch angle set values after the safety verification.
8. The collaborative tracking control method based on distributed reinforcement learning according to claim 3, wherein the driving agent tracks a target according to the target angle value by the servo control execution layer, specifically comprising: And generating a position deviation according to the target angle value through a servo control execution layer, generating a speed command based on the position deviation, and driving a servo control unit to track a target through the speed command and the actual angular speed.
9. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 3-8 when the computer program is executed.
10. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method according to any of claims 3-8.

Description

Collaborative tracking control method based on distributed reinforcement learning and related device Technical Field The invention relates to the technical field of collaborative tracking, in particular to a collaborative tracking control method based on distributed reinforcement learning and a related device. Background Photoelectric theodolites are key devices in the fields of modern target range measurement, space target monitoring, astronomical observation and the like, and angle measurement and continuous tracking are carried out on a moving target through photoelectric sensors. Along with the higher and higher speed of movement and stronger mobility of the target, a single theodolite is difficult to independently finish continuous and stable tracking of a high-speed maneuvering target due to the limitations of small physical field of view, tracking blind areas and the like. In order to break through the performance bottleneck of a single photoelectric theodolite, the research tends to networking a plurality of photoelectric theodolites to form a distributed system. The plurality of photoelectric theodolites are spatially distributed and are in network interconnection to realize coordination, so that a perception community is formed, field complementation is effectively realized, the observation range is enlarged, and blind areas are avoided. By using the angle intersection method, the high-precision estimation of the target space position can be realized. In task allocation, multi-objective tracking, role dynamic switching, task reconstruction and the like are supported. Under the condition of single node, if faults exist, the system is paralyzed, but the multi-node networking system has stronger fault tolerance capability, and can realize local strategy adjustment under the scene of limited communication and sudden change of a target. In applications such as intelligent traffic monitoring and aerospace target recognition, the multi-sensor network system shows comprehensive performance superior to that of a single-point observation system. However, currently mainstream multi-electro-optic theodolite collaborative tracking systems mostly adopt a centralized control architecture. The architecture is generally provided with a central node (such as a central server or a main control station) and is responsible for summarizing data such as target miss distance, angle information and the like acquired by all theodolite nodes, running a fusion algorithm to generate a global situation, uniformly calculating expected tracking angles of all the nodes, and issuing a control instruction to each theodolite for execution. This "central computing, peripheral execution" mode, while convenient for management and global optimal solution, has inherent drawbacks. First, the central node is the "heart" of the overall system, which, once it fails, suffers from an attack or breaks in the communication link, will cause the entire tracking network to break down, with a serious single point of failure risk, and its viability is a concern in a highly resistant battlefield environment. Second, such architecture lacks real-time adaptivity. For complex dynamic scenes such as severe maneuver of a target, short signal loss caused by cloud and fog shielding and the like, the command generation period of a central node is longer, the strategy is solidified, local changes are difficult to respond in time, and the target is lost or the joint is in cooperation and disjointed. In addition, each node passively executes instructions, independent decision capability is lacked, rapid adjustment cannot be performed according to local observation information, and overall robustness and flexibility of the system are restricted. In summary, how to improve the adaptability of the multi-photoelectric theodolite system to complex scenes and the autonomy of collaborative tracking while eliminating the single-point fault risk of the central node has become a technical problem to be solved. Disclosure of Invention The invention aims to provide a collaborative tracking control method and a relevant device based on distributed reinforcement learning, which are used for solving the problems in the prior art and improving the adaptability of a multi-photoelectric theodolite system to complex scenes and the autonomy of collaborative tracking while eliminating the single-point fault risk of a central node. In order to achieve the above purpose, the technical scheme adopted by the invention is as follows: In a first aspect, the present invention provides a collaborative tracking control system based on distributed reinforcement learning, for a distributed system formed by networking a plurality of electro-optic theodolites, where the distributed system includes a plurality of agents serving as independent tracking nodes, each agent characterizes one electro-optic theodolite node, each agent includes a computing processing unit connected with a servo control u