CN-121983256-A - Air-ground integrated medical rescue command and decision support method

CN121983256ACN 121983256 ACN121983256 ACN 121983256ACN-121983256-A

Abstract

The application discloses a space-ground integrated medical rescue command and decision support method and system, wherein the method comprises the following steps of constructing and running a high-fidelity rescue simulation environment, obtaining a global rescue situation state and local observations of each rescue agent in a multi-agent decision center, generating strategies for each type of rescue agent through a trained mixed decision engine based on the global rescue situation state and the local observations of each rescue agent, outputting a combined action instruction, issuing the combined action instruction to a corresponding rescue agent in the high-fidelity rescue simulation environment for execution, driving environmental state transition, obtaining composite rewards including team rewards and individual rewards, and performing iterative optimization on the strategies in the multi-agent decision center based on the environmental state transition and the composite rewards. The application can dynamically adjust the strategy, considers the overall benefit of the team, and the final strategy tends to be globally better.

Inventors

HUANG YUHONG
YANG LINA
HUANG JINYAO
LI XINLE
DAI JUN
ZHANG CHUNGANG

Assignees

中船海神医疗科技有限公司

Dates

Publication Date: 20260505
Application Date: 20251229

Claims (10)

1. The air-ground integrated medical rescue command and decision support method is characterized by being based on centralized training and distributed execution of centralized training, and a distributed execution CTDE architecture, which is realized by interaction of a multi-agent decision center and a high-fidelity rescue simulation environment, and comprises the following steps: Constructing and operating the high-fidelity rescue simulation environment, wherein the high-fidelity rescue simulation environment simulates a rescue scene comprising wounded events, rescue intelligent bodies, geographic space and dynamic uncertainty, and generates a global rescue situation state; Acquiring the global rescue situation state and local observation of each rescue intelligent agent in the multi-intelligent agent decision center; Based on the global rescue situation state and the local observation of each rescue agent, generating a strategy for each type of rescue agent through a trained mixed decision engine, and outputting a combined action instruction, wherein the mixed decision engine integrates a multi-agent reinforcement learning module and a space-time optimization algorithm module; Issuing the combined action instruction to a corresponding rescue intelligent agent in the high-fidelity rescue simulation environment for execution, driving the environment state transfer, and obtaining composite rewards comprising team rewards and individual rewards; And performing iterative optimization on the strategy in the multi-agent decision center based on the environmental state transition and the composite rewards.
2. The method of claim 1, wherein the centralized training, distributed execution CTDE architecture includes a centralized evaluator network that uses the global rescue situation state and actions of all rescue agents during a training phase to estimate a value of a joint action; The distributed executor networks are respectively corresponding to various types or each rescue intelligent agent and are used for outputting action strategies according to the respectively corresponding local observation; wherein the parameter updating of the actuator network is guided by a policy gradient provided by the evaluator network.
3. The method according to claim 1 or 2, wherein the generating, by training a completed hybrid decision engine, a policy for each type of rescue agent based on the global rescue situation state and the local observation of each rescue agent, and outputting a joint action instruction includes: Distributing proper rescue agents for a new rescue task through a strategy output by the multi-agent reinforcement learning module; Aiming at the rescue intelligent agent allocated with a specific rescue task, calling the space-time optimization algorithm module, and calculating an optimized path from the current position to a task target point based on the current environment instantaneous state; and taking the key node information of the optimized path as priori knowledge, inputting the priori knowledge into a strategy network of the corresponding rescue intelligent agent, and generating a final executable action instruction.
4. The method of claim 1, wherein the composite prize comprises: global team rewards, when the rescue task is successfully completed, all participating rescue intelligent agents acquire positive rewards, and when the task fails, the rescue intelligent agents acquire negative rewards; The individual efficiency rewards, and rewards or penalties are given according to the movement efficiency or task execution progress of the rescue intelligent agent; A cooperative rewarding step of giving additional forward rewarding when two or more rescue intelligent agents complete a preset cooperative action mode; constraint penalties, which give penalties to actions that violate preset operating rules or physical constraints.
5. The method according to claim 2, characterized in that the actuator network and/or the evaluator network is a deep neural network, the input layer of which is designed as a module for encoding the global rescue situation state and/or the local observations, which encoding module fuses the multi-source heterogeneous data into a unified feature vector.
6. An air-ground integrated intelligent medical rescue collaborative decision-making system for implementing the method of any one of claims 1-5, comprising: the high-fidelity rescue simulation environment module is used for simulating and generating a dynamic rescue scene, physical interaction of rescue agents and a global state; The multi-agent decision center module comprises a CTDE architecture-based hybrid decision engine, a control module and a control module, wherein the hybrid decision engine is used for receiving states and observation, calculating and outputting joint action instructions; And the training and optimizing module is used for managing the interaction process of the multi-agent decision center module and the high-fidelity rescue simulation environment module, collecting experience data and updating parameters of the hybrid decision engine.
7. The system of claim 6, wherein the high fidelity rescue simulation environment module comprises: the scene generating unit is used for configuring or randomly generating the positions, types and quantity of wounded events; the physical simulation unit is used for simulating the kinematics, dynamics and environment interaction of the rescue intelligent body; an uncertainty injection unit for introducing communication delay, equipment failure or environmental mutation in the simulation; And the evaluation unit is used for calculating key performance indexes of rescue response time, success rate and resource utilization rate.
8. The system of claim 6, wherein the hybrid decision engine comprises a multi-agent reinforcement learning module based on a multi-agent near-end policy optimization MAPPO algorithm or a multi-agent depth deterministic policy gradient MADDPG algorithm.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any one of claims 1 to 5 when the program is executed by the processor.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any one of claims 1 to 5.

Description

Air-ground integrated medical rescue command and decision support method Technical Field The invention relates to the crossing field of intelligent emergency response and intelligent medical technology, in particular to an air-ground integrated medical rescue collaborative decision system based on Multi-agent reinforcement learning (Multi-Agent Reinforcement Learning, MARL) and a space-time optimization algorithm and a core algorithm implementation thereof. Background The existing medical rescue system has the main technical bottlenecks that data formats and communication protocols of heterogeneous units such as unmanned aerial vehicles, ambulances and hospitals are different, real-time fusion and characterization capability under a unified space-time frame is lacking, the existing scheduling system is mostly based on rules or simple heuristic algorithms, cannot adapt to complex decision environments with high dynamics, multiple targets and strong coupling under a sudden rescue scene, an effective task-level cooperative mechanism is lacking among space-ground rescue units, high-efficiency cooperative modes such as unmanned aerial vehicle pilot reconnaissance-ambulance precise connection or unmanned aerial vehicle emergency material delivery-ambulance on-rescue on-the-way are difficult to realize, and the traditional optimization algorithm relies on accurate mathematical models and complete information assumptions, so that uncertainty and partial observability in reality are difficult to process. Disclosure of Invention The technical problems to be solved by the invention mainly include how to overcome the main technical problems of the existing medical rescue system. In a first aspect, an embodiment of the present invention provides a space-based integrated medical rescue command and decision support method, based on centralized training and distributed execution of centralized training, a distributed execution CTDE architecture, implemented by interaction of a multi-agent decision center and a high-fidelity rescue simulation environment, including the following steps: Constructing and operating the high-fidelity rescue simulation environment, wherein the high-fidelity rescue simulation environment simulates a rescue scene comprising wounded events, rescue intelligent bodies, geographic space and dynamic uncertainty, and generates a global rescue situation state; Acquiring the global rescue situation state and local observation of each rescue intelligent agent in the multi-intelligent agent decision center; Based on the global rescue situation state and the local observation of each rescue agent, generating a strategy for each type of rescue agent through a trained mixed decision engine, and outputting a combined action instruction, wherein the mixed decision engine integrates a multi-agent reinforcement learning module and a space-time optimization algorithm module; Issuing the combined action instruction to a corresponding rescue intelligent agent in the high-fidelity rescue simulation environment for execution, driving the environment state transfer, and obtaining composite rewards comprising team rewards and individual rewards; And performing iterative optimization on the strategy in the multi-agent decision center based on the environmental state transition and the composite rewards. In the specific embodiment of the invention, the distributed execution CTDE architecture comprises a centralized evaluator network, a central processing unit and a central processing unit, wherein the centralized evaluator network uses the global rescue situation state and actions of all rescue agents to estimate the value of the combined actions in a training stage; The distributed executor networks are respectively corresponding to various types or each rescue intelligent agent and are used for outputting action strategies according to the respectively corresponding local observation; wherein the parameter updating of the actuator network is guided by a policy gradient provided by the evaluator network. In a specific embodiment of the present invention, the generating a policy for each type of rescue agent and outputting a joint action instruction based on the global rescue situation state and the local observation of each rescue agent by a trained hybrid decision engine includes: Distributing proper rescue agents for a new rescue task through a strategy output by the multi-agent reinforcement learning module; Aiming at the rescue intelligent agent allocated with a specific rescue task, calling the space-time optimization algorithm module, and calculating an optimized path from the current position to a task target point based on the current environment instantaneous state; and taking the key node information of the optimized path as priori knowledge, inputting the priori knowledge into a strategy network of the corresponding rescue intelligent agent, and generating a final executable action instruction. In a specific em