CN-121981196-A - PINN self-adaptive sampling method and PINN self-adaptive sampling equipment based on multi-agent reinforcement learning
Abstract
The invention discloses PINN self-adaptive sampling method and device based on multi-agent reinforcement learning, and belongs to the technical field of artificial intelligence and scientific computing. The method constructs a multi-agent sampling environment, concentrates the agents on collaborative exploration of a space domain under frozen time slices by introducing a fixed time anchor point mechanism, designs a composite rewarding mechanism comprising predicting local change amplitude of solutions, regional coverage rewards and redundancy elimination penalties, enables the agents to autonomously identify and locate complex regions with severe solution changes, trains a sampling strategy based on a multi-agent near-end strategy optimization algorithm, generates a self-adaptive sampling point set and fuses the self-adaptive sampling point set with an initial point set, performs refined training on PINN, solves the problem that the fixed sampling strategy is difficult to capture a critical region when the existing physical information neural network solves a partial differential equation with strong nonlinearity or local high gradient characteristics, improves the solving precision and convergence speed of the model in a complex dynamics system, and effectively reduces the sampling cost.
Inventors
- HE PENG
- Yao Ruoxia
- Bi Shizheng
- Su Liangxiao
Assignees
- 陕西师范大学
Dates
- Publication Date
- 20260505
- Application Date
- 20260128
Claims (10)
- 1. PINN self-adaptive sampling method based on multi-agent reinforcement learning is characterized by comprising the following steps: The method comprises the steps of constructing a multi-agent sampling environment, modeling a sampling process as a Markov decision process, introducing a fixed time anchor point mechanism when the sampling environment is set, and executing actions by agents only in a space dimension; controlling a plurality of agents to execute actions in a set sampling environment, calculating rewards based on local change characteristics of a predicted solution of pre-training PINN under a fixed time anchor point, and obtaining total rewards of the agents by combining regional rewards and redundancy penalties; updating a strategy network and a value network of the intelligent agent by utilizing a multi-intelligent agent reinforcement learning algorithm to obtain a trained self-adaptive sampling strategy; And generating a self-adaptive sampling point set by using a trained self-adaptive sampling strategy, fusing the self-adaptive sampling point set with the initial sampling point set to form a mixed sampling point set, and carrying out fine training on PINN by using the mixed sampling point set to obtain a target PINN model so as to solve a target partial differential equation.
- 2. The multi-agent reinforcement learning based PINN adaptive sampling method of claim 1, wherein obtaining the pre-trained PINN comprises: Acquiring a target partial differential equation, determining a space-time solving domain, a time interval, an initial condition and a boundary condition, and constructing an equation operator expression form capable of being used for automatic differential; And generating an initial sampling point set in the solving domain, constructing a physical information neural network, and pre-training PINN by using the initial sampling point set to obtain pre-trained PINN.
- 3. The multi-agent reinforcement learning based PINN adaptive sampling method of claim 2, wherein generating an initial set of sampling points within the solution domain comprises: the generation method of the initial sampling point is expressed as follows: Wherein, the Solving a domain for the space-time determined in step one; representing an initial sampling point set, wherein the sampling mode is random sampling, uniformly dividing sampling or Latin hypercube sampling.
- 4. The multi-agent reinforcement learning-based PINN adaptive sampling method of claim 1, wherein constructing a multi-agent sampling environment, modeling a sampling process as a markov decision process, introducing a fixed time anchor mechanism when setting the sampling environment, and performing actions by agents only in a spatial dimension comprises: at the beginning of each sampling round, randomly extracting a time value within a time interval And remains unchanged during the sampling pass; In the space domain Each agent is randomly allocated with an initial coordinate A plurality of agents are placed in the same environment together to execute a strategy learning task; Space domain By number of segments Dividing into a plurality of subareas And assigning a zone identifier to each agent ; The state space of an agent is defined as Wherein For the current spatial location of the agent, The method comprises the steps of identifying a space subarea to which the intelligent agent belongs, wherein the action space of the intelligent agent only comprises the movement of a space dimension.
- 5. The multi-agent reinforcement learning-based PINN adaptive sampling method according to claim 1, wherein the total rewards of agents are calculated as: Wherein, the Calculating local differential amplitude values based on a predictive solution for local change rewards; The method comprises the steps of rewarding the region, and calculating based on the distance between an agent and the center of the region; calculating based on the distance between the agents for redundancy penalty; the local variation rewards The calculation mode of (a) is as follows: Set the position of the intelligent body Move to Calculation PINN anchor at fixed time The following predictive solution differential amplitude When (1) When the value is larger than a preset threshold value, giving forward rewards, otherwise, giving the rewards as 0; The regional rewards Redundancy penalty The calculation mode of (a) is as follows: the zone rewards are expressed as Wherein The distance from the current location of the agent to the center of the area, The coefficients are awarded for the region, Is a decay factor, the redundancy penalty is expressed as Wherein For the distance of the current agent from its nearest neighbor, In order to penalize the coefficients, Is the interaction radius.
- 6. The multi-agent reinforcement learning-based PINN adaptive sampling method according to claim 1, wherein updating the agent's policy network and value network with the multi-agent reinforcement learning algorithm to obtain a trained adaptive sampling policy comprises: multiple agents sharing a unified policy network Its update goal is to maximize the expected return: The policy update adopts a clipping target of near-end policy optimization (PPO) in the form of: Is a time step As a result of the above-mentioned desire, As a ratio of the probability values, As a function of the advantages of the present invention, As a parameter of the policy network, Cutting out super parameters; building value networks The dominance function is calculated by: Wherein, the As a discount factor, the number of times the discount is calculated, Is a time step The value network is trained by minimizing the following losses: Wherein, the Is the actual return; In each round of strategy iteration, the strategy network and the value network are updated simultaneously, and the following overall targets are optimized through a gradient descent algorithm: Wherein the method comprises the steps of As an item of policy entropy, The weight coefficients are all larger than 0, after multiple rounds of updating, the intelligent agent can learn a stable self-adaptive sampling strategy, The method comprises the steps that a plurality of agents share a strategy network and evaluate the global state through a centralized value network, and an updating objective function of the strategy network adopts a shearing ratio form: Wherein the method comprises the steps of As a ratio of the probability values, As a dominant function.
- 7. The multi-agent reinforcement learning-based PINN adaptive sampling method of claim 1, wherein generating an adaptive sampling point set using a trained adaptive sampling strategy, fusing it with an initial sampling point set to form a hybrid sampling point set comprises: Recording access positions of the intelligent agents in the exploration process, and screening out points of which the individual rewards meet preset conditions as self-adaptive sampling points: Will be With a uniformly distributed initial set of sampling points Merging to obtain a mixed sampling point set : 。
- 8. The multi-agent reinforcement learning-based PINN adaptive sampling method of claim 1, wherein the mixed sampling training loss function of PINN is: Wherein the method comprises the steps of To account for the residual loss of the physical equation over the mixed sample point set, In order to achieve the loss of the initial condition, Is a boundary condition loss.
- 9. A computer device comprising a processor and a memory, the memory configured to store a computer executable program, the processor configured to read and execute a portion or all of the computer executable program from the memory, the processor configured to implement the multi-agent reinforcement learning-based PINN adaptive sampling method of any one of claims 1-7 when executing a portion or all of the computer executable program.
- 10. A computer readable storage medium, wherein a computer program is stored in the computer readable storage medium, and when the computer program is executed by a processor, the method for performing the PINN adaptive sampling based on multi-agent reinforcement learning according to any one of claims 1 to 7 is implemented.
Description
PINN self-adaptive sampling method and PINN self-adaptive sampling equipment based on multi-agent reinforcement learning Technical Field The invention belongs to the technical fields of scientific calculation, partial differential equation numerical solution and artificial intelligence, and particularly relates to a physical information neural network (Physics-Informed Neural Networks, PINNs) self-adaptive sampling method and device based on multi-agent reinforcement learning. Background Partial Differential Equations (PDEs) are important mathematical tools to describe the phenomenon of continuous variation in nature and engineering systems. In recent years, with the rapid development of scientific computing technology, a partial differential equation solving method based on deep learning gradually becomes a research hotspot. The physical information neural network is widely focused in the fields of electromagnetic field simulation, structural dynamics, hydrodynamics, quantum physics, biological molecular dynamics and the like by virtue of the characteristics of no grid and strong generalization. PINNs embeds the control partial differential equation, the initial condition and the boundary condition into the loss function of the neural network in a residual form, so that the network can simultaneously approach data and meet physical constraint in the training process, thereby having good theoretical consistency and practical flexibility. However, existing PINNs methods generally rely on a set of fixed sampling points that are pre-generated within the computational domain. For partial differential equations (such as nonlinear wave equation, reactive diffusion equation, singular perturbation equation and the like) with nonlinear characteristics, drastic local changes or multi-scale behaviors, even sampling is difficult to effectively capture a complex structure solved in a local area, so that network training is difficult, and solving accuracy is remarkably reduced. In practical engineering application, nonlinear wave equations are greatly appeared in the scenes of superconducting device simulation, nonlinear medium wave propagation, crystal dislocation dynamics, large-amplitude structural vibration and the like, the equations often contain obvious nonlinear terms and local steep areas, higher requirements are put on the precision and stability of a solving method, and a fixed sampling strategy is difficult to meet the requirements. To improve the solution accuracy, adaptive sampling methods based on residual driving, such as Residual Adaptive Refinement (RAR) and Residual Adaptive Distribution (RAD), have been proposed. These methods select new sampling points by calculating the residual of the current PINNs solution, thereby encrypting the regions with larger errors. Although the residual driving method improves the sampling efficiency to a certain extent, the residual driving method has the defects that firstly PDE residual errors need to be evaluated on a large number of candidate points, so that the calculation cost is huge, secondly, the method essentially belongs to a local greedy strategy, only can carry out passive encryption according to residual information of a current model, is easy to sink into local optimum, and lacks global exploration capability. Furthermore, in high dimensional problems or highly nonlinear dynamical equations, the residual structure itself may be affected by noise, resulting in failure or instability of the sampling strategy. These drawbacks of the conventional PINNs sampling method make it difficult for the prior art to achieve efficient, high-precision solutions to complex partial differential equations. Under the background of practical engineering, such as superconducting Josephson junction model, large-scale mechanical structure dynamic response analysis and nonlinear material behavior simulation, partial differential equations tend to have high nonlinearity and severe local variation, and the existing sampling strategy is difficult to ensure the solving efficiency, precision and stability at the same time. Therefore, a new sampling method with active exploration capability is needed, which can adaptively find key areas and effectively improve PINNs solving accuracy. Disclosure of Invention The invention aims to solve the problems that the existing Physical Information Neural Network (PINNs) is fixed in sampling point distribution, difficult to self-adaptively identify a high-error area, insufficient in global exploration capacity, high in residual calculation overhead and the like when solving a complex Partial Differential Equation (PDE). Particularly, for a dynamics control equation with important significance in engineering such as a nonlinear wave equation, the solution of the dynamics control equation usually comprises obvious spatial non-uniformity, a local high gradient area or a multi-scale structure, and the traditional uniform sampling method cannot effective