CN-121999970-A - Dynamic rehabilitation physiotherapy system based on reinforcement learning

CN121999970ACN 121999970 ACN121999970 ACN 121999970ACN-121999970-A

Abstract

The invention discloses a dynamic rehabilitation physiotherapy system based on reinforcement learning, which relates to the technical field of medical rehabilitation equipment and comprises a physiological signal acquisition and processing module, an action generation module, a model optimization module, a closed loop feedback and termination module, a model optimization module and a closed loop feedback and termination module, wherein the physiological signal acquisition and processing module is used for acquiring multi-mode data and processing the multi-mode data to obtain standardized physiological signal data, the action generation module is used for constructing a state vector according to current waveform parameters and the physiological signal data to obtain an action vector, the waveform generation module is used for updating corresponding waveform parameters according to the action vector and then combining the corresponding waveform parameters to obtain a combined waveform, the model optimization module is used for acquiring experience tuples according to reward values calculated by a multi-objective reward function and updating DDPG models, and the closed loop feedback and termination module is used for circularly executing the steps of physiological signal acquisition to model optimization according to a preset period and executing corresponding stopping operation when stopping conditions are monitored. The invention solves the problem of poor adaptability of the rehabilitation physiotherapy instrument in the prior art.

Inventors

TAO YAQIN
WANG SHANCHENG
XIONG ZICHEN

Assignees

南昌耀光科技有限公司

Dates

Publication Date: 20260508
Application Date: 20251203

Claims (10)

1. A reinforcement learning-based dynamic rehabilitation physiotherapy system, the system comprising: The physiological signal acquisition and processing module is used for acquiring multi-mode data through the distributed multi-mode sensors and processing the multi-mode data to obtain standardized physiological signal data; the motion generation module is used for constructing a state vector according to the current waveform parameters and the physiological signal data and obtaining a motion vector according to the state vector based on the DDPG model; The waveform generation module is used for updating corresponding waveform parameters according to the motion vectors and then combining the waveform parameters to obtain a combined waveform, wherein the combined waveform is used for rehabilitation physiotherapy; the model optimization module is used for obtaining experience tuples for updating DDPG models according to the rewards obtained by calculation of the multi-objective rewards function; And the closed loop feedback and termination module is used for circularly executing the steps from physiological signal acquisition to model optimization according to a preset period, and executing corresponding stopping operation when the stopping condition is monitored to be met.
2. The reinforcement learning based dynamic rehabilitation therapy system of claim 1, further comprising: The initialization module is used for setting initial waveform parameters and loading pre-trained DDPG model weights; Wherein the initial waveform parameters at least comprise a fundamental frequency Amplitude of vibration Waveform mixing ratio 。
3. The reinforcement learning based dynamic rehabilitation therapy system according to claim 2, wherein the steps of acquiring multi-modal data by the distributed multi-modal sensors and processing the multi-modal data to obtain standardized physiological signal data include: collecting skin impedance by four-electrode method Electromyographic signals are obtained through surface electromyographic sensors and frequency domain energy is extracted Estimating pain scores via touch screen input or heart rate variability algorithm ; Impedance to skin Frequency domain energy Pain scoring And carrying out band-pass filtering and normalization processing to generate standardized physiological signal data.
4. The reinforcement learning based dynamic rehabilitation therapy system according to claim 3, wherein the step of constructing a state vector from the current waveform parameters and the physiological signal data, and obtaining the motion vector from the state vector based on DDPG model comprises: Integrating current waveform parameters and physiological signal construction corresponding state vector ; The Actor network based on DDPG model outputs normalized motion vector according to state vector, then performs physical quantity mapping to obtain motion vector, and the motion vector is used for performing parameter adjustment motion ; Wherein, the As the frequency of the current waveform parameters, For the amplitude of the current waveform parameter, For the mixing ratio of the current waveform parameters, In order to be an impedance value, In the form of myoelectric energy, For the purpose of scoring the pain, For the frequency adjustment step size, , For the amplitude adjustment step size, , The step size is adjusted for the mixing ratio, 。
5. The reinforcement learning-based dynamic rehabilitation physiotherapy system according to claim 4, wherein the step of updating the corresponding waveform parameters according to the motion vector and combining to obtain a combined waveform, the combined waveform being used for rehabilitation physiotherapy comprises: Combined waveform Wherein, the Is a square wave, which is a square wave, Is an index wave which is used for the generation of the wave, In order to adjust the frequency of the waveform parameters, In order to adjust the amplitude of the waveform parameter, Is the mixing proportion of the waveform parameters after adjustment.
6. The reinforcement learning based dynamic rehabilitation therapy system of claim 5, wherein the expression of the multi-objective rewarding function is: Wherein, the (Myoelectric energy decrease value); for the first period after initiation of treatment The value of the sum of the values, In the form of myoelectric energy, For the purpose of scoring the pain, Is the energy consumption proportion.
7. The reinforcement learning based dynamic rehabilitation therapy system according to claim 6, wherein the step of normalizing the motion vector and performing physical quantity mapping to obtain the motion vector comprises: Wherein, the In order to normalize the motion vector, Is a motion vector.
8. The reinforcement learning based dynamic rehabilitation therapy system of claim 3, wherein the step of deriving an experience tuple for updating DDPG the model: storing the experience tuples to a playback buffer; Randomly sampling a preset group experience every preset period to update the Critic network, namely minimizing TD error, updating the Actor network, and updating the gradient ascending strategy Target network soft update: ( ); Wherein E [ And is a mathematical expectation that the term "mathematical expectation", For the parameters of the current Actor network , For the output of the current Critic network, In order to deflect the motion, As a parameter of the target network, Is a parameter of the current network.
9. The reinforcement learning based dynamic rehabilitation therapy system of claim 1, further comprising: and the fusing module is used for triggering fusing to immediately stop waveform output and alarm when the pain score is detected to be larger than a preset threshold value.
10. The reinforcement learning based dynamic rehabilitation therapy system of claim 1, wherein the stop condition comprises: The physiotherapy time is longer than the preset time; Pain scoring queue All elements in the method are larger than a preset value; Monitoring the state opening of an emergency stop button; the terminating operation includes: Gradually reducing the waveform amplitude to 0V; preserving parameter adjustment sequences Physiological signal timing Sequence of prize values ; Uploading the parameter adjustment sequence, the physiological signal time sequence and the rewarding value to a cloud model training pool.

Description

Dynamic rehabilitation physiotherapy system based on reinforcement learning Technical Field The invention relates to the technical field of medical rehabilitation equipment, in particular to a dynamic rehabilitation physiotherapy system based on reinforcement learning. Background In the field of rehabilitation physiotherapy, the prior art has a plurality of limitations. For example, waveform parameters of a rehabilitation physiotherapy instrument usually depend on static preset, and cannot be dynamically adjusted according to real-time physiological states and individual differences of patients, so that curative effects fluctuate and comfort of the patients is poor. Disclosure of Invention In view of the above, the present invention aims to provide a dynamic rehabilitation physiotherapy system based on reinforcement learning, which aims to solve the problem of poor adaptability of physiotherapy systems in the prior art. In one aspect, the present invention provides a reinforcement learning-based dynamic rehabilitation physiotherapy system, the system comprising: The physiological signal acquisition and processing module is used for acquiring multi-mode data through the distributed multi-mode sensors and processing the multi-mode data to obtain standardized physiological signal data; the motion generation module is used for constructing a state vector according to the current waveform parameters and the physiological signal data and obtaining a motion vector according to the state vector based on the DDPG model; The waveform generation module is used for updating corresponding waveform parameters according to the motion vectors and then combining the waveform parameters to obtain a combined waveform, wherein the combined waveform is used for rehabilitation physiotherapy; the model optimization module is used for obtaining experience tuples for updating DDPG models according to the rewards obtained by calculation of the multi-objective rewards function; And the closed loop feedback and termination module is used for circularly executing the steps from physiological signal acquisition to model optimization according to a preset period, and executing corresponding stopping operation when the stopping condition is monitored to be met. Further, the dynamic rehabilitation physiotherapy system based on reinforcement learning, wherein the system further comprises: The initialization module is used for setting initial waveform parameters and loading pre-trained DDPG model weights; Wherein the initial waveform parameters at least comprise a fundamental frequency Amplitude of vibrationWaveform mixing ratio。 Further, in the reinforcement learning-based dynamic rehabilitation physiotherapy system, the steps of acquiring multi-mode data through the distributed multi-mode sensor and processing the multi-mode data to obtain standardized physiological signal data include: collecting skin impedance by four-electrode method Electromyographic signals are obtained through surface electromyographic sensors and frequency domain energy is extractedEstimating pain scores via touch screen input or heart rate variability algorithm; Impedance to skinFrequency domain energyPain scoringAnd carrying out band-pass filtering and normalization processing to generate standardized physiological signal data. Further, in the reinforcement learning-based dynamic rehabilitation physiotherapy system, the step of constructing a state vector according to the current waveform parameters and the physiological signal data and obtaining the motion vector according to the state vector based on DDPG model includes: Integrating current waveform parameters and physiological signal construction corresponding state vector ; The Actor network based on DDPG model outputs normalized motion vector according to state vector, then performs physical quantity mapping to obtain motion vector, and the motion vector is used for performing parameter adjustment motion; Wherein, the As the frequency of the current waveform parameters,For the amplitude of the current waveform parameter,For the mixing ratio of the current waveform parameters,In order to be an impedance value,In the form of myoelectric energy,For the purpose of scoring the pain,For the frequency adjustment step size,,For the amplitude adjustment step size,,The step size is adjusted for the mixing ratio,。 Further, in the reinforcement learning-based dynamic rehabilitation physiotherapy system, the step of updating the corresponding waveform parameters according to the motion vector and then combining the waveform parameters to obtain a combined waveform, wherein the step of using the combined waveform for rehabilitation physiotherapy includes: Combined waveform Wherein, the Is a square wave, which is a square wave,Is an index wave which is used for the generation of the wave,In order to adjust the frequency of the waveform parameters,In order to adjust the amplitude of the waveform parameter,Is the mixing propo