CN-122005101-A - Intelligent surgical robot control method based on reinforcement learning

CN122005101ACN 122005101 ACN122005101 ACN 122005101ACN-122005101-A

Abstract

The invention discloses an intelligent surgical robot control method based on reinforcement learning, which comprises the following steps of S1, obtaining medical image data of an area to be operated and preprocessing the medical image data to generate a standardized medical image, S2, building a dynamic graph convolutional neural network model based on the standardized medical image, S3, building the surgical neural network model and outputting continuous control actions, S4, optimizing control parameters of the surgical neural network model by adopting a lightning search algorithm, S5, performing multi-round strategy training in a built surgical simulation environment, S6, deploying the optimal surgical neural network model into an intelligent surgical robot system, and S7, recording execution tracks, tissue deformation data and operation process parameters of the robot after surgery is completed. The intelligent surgical robot control method combines medical image processing, a dynamic graph convolution neural network model and a lightning search algorithm technology, and achieves intelligent surgical robot control based on reinforcement learning.

Inventors

He Zengsong
HE YIXUAN

Assignees

小管家(苏州)健康科技有限公司

Dates

Publication Date: 20260512
Application Date: 20260317

Claims (10)

1. The intelligent surgical robot control method based on reinforcement learning is characterized by comprising the following steps: S1, acquiring medical image data of an area to be operated, and preprocessing to generate a standardized medical image; S2, constructing a dynamic graph convolution neural network model based on a standardized medical image, and modeling an intraoperative target area into a dynamic graph structure; s3, constructing a control framework based on reinforcement learning, taking the structural characteristics of the dynamic diagram as state input, building a neural network model of the operation and outputting continuous control actions; S4, optimizing control parameters of a surgical neural network model by adopting a lightning search algorithm, configuring an electric field induction unit and a guiding moving path for each lightning body, wherein each lightning body represents a group of control strategy parameter configurations, calculating performance scores of each group of parameter configurations in a strategy training process through a centralized evaluation mechanism, and generating an optimal strategy parameter set; S5, performing multi-round strategy training in the constructed operation simulation environment, optimizing each group of parameter configurations of the operation neural network model by utilizing the optimal strategy parameter set, and generating an optimal neural network model; s6, deploying the optimal operation neural network model into an intelligent operation robot system, continuously collecting intraoperative sensing data, and updating a dynamic graph structure; s7, recording the execution track, the tissue deformation data and the operation process parameters of the robot after the operation is finished, and feeding back the performance evaluation result to a control frame based on reinforcement learning for iteration.
2. The reinforcement learning based intelligent surgical robot control method of claim 1, wherein the medical image data specifically comprises pre-operative MRI images, CT scan images, ultrasound images, and endoscopic images.
3. The intelligent surgical robot control method based on reinforcement learning of claim 1, wherein the control strategy parameter configuration specifically comprises strategy network initial weights, learning rates and discount factors.
4. The intelligent surgical robot control method based on reinforcement learning according to claim 1, wherein the S2 specifically includes: S21, inputting a standardized medical image into an image segmentation network, automatically identifying key anatomical structure areas in operation, and labeling structure categories and boundary contours; S22, constructing an initial graph structure comprising nodes and edges based on the position relation and the space relative distance of the anatomical structure in the standardized medical image, wherein the nodes represent anatomical structure units, and the edges represent the connection relation between the structures; S23, acquiring a task phase sequence predefined by an intraoperative operation flow, encoding a current surgical phase into a phase label, and embedding the phase label into an additional attribute vector of each node; S24, setting the length of a time sequence sliding window and the updating step length of the graph structure, extracting the snapshot of the graph structure according to the fixed time step length in the operation process, generating a time frame sequence through node state and edge weight change between the front snapshot and the rear snapshot, and constructing an initial dynamic graph structure with time sequence dependence; S25, applying the initial dynamic graph structure to an attention-enhancing convolution processing unit, dynamically adjusting edge weights and executing graph convolution operation based on node feature similarity, stage label weights and structure position relation; S26, outputting the dynamic graph structure subjected to the attention-enhancing convolution processing.
5. The intelligent surgical robot control method based on reinforcement learning according to claim 1, wherein the step S3 specifically includes: S31, extracting dynamic graph structure information, including node characteristics, side weight information, stage attributes and time frame sequences; S32, taking the extracted dynamic diagram structure information as a state input, inputting the state input into a state coding unit, uniformly coding the topological relation and time sequence characteristics among nodes, generating a state representation for decision, and setting the state representation as a state vector; S33, constructing an action generating unit based on a neural network, wherein the action generating unit receives a state vector as input and establishes a surgical neural network model consisting of an input layer, a plurality of nonlinear calculation layers and an output layer; S34, constructing a continuous control action instruction vector based on the state vector and the internal weight matrix in an output layer: ; Wherein, the Representing a continuous control motion vector of the motion vector, Representing the motion candidate vector to be optimized, The motion candidate space is represented by a graph, The total number of time steps is indicated, Represent the first The state vector for the individual time steps, Represent the first The output layer weight matrix corresponding to each time step, Represent the first The time-step attention weight tensor, The motion vector is mapped to a linear transformation matrix of the state object space, Representing the element-level tensor product, Representing the square of the euclidean distance, The coefficient of regularization is set to be, Third-order norms representing the jth motion quantity, n representing the dimension of the motion vector, j representing the control motion vector The index of the j-th component in (c), The time step is indicated as such, Representing a small normal number constant for preventing denominator from being zero; s35, a construction value evaluation unit evaluates an expected return value of the control action of the action generation unit under a state vector and predicts a return trend under a future state; S36, the action generating unit and the value evaluation unit form a reinforcement learning control framework together, the performance feedback index is received, and the neural network structure and parameters are updated and optimized.
6. The intelligent surgical robot control method based on reinforcement learning according to claim 1, wherein the step S4 specifically includes: S41, constructing an initial potential field mapping structure, and randomly distributing a plurality of lightning individuals in a parameter space, wherein each lightning individual corresponds to a group of control parameters of the operation neural network model; S42, configuring an electric field induction structural unit for each lightning body, calculating a local electric field intensity vector based on the parameter distance and the potential difference between adjacent lightning bodies, constructing a voltage gradient guiding tensor, and recording the directional migration tendency of each lightning body; s43, introducing a disturbance-reconstruction mechanism, applying a small-range random disturbance to the current position of each lightning body, constructing a disturbance feedback map by combining with a historical potential track, and dynamically adjusting the next moving path; S44, respectively using the control parameter set corresponding to each lightning body to initialize the operation neural network model, running strategy training of fixed rounds in the same simulation training environment, and recording performance indexes of each group of parameters in four dimensions of control precision, action smoothness, convergence rate and tissue feedback response; S45, constructing a centralized evaluation mechanism, inputting performance indexes and control motion vectors into an evaluation convergence unit in a combined way, and calculating the comprehensive performance score of each group of lightning individuals: ; Wherein, the Representing the composite performance score corresponding to the control parameter, Representing the i-th component in the continuous control motion vector, Representing the dimension of the control motion vector, The total number of time steps is indicated, Representing the control stability weight of the ith control quantity at the t-th time step, A target motion reference value representing an ith control quantity of the lightning body at a t-th time step, i representing an ith dimension in the control motion vector, j representing a j-th component in the control motion vector, and t representing a scoring time step index; S46, marking a control parameter set corresponding to the lightning body with the minimum comprehensive performance score as an optimal strategy parameter set.
7. The intelligent surgical robot control method based on reinforcement learning according to claim 1, wherein the step S5 specifically includes: s51, constructing a surgical simulation environment with a 3D modeling based on MRI data, and integrating a real-time physiological feedback simulation unit and an interference injection unit in the environment; S52, loading the optimal strategy parameter set into a surgical neural network model, initializing a weight tensor, a bias tensor and a time step decision weight mapping table, and setting an environment disturbance recording unit for tracking the influence of environment fluctuation in training on a control strategy; S53, executing a multi-batch reinforcement training process in an operation simulation environment, collecting a continuous state transition sequence, an environment disturbance response track and a control action output track generated by the execution action of the robot, and constructing a sequence batch tensor in a time stamp mode; s54, carrying out dynamic re-labeling weight adjustment on training sample batches according to the comprehensive performance scores, and executing cross-time-step residual alignment updating operation on the sequence batch tensor: ; Wherein, the A trainable parameter tensor representing the surgical neural network model in the current round, Representing the surgical neural network parameter tensor obtained after parameter update, The coefficient of the learning rate is represented, The total number of time steps is indicated, Representing the dimension of the control motion vector, i representing the ith dimension in the control motion vector, Representing the composite performance score corresponding to the control parameter, Represent the first The actual output value of the ith control action in a time step, The time step is indicated as such, Represent the first The target reference value of the i-th control action in 1 time step, Representing the ith component of the state vector of the dynamic diagram structure in the t-th time step, First, the The i-th component of the state vector of the dynamic diagram structure in 1 time step, Representing the composite performance score corresponding to the control parameter, Representing a small normal number constant for preventing denominator from being zero, i representing an ith dimension in the control action vector; S55, introducing a parameter sparsity self-adjustment mechanism in the training process, and performing round-by-round gate control compression on the low-amplitude units in the updated operation neural network parameter tensor; S56, constructing a final evaluation index system based on three dimensions of action performance stability, model structure sparsity and interference response sensitivity in the training process, sorting based on evaluation indexes, and outputting an optimal neural network model.
8. The intelligent surgical robot control method based on reinforcement learning according to claim 1, wherein the step S6 specifically includes: S61, deploying the optimal neural network model into a control decision unit of the intelligent surgical robot; s62, loading a dynamic graph convolutional neural network structure and a state sensing unit in the surgical robot, and establishing data input and output connection with a control decision unit; s63, initializing the position, joint state, graph structure buffer area and environment sensor interface of the end effector of the robot before the operation starts; S64, acquiring visual images, force sense feedback, instrument touch positions and tissue deformation data generated in real time in the operation process, and inputting the data to a state sensing unit at a fixed frequency to update a dynamic graph structure.
9. The intelligent surgical robot control method based on reinforcement learning of claim 8, wherein S64 specifically comprises: S641, converting the acquired data into graph structure input by using a state sensing unit, wherein the graph structure input comprises node construction, side construction, space connection relation construction and stage attribute construction, and dynamic graph state representation is generated in each time step; S642, inputting the dynamic graph structure of the current time step into a control decision unit as a state vector input of the operation neural network model; s643, outputting a continuous control action command vector of the current time step by the operation neural network model, wherein the continuous control action command vector comprises a multi-degree-of-freedom action value and a corresponding control amplitude; S644, transmitting a control action command vector to an interface of a bottom layer actuator of the robot to drive a mechanical structure of the tail end to perform operations including position movement, posture adjustment and clamping force control; s645, receiving an actuator feedback signal and a sensor monitoring value, and recording the result information of the current state and the action; S646, repeatedly executing steps S641 to S642 in the whole operation process, forming a real-time state-action cycle, and realizing an intelligent control execution process based on dynamic diagram structure driving.
10. The intelligent surgical robot control method based on reinforcement learning according to claim 1, wherein the step S7 specifically includes: s71, after the operation process is finished, extracting a complete execution track from the robot, wherein the complete execution track comprises a displacement path, an attitude angle change and a time mark sequence of an end effector; s72, extracting tissue deformation data acquired during operation, including tissue deformation degree, instrument contact point track and force feedback record; S73, recording operation process parameters in the operation process, including control action amplitude sequences, response time delay, interference error values and state hopping frequencies, and constructing a multi-dimensional operation data set; S74, carrying out unified format processing on all recorded data in the steps S71 to S73 to construct a performance evaluation data matrix; S75, setting a multi-index evaluation system comprising control precision, tissue protection, energy consumption level and action stability, and performing weighted aggregation on the performance evaluation data matrix to generate a performance evaluation result; and S76, feeding back the performance evaluation result to a control framework based on reinforcement learning for iteration.

Description

Intelligent surgical robot control method based on reinforcement learning Technical Field The invention relates to the technical field of intelligent surgical control, in particular to an intelligent surgical robot control method based on reinforcement learning. Background With the rapid development of modern medical technology and artificial intelligence technology, surgical robotic systems are becoming an important tool for high-precision surgical procedures. Compared with the traditional operation mode, the operation robot has the advantages of strong operation stability, high control precision and good repeatability, and is widely applied to a plurality of high-risk fields such as urology surgery, cardiothoracic surgery, neurosurgery and the like. The existing surgical robot mainly depends on remote operation of doctors, and tasks such as anatomy structure identification, instrument guidance, tissue cutting and suturing are completed through a mechanical arm, so that the safety and success rate of surgery are greatly improved. However, in practical application, robot operation is mainly controlled by man-machine interaction, the degree of intelligence is limited, and the deep understanding and autonomous decision making capability of a surgical scene are lacking, so that the adaptability and the operation flexibility of the robot in a complex environment are limited. In the prior art, medical image processing has become one of the main means for surgical robotic systems to perceive the external environment. The tissue structure before and during operation is modeled and analyzed by using MRI, CT and ultrasonic medical imaging technologies, and a key basis can be provided for operation path planning and operation decision. However, most existing methods perform three-dimensional reconstruction and anatomy recognition based only on static images or regular templates, and it is difficult to fully reflect tissue deformation, instrument interference and real-time changes during surgery. In addition, the traditional image processing method mainly relies on a convolutional neural network to perform feature extraction, and cannot effectively capture the spatial topological relation and time sequence features among tissues, so that the performance of the method in a dynamic complex operation scene is limited. Meanwhile, the existing robot control strategy is mainly based on preset path planning or action prediction based on supervised learning training, and the method has certain effectiveness under fixed flow and standard scenes, but in actual operation, factors such as individual difference of patients, tissue physiological state change, burst interference and the like cause high uncertainty of an operation environment. Due to the lack of a real-time modeling and strategy self-adaptive updating mechanism for the environment state, the existing control method is difficult to meet the dual requirements of complex operation tasks on fine operation and flexible response, and the control accuracy is easy to be reduced, the action is unstable and even the operation is failed. In recent years, reinforcement learning has been widely achieved as an important technique for solving the problems of decision and action selection of an agent in the fields of automatic driving, robot navigation, intelligent gaming, etc., and has been gradually introduced into the field of medical robot control. Through the reinforcement learning framework, the agent can continuously adjust the strategy in the process of interacting with the environment to obtain higher return, thereby realizing autonomous optimization of complex tasks. However, the application of the conventional reinforcement learning method in the medical surgery scene still faces a plurality of challenges, such as high dimension of a state space, sparse feedback signal, difficulty in obtaining training samples, weak strategy migration capability, and the like. In particular, how to construct a steady, interpretable reinforcement learning control model with real-time feedback capability in a high risk, high precision demanding surgical environment remains a bottleneck problem in the current research. In addition, the existing reinforcement learning method is mostly modeled in a discrete state and a discrete action space, and is difficult to adapt to the requirement of continuous control in operation. For example, surgical robots require actuators that control multiple degrees of freedom simultaneously in actual operation, including position adjustment, clamping force, angle control, etc., and control parameters in each dimension need to be fine-tuned in continuous space. The traditional method is easy to sink into local optimum under a high-dimensional continuous action space, global strategy optimization is difficult to realize, and meanwhile, the stability and efficiency of the training process are difficult to guarantee. In the aspect of strategy optimization, the exist