CN-121998845-A - Medical image tool enhancement processing method based on reflexive thinking reinforcement learning

CN121998845ACN 121998845 ACN121998845 ACN 121998845ACN-121998845-A

Abstract

The invention discloses a medical image tool enhancement processing method based on reflexive reinforcement learning, which comprises the steps of constructing a multi-mode reasoning model for vision-language joint reasoning. And (3) performing supervision and fine adjustment on the model by using a cold start data set with tool call track labels, so that the model has basic structural reasoning capacity and standard tool use behaviors. And constructing a reflexive training sample by collecting early-stage error reasoning tracks and later-stage correct reasoning tracks generated by the same input at different training checkpoints, and performing reflexive fine tuning on the model. Based on the reinforcement learning data set containing the actual tool invocation scenario, a reward function is constructed and tool reinforcement learning training is performed. After receiving the medical image to be processed and the corresponding natural language problem, the trained model can generate gradual natural language thinking, tool calling instructions and external tool observation results, and output medical image analysis conclusion to realize fine and interpretable medical visual reasoning.

Inventors

LI WENJIE
WANG LEI
LI YUEQI
ZHANG YUJIE
ZHOU YUXIN
WANG HANYU

Assignees

上海交通大学医学院附属瑞金医院

Dates

Publication Date: 20260508
Application Date: 20251205

Claims (10)

1. A medical image tool enhancement processing method based on reflexive reinforcement learning is characterized by comprising the following steps: Constructing a multi-mode reasoning model, wherein the multi-mode reasoning model comprises an image encoder, a text encoder, a reasoning module, a tool interface module and an observation encoding unit; performing cold start supervision fine adjustment on the multi-mode reasoning model based on a cold start data set with tool call track labels; In the cold start monitoring fine tuning, an early error track and a later correct track which are generated by the same input at different training check points are collected, track pairs conforming to the characteristics of the reflexive thinking are constructed as reflexive training samples, and the reflexive thinking fine tuning is carried out on a multi-mode reasoning model after the cold start monitoring fine tuning based on the reflexive thinking training samples; after finishing the reflexive fine tuning, carrying out tool reinforcement learning training on the multi-mode reasoning model based on the reinforcement learning data set to obtain a trained multi-mode reasoning model; Acquiring a medical image to be processed and a corresponding natural language problem; inputting the medical image to be processed and the corresponding natural language problem into the trained multi-mode reasoning model to obtain the processing output comprising the natural language thinking generated by the model, the tool calling instruction, the observation information returned by the external tool and the final medical image analysis result.
2. The medical image tool enhancement processing method based on reflexive reinforcement learning according to claim 1, wherein the image encoder is a deep convolutional neural network for processing medical images to extract medical image features; the text encoder is used for carrying out word-level and sentence-level semantic coding on the natural language problem to obtain corresponding text semantic features; The reasoning module generates a network for the autoregressive sequence based on the large language model and is used for generating a reasoning path based on the medical image characteristics, the text semantic characteristics and the observation characteristics ; The tool interface module is an instruction analysis and tool scheduling unit and is used for analyzing tool calling instructions generated by the reasoning module and issuing calling requests to an external image analysis tool; the observation encoding unit is an observation processing network and is used for receiving the observation information returned by the external image analysis tool and encoding the observation information into observation characteristics which can be read by the reasoning module.
3. The medical image tool enhancement processing method based on reflexive reinforcement learning according to claim 2, wherein the reasoning path Expressed as: Wherein, the Thinking about the content for the natural language generated in the n-th step, An instruction is invoked for the tool generated in step n, In order to obtain the observation characteristics corresponding to the observation results returned by the external image analysis tool according to the tool call instruction, the reasoning module is based on the current input medical image I, the problem Q and the previous step when the nth step is reasoning The step reasoning history autonomously decides to output the next natural language thinking With tool call instructions Expressed as: Wherein, the Before representation The history of the reasoning of the steps, A mapping function representing an inference module; the inference module gradually updates the inference path through a thinking-tool call-observe loop mechanism until in the termination step The generation of the determined answer or the reaching of a preset upper limit of the length of the context or the upper limit of the number of interaction rounds.
4. The medical image tool enhancement processing method based on reflexive reinforcement learning according to claim 2, wherein the external image analysis tool comprises a first image processing tool, a second image processing tool and a third image processing tool; The first image processing tool is an image segmentation tool based on position prompt and is used for receiving a target area boundary box provided by the multi-mode reasoning model as a fine segmentation mask for inputting and outputting a target object; The second image processing tool is an image segmentation tool based on semantic description, and is used for receiving images and text prompts as input and outputting segmentation masks of corresponding target areas, so that the target area positioning can be completed without explicit prediction of a boundary box by a model; The third image processing tool is an image local enlarging tool, and is used for receiving an image and a boundary box or a segmentation mask as input, outputting an enlarged image of a corresponding area when the input is the boundary box, outputting the enlarged image when the input is the segmentation mask, and explicitly drawing a target contour in the image to highlight focus morphological characteristics.
5. The method for enhancing a medical image tool based on reflexive reinforcement learning according to claim 1, wherein the cold start data set comprises a plurality of samples, each sample comprises a medical image to be processed, a corresponding natural language question, a real answer and an inference path with tool call trajectory labels, wherein the tool call trajectory labels record natural language thinking, tool call instructions and observation results returned by the corresponding tools generated in each step of inference.
6. The medical image tool enhancement processing method based on reflexive reinforcement learning according to claim 1, wherein the performing cold start supervised fine tuning on the multi-modal inference model based on a cold start dataset with tool call trajectory labeling specifically comprises: For each sample in the cold start data set, inputting the medical image, natural language problem and corresponding labeling reasoning path in the sample Multimode reasoning model, and reasoning path generated by model With marked paths Gradually comparing, and thinking according to natural language generated in each step With tool call instructions And annotating content And Calculating a cold start supervisory loss function, the formula being: Wherein, the Thinking about the content for the natural language generated in the n-th step, An instruction is invoked for the tool generated in step n, The observation result returned by the corresponding tool calling instruction in the labeling path is obtained; representing the total number of token of natural language thinking and tool calling instructions in an inference path generated by the model; Representing n-th step natural language thinking Token number of (2); Representing nth step tool call instructions Token number of (2); Representing the total number of steps of the inference path; Representing a desire to take all samples in the cold start dataset; representing the generation of natural language thinking in an nth step model With tool call instructions By reasoning modules of the multimodal reasoning model by a model based on medical image features, text semantic features and precursors Calculating and obtaining an autoregressive sequence generation mode of the step observation characteristics; representing an inference history generated by the model before the nth step; the true answer of the sample; supervising the loss function for cold start; Supervision of loss functions by minimizing cold starts And the cold start supervision fine adjustment of the multi-mode reasoning model is realized.
7. The method for enhancing a medical image tool based on reflexive reinforcement learning according to claim 1, wherein in the cold start supervised fine tuning, an early error trajectory and a later correct trajectory generated by the same input at different training checkpoints are collected, a trajectory pair conforming to reflexive characteristics is constructed as a reflexive training sample, and the reflexive fine tuning is performed on a multi-mode reasoning model after the cold start supervised fine tuning based on the reflexive training sample, specifically comprising: In the training process, for each sample in the cold start data set, respectively recording an inference path generated by a multi-mode inference model at an early training check point and a later training check point, wherein the early training check point is a training state when the model is not completely converged, and the later training check point is a state when the model is trained to be converged or close to convergence; the inference path generated by the early training checkpoints is noted as: wherein, the method comprises the steps of, Respectively represent early training check points Natural language thinking content, tool calling instructions and observation features generated in the step; Representing the total number of steps of the early inference path; and (3) recording an inference path generated by the post training check point as: wherein, the method comprises the steps of, Respectively represent the post training check points Natural language thinking content, tool calling instructions and observation features generated in the step; representing the total steps of the post-reasoning path; Judging whether any step index exists And (3) with Such that the following formula: Wherein, the method is established, and the method comprises the steps of, Path 1 for early reasoning The tool of the step generation invokes the instruction, Is the post reasoning path The tool of the step generation invokes the instruction, For the indication function of whether the final answer corresponding to the early reasoning path is correct or not, a value of 0 indicates error, and a value of 1 indicates correct; an indication function for indicating whether a final answer corresponding to a later reasoning path is correct or not, wherein a value of 0 indicates an error, and a value of 1 indicates correct; if present, will satisfy the condition And (3) with Alignment and integration to form a reflexive training sample ; Training sample for reflexive thinking The multi-mode reasoning model after the cold start supervision fine tuning is input for training, and the cold start supervision loss function is minimized And realizing the self-reflecting fine tuning of the multi-mode reasoning model.
8. The method for enhancing a medical image tool based on reflexive reinforcement learning as recited in claim 7, wherein said condition is to be satisfied And (3) with Alignment and integration to form a reflexive training sample The method specifically comprises the following steps: For each early error step Searching for corresponding modifications in a post-inference path The modification means that corresponding steps exist in the later path to satisfy Or the final answer of the early path is wrong and the final answer of the later path is correct; Thinking of natural language generated in early steps Correct tool call instruction with late step Combining to form new thinking-back steps, and simultaneously, observing features corresponding to the later steps Incorporating the returnal step; for the steps of unmodified early paths and correct corresponding final answers, retaining natural language thinking, tool calling instructions and observation characteristics of the steps, and directly adding a self-thinking training sample; and sequentially splicing all the thinking-back steps with the unmodified correct steps according to the time sequence of the original reasoning path to obtain a complete thinking-back training sample.
9. The medical image tool enhancement processing method based on reflexive reinforcement learning according to claim 1, wherein the tool enhancement reinforcement learning training is performed on the multi-modal reasoning model based on the reinforcement learning data set to obtain a trained multi-modal reasoning model, and the method specifically comprises the following steps: For each sample in the reinforcement learning data set, including a medical image I, a corresponding natural language question Q and a real answer A, the current strategy parameters are inferred in a multi-mode Generating multiple candidate inference tracks , wherein, Respectively represent the first Natural language thinking content, tool calling instructions and observation features generated in the nth step of the candidate reasoning track; Is the first Total number of steps of the candidate inference track; In the generation process of each reasoning track, the multi-mode reasoning model alternately generates natural language thinking according to a thinking-tool calling-observing circulation mechanism With tool call instructions Based on the current image, the problem and the existing track Deducing the next output, expressed as ; Calculating a composite prize for each track Expressed as: Wherein, the The model is used for outputting a real answer A, and the model is used for judging whether the real answer A is matched with the model; for a segmentation mask task, segmentation and segmentation are carried out according to IoU values of a prediction mask and a real mask; the method is a format rewarding, is used for ensuring that the structure of the reasoning track accords with the expected sequence, and particularly ensures that the track is generated to accord with the specified format by checking whether the output track sequentially comprises natural language thinking, tool calling and final answer marks and deducting marks which are missing or have wrong sequence; using rewards for the tools for encouraging the models to call external tools under reasonable conditions, and giving rewards only when the models generate correct answers and call the external tools at least once in the reasoning process, wherein the rewards are positively correlated with the times of tool call and the contribution of tool call to task completion; Comprehensive rewards on a per track basis And calculating the reinforcement learning loss function, and updating the strategy parameters of the multi-mode reasoning model by minimizing the reinforcement learning loss function to obtain the trained multi-mode reasoning model.
10. The method for enhancing a medical image tool based on reflexive reinforcement learning according to claim 9, wherein the reinforcement learning loss function has a formula: Wherein, the Is a reinforcement learning loss function; representing pairs of reinforcement learning data sets All samples in (a) and in old policy Taking expectations from G candidate reasoning tracks generated below; representing a medical image input in a sample; Representing natural language question input in the sample; representing the real answer of the sample; Representing the total number of all token except the tool return observation in the ith reasoning track; Expressed in the current policy parameters Next, probability of nth action of ith track and old strategy The ratio of the corresponding action probabilities; Representing all generated actions and observed historical tracks of the ith track before the nth step; representing a dominance function of the ith trace; A composite prize representing an ith track; 、 respectively representing the average value and standard deviation of the comprehensive rewards of the G candidate reasoning tracks; is a clipping function; Is a clipping parameter.

Description

Medical image tool enhancement processing method based on reflexive thinking reinforcement learning Technical Field The invention belongs to the technical field of medical image processing, and particularly relates to a medical image tool enhancement processing method based on reflexive reinforcement learning. Background Medical image analysis is an important link in clinical diagnosis, treatment planning and disease monitoring. With the development of deep learning, convolutional Neural Network (CNN), visual transformer (ViT) and other technologies have been widely applied to medical image feature extraction and segmentation tasks. Meanwhile, a natural language processing technology based on a Large Language Model (LLM) shows strong reasoning capability in a multi-modal task, so that joint reasoning of images and text information is possible. For example, in the prior art, chinese patent application CN110751627a discloses an image processing method, apparatus, computer device, and storage medium. The method comprises the steps of obtaining a medical image, inputting the medical image into a preset neural network model to obtain a processing result of the medical image at an image level and a processing result of the medical image at a pixel level, and processing the medical image at least two different functional levels by the preset neural network model. However, the existing automatic analysis method for medical images still has the following problems: 1. The single model capability is limited in that the traditional method generally relies on a single model to directly predict image features or diagnostic results, lacks the capability of flexibly combining various external tools (such as image segmentation, amplification and morphological analysis tools), and is difficult to deal with complex and multi-step medical reasoning tasks. 2. The tool use strategy lacks adaptivity, namely the existing multi-mode reasoning model depends on a static or predefined tool call sequence when an external tool is used, the strategy cannot be adjusted autonomously according to an intermediate reasoning result, and tool call redundancy or errors are easy to cause. 3. The error correction capability is insufficient, and in medical tasks, models can produce erroneous reasoning even with tool assistance. The existing multi-mode reasoning model generally lacks a self-reflecting and self-correcting mechanism, and is difficult to actively correct errors according to tool output or reasoning history, so that the accuracy of a final result is low. 4. Reinforcement learning applications are deficient in that, although reinforcement learning is used to optimize strategies in traditional natural language processing and image tasks, in a medical image multi-modal reasoning scenario, joint optimization of tool call sequences, external observation information, and final answers still lacks efficient methods. The existing method only gives sparse rewards to the final answers, and the tool use efficiency and the optimization of the reasoning process structure are ignored. Therefore, a multi-mode reasoning method capable of realizing self-reflecting and tool enhancement is needed, so that the model can automatically call an external tool in medical image analysis, and can continuously correct errors through a self-reflecting mechanism, the accuracy and reliability of diagnosis and analysis are improved, and meanwhile, the tool call efficiency and reasoning quality are improved through a reinforcement learning optimization strategy. Disclosure of Invention The invention aims to overcome the defects of the prior art and provide a medical image tool enhancement processing method based on reflexive reinforcement learning. The aim of the invention can be achieved by the following technical scheme: The invention provides a medical image tool enhancement processing method based on reflexive reinforcement learning, which comprises the following steps: Constructing a multi-mode reasoning model, wherein the multi-mode reasoning model comprises an image encoder, a text encoder, a reasoning module, a tool interface module and an observation encoding unit; performing cold start supervision fine adjustment on the multi-mode reasoning model based on a cold start data set with tool call track labels; In the cold start monitoring fine tuning, an early error track and a later correct track which are generated by the same input at different training check points are collected, track pairs conforming to the characteristics of the reflexive thinking are constructed as reflexive training samples, and the reflexive thinking fine tuning is carried out on a multi-mode reasoning model after the cold start monitoring fine tuning based on the reflexive thinking training samples; after finishing the reflexive fine tuning, carrying out tool reinforcement learning training on the multi-mode reasoning model based on the reinforcement learning data set to obtain a traine