CN-121483482-B - Chest radiography report generation and focus positioning method and system based on reinforcement learning
Abstract
The invention discloses a chest radiography report generation and focus positioning method and system based on reinforcement learning, and belongs to the technical field of artificial intelligence and medical image analysis intersection. The method comprises the steps of performing supervision fine adjustment on a basic model by utilizing a multisource chest radiography image and a query text, optimizing a strategy model by adopting GRPO reinforcement learning framework, introducing a token entropy guided indication function into an objective function, and finally realizing parallel generation of an end-to-end chest radiography report text and focus boundary frame coordinates by the optimized strategy model. The method realizes the visual positioning of the focus while improving the generation quality of the chest radiography report, and the integrated output form of the method is more fit with the clinical actual workflow.
Inventors
- Sun Gefan
- CAI YICHI
- Qiu Xuesi
- YU YUNLONG
- WANG CHAO
Assignees
- 浙江大学
Dates
- Publication Date
- 20260508
- Application Date
- 20260106
Claims (7)
- 1. The chest radiography report generation and focus positioning method based on reinforcement learning is characterized by comprising the following steps of: S1, taking a universal vision-language model with regional perceptibility as a basic model, performing supervision fine tuning on the basic model by utilizing chest radiography images integrated from different sources and a pre-constructed query text, and enhancing the labeling data corresponding to the chest radiography images to serve as a supervision signal; S2, taking the basic model subjected to supervision fine adjustment as a strategy model in a GRPO framework, and performing reinforcement learning optimization on the strategy model, in the optimization process, guiding the strategy model to generate a chest radiography report text according to a chest radiography image through compound prompt, forming a complete text sequence for strategy model optimization by the chest radiography report text and a boundary frame coordinate text output by the strategy model, after calculating an entropy value of each token in the chest radiography report text, performing two classifications on the token according to the entropy value through a preset uncertainty threshold, giving different indication function values for each token, and calculating reinforcement learning objective functions, and updating strategy model parameters based on the maximized reinforcement learning objective functions until the strategy model converges or reaches a preset upper limit of optimization times; S3, taking the optimized strategy model as a final model, inputting the chest image to be processed into the final model, and outputting a chest report text and a focus boundary frame coordinate text in the chest image from end to end in parallel by the final model; S2, the composite prompt is formed by splicing a positioning query, a boundary frame coordinate text and a report generation query, and the boundary frame coordinate text is generated by a strategy model according to an input chest image and the positioning query; S2, generating chest film report texts corresponding to all chest film images in a training batch by using a strategy model, calculating entropy values of the token according to probability distribution output by the strategy model for the token generated by each time step in the chest film report texts, sorting the entropy values of all the token in the training batch according to the order from large to small, and sorting the first token after sorting Taking the entropy value as an uncertainty threshold, taking a token with the entropy value larger than or equal to the uncertainty threshold as a high entropy token, setting the indication function value of the high entropy token as 1, taking a token with the entropy value smaller than the uncertainty threshold as a low entropy token, and setting the indication function value of the low entropy token as 0, wherein, Is a preset super parameter; In S2, the reinforcement learning objective function is obtained after the average token loss of a group in GRPO frames is obtained by dividing the overall loss of the group by the sequence accumulation length of the group, wherein the overall loss is the sum of the image-level losses of all chest images in the group, the sequence accumulation length is the sum of the complete text sequence lengths corresponding to all images in the group, the image-level loss of each chest image is obtained by adding two parts, the first part is GRPO loss calculated by token in the boundary frame coordinate text corresponding to the chest image, the second part is condition GRPO loss calculated by token in other texts corresponding to the chest image, the other texts are obtained by removing boundary frame coordinate text in the complete text function value sequence corresponding to the chest image, and the condition GRPO loss is obtained by multiplying GRPO loss and indication function value corresponding to token.
- 2. The reinforcement learning-based chest radiography report generation and lesion localization method of claim 1, wherein in S1, a pre-trained generic vision-language model qwen2.5-VL-3B is used as a base model.
- 3. The reinforcement learning-based chest radiography report generation and lesion localization method of claim 1, wherein in S1, chest radiography images and their labeling data used in the supervised fine tuning of the base model are derived from five common datasets, namely MIMIC-CXR, vinDR, MS-CXR, padChest-GR, and CheXpert, respectively, and the labeling data includes radiological reports, disease bounding boxes, anatomical bounding boxes, and disease classification labels.
- 4. A reinforcement learning based chest radiography report generation and lesion localization system comprising: the supervision fine tuning module is used for taking a universal vision-language model with regional perceptibility as a basic model, performing supervision fine tuning on the basic model by utilizing chest radiography images integrated from different sources and a pre-built query text, and enhancing the labeling data corresponding to the chest radiography images to be used as a supervision signal; The optimization module is used for taking the basic model subjected to supervision fine adjustment as a strategy model in a GRPO framework and performing reinforcement learning optimization on the strategy model, generating a chest radiography report text according to a chest radiography image through a composite prompt guide strategy model in the optimization process, forming a complete text sequence for strategy model optimization by the chest radiography report text and a boundary frame coordinate text output by the strategy model, performing two classifications on the token according to the entropy value after calculating the entropy value of each token in the chest radiography report text through a preset uncertainty threshold value, giving different indication function values for each token type for calculating reinforcement learning objective functions, and updating strategy model parameters based on the maximized reinforcement learning objective functions until the strategy model converges or reaches a preset upper limit of optimization times; the result generation module is used for taking the optimized strategy model as a final model, inputting the chest image to be processed into the final model, and outputting a chest report text and a focus boundary frame coordinate text in the chest image from end to end in parallel by the final model; In the optimization module, the composite prompt is formed by splicing a positioning query, a boundary frame coordinate text and a report generation query, and the boundary frame coordinate text is generated by a strategy model according to an input chest image and the positioning query; In the optimization module, a chest film report text corresponding to each chest film image is generated by a strategy model for all chest film images in a training batch, the entropy values of the token are calculated according to the probability distribution output by the strategy model for the token generated by each time step in the chest film report text, the entropy values of all the token in the training batch are ordered according to the order from big to small, and the ordered token is the first one Taking the entropy value as an uncertainty threshold, taking a token with the entropy value larger than or equal to the uncertainty threshold as a high entropy token, setting the indication function value of the high entropy token as 1, taking a token with the entropy value smaller than the uncertainty threshold as a low entropy token, and setting the indication function value of the low entropy token as 0, wherein, Is a preset super parameter; In the optimization module, the reinforcement learning objective function is obtained after the average per token loss of a group in a GRPO framework is obtained by dividing the overall loss of the group by the sequence accumulation length of the group, wherein the overall loss is the sum of the image-level losses of all chest images in the group, the sequence accumulation length is the sum of the complete text sequence lengths corresponding to all images in the group, the image-level loss of each chest image is obtained by adding two parts, the first part is GRPO loss calculated by token in the boundary box coordinate text corresponding to the chest image, the second part is condition GRPO loss calculated by token in other texts corresponding to the chest image, the other texts are obtained by removing boundary box coordinate text in the complete text function value sequence corresponding to the chest image, and the condition GRPO loss is obtained by multiplying GRPO loss and indication function value corresponding to token.
- 5. A computer program product comprising computer program/instructions which, when executed by a processor, implement the reinforcement learning based chest report generation and lesion localization method of any one of claims 1-3.
- 6. A computer readable storage medium, wherein a computer program is stored on the storage medium, and when the computer program is executed by a processor, the method for generating a chest radiography report and locating a focus based on reinforcement learning according to any one of claims 1 to 3 is implemented.
- 7. A computer electronic device comprising a memory and a processor; The memory is used for storing a computer program; The processor is configured to implement the reinforcement learning-based chest radiography report generation and focus positioning method according to any one of claims 1-3 when executing the computer program.
Description
Chest radiography report generation and focus positioning method and system based on reinforcement learning Technical Field The invention belongs to the technical field of artificial intelligence and medical image analysis intersection, and particularly relates to a chest radiography report generation and focus positioning method and system based on reinforcement learning. Background Chest X-Ray (CXR) is the most common, economical and efficient medical image examination means, and is a first-line tool for diagnosing various Chest diseases such as pneumonia, tuberculosis, lung cancer and the like. Recent years have seen significant progress based on artificial intelligence chest Vision-Language Models (VLMs), the core goal of which is to simulate radiologist workflow, generate consistent diagnostic reports and accurately locate pathology findings that support the diagnosis. Currently, the dominant approach to developing such multitasking CXR VLMs is the large scale Supervisory Fine Tuning (SFT). The method enables the model to be trained on a data set marked by a large number of experts so as to primarily grasp the report generation and focus positioning capability. However, this approach has the obvious limitation that the performance enhancement of the model is severely limited by high quality, large scale expert annotation data. Since labeling requires a highly specialized radiologist, it becomes extremely difficult and expensive to acquire more data to further enhance the model performance. In order to break through the performance bottleneck of SFT, reinforcement learning (Reinforcement Learning, RL) as an effective post-training technique presents great potential in general-purpose language and vision-language models. However, its use in multitasking medical VLM is still under-explored. One key challenge is that existing RL algorithms (e.g., near-end policy optimized PPO, group relative policy optimized GRPO) are typically designed for a single task objective. When applied to a model requiring a plurality of different tasks such as simultaneous learning report generation and bounding box prediction, optimizing one of the tasks tends to result in performance degradation of the other tasks, i.e., performance trade-off issues. For example, in the prior art, if the standard GRPO framework is adopted to only optimize the text quality generated by the report, the accuracy of locating the focus of the model may be reduced, whereas if the locating accuracy is only optimized, the clinical accuracy and fluency of generating the report may be damaged. This negative inter-task interference is a major hurdle faced when applying RL to multitasking medical VLM because clinical deployment requires that the model remain high level on all core capabilities, rather than tend to go overboard on one or some subjects at a single point. Disclosure of Invention The invention aims to solve the technical problems that the existing chest radiography vision-language model post-training method based on reinforcement learning has the defect of unbalanced multi-task performance, namely when a standard single-target optimization algorithm is adopted, the accuracy of focus positioning can be damaged by an optimization report generating task, or the report quality can be reduced by an optimization positioning task, so that the model can not realize the synergy and balanced promotion of diagnosis interpretation capability and space positioning capability on the premise of not needing to add a large amount of expensive labeling data. In order to solve the technical problems, the invention provides a chest radiography report generation and focus positioning method and system based on reinforcement learning. In order to achieve the above purpose, the present invention adopts the following technical scheme: In a first aspect, the present invention provides a chest radiography report generation and focus positioning method based on reinforcement learning, including the steps of: S1, taking a universal vision-language model with regional perceptibility as a basic model, performing supervision fine tuning on the basic model by utilizing chest radiography images integrated from different sources and a pre-constructed query text, and enhancing the labeling data corresponding to the chest radiography images to serve as a supervision signal; S2, taking the basic model subjected to supervision fine adjustment as a strategy model in a GRPO framework, and performing reinforcement learning optimization on the strategy model, in the optimization process, guiding the strategy model to generate a chest radiography report text according to a chest radiography image through compound prompt, forming a complete text sequence for strategy model optimization by the chest radiography report text and a boundary frame coordinate text output by the strategy model, after calculating an entropy value of each token in the chest radiography report text, performing tw