CN-121983336-A - Chained reasoning training and generating method and system for medical diagnosis large model

CN121983336ACN 121983336 ACN121983336 ACN 121983336ACN-121983336-A

Abstract

The invention discloses a chain type reasoning training and generating method and system of a medical diagnosis large model, which relate to the crossing field of artificial intelligence and intelligent medical treatment and comprise a first stage and a second stage, wherein the first stage is used for constructing a knowledge-enhanced chain type medical diagnosis data set and training the large model to learn a chain type diagnosis model through supervision and fine adjustment. And in the second stage, the model is taken as a reinforcement learning strategy model, parameters are updated through a strategy optimization algorithm, and finally the chain type reasoning model with both structure and diagnosis accuracy is obtained. The system also comprises an inference training module, a local knowledge base module, an inference control module and an interaction generation module. The method has the beneficial effects that the diagnosis process of the model is completely presented in a structured and step-by-step mode through supervising and learning the chain type reasoning structure of the forced model internalization hypothesis-positioning-anti-thinking-decision, and the model reasoning content is driven to evolve towards the direction of higher diagnosis accuracy and more reasonable knowledge reference through the track-level compound rewarding function and grouping relative strategy optimization algorithm.

Inventors

WANG HONGWEI
PAN YANLING

Assignees

浙江大学

Dates

Publication Date: 20260505
Application Date: 20260119

Claims (16)

1. A chain type reasoning training method of a medical diagnosis large model, which is characterized by comprising a first stage and a second stage: the first stage training step includes: S1, acquiring medical knowledge data, and constructing a training data set, wherein each medical knowledge data sample comprises a medical image, a clinical problem and a plurality of link chain type reasoning tracks marked by experts, the medical knowledge data is from a fusion database, and the fusion database fuses a general medical knowledge base and a local knowledge base storing hospital history diagnosis records; S2, performing supervision and fine adjustment on the multi-mode large model by using the data set, taking the output of the former link as the generation context of the latter link, and taking the probability of maximizing the generation of the complete chain track of the model as a training target, so that the model grasps the chain type reasoning structure; the second stage training step comprises: s3, using the model trained in the first stage as a strategy model for reinforcement learning, and defining a complete chain type reasoning track generated by the model aiming at one input as a complete action; S4, designing a track-level compound rewarding function, wherein the compound rewarding function at least comprises structure consistency rewards and diagnosis accuracy rewards; And S5, updating parameters of a strategy model by adopting a strategy optimization algorithm based on the composite reward function, so that the reasoning content of the model evolves towards the direction of obtaining higher diagnosis accuracy on the premise of keeping a chained structure, and a chained reasoning model is obtained.
2. The method for training a chain type reasoning about a large model of medical diagnosis according to claim 1, wherein the chain type reasoning track is predefined as four reasoning links connected in series in turn, comprising: A hypothesis link for generating a preliminary diagnostic hypothesis; Outputting a positioning link of a key area positioning instruction based on the assumption; Analyzing the positioning area by combining medical knowledge to correct the supposition dislike link; a decision link for outputting a final diagnosis conclusion; And the training data of the anti-thinking link invokes the related knowledge from the fusion knowledge base and displays the medical knowledge annotation.
3. The method of claim 2, wherein the medical knowledge notes are associated with training data in the form of structured text segments, the content of which is selected from at least one of anatomical spatial relationship descriptions, imaging diagnostic criteria for specific diseases, pathophysiological mechanism interpretation, contrast descriptions of normal and abnormal imaging features, and a summary of local historical case diagnostic records similar to the current case.
4. The method of claim 2, wherein the local knowledge base stores hospital history diagnostic records including at least one of image report text, diagnosis conclusions, treatment follow-up information, and typical case characteristics noted by departments or physicians; The records of the local knowledge base are stored in a vectorized form, and quick retrieval based on semantic similarity is supported.
5. A method for training a chain type inference of a large medical diagnosis model according to claim 2 or 3, wherein in step S1, four links of the chain type inference track have strict, irreversible time sequence dependency relationships, and the output function of each link is pre-constrained, and the specific dependency relationships and pre-constraint relationships are as follows: The hypothesis links are constrained to output only the differential diagnosis list; the output of the localization element is constrained to have to be anatomically related to the current hypothesis; The dislike link is constrained to have to reference relevant medical knowledge or historical case evidence from the fused knowledge base to perform consistency check on the positioning evidence; The decision links are constrained to the sole conclusion that must be made deterministic based on information from all links of the preamble.
6. The method according to claim 1, wherein in step S2, the autoregressive language modeling loss function used for the supervised fine tuning performs indifferently calculating the loss of the generated marks of all links in the chained track, so as to ensure that the model learns the generation pattern and the logic connection of each link equally.
7. The method according to claim 1, wherein in step S4, the structural consistency rewards are calculated by a lightweight discriminant model trained to distinguish between a normal trajectory conforming to the order of the chain links and a non-normal trajectory with disordered order; Wherein the diagnostic accuracy rewards are binary rewards or continuous rewards based on diagnostic confidence.
8. The method of claim 1, wherein the composite reward function further comprises a knowledge reference rationality reward for evaluating logical correlations between the fused knowledge base information referenced in the negative link and the current location area and diagnostic hypotheses.
9. The chain type reasoning training method of the large medical diagnosis model as claimed in claim 1, wherein in step S5, the adopted strategy optimization algorithm is a group relative strategy optimization method, and specifically comprises: For the same group of input data, generating a plurality of chain type reasoning tracks through parallel sampling of a current strategy model; calculating the total rewarding value of each track; Calculating the relative dominance score of each track by taking the average rewarding value of the group of tracks as a base line; and updating strategy model parameters through a gradient ascent method according to the relative dominance scores so as to improve the generation probability of the high rewarding track.
10. The method for training a large model for medical diagnosis according to claim 1, wherein in the second training step, the reinforcement learning training with high efficiency of parameters is performed by freezing the parameters of the visual encoder and the text encoder and updating only part of the parameters of the large language model.
11. The chain type reasoning generation method of the medical diagnosis large model is characterized by comprising the following steps of: s21, receiving medical images to be diagnosed and clinical problems; S22, acquiring a historical diagnosis record associated with the current case as a context by retrieving the local knowledge base; S23, inputting a historical diagnosis record associated with the current case into a chain type reasoning model obtained by the method of any one of claims 1-10; s24, generating a complete reasoning text report conforming to a hypothesis-location-retum-decision chain structure by the chain reasoning model; the complete reasoning text report comprises the output of the intermediate reasoning links and the final diagnosis conclusion, and the discussion of the dislike links is combined with the reference information of the historical diagnosis record.
12. The method of generating of claim 11, further comprising: allowing a user to intervene in at least one preset reasoning link in the process of generating a complete reasoning text report by the model; The intervention comprises at least one intervention mode of confirming, correcting and supplementing the automatically positioned area of the model in the positioning link and/or submitting a question or providing additional evidence of knowledge or reasoning logic referenced by the model in the anti-thinking link; Based on intervention feedback of a user, the model adjusts and rewrites reasoning contents of subsequent links in real time, and finally a diagnosis report fused with man-machine interaction information is generated.
13. A medical chain diagnosis large model system, comprising: an inference training module for implementing the chain type inference training method as set forth in any one of claims 1 to 10 to train to obtain a medical diagnosis model with chain type inference capability; The local knowledge base module is used for storing, managing and searching hospital history diagnosis records in a vectorization mode and providing localized knowledge support for the training module and downstream reasoning; An inference control module for loading and managing the medical diagnostic model obtained by the training module by the inference generating method according to any one of claims 11 to 12, controlling it to execute an inference logic in accordance with the chain structure of hypothesis-location-jeopardy-decision.
14. The medical chain diagnosis large model system of claim 13, further comprising: The interaction generation module is used for allowing a user to intervene in at least one preset reasoning link in the process of generating a complete reasoning text report by the model; The intervention comprises at least one intervention mode of confirming, correcting and supplementing the automatically positioned area of the model in the positioning link and/or submitting a question or providing additional evidence of knowledge or reasoning logic referenced by the model in the anti-thinking link; Based on intervention feedback of a user, the model adjusts and rewrites reasoning contents of subsequent links in real time, and finally a diagnosis report fused with man-machine interaction information is generated.
15. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any one of claims 1 to 12 when the program is executed by the processor.
16. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any one of claims 1 to 12.

Description

Chained reasoning training and generating method and system for medical diagnosis large model Technical Field The invention relates to the technical field of intersection of artificial intelligence and intelligent medical treatment, in particular to a chain type reasoning training and generating method and system of a medical diagnosis large model. Background Along with the deep penetration of the artificial intelligence technology in the medical field, the medical image aided diagnosis technology based on the visual language model has made remarkable progress, and becomes one of the core research directions in the intelligent medical field. The technology generally adopts an end-to-end deep neural network architecture, takes medical images (such as X-ray films, CT, MRI and the like) and clinical text descriptions as joint input, directly outputs diagnosis conclusions or structural reports, provides rapid reference for clinicians, and has potential value in the aspects of improving diagnosis efficiency, relieving medical resource tension and the like. However, the existing mainstream methods still have a plurality of outstanding technical defects in practical clinical application, which severely restrict the credibility and practical value: First, the reasoning process is not interpretable (black box problem). The mapping relation between the input and the output of the existing end-to-end model is complex, and the internal decision logic is highly abstract and cannot be intuitively understood and traced by human doctors. When the model gives out an error diagnosis result, a clinician is difficult to locate an error source (such as misjudgment image characteristics, missing key clinical information and the like), and the uncertainty makes the clinician difficult to obtain wide trust in serious medical scenes such as disease diagnosis, treatment scheme reference and the like, so that the floor application of the technology is greatly hindered. Secondly, there is a lack of structured clinical thinking. The diagnostic process of the clinician is a chained, logical reasoning process followed by finding anomalies-proposing assumptions-finding evidence-validating/excluding-final decisions, with definite procedural and logistic properties. The training goal of the existing end-to-end model is only focused on the correctness of the final diagnosis answer, the training goal is not guided to establish the structural reasoning capability conforming to the medical cognition rule, the thinking mode is different from that of human expert, the generated diagnosis conclusion often lacks complete logic support, and the training goal is difficult to be effectively matched with the clinical thinking of doctors. Third, generalization ability to noise and rare cases is weak. Existing models are more prone to learn surface statistical correlations in training data than deep causal logic behind disease diagnosis due to the lack of constraints and guidance for intermediate reasoning steps. When complex scenes such as poor image quality, blurred symptom description, multiple diseases and co-diseases are faced, or long-tail cases (such as rare diseases and special complications) with extremely low proportion in training data are processed, the model performance can be obviously reduced, the diagnosis accuracy is greatly reduced, and phantom output which looks reasonable and practical and is wrong can be generated, so that risks are brought to clinical diagnosis. Fourth, knowledge fusion capability is insufficient, and local diagnosis and treatment practice is difficult to adapt. The training of the general medical large model depends on a public data set, and a knowledge system of the general medical large model has universality, but cannot effectively integrate the unique diagnosis and treatment experience of a specific medical institution, a historical case library, the local common disease distribution characteristics and the personalized diagnosis specifications. This results in model generated diagnostic recommendations that often deviate from clinical practice in a particular hospital, lack pertinence and practicality, and are difficult to meet the personalized needs of different medical institutions. To improve the interpretability of models, optimization schemes such as visual attention thermodynamic diagrams based on gradient saliency maps have emerged in the prior art in an attempt to identify image regions of interest to the model by visual means. However, such a method is a post-hoc and passive auxiliary tool in nature, and does not actively and structurally model the reasoning process in the model training stage, so that only the model can be informed of where to pay attention to, but the model cannot answer the deeper core problems such as the reason why the model pays attention to the logic chain from the concerned area to the diagnosis conclusion, and the like, and the problems of opaque reasoning process and logic i