CN-121747910-B - Training method of clinical auxiliary diagnosis model
Abstract
The invention relates to a training method of a clinical auxiliary diagnosis model, belongs to the technical field of medical information processing, and solves the problem that the consistency difference of a size model in the prior art influences the accuracy of diagnosis decision. The method comprises the steps of obtaining medical texts of a plurality of patients to construct a sample set, constructing a text compression model and a medical diagnosis large model, respectively pre-training the constructed text compression model and the medical diagnosis large model based on the sample set, fixing parameters of the medical diagnosis large model, optimizing the text compression model based on diagnosis task loss of the medical diagnosis large model by taking the compressed text output by the text compression model as input of the medical diagnosis large model, and fine-tuning the medical diagnosis large model based on diagnosis task loss of the medical diagnosis large model to obtain a trained clinical auxiliary diagnosis model. The improvement of diagnosis accuracy in complex clinical scenes is realized.
Inventors
- LIU XIAOQING
- LIU WEIHUA
- WU LEI
- CHEN XIAOMEI
Assignees
- 重庆壹永科技集团有限公司
Dates
- Publication Date
- 20260508
- Application Date
- 20260213
Claims (10)
- 1. The training method of the clinical auxiliary diagnosis model is characterized by comprising the following steps of: acquiring medical texts of a plurality of patients to construct a sample set; Constructing a text compression model and a medical diagnosis large model, and respectively pre-training the constructed text compression model and the medical diagnosis large model based on the sample set; Fixing parameters of a large medical diagnosis model, taking compressed text output by a text compression model as input of the large medical diagnosis model, and carrying out adaptation optimization on the text compression model based on diagnosis task loss of the large medical diagnosis model; fixing parameters of the text compression model, taking the compressed text output by the text compression model as the input of the medical large model, and performing fine adjustment on the medical large model based on the diagnosis task loss of the medical large model to obtain a trained clinical auxiliary diagnosis model; The text compression model includes: The input layer is used for preprocessing an input text; The sharing encoder is used for extracting the characteristics of the preprocessed text to obtain sharing characteristics; The compression task head is used for decoding the shared characteristics to predict compressed text; The key information extraction head is used for decoding the shared characteristics and predicting key field information; The input layer pre-processes the input text and comprises self-adaptive blocking of the input text, and the input layer comprises: the block dividing unit is used for dividing the input text based on the similarity; The overlapping adjustment unit is used for adjusting the blocks based on multi-factor fusion to determine overlapping areas between adjacent blocks, so as to obtain a final block division result; The pre-training loss of the text compression model is calculated in the following way: ; Wherein, the Representing the compression loss of the text compression model, The key information representing the text compression model predicts the loss, And All represent weight coefficients; The medical diagnosis large model is pre-trained based on dynamic prompt optimization and key field weight optimization.
- 2. The training method of a clinical auxiliary diagnostic model according to claim 1, wherein the overlap adjustment unit determines the overlap region between adjacent blocks based on multi-factor fusion by: ; Wherein, the Representing the size of the overlap area of the i-th block and the i + 1-th block determined based on the semantic unit, Represents the overlap area size of the i-th block and the i+1-th block determined based on the block length, Represents the overlap area size of the i-th block and the i + 1-th block determined based on the entity density, 、 And All of which represent the weight coefficient, Representing the overlap area length of the i-th block and the i+1-th block that are finally determined.
- 3. The training method of a clinical auxiliary diagnostic model according to claim 1, wherein the shared encoder performs feature extraction on the blocks to obtain the shared features by: Extracting the characteristics of each block to obtain the basic characteristics of each block; For the overlapping area of the adjacent blocks, fusing basic features corresponding to the overlapping area based on bidirectional attention to obtain fusion features of the overlapping area; And splicing the technical characteristics of the non-overlapping area of each block and the fusion characteristics of the overlapping area of the adjacent blocks in sequence to form a sharing characteristic.
- 4. The training method of a clinical auxiliary diagnostic model according to claim 3, wherein the fusing of the basic features corresponding to the overlapping region based on the bidirectional attention to obtain the fused features of the overlapping region comprises: projecting basic features of overlapping areas of two adjacent blocks to query, key and value spaces respectively; Calculating bidirectional cross attention by adopting multi-head attention based on the projected query matrix, key matrix and value matrix; And carrying out position weighted fusion on the bidirectional cross attention to obtain fusion characteristics of the overlapped area.
- 5. The training method of a clinical auxiliary diagnostic model according to claim 1, wherein the diagnostic task loss of the large medical diagnostic model is calculated using the following formula: ; Wherein, the A true token representing the t-th position, The input sequence of samples is represented and, Representing a token sequence that a large medical diagnostic model has been generated prior to location t, Parameters representing a large model of medical diagnosis, Representing the probability distribution of a large model prediction of medical diagnosis, T representing the total length of the output sequence, Parameters representing a text compression model are provided, Representing the 2-norm of the matrix, And Representing the weight coefficient.
- 6. The method of training a clinical auxiliary diagnostic model according to claim 1, wherein pre-training the medical diagnostic large model based on dynamic hint optimization and key field weight optimization comprises: Constructing a basic prompt word project and reserving a dynamic insertion point in the basic prompt word project; constructing an error knowledge base; During each training round, parameters of the medical diagnosis large model are adjusted based on reasoning loss and key field extraction loss; after each round of training, constructing dynamic insertion information according to the key fields extracted by the medical diagnosis large model and the error knowledge base, and inserting the dynamic insertion information into the dynamic insertion points to obtain a prompt word project of the next round of training.
- 7. The training method of a clinical auxiliary diagnostic model according to claim 6, wherein the key field extraction loss is calculated by: ; Wherein, the The true value of the i-th key field, An i-th key field representing the extraction of a large medical diagnostic model, The number of key fields is determined by the number of key fields, Representing the weight of the ith critical field of the t-th training, Representing a loss function.
- 8. The training method of a clinical auxiliary diagnostic model according to claim 7, wherein the weight of the i-th key field is calculated by: ; Wherein, the Representing the weight of the ith key field of the t-1 th round of training, The attenuation factor is indicated as such, Indicating the normalized error rate of the ith key field at the t-1 round.
- 9. The training method of a clinical auxiliary diagnostic model according to claim 1, wherein the segmentation unit adaptively segments the input text by: S11, extracting sentence sequences of an input text to obtain an embedded vector of each sentence; S12, setting the current sequence number as one, setting the block sequence number of the first sentence as the current sequence number, and taking the second sentence in the sentence sequence as the current sentence; s13, calculating the similarity of the current sentence and the corresponding block of the current sequence number, if the similarity is larger than a similarity threshold value and the length of the corresponding block of the current sequence number and the total length of the current sentence are not larger than the length threshold value, setting the block sequence number of the current sentence as the current sequence number, otherwise, adding one to the current sequence number, and setting the block sequence number of the current sentence as the current sequence number; S14, if the next sentence exists, taking the next sentence as the current sentence, returning to the step S13, and if not, ending the blocking.
- 10. The method of training a clinical auxiliary diagnostic model according to claim 2, wherein the calculation is performed using the following formula : ; Wherein, the The basic proportion is indicated as such, Indicating the length of the i-th block, Representing the length of the i +1 th partition, Indicating the length of the minimum overlap that is to be achieved, Representing the maximum overlap length.
Description
Training method of clinical auxiliary diagnosis model Technical Field The invention relates to the technical field of medical information processing, in particular to a training method of a clinical auxiliary diagnosis model. Background Clinical decision support systems are a core component of modern intelligent medical science, and the goal is to provide decision support such as diagnosis suggestions, treatment plan recommendations and the like for doctors by analyzing patient medical information. With the development of artificial intelligence technologies such as reinforcement learning, deep learning and the like, a clinical decision support system based on natural language processing gradually evolves from a traditional rule system into a data-driven deep learning model. The current technical route is mainly divided into two major categories, namely a special small model path, namely training a simplified model (such as a BiLSTM-CRF-based entity identification model and a CNN-based classification model) for a specific department or disease. The model parameter is usually in the millions to tens of millions, and has the characteristics of light deployment and high reasoning speed. And a general large model path adopts a pre-trained large-scale language model (such as GPT series, LLaMA and other medical fine-tuning versions) to realize the functions of clinical question-answering, diagnosis reasoning and the like through instruction fine-tuning. The model has billions of parameters and even billions, and has strong semantic understanding and logical reasoning capability. In the prior art, an end-to-end model is directly trained by simply connecting a small model and a large model in series, and due to different training targets of the small model and the large model and poor synergy, in a series architecture, errors of the small model can be directly transmitted and amplified, and correction cannot be performed through downstream task loss, so that the accuracy of diagnosis decision is affected. Disclosure of Invention In view of the above analysis, the present invention aims to provide a training method of a clinical auxiliary diagnostic model, which is used for solving the problem that the existing size model has poor cooperativity to influence the accuracy of diagnostic decision. In one aspect, the embodiment of the invention provides a training method of a clinical auxiliary diagnosis model, which comprises the following steps: acquiring medical texts of a plurality of patients to construct a sample set; constructing a text compression model and a medical diagnosis large model, and respectively pre-training the constructed text compression model and the medical diagnosis large model based on the sample set; Fixing parameters of a large medical diagnosis model, taking compressed text output by a text compression model as input of the large medical diagnosis model, and carrying out adaptation optimization on the text compression model based on diagnosis task loss of the large medical diagnosis model; And fixing parameters of the text compression model, taking the compressed text output by the text compression model as the input of the medical large model, and performing fine adjustment on the medical large model based on the diagnosis task loss of the medical large model to obtain a trained clinical auxiliary diagnosis model. Based on a further improvement of the above method, the text compression model comprises: The input layer is used for preprocessing an input text; The sharing encoder is used for extracting the characteristics of the preprocessed text to obtain sharing characteristics; The compression task head is used for decoding the shared characteristics to predict compressed text; and the key information extraction head is used for decoding the shared characteristics and predicting key field information. Based on the further improvement of the method, the input layer preprocesses the input text and comprises self-adaptive blocking of the input text, and the input layer comprises: the block dividing unit is used for dividing the input text based on the similarity; and the overlap adjusting unit is used for adjusting the blocks based on the multi-factor fusion to determine the overlap area between the adjacent blocks, so as to obtain a final block dividing result. Based on a further improvement of the above method, the overlap adjustment unit determines the overlap region between adjacent blocks based on multi-factor fusion in the following manner: ; Wherein, the Representing the size of the overlap area of the i-th block and the i + 1-th block determined based on the semantic unit,Represents the overlap area size of the i-th block and the i+1-th block determined based on the block length,Represents the overlap area size of the i-th block and the i + 1-th block determined based on the entity density,、AndAll of which represent the weight coefficient,Representing the overlap area length of the i-