Search

CN-122000081-A - Method for constructing hepatitis B prevention and control large language model

CN122000081ACN 122000081 ACN122000081 ACN 122000081ACN-122000081-A

Abstract

The invention relates to a method for constructing a large hepatitis B prevention and control language model, which realizes an intelligent response mechanism for automatically switching a health education mode or a clinical decision support mode based on user role attributes by integrating evidence-based medical knowledge and retrieval enhancement generation and fine adjustment integrated systems related to hepatitis B prevention and control, and provides a medical evidence tracing function so as to improve the accuracy and verifiability of hepatitis B prevention and control information, support offline deployment on edge computing equipment and prolong the life cycle of the model. The invention solves the problems of inaccurate knowledge, single service mode, non-traceability of conclusions and the like of the general large model in the field of hepatitis B prevention and control medicine, improves the accuracy, individuation and clinical credibility of the hepatitis B prevention and control information, and provides reliable evidence-based support for the full-period management of hepatitis B infected persons.

Inventors

  • LI CAIXIA
  • ZHAO YANGGUO
  • YUAN GENGYANG
  • YANG XINRUI
  • FU XIA

Assignees

  • 中山大学附属第八医院(深圳福田)

Dates

Publication Date
20260508
Application Date
20260127

Claims (10)

  1. 1. A method for constructing a large hepatitis B prevention and control language model is characterized by comprising the following steps: S1, constructing a hepatitis B evidence-based medical knowledge base, namely extracting hepatitis B prevention and control text content from clinical practice guidelines and medical research documents, preprocessing and semantically slicing the extracted text content, generating text feature vectors and constructing a retrievable vector index database; S2, performing parameter fine adjustment on the general model to form a special model, namely performing parameter fine adjustment on the general pre-training language model by adopting a low-rank adaptation technology based on the hepatitis B evidence-based medical knowledge base, and enabling the model to internalize the hepatitis B professional knowledge and clinical reasoning capacity through the hepatitis B diagnosis and treatment dialogue corpus training model to finally form a hepatitis B prevention and control special model; s3, constructing an integrated system of search enhancement generation and fine tuning model, which comprises the following steps: the query analysis unit is used for analyzing user input and generating a search query; A knowledge retrieval unit for retrieving relevant medical evidence from the vector index database; The evidence fusion unit is used for fusing the search result with the internal knowledge of the special hepatitis B prevention and control model; a content generation unit for generating response content based on the fused knowledge; S4, a role self-adaptive response mechanism is used for automatically switching a patient health education mode or a clinical decision support mode according to the user identity attribute based on the integrated system to generate corresponding response content; S5, medical evidence tracing, namely embedding an evidence tracing system in the content generation unit to enable the generated content to be related to the original guideline source, the evidence grade and the recommendation strength, and constructing a verifiable link of the problem-evidence-conclusion.
  2. 2. The method for constructing a large language model for preventing and controlling hepatitis b according to claim 1, wherein the preprocessing and semantic slicing of the extracted text content in step S1 specifically comprises: identifying and removing non-text content, and reserving a medical text chapter structure and chart reference relation; Determining a professional term boundary by applying syntactic dependency analysis, and generating a semantic slice by combining the medical concept integrity evaluation; labeling source document identification, version information and evidence grade for each semantic slice, and establishing a traceable medical knowledge unit.
  3. 3. The method for constructing a large language model for preventing and controlling hepatitis b according to claim 1, wherein the generating text feature vectors and constructing a retrievable vector index database in step S1 specifically comprises: Converting the semantic slice into a fixed dimension feature vector by adopting a general semantic representation model, and carrying out vector normalization processing; And constructing a multi-layer index structure based on an approximate nearest neighbor search algorithm, and establishing a mapping relation between the semantic slice and the feature vector to realize efficient retrieval based on medical semantic similarity.
  4. 4. The method for constructing a large-scale hepatitis b prevention and control language model according to claim 1, wherein the performing parameter fine adjustment by using a low-rank adaptation technique in step S2 specifically comprises: Inserting a low-rank decomposition matrix into a self-attention layer and a feedforward network layer of a general pre-training language model, setting parameter updating constraint conditions, enabling model updating to be concentrated on professional terms and semantic expressions related to hepatitis B diagnosis and treatment, maintaining general language capability and enhancing professional field adaptability.
  5. 5. The method for constructing a large language model for preventing and controlling hepatitis b according to claim 1 or 4, wherein the training model for training the corpus of diagnosis and treatment of hepatitis b in step S2 specifically comprises: collecting real question-answer records of clinicians and patients, and carrying out structural labeling according to the type of the questions, evidence sources and recommended intensity; The progressive training strategy is adopted, basic knowledge coverage training is firstly carried out, and then complex clinical reasoning capability training is carried out, so that the logic structure and decision path of the hepatitis B diagnosis and treatment guide are internalized by the model.
  6. 6. The method for constructing a large model of hepatitis b prevention and control language according to claim 1, wherein in step S3, the search result is fused with the internal knowledge of the special model of hepatitis b prevention and control, specifically comprising: calculating the relevance weight of the search medical evidence and the user query, evaluating the source authority of the evidence, and carrying out credibility weighting on the multi-source evidence; And carrying out layering fusion on the weighted external evidence and the internal knowledge of the model, and preferentially adopting the external knowledge with high evidence grade to correct the internal knowledge characterization of the model.
  7. 7. The method for constructing a large language model for preventing and controlling hepatitis b according to claim 1, wherein the step S4 specifically comprises the following steps: S41, receiving user identity information or automatically identifying the user role type through user interaction behavior; S42, when the role of the patient is identified, activating a full-period health education mode, determining a prevention and control stage according to individual characteristics of the patient, generating a prevention screening and vaccination personalized guide for primary prevention and control, generating a treatment scheme and a follow-up monitoring suggestion for secondary prevention and control, generating a liver cancer risk assessment and monitoring plan for tertiary prevention and control, when the role of the medical care personnel is identified, activating a evidence-based decision support mode, analyzing clinical case characteristics, generating a treatment indication judgment and scheme matching suggestion based on the latest guide, and providing a special crowd treatment accurate adaptation strategy and a treatment response monitoring and scheme adjustment decision support.
  8. 8. The method for constructing a large language model for preventing and controlling hepatitis b according to claim 1, wherein the step S5 specifically comprises the steps of: S51, automatically labeling corresponding guideline source identifiers for each medical conclusion in the content generation unit, associating the medical conclusion with corresponding evidence grade and recommendation strength, constructing a semantic mapping index from a problem to medical evidence, and embedding an interactable evidence chain identifier in response content; and S52, dynamically displaying the evidence original text and the recommendation basis in response to the triggering operation of the user on the traceability mark, and generating a visual problem-evidence-conclusion association path so that the medical conclusion has verifiability and traceability.
  9. 9. The method for constructing a large language model for preventing and controlling hepatitis b according to claim 1, wherein the integrated system further comprises: and the conflict detection and solution unit is used for identifying the content contradiction between the new guideline and the old guideline in the evidence fusion stage, carrying out knowledge fusion by adopting a time priority principle and an evidence grade weighting strategy, ensuring that the output content accords with the latest medical guideline and keeping the historical knowledge traceable.
  10. 10. The method for constructing a large language model for preventing and controlling hepatitis b according to claim 1, wherein the method further comprises the steps of: And S6, deploying a knowledge distillation and lightweight model, namely using the special hepatitis B prevention and control model as a teacher model, using a lightweight model as a student model, migrating the clinical reasoning capacity of the teacher model to the student model through a knowledge distillation technology, and deploying the lightweight student model on edge computing equipment to support decision support and periodic knowledge updating in an offline environment.

Description

Method for constructing hepatitis B prevention and control large language model Technical Field The invention relates to the technical field of medical information processing and natural language processing, in particular to a method for constructing a hepatitis B prevention and control large language model. Background Hepatitis b virus (HEPATITIS B VIRUS, HBV, abbreviated as hepatitis b) has not been kept away as a global public health problem. China is the country with the highest burden of hepatitis B, and the number of hepatitis B infection and death is about one third of the world. The year's expertise in Chinese statistics indicates that the number of hepatitis B infection in 2022 is up to 8600 thousands, and the number of hepatitis B cases still stands in the first place of legal reporting infectious diseases in China. In the full period management of hepatitis B, high-quality health education consultation and clinical decision support are required from preventive screening, diagnosis and treatment to liver cancer monitoring. At present, the prevention and control of hepatitis B mainly depend on two types of technical modes: The traditional medical consultation and decision mode takes medical staff as a core, and provides full-flow health guidance and clinical decision by means of clinical guidelines (such as chronic hepatitis B control guidelines, primary liver cancer diagnosis and treatment guidelines) and clinical experience. However, the mode is limited by the real bottlenecks of shortage of talents of liver disease specialists, shortage of clinical diagnosis and treatment time and the like, the comprehensive requirements of huge infected groups are difficult to cover, diagnosis and treatment deviation caused by individual experience differences cannot be effectively reduced, and the service efficiency is low and the standardization degree is insufficient. The digital tool auxiliary mode comprises mobile applications such as hepatitis B prevention and control APP, applet and the like facing patients and a simple decision support tool facing medical staff. Although the tool realizes the basic health management function, the response content is mostly a fixed answer preset by a program, only the basic questions of common scenes such as transmission paths, vaccination and the like can be covered, the personalized response capability based on individual characteristics (such as age, illness stage, complications and liver cancer family history) of patients is lacking, and real-time guidance cannot be provided according to dynamic update of authority guidelines and changes of the illness states of the patients. In recent years, a general large language model is tried to be applied to medical information inquiry by virtue of natural language processing capability, but special optimization is not carried out aiming at the hepatitis B prevention and control field. The prior art has two core defects that firstly, the knowledge accuracy is seriously insufficient, the general model has essential deviation on understanding of professional contents such as hepatitis B screening standard, antiviral treatment indication, liver cancer monitoring frequency and the like, the accuracy is lower than 70%, misleading information is easy to generate, secondly, authoritative evidence support and traceability are lacking, a binding relation is not established with a hepatitis B prevention and control authoritative guideline, corresponding guideline sources, chapters and evidence grades are not marked, a tracing path is not provided, and the uncertainty and risk of clinical application are obviously increased. Therefore, a method for constructing a large-scale hepatitis B prevention and control language model integrating evidence-based medical knowledge and having both accuracy and timeliness is needed, so as to solve the defects of the prior art in aspects of professional knowledge depth, personalized adaptation, clinical traceability and the like. Disclosure of Invention Based on the current situation, the invention aims to provide a method for constructing a large-scale hepatitis B prevention and control language model, so as to solve the key problems that a general large-scale language model in the prior art has insufficient knowledge accuracy, lacks evidence-based medical support, cannot distinguish user role demands, cannot trace medical conclusions and the like in the hepatitis B professional field. The technical scheme adopted for solving the technical problems is as follows: The invention provides a method for constructing a large hepatitis B prevention and control language model, which comprises the following steps: S1, constructing a hepatitis B evidence-based medical knowledge base, namely extracting hepatitis B prevention and control text content from clinical practice guidelines and medical research documents, preprocessing and semantically slicing the extracted text content, generating text feature vectors a