CN-121999955-A - Method and system for generating radiology report based on focus guide mask and knowledge map enhancement

CN121999955ACN 121999955 ACN121999955 ACN 121999955ACN-121999955-A

Abstract

The invention discloses a radiology report generation method and system based on focus guide mask and knowledge graph enhancement, and belongs to the technical field of medical artificial intelligence. The method comprises the steps of preprocessing chest medical images, extracting local features, constructing focus knowledge maps containing organ-focus-disease types, obtaining candidate organs and mask images thereof through a pre-trained focus detection model, generating focus enhancement knowledge features based on the candidate organs by searching related knowledge from the knowledge maps, inputting organ masks and original images into a mask image attention layer to obtain mask enhancement image features, carrying out self-adaptive weighted fusion on the two types of features by utilizing a multi-branch gating cross-mode fusion module to obtain fusion representation of knowledge mask enhancement, and finally generating a radiology report through a transducer encoder-decoder. The invention can realize image-text alignment of organ granularity, effectively reduce language illusion in the generated report and promote clinical consistency and interpretability of the report.

Inventors

ZHANG YIJIA
Song Xiangkang
LIU ZHI

Assignees

大连海事大学

Dates

Publication Date: 20260508
Application Date: 20251230

Claims (8)

1. A method for generating a radiological report based on lesion guide masking and knowledge-graph enhancement, comprising the steps of: acquiring a chest medical image and a radiology report text corresponding to the chest medical image; Extracting organ-focus-disease triads from the radiology report text by using a large language model and coding to form a focus knowledge graph; The focus detection model takes the chest medical image as input, extracts the level visual characteristics, the classification head outputs a probability value for each focus, screens out the category of the candidate focus by setting a threshold value, and finally determines candidate organs according to the organ position priori of the candidate focus; inputting the organ mask image and the original image into a mask image attention layer to obtain mask enhanced image characteristics; the mask image attention layer takes the original chest image features and organ mask image features as inputs, and outputs a group of mask enhanced image feature representations, wherein the feature is more prominent in the surrounding area of the focus than the original image features; performing gate-controlled dynamic weighted fusion on the mask enhanced image features and the focus enhanced knowledge features to obtain fusion features after knowledge mask enhancement; the fusion features are input into a transducer encoder-decoder framework, generating a radiological report.
2. The method of claim 1, wherein the chest medical image is preprocessed to extract local image features to obtain a vectorized representation of the image, and the radiological report text is preprocessed to obtain training report text for supervising a training report generation model.
3. The method of claim 1, wherein preprocessing the chest medical image to extract local image features and obtain a vectorized representation of the image comprises: the chest X-ray image is divided into a plurality of image blocks, and local image features are extracted through a pre-trained visual encoder RAD-DINO, so that vectorized representation of the image is obtained.
4. The method of claim 1, wherein obtaining a list of candidate lesions and organs based on the pre-trained lesion detection model and generating a corresponding organ mask image comprises: and performing preliminary detection on the image based on the pre-trained focus detection model torchxrayvision to obtain a candidate focus and organ list, and generating the organ mask image.
5. The method of claim 1, wherein inputting the candidate lesion mask with the original image into the masked image attention layer to obtain mask enhanced image features, comprising: Retrieving medical priori knowledge according to the organ information of the candidate focus, and obtaining focus enhancement knowledge features through a linear mapping layer; and inputting the candidate focus mask image and the original image into a mask image attention layer together to obtain mask enhanced image features.
6. The method of claim 1, wherein gating dynamic weighted fusion of mask enhanced image features and lesion enhanced knowledge features to obtain knowledge mask enhanced fusion features comprises: Aligning the lesion enhancement knowledge features with the mask enhanced image features in a spatial dimension; Respectively generating channel weights of two paths of features through a gating network, and summing according to the weights; And reducing the weighted features into a sequence form to obtain the fusion feature representation.
7. The method of claim 1, wherein inputting the fused representation into a transducer encoder-decoder framework, generating a radiological report, comprises: sending the fused representation to a transducer encoder-decoder, generating a radiology report word by word; And optimizing the difference between the generated report and the reference report by adopting cross entropy loss, so as to ensure the fluency of the generated language.
8. A radiological report generation system based on lesion guide masking and knowledge-graph enhancement, comprising: The image preprocessing module is used for acquiring chest medical images and corresponding radiology report texts thereof and preprocessing the chest medical images and the corresponding radiology report texts thereof; the knowledge graph coding module is used for extracting organ-focus-disease type triplets from the radiology report text by using a large language model and coding to form focus knowledge graphs; The device comprises a preliminary focus detection module, a classification head and a detection module, wherein the preliminary focus detection module is used for obtaining a candidate focus and organ list based on a pre-trained focus detection model and generating a corresponding organ mask image; The multi-source information enhancement module is used for retrieving triplets related to the candidate focus from the knowledge graph to generate focus enhancement knowledge features, and inputting a candidate focus mask and an original image into a mask image attention layer to obtain mask enhancement image features, wherein the mask image attention layer takes the original chest image features and organ mask image features as inputs; the multi-branch gating cross-mode fusion module is used for performing gating dynamic weighted fusion on mask enhanced image features and focus enhanced knowledge features to obtain fusion features after knowledge mask enhancement; and the decoding and training module is used for inputting the fusion representation into a converter encoder-decoder framework to generate a radiology report.

Description

Method and system for generating radiology report based on focus guide mask and knowledge map enhancement Technical Field The invention belongs to the field of medical artificial intelligence, relates to an automatic radiological report generation technology, and particularly relates to a radiological report generation method and system based on focus guidance masks and knowledge graph enhancement. Background Medical imaging plays a vital role in clinical diagnosis, and particularly in the diagnosis and treatment process of radiology, the image scanning technologies such as X-ray, CT and MRI provide a key visual basis for disease detection. However, interpretation of medical images and writing of diagnostic reports requires a physician to have a great deal of expertise and a great deal of clinical experience, which is time-consuming and labor-consuming, and places a great deal of work on the radiologist. In this context, automated medical image report generation (RRG) technology has been developed as a potential solution to improve diagnostic efficiency and alleviate physician working pressure. In the chest image application scene, based on the structural information organization mode of organ dimensions (such as lung fields, cardiac shadows, mediastinums, bony chest and the like), the method is natural to fit with the chapter framework of clinical reports, is favorable for realizing fine-granularity focus positioning and evidence referencing based on anatomical parts, and meanwhile, by constructing an organ-focus-disease type knowledge map, medical priori knowledge can be introduced into a system in a retrievable and computable form, so that the semantic ambiguity brought by synonymous expression and negative expression can be unified, and the insufficiency of training samples of long tail diseases and rare diseases can be complemented through a constraint mechanism of entities and relations, thereby improving the integrity and the interpretability of a generated conclusion. In recent years, a deep learning-based medical image report generation method has been significantly advanced, wherein a transducer-based model has become a mainstream technical route. These methods typically extract global features of a medical image through a visual encoder and utilize an attention mechanism to effect a conversion from image features to textual descriptions, enabling end-to-end report generation capabilities. However, the existing methods still have the following technical problems: (1) The focus attention vision-text deviation is that the existing model carries out global processing on the whole medical image mostly, and the lack of explicit attention mechanisms on key anatomical structures and focus areas leads to alignment errors between visual features and text descriptions, so that the generated report is difficult to accurately reflect clinically important abnormality in the image; (2) The current method mainly relies on image embedded features to decode, the generated content often presents templated features, personalized expression is difficult to provide according to specific illness, and meanwhile, effective modeling of organ-level semantic association is lacking; (3) The prior method generally adopts simple characteristic splicing or channel-by-channel addition and other coarse granularity fusion strategies when processing multi-mode input such as image characteristics, focus mask information and medical knowledge triples, and the like, and the heterogeneity and complementarity of different modes in channel dimension cannot be fully considered, so that the model is difficult to fully utilize the discriminant characteristics in each mode, and the semantic integrity and clinical applicability of a generated report are affected. Therefore, aiming at the problems of visual-text alignment deviation, insufficient organ-level knowledge fusion, extensive multi-mode fusion and the like, a radiological report generation method capable of realizing accurate alignment of organ granularity graphics and texts, deep fusion of medical knowledge maps and self-adaptive multi-mode fusion capability is needed to solve the problems of semantic deviation, templatization expression, insufficient clinical consistency and the like in the prior art. Disclosure of Invention In view of the above, the invention provides a method and a system for generating a radiology report based on focus guidance mask and knowledge graph enhancement, so as to realize automatic analysis and description of chest medical images and generate a radiology report with clinical consistency, semantic accuracy and strong interpretability. The technical scheme adopted by the invention is as follows: in one aspect, the present invention provides a method for generating a radiological report based on lesion guiding mask and knowledge-graph enhancement, comprising the steps of: acquiring a chest medical image and a radiology report text corresponding to the ches