CN-121983220-A - Method for generating craniocerebral CT report and computer program product

CN121983220ACN 121983220 ACN121983220 ACN 121983220ACN-121983220-A

Abstract

The application discloses a brain CT report generating method and a computer program product, which utilize a pre-trained report generating model to carry out visual coding on a three-dimensional brain CT image to obtain a preliminary feature map, carry out global aggregation on the preliminary feature map and process the preliminary feature map through an adapter to obtain a global coding vector, output disease category information and focus positioning information through a multi-branch network based on the global coding vector, generate probability masks corresponding to the space of the preliminary feature map according to the focus positioning information and carry out spatial weighting on the preliminary feature map to obtain a mask feature map, carry out mask learning coding on the mask feature map to obtain focus strengthening features, carry out feature projection on the global coding vector and fuse the focus strengthening features to obtain fusion features, and process the fusion features and preset prompt words to output a report text sequence. The method utilizes mask weighting and feature fusion of positioning guidance to enhance focus related characterization, improves coverage and consistency of reports on key diagnosis elements, and reduces irrelevant background interference.

Inventors

CHEN JIAJUN
YANG BAOGUANG
DAI SHICHEN
CHEN MAODONG
CHENG DALONG
WU ZIHAO
LI CHUANFU
YIN BAOCAI
WEI SI
LU XIAOLIANG
HE ZHIYANG

Assignees

安徽影联云享医疗科技有限公司
科大讯飞华南人工智能研究院（广州）有限公司
讯飞医疗科技股份有限公司
科大讯飞股份有限公司

Dates

Publication Date: 20260505
Application Date: 20260120

Claims (10)

1. A method for generating a craniocerebral CT report, wherein the method utilizes a pre-trained report generation model to process a three-dimensional craniocerebral CT image, the method comprising: performing visual coding on the three-dimensional craniocerebral CT image to obtain a preliminary feature map corresponding to the three-dimensional craniocerebral CT image; performing global aggregation on the preliminary feature images and processing the preliminary feature images through an adapter to obtain global coding vectors; generating, over a multi-branch network, an auxiliary output for characterizing a diagnostic element based on the global encoding vector, the auxiliary output including at least disease category information and lesion localization information; Generating a probability mask according to the focus positioning information, and carrying out space weighting on the preliminary feature map by utilizing the probability mask to obtain a mask feature map, wherein the probability mask is a weight map corresponding to the preliminary feature map space; performing mask learning coding on the mask feature map to obtain focus strengthening features; Performing feature projection on the global coding vector to obtain projection features, and performing fusion processing on the projection features and the focus reinforcement features to obtain fusion features; and processing the fusion characteristics and preset prompt words to obtain a report text sequence corresponding to the three-dimensional craniocerebral CT image.
2. The method of claim 1, wherein the training process of the pre-trained report generation model comprises: acquiring a plurality of training samples, wherein each training sample comprises a training three-dimensional craniocerebral CT image and a real report text corresponding to the training three-dimensional craniocerebral CT image; Extracting a disease category label from the real report text; dividing and predicting the training three-dimensional craniocerebral CT image by utilizing a pre-trained nn-UNet model to obtain a pseudo tag, wherein the pseudo tag is used as monitoring information of focus positioning information; inputting the training three-dimensional craniocerebral CT image into a report generating model to be trained to obtain a report text sequence, disease category information and focus positioning information corresponding to the training sample; determining generation loss based on the real report text and a report text sequence corresponding to the training sample; Determining a classification loss based on the disease category label and disease category information corresponding to the training sample; Determining segmentation loss based on focus positioning information corresponding to the pseudo tag and the training sample, wherein the segmentation loss comprises a Dice loss and a cross entropy loss; And updating parameters of the report generating model to be trained based on the weighted sum of the generating loss, the classifying loss and the dividing loss to obtain the trained report generating model.
3. The method of claim 1, wherein generating a probability mask from the lesion localization information and spatially weighting the preliminary feature map with the probability mask to obtain a mask feature map comprises: Thresholding is carried out on the focus positioning information to obtain a binary mask; Performing two-dimensional Gaussian filtering processing on the connected region of the binary mask to obtain the probability mask; and multiplying the probability mask with the preliminary feature map element by element to obtain the mask feature map.
4. The method of claim 1, wherein the fusing the projection feature and the lesion reinforcement feature to obtain a fused feature comprises: and fusing the projection features and the focus strengthening features by adopting a cross attention mechanism, and obtaining the fusion features by taking the projection features as query vectors and the focus strengthening features as key vectors and value vectors.
5. The method according to claim 1, wherein the inputting the fusion feature and the preset prompt word into a language model to obtain a report text sequence corresponding to the three-dimensional craniocerebral CT image includes: And inputting the fusion features and the preset prompt words into the language model together in a visual prefix form, and outputting the report text sequence, wherein the preset prompt words comprise instruction words for instructing the language model to generate a diagnosis report.
6. The method of claim 1, wherein the multi-branch network comprises a disease classification branch and a lesion localization branch; The generating, over a multi-branch network, an auxiliary output for characterizing a diagnostic element based on the global encoding vector, comprising: processing the global coding vector by utilizing a disease classification branch, and outputting probability distribution of disease categories as the disease category information; and processing the global coding vector by using a focus positioning branch, and outputting a positioning map corresponding to the three-dimensional craniocerebral CT image space as focus positioning information.
7. The method of claim 6, wherein the lesion localization information includes disease area channels and positive anatomical site channels.
8. The method of claim 1, wherein performing mask learning encoding on the mask feature map to obtain focus reinforcement features comprises: extracting features of the mask feature map to obtain intermediate features; And carrying out context information aggregation on the intermediate features, and outputting focus strengthening feature vectors as the focus strengthening features.
9. The method of claim 1, wherein the performing global aggregation on the preliminary feature map and processing the preliminary feature map through an adapter to obtain a global encoding vector comprises: carrying out global average pooling on the preliminary feature map to obtain a global feature vector; And inputting the global feature vector into an adapter for feature mapping to obtain the global coding vector.
10. A computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps of the method of claims 1-9.

Description

Method for generating craniocerebral CT report and computer program product Technical Field The application relates to the technical field of medical artificial intelligence and computer vision intersection, in particular to a craniocerebral CT report generation method and a computer program product. Background The craniocerebral CT examination is widely applied to screening, diagnosis and curative effect evaluation of nervous system diseases due to the characteristics of high imaging speed, sensitivity to emergency such as hemorrhage, occupation and the like, wide clinical coverage and the like. In clinical practice, imaging physicians often need to form structured or semi-structured imaging reports on a review basis to describe key image signs, give diagnostic trends, and make follow-up recommendations. Along with the continuous increase of the image examination amount, how to automatically generate the craniocerebral CT image report by means of the artificial intelligence technology so as to improve the report writing efficiency, reduce the workload and improve the report consistency, and becomes an important research direction in the field of intelligent analysis of medical images. The existing automatic craniocerebral CT report generation technology mostly adopts a deep-learning 'visual coding and text decoding' paradigm, namely, a convolutional neural network or a visual transducer and other models are utilized to code image features, and a language model is utilized to generate report text according to autoregressive coding features. However, the craniocerebral CT belongs to three-dimensional data, has huge image information quantity, and the focus area truly related to diagnosis is often characterized by small volume, sparse distribution and discrete distribution, and under the condition of only relying on the loss of a generating task, the model is easily interfered by a large amount of normal tissue information, so that focus detail is difficult to be stabilized, and the problems of missing key symptoms, unclear focus positioning expression, deviation of diagnosis conclusion, text assumption and the like are further caused. In addition, report generation not only requires consistent whole semantics, but also requires accurate depiction of diagnostic elements such as disease types, positive anatomical parts and the like, and the cost for acquiring pixel-level labeling data such as focus segmentation and the like is high, so that model training is difficult to obtain strong enough positioning supervision, and the reliability and clinical usability of report generation are further limited. Therefore, in automatic generation of a craniocerebral CT report, the problems that a model is difficult to effectively focus due to small and sparse focus areas in three-dimensional data, key signs are easy to miss due to supervision of a generation task and diagnostic elements are not accurately expressed, and positioning supervision is insufficient due to high accurate focus labeling cost become urgent to be solved. Disclosure of Invention The application provides a method and a computer program product for generating a craniocerebral CT report, which aim to solve the problems that in the automatic generation of the craniocerebral CT report in the prior art, a focus area is small and sparse in three-dimensional data, a model is difficult to effectively focus, key sign omission and inaccurate expression of diagnostic elements are easily caused only by depending on generation task supervision, and positioning supervision is insufficient due to high accurate focus labeling cost. In a first aspect, a method for generating a craniocerebral CT report, the method using a pre-trained report generation model to process a three-dimensional craniocerebral CT image, the method comprising: performing visual coding on the three-dimensional craniocerebral CT image to obtain a preliminary feature map corresponding to the three-dimensional craniocerebral CT image; performing global aggregation on the preliminary feature images and processing the preliminary feature images through an adapter to obtain global coding vectors; generating, over a multi-branch network, an auxiliary output for characterizing a diagnostic element based on the global encoding vector, the auxiliary output including at least disease category information and lesion localization information; Generating a probability mask according to the focus positioning information, and carrying out space weighting on the preliminary feature map by utilizing the probability mask to obtain a mask feature map, wherein the probability mask is a weight map corresponding to the preliminary feature map space; performing mask learning coding on the mask feature map to obtain focus strengthening features; Performing feature projection on the global coding vector to obtain projection features, and performing fusion processing on the projection features and the focus reinforcement f