CN-122023291-A - Multi-mode feature fusion-based macular crack Kong Shu postvisual prediction method and electronic equipment

CN122023291ACN 122023291 ACN122023291 ACN 122023291ACN-122023291-A

Abstract

The invention relates to a multi-modal feature fusion-based macular crack Kong Shu postvisual prediction method and electronic equipment. The prediction model integrates a feature extraction backbone network, a retina layering attention module, a split hole positioning and enhancing module and a multi-mode fusion module. The method comprises the steps of extracting multi-scale features of a preoperative OCT image through a main network, focusing a layered mask guide model generated based on a segmentation network on key structures such as an ellipsoid belt and an external membrane by utilizing a retina layered attention module, adaptively strengthening split hole region features according to a positioning result of a target detection network by utilizing a split hole positioning and strengthening module, and finally deeply fusing the processed image features and clinical parameters by utilizing an attention mechanism through a multi-mode fusion module and outputting an accurate postoperative vision prediction value. The invention realizes automatic and high-precision personalized prediction and overcomes the defects of strong subjectivity, low efficiency and insufficient information fusion of the traditional method.

Inventors

ZHANG HONGBING
ZHENG JIANGBIN
YANG ZHAO
CHENG LINA
XUE YANYAN

Assignees

陕西省眼科研究所
西北工业大学

Dates

Publication Date: 20260512
Application Date: 20260115

Claims (10)

1. The macular Kong Shu postvisual prediction method based on multi-modal feature fusion is characterized by comprising the following steps of: acquiring preoperative OCT images and clinical parameters of an optical coherence tomography of a patient; Inputting the preoperative OCT image and the clinical parameters into a trained prediction model to obtain a postoperative vision prediction result; wherein the predictive model processes the preoperative OCT image and the clinical parameters by: Extracting multi-scale image features from the preoperative OCT image using a feature extraction backbone network; using a retina layering attention module to weight-adjust the multi-scale image features based on a layering mask generated by a retina segmentation network to focus on a specific retina structure area; Positioning a crack region based on a target detection network by using a crack positioning and enhancing module, and adaptively enhancing crack region characteristics in the multi-scale image characteristics; the multi-mode fusion module is utilized to carry out depth fusion on the image characteristics subjected to weight adjustment and self-adaptive enhancement and the clinical parameters subjected to embedded coding, so as to generate fusion characteristics; and outputting the postoperative vision prediction result through a regressive based on the fusion characteristics.
2. The multi-modal feature fusion-based macular post-Kong Shu vision prediction method according to claim 1, wherein the step of weighting the multi-scale image features based on a layering mask generated by a retina segmentation network using a retina layering attention module comprises: acquiring a layering mask generated by a retina segmentation network according to the preoperative OCT image, wherein the layering mask comprises a probability map of key retina layers of an ellipsoid band and an external membrane; Inputting the layering mask and the image features output by the feature extraction backbone network into an attention sub-network together to generate a space weight graph; By introducing a learnable background suppression parameter and a retina enhancement parameter, differential weight suppression and enhancement are carried out on a background area and a key retina structure area based on the space weight map; And weighting the image features output by the feature extraction backbone network by using the space weight graph after differential weight inhibition and enhancement to realize feature reconstruction.
3. The multi-modal feature fusion-based macular hole Kong Shu posterior vision prediction method according to claim 2, wherein the steps of using a hole locating and enhancing module to locate a hole region based on a target detection network and adaptively enhance the hole region features in the multi-scale image features, include: Acquiring the coordinates of a crack boundary box output by a target detection network according to the preoperative OCT image; Cutting out the feature of the crack region from the image feature output by the feature extraction backbone network through ROI alignment operation according to the coordinate of the crack boundary frame; Processing the characteristics of the split hole area by using a parameterized convolution network to generate an attention weight graph of the split hole area; And restoring the attention weight graph of the split hole area to the full graph size through interpolation operation, and controlling the fusion strength of the attention weight graph and the image characteristics output by the characteristic extraction main network through a learnable global enhancement factor to realize the characteristic enhancement of the split hole area.
4. The multi-modal feature fusion-based macular degeneration Kong Shu posterior vision prediction method of claim 3, wherein the step of deep fusion of the weight-adjusted and adaptively-enhanced image features with the embedded encoded clinical parameters using a multi-modal fusion module comprises: converting the clinical parameters into high-dimensional feature vectors through an embedding layer; performing dimension alignment on the high-dimensional feature vector and the image depth feature subjected to weight adjustment and self-adaptive enhancement; and adopting a multi-head self-attention mechanism to perform bidirectional interaction and fusion on the high-dimensional feature vector with the aligned dimensions and the image depth feature, and outputting fusion features.
5. The multi-modal feature fusion-based macular post-Kong Shu vision prediction method according to claim 1, wherein the trained prediction model is obtained by a pre-training and fine-tuning strategy comprising: Pre-training the feature extraction backbone network on a generic image recognition dataset; and performing end-to-end fine tuning on the whole prediction model on a medical data set containing the pre-operation OCT image, clinical parameters and corresponding postoperative vision labels of the patient with the macular hole.
6. The multi-modal feature fusion-based macular degeneration prediction method according to claim 1, further comprising a data preprocessing step, including one or more of reflection filling, center clipping, and random affine transformation, before inputting the pre-operative OCT image and the clinical parameters into a trained prediction model.
7. The method for predicting postmacular degeneration Kong Shu based on multimodal feature fusion according to claim 1, wherein the retinal segmentation network is a U-net3+ network.
8. The multi-modal feature fusion-based macular degeneration prediction method according to claim 1, wherein the object detection network is YOLOv network.
9. The multi-modal feature fusion-based post-macular vision prediction method of claim 1, wherein the feature extraction backbone network is a ConvNeXt network.
10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor, when executing the program, implements the multi-modal feature fusion-based macular degeneration prediction method of any one of claims 1 to 9.

Description

Multi-mode feature fusion-based macular crack Kong Shu postvisual prediction method and electronic equipment Technical Field The invention relates to the technical field of medical image processing and artificial intelligence, in particular to a multi-mode feature fusion-based macular Kong Shu postvisual prediction method and electronic equipment. Background Macular Hole (MH) is a serious blinding fundus disease, and although the surgical treatment can achieve a high anatomical closure rate, the degree of postoperative vision recovery is significantly different. Accurate prediction of post-operative vision is critical to doctor-patient communication, surgical plan decision making, and patient expectation management. Currently, post-operative vision prediction relies mainly on manual measurement and analysis of morphological parameters in pre-operative OCT images by physicians, such as the minimum diameter of the stoma, the base diameter, the macular index (MHI) and the Ellipsoidal Zone (EZ), the integrity of the external membrane (ELM), etc. However, the method has obvious limitations that firstly, the manual measurement efficiency is low, subjective deviation and inter-observer variability are introduced, and the clinical large-scale application requirements are difficult to meet, secondly, the preset morphological parameters cannot comprehensively and deeply describe complex textures and structural features related to prognosis in OCT images, and finally, the method is difficult to effectively integrate clinical parameters (such as age, disease course and the like) of patients, so that personalized comprehensive prediction is realized. With the development of artificial intelligence technology, studies have been attempted to make predictions using machine learning or deep learning models. These prior art solutions suffer from one or more of the following drawbacks: the feature extraction relies on manual work, and most machine learning models still rely on manually measured parameters as input, so that end-to-end automatic feature learning cannot be realized. The focus on the key area is insufficient, the existing deep learning model lacks a specific focus mechanism on the focus area (the split hole) and the key retina structure (EZ and ELM layers) when the OCT image is processed, and the model is difficult to focus on the fine features which are most important for prognosis from a complex background. The multi-modal information fusion shallow layer is that the depth fusion of the OCT image depth feature and the clinical parameter at the feature level can not be realized, and is usually only simple feature splicing or decision level fusion, and the complementarity and the relevance between the multi-source information can not be fully mined. The model has insufficient generalization capability, and the model is easy to be fitted because of limited scale and uneven quality of training data, so that the prediction performance is obviously reduced when the model is applied to data collected by different equipment and different medical institutions. Thus, there is a great need in the art for a new solution that can automatically, precisely, and individually predict vision after macular Kong Shu. Disclosure of Invention The invention aims to provide a multi-modal feature fusion-based macular crack Kong Shu postvisual acuity prediction method so as to solve the problems in the background technology. In order to achieve the above purpose, the present invention provides the following technical solutions: A multi-modal feature fusion-based macular crack Kong Shu postvisual prediction method comprises the following steps: acquiring preoperative OCT images and clinical parameters of an optical coherence tomography of a patient; Inputting the preoperative OCT image and the clinical parameters into a trained prediction model to obtain a postoperative vision prediction result; wherein the predictive model processes the preoperative OCT image and the clinical parameters by: Extracting multi-scale image features from the preoperative OCT image using a feature extraction backbone network; using a retina layering attention module to weight-adjust the multi-scale image features based on a layering mask generated by a retina segmentation network to focus on a specific retina structure area; Positioning a crack region based on a target detection network by using a crack positioning and enhancing module, and adaptively enhancing crack region characteristics in the multi-scale image characteristics; the multi-mode fusion module is utilized to carry out depth fusion on the image characteristics subjected to weight adjustment and self-adaptive enhancement and the clinical parameters subjected to embedded coding, so as to generate fusion characteristics; and outputting the postoperative vision prediction result through a regressive based on the fusion characteristics. Optionally, the step of using a retina layered attention module