CN-121983289-A - Multi-mode image fusion nasolacrimal duct obstructive disease diagnosis and grading system

CN121983289ACN 121983289 ACN121983289 ACN 121983289ACN-121983289-A

Abstract

The application relates to the technical field of medical image processing and discloses a nasolacrimal duct obstructive disease diagnosis and grading system with multi-mode image fusion, which comprises the steps of acquiring head CT data, performing spatial standardization, generating an input tensor of an osseous channel and a soft tissue channel based on double-window parameters, constructing a double-flow attention interaction network, generating a spatial attention mask by utilizing an osseous feature extraction branch, performing weighted fusion on soft tissue pathological features through a topological guiding attention unit, executing a consistency constraint strategy on a non-labeled sample by adopting a semi-supervision model training module, updating network parameters by minimizing a compound loss function, and finally outputting disease property classification and obstruction degree grading results. The application can effectively combine the osseous anatomical structure and the soft tissue pathological information, improves the model performance by using the non-labeling data, and realizes the high-precision automatic diagnosis and classification of nasolacrimal duct obstruction.

Inventors

BAI FANG
WANG LIHUA
ZHANG ZHISHENG

Assignees

高碑店市通远科技有限公司

Dates

Publication Date: 20260505
Application Date: 20260209

Claims (10)

1. A nasolacrimal duct obstructive disease diagnosis and grading system with multi-modal image fusion, the system comprising: The data acquisition and multi-window preprocessing module is configured to acquire the computed tomography data of the head of the subject, perform spatial standardization and generate an osseous channel input tensor and a soft tissue channel input tensor based on preset double-window parameters; the system comprises a dual-flow attention interaction network construction module, a topology guidance attention unit and a soft tissue feature extraction module, wherein the dual-flow attention interaction network construction module is configured to construct a dual-flow attention interaction network comprising a bone feature extraction branch and a soft tissue feature extraction branch, the bone feature extraction branch and the soft tissue feature extraction branch are respectively used for extracting bone topological features and soft tissue pathological features from the bone channel input tensor and the soft tissue channel input tensor, and a spatial attention mask is generated based on the bone topological features by the topology guidance attention unit, and the soft tissue pathological features are subjected to weighted fusion to generate a fusion feature map; the semi-supervised model training module is configured to construct a mixed data set, execute a consistency constraint strategy on an unmarked sample, update network parameters of the dual-flow attention interaction network by minimizing a composite loss function containing supervision loss and non-supervision consistency loss, and obtain a trained dual-flow attention interaction network; And the intelligent diagnosis and grading output module is configured to input the osseous channel input tensor and the soft tissue channel input tensor of the patient to be diagnosed into the trained double-flow attention interactive network and output disease property classification and blockage degree grading results.
2. The nasolacrimal duct obstructive disease diagnosis and grading system of claim 1, wherein the specific manner in which the data acquisition and multi-window preprocessing module generates the bone and soft tissue channel input tensors comprises: inputting tensor to the osseous channel, subtracting a bone window lower limit value from a voxel value of computer tomography data, dividing the voxel value by a bone window width, and truncating and mapping a calculation result to a range from 0 to 1, inhibiting a soft tissue background and highlighting an osseous anatomical structure, wherein the bone window lower limit value is a value obtained by subtracting half of the bone window width from a bone window level; Inputting tensor to the soft tissue channel, subtracting a soft tissue window lower limit value from a voxel value of the computed tomography data, dividing the voxel value by the soft tissue window width, and truncating and mapping a computed result to a range from 0 to 1, and stretching gray contrast of the lacrimal mucosa and the filling in the lumen, wherein the soft tissue window lower limit value is a value obtained by subtracting half of the soft tissue window width from the soft tissue window level; And finally, combining the generated osseous channel input tensor and the soft tissue channel input tensor into parallel input dual-channel data.
3. The multi-modal image fused nasolacrimal duct obstructive disease diagnosis and classification system of claim 1, wherein the generating a spatial attention mask based on the bony topological features using a topological guide attention unit, the weighted fusion of the soft tissue pathology features, the specific steps of generating a fused feature map are: compressing the multi-channel bone topological feature output by the bone feature extraction branch into a single-channel feature map by utilizing a three-dimensional convolution layer; Mapping the single-channel feature map into a spatial attention mask with a value between 0 and 1 by using a Sigmoid activation function, wherein the spatial attention mask characterizes the probability response intensity of each spatial position belonging to the osseous lacrimal passage structure; performing element-by-element multiplication operation on the spatial attention mask and the soft tissue pathological features output by the soft tissue feature extraction branch to obtain a weighted feature map; And performing residual connection operation of element-by-element addition on the weighted feature map and the original soft tissue pathological feature to obtain the fusion feature map.
4. The multimodal image fusion nasolacrimal duct obstructive disease diagnosis and classification system of claim 1, wherein the intelligent diagnosis and classification output module further comprises a fully connected classification layer configured to: performing global average pooling operation on the fusion feature map, and compressing the three-dimensional feature tensor into a one-dimensional feature vector; inputting the one-dimensional feature vector to a multi-layer perceptron network, the multi-layer perceptron network comprising two independent sets of output nodes; The first group of output nodes are used for outputting classification probabilities of primary blocking and secondary blocking; The second group of output nodes are used for outputting grading probabilities of patency, slight stenosis and severe obstruction; and selecting the category with the maximum probability value as a final structural diagnosis result.
5. The multimodal image fusion nasolacrimal duct obstructive disease diagnosis and classification system of claim 1, wherein the consistency constraint strategy in the semi-supervised model training module specifically comprises: Respectively applying weak enhancement operation and strong enhancement operation to the same unlabeled sample; inputting the weakly reinforced sample into a network to obtain a prediction distribution, and taking a category corresponding to the prediction distribution as a pseudo tag when the highest confidence coefficient of the prediction distribution exceeds a preset threshold value; inputting the sample subjected to strong enhancement into a network to obtain strong enhancement prediction distribution; the prediction distribution of the strong enhancement sample by the constraint network is consistent with the pseudo tag, and the cross entropy loss between the strong enhancement prediction distribution and the pseudo tag is calculated to be used as the unsupervised consistency loss.
6. The multimodal image fusion nasolacrimal duct obstructive disease diagnosis and classification system of claim 5, wherein the strong enhancement operation specifically comprises: injecting additive noise conforming to Gaussian distribution into the image, and simulating quantum noise interference under the low-dose scanning condition; randomly selecting a rectangular area in a three-dimensional space of an image, setting pixel values in the rectangular area to be zero, and simulating local deletion or shielding of an anatomical structure; And carrying out nonlinear gamma transformation on the image pixel values, and simulating contrast difference caused by reconstruction algorithms of different devices.
7. The multimodal image fusion nasolacrimal duct obstructive disease diagnosis and classification system of claim 1, wherein the semi-supervised model training module is configured with a dynamic weighting mechanism when computing the composite loss function: the composite loss function is formed by weighted summation of supervised loss of marked samples and unsupervised consistency loss of unmarked samples; The method comprises the steps of training, introducing a dynamic weight coefficient which gradually increases along with the increase of training rounds to adjust the duty ratio of the unsupervised consistency loss, giving a lower weight to the unsupervised consistency loss at the initial stage of training, improving the weight of the unsupervised consistency loss at the later stage of training, and correcting the decision boundary of a model by utilizing the data distribution characteristics of the unlabeled data.
8. The multi-modality image fused nasolacrimal duct obstructive disease diagnosis and classification system of claim 1, wherein the intelligent diagnosis and classification output module is further configured with a visual thermodynamic diagram generation unit for: Extracting a spatial attention mask generated by the topological guiding attention unit; Upsampling the spatial attention mask to the same spatial resolution as the original computed tomography data using a bilinear interpolation algorithm; The upsampled mask is mapped to a pseudo-color thermodynamic diagram, which is translucently superimposed on the gray-scale image of the soft tissue channel using Alpha blending techniques, displaying the interior region of the nasolacrimal duct of interest to the model.
9. The multimodal image fusion nasolacrimal duct obstructive disease diagnosis and classification system of claim 1, further comprising an expert collaborative verification mechanism configured to: screening the diagnosis result according to the confidence score output by the model, and marking the case with the confidence lower than a preset threshold as the case to be verified; providing an interactive interface to receive a diagnosis category or an adjustment thermodynamic diagram attention area corrected by a doctor of the case to be checked; Taking the case corrected by the doctor as a difficult case sample and storing the difficult case sample into an incremental training database; And when the incremental training database is used for carrying out fine-tuning training on the network, giving the loss weight of the difficult sample higher than that of the common sample.
10. The nasolacrimal duct obstructive disease diagnosis and grading system of claim 1, wherein the spatial normalization process in the data acquisition and multi-window preprocessing module specifically comprises: Unifying the head computer tomography data to a standard anatomical space through a rigid registration algorithm, and selecting a connecting line from the front joint to the rear joint as a reference plane for rotation correction; and cutting out a voxel block containing a bilateral lacrimal system by taking the midpoint of the connecting line of the inner canthus of the bilateral as the center, resampling the cut voxel block to uniform spatial resolution, and eliminating the layer thickness and resolution difference caused by different acquisition equipment.

Description

Multi-mode image fusion nasolacrimal duct obstructive disease diagnosis and grading system Technical Field The invention relates to the technical field of medical image processing, in particular to a nasolacrimal duct obstructive disease diagnosis and grading system with multi-mode image fusion. Background Nasolacrimal duct obstruction is a common lacrimal passage disease in ophthalmology, and can cause symptoms such as lacrimation, acute and chronic dacryocystitis, and the like, and seriously affects the life quality of patients. In clinical diagnosis and treatment, computer tomography, in particular lacrimal CT imaging, is an important imaging basis for judging the blocking position, property and degree. However, the nasolacrimal duct region is anatomically complex, with a significant difference in density between the bony passages and the internal soft tissues (e.g., mucosa, obstruction, contrast agent). Prior art when processing such image data using a deep learning model, it is common to directly use an image of a single window level setting as input, or simply normalize the CT values. The processing mode is difficult to combine the positioning information of the osseous structure with the texture details of the soft tissue lesion. If the emphasis is on the bone window showing the bony structure, the contrast of the soft tissue in the lumen is too low to distinguish between the thickening of the mucosa and the fluid filling, and if the emphasis is on the soft tissue window showing the soft tissue, the surrounding high-density bone can generate artifacts or mask anatomical boundaries, resulting in the loss of critical diagnostic information of the model in the feature extraction stage. In addition, the nasolacrimal duct is small in space and adjacent to complex tissues such as the maxillary sinus, turbinates, orbits, etc. Most of the current automated diagnostic algorithms adopt a general convolutional neural network architecture, and lack attention mechanisms for lumen topology. When the characteristics are extracted, the network is used for equally processing all areas, and is easy to be interfered by surrounding irrelevant anatomical structures or noise, so that the grading judgment on the stenosis degree is inaccurate. Meanwhile, the accurate labeling of the medical images requires a great deal of expert effort, the existing method relies on full-supervision learning, and under the condition that labeling data are scarce, the generalization capability and robustness of the model are often difficult to meet the requirements of clinical practical application. Disclosure of Invention Aiming at the defects of the prior art, the invention provides a nasolacrimal duct obstructive disease diagnosis and grading system fused by multi-mode images, which solves the problems of low disease diagnosis accuracy and imprecise grading caused by difficulty in simultaneously considering bone structure positioning and soft tissue pathological feature extraction, and easiness in surrounding complex anatomical environment interference and high labeling cost limitation when the nasolacrimal duct CT images are processed in the prior art. In order to achieve the above purpose, the invention is realized by the following technical scheme: the invention provides a nasolacrimal duct obstructive disease diagnosis and grading system with multi-modal image fusion. The system comprises a data acquisition and multi-window preprocessing module, a double-flow attention interaction network construction module, a semi-supervision model training module and an intelligent diagnosis and grading output module. In the data acquisition and multi-window preprocessing link, the system acquires computed tomography data of the head of the subject. In order to eliminate the layer thickness and resolution difference brought by different acquisition equipment, firstly unifying data to a standard anatomical space through a rigid registration algorithm, selecting a connecting line from front joint to rear joint as a reference plane for rotation correction, cutting out a voxel block containing a bilateral lacrimal system by taking the midpoint of the connecting line of the inner canthus of the two sides as the center, and resampling to uniform spatial resolution. Based on the above, the system generates an osseous channel input tensor and a soft tissue channel input tensor based on preset double window parameters. The specific tensor generation logic is that tensor is input to a osseous channel, a voxel value is subtracted by a bone window lower limit value and divided by a bone window width, a result cut is mapped to a 0-1 interval, so that soft tissue background is restrained and bone anatomical structures are highlighted, and the tensor is input to the soft tissue channel, the voxel value is subtracted by the soft tissue window lower limit value and divided by the soft tissue window width, and the result cut is mapped to a 0-1 interval, so that gray