CN-121481964-B - Lesion segmentation method, system, equipment and medium based on ear-nose-throat endoscope image

CN121481964BCN 121481964 BCN121481964 BCN 121481964BCN-121481964-B

Abstract

The invention relates to a focus segmentation method, a system, equipment and a medium based on ear-nose-throat endoscope images, which are used for synchronously acquiring double-spectrum images through time sequence triggering, solving the problem of anatomical structure shielding caused by mucus flow, utilizing a pixel gradient modeling dynamic mucus displacement field to eliminate misjudgment of a traditional segmentation method on static focus and dynamic secretion, adopting a deformable convolution layer to correct spatial offset of white light and narrow-band images, overcoming mismatch of multi-mode characteristics caused by optical scattering, and finally synchronously generating a real-time operation navigation mark and a clinical treatment scheme based on topological attribute of a focus probability map, thereby realizing a closed loop link from image analysis to diagnosis and treatment decision. The method integrates the functions of mucus interference suppression, cross-mode accurate registration and real-time diagnosis and treatment assistance in a breakthrough manner, and obviously improves the focus recognition accuracy and the clinical operation efficiency of the endoscope image.

Inventors

REN LIHUA
YOU CHUNYAN
ZHANG MINGMING

Assignees

河北省中医院(河北中医药大学第一附属医院、河北省青少年儿童脊柱侧弯防控中心)

Dates

Publication Date: 20260508
Application Date: 20251107

Claims (9)

1. The focus segmentation method based on the ear-nose-throat endoscope image is characterized by comprising the following steps of: S1, synchronously acquiring time sequence trigger signals of an endoscope light source, and capturing time sequence aligned double-spectrum images by alternately driving a white light source and a narrow-band filter device to generate a time sequence aligned white light image sequence and a narrow-band imaging image; s2, performing mucus displacement modeling on the white light image sequence, and calculating a displacement vector of dynamic mucus through pixel gradient changes of adjacent frames to generate a mucus motion vector field; S3, performing mask restoration on the mucus motion vector field and the white light image sequence, segmenting a mucus coverage area through a displacement amplitude threshold value, and reconstructing occluded anatomical structure pixels to generate an optimized white light image without mucus interference; S4, performing cross-modal feature registration on the optimized white light image and the narrow-band imaging image, respectively extracting a white light feature image and a narrow-band feature image, correcting spatial offset among the feature images through a deformable convolution layer, and generating a registered multi-modal feature image; s5, performing focus segmentation on the multi-mode feature map, generating a focus probability map through feature channel stitching and a U-shaped decoding network, and outputting a real-time navigation mark overlapped on an endoscope video based on the space coordinates and the area attribute of a connected domain in the focus probability map; wherein the real-time navigation mark is used for indicating anatomical location of focus and grading of malignancy probability; the mask repair is performed on the mucus motion vector field and the white light image sequence, the mucus coverage area is segmented through a displacement amplitude threshold value, and the occluded anatomical structure pixels are reconstructed, so as to generate an optimized white light image without mucus interference, which comprises the following steps: S31, performing amplitude self-adaptive segmentation on the mucus motion vector field, and generating a dynamic threshold mask boundary related to the adhesion strength of mucus by calculating the space-time accumulation amount of motion vectors in a local neighborhood; S32, performing anatomical structure reconstruction on the coverage area of the dynamic threshold mask boundary, and recovering the continuity of the mucosal gland trend of the shielded area through a neighborhood healthy tissue texture propagation algorithm to generate a preliminarily reconstructed white light reconstruction image; And S33, carrying out edge consistency optimization on the white light reconstruction image, eliminating step artifacts of a texture transition region through multi-scale gradient fusion, and generating an optimized white light image with continuous anatomical structure.
2. The method according to claim 1, wherein S1 comprises: s11, performing light source time-sharing triggering processing on a control signal of the endoscope light source, and generating a time-sharing imaging control instruction without spectral crosstalk by alternately switching driving current phases of a white light source and a narrow-band filter; s12, inputting the time-sharing imaging control instruction into an image sensor driving circuit of an endoscope imaging system, performing frame synchronization processing on photoelectric conversion original data captured by a sensor, and generating a double-spectrum image with aligned exposure time sequence by compensating initial time offset of white light and narrow-band exposure; And S13, performing motion artifact detection processing on the double-spectrum image, removing a distorted frame caused by organ tremor through cross-correlation analysis of adjacent frames of the white light channel and the narrow-band channel, and generating a white light image sequence and a narrow-band imaging image with aligned time sequences.
3. The method according to claim 1, wherein S2 comprises: S21, performing mucus rheological characteristic analysis on the white light image sequence, and calculating constitutive equation parameters of mucus shear stress and strain rate through a non-rigid transformation relation of pixel gradient fields between adjacent frames to generate mucus viscoelasticity parameters; S22, modeling a displacement field of the viscoelasticity parameters, solving a displacement vector equation through viscous fluid dynamics constraint, substituting the viscoelasticity parameters into a preset Oldroyd-B model to calculate stress tensor distribution of the viscoelasticity flow, and generating a dynamic viscoelasticity displacement field; s23, noise suppression is carried out on the dynamic mucus displacement field, anatomical structure motion and mucus flow components are separated through low-pass filtering, abnormal pulsation signals with the frequency higher than a physiological motion threshold value are filtered, and a denoised mucus motion vector field is generated.
4. The method according to claim 1, wherein S4 comprises: s41, extracting multi-resolution features of the optimized white light image, decomposing by a convolution pyramid to generate a multi-scale feature response map covering a macro anatomical structure to a micro texture, and generating a cross-scale white light feature map; s42, carrying out frequency domain feature enhancement on the narrow-band imaging image, and enhancing the blood vessel morphology corresponding to 415nm and 540nm wave bands and the frequency domain response of the mucosa surface gland opening through band-pass filtering to generate an enhanced narrow-band feature map; and S43, performing spatial offset correction on the white light feature map and the enhanced narrow-band feature map, and learning feature deformation parameters between white light and narrow-band features through a deformable convolution kernel to generate a registered multi-mode feature map.
5. The method of claim 4, wherein the characteristic deformation parameter is calculated by the formula: ; ; Wherein, the Is used as a characteristic deformation parameter, and the deformation parameter, For the coordinates of the location of the object, For a3 x 3 grid of samples, For bilinear interpolation weights, In the case of a narrow-band feature, In order for the offset to be able to be learned, Is a white light feature.
6. The method according to any one of claims 1-5, wherein S5 comprises: S51, performing characteristic channel splicing processing on the multi-mode characteristic images, generating fusion characteristic tensors by splicing white light and narrow-band characteristic images along a channel axis, inputting the fusion characteristic tensors into a U-shaped decoding network for up-sampling and characteristic fusion, and generating a focus probability image; s52, carrying out connected domain attribute analysis processing on the focus probability map, and calculating the centroid space coordinate and pixel area of each connected domain by scanning the connected domain larger than a set threshold value in the probability map to generate quantized attributes of focus positioning and size; and S53, generating a real-time navigation mark based on the quantized attribute, mapping focus centroid coordinates to an endoscope video coordinate system through three-dimensional projection transformation, and generating the real-time navigation mark of the endoscope video.
7. A lesion segmentation system based on an ear-nose-throat endoscopic image, the system comprising: The dual-spectrum image synchronous acquisition module is used for synchronously acquiring time sequence trigger signals of the endoscope light source, capturing time sequence aligned dual-spectrum images by alternately driving the white light source and the narrow-band filter device, and generating time sequence aligned white light image sequences and narrow-band imaging images; the mucus displacement modeling module is used for performing mucus displacement modeling on the white light image sequence, calculating a displacement vector of dynamic mucus through pixel gradient changes of adjacent frames, and generating a mucus motion vector field; the mucus mask restoration and image optimization module is used for carrying out mask restoration on the mucus motion vector field and the white light image sequence, dividing a mucus coverage area through a displacement amplitude threshold value and reconstructing the blocked anatomical structure pixels to generate an optimized white light image without mucus interference; The cross-modal feature registration module is used for performing cross-modal feature registration on the optimized white light image and the narrow-band imaging image, respectively extracting a white light feature image and a narrow-band feature image, correcting spatial offset among the feature images through a deformable convolution layer, and generating a registered multi-modal feature image; The focus segmentation and diagnosis scheme generation module is used for performing focus segmentation on the multi-mode feature map, generating a focus probability map through feature channel stitching and a U-shaped decoding network, and outputting a real-time navigation mark overlapped on an endoscope video based on the space coordinates and the area attribute of a connected domain in the focus probability map; wherein the real-time navigation mark is used for indicating anatomical location of focus and grading of malignancy probability; the mask repair is performed on the mucus motion vector field and the white light image sequence, the mucus coverage area is segmented through a displacement amplitude threshold value, and the occluded anatomical structure pixels are reconstructed, so as to generate an optimized white light image without mucus interference, which comprises the following steps: Performing amplitude self-adaptive segmentation on the mucus motion vector field, and generating a dynamic threshold mask boundary related to the mucus adhesion strength by calculating the space-time accumulation amount of motion vectors in a local neighborhood; Performing anatomical structure reconstruction on the coverage area of the dynamic threshold mask boundary, and recovering the continuity of the mucosal gland trend of the shielded area through a neighborhood healthy tissue texture propagation algorithm to generate a preliminarily reconstructed white light reconstruction image; and carrying out edge consistency optimization on the white light reconstruction image, eliminating step artifacts of a texture transition region through multi-scale gradient fusion, and generating an optimized white light image with continuous anatomical structure.
8. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the method of any one of claims 1 to 6 when executing the computer program.
9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the method of any one of claims 1 to 6.

Description

Lesion segmentation method, system, equipment and medium based on ear-nose-throat endoscope image Technical Field The invention relates to the technical field of medical image processing, in particular to a focus segmentation method, a system, equipment and a medium based on an ear-nose-throat endoscope image. Background The ear-nose-throat endoscope imaging technology is used as a key means of clinical diagnosis, and visual information of a mucous membrane surface layer and a blood vessel structure is provided for doctors through cooperative imaging of white light and a narrow-band spectrum. The traditional diagnosis and treatment process relies on naked eyes of doctors to read real-time endoscope images, and combines anatomical structure information of white light imaging and blood vessel enhancement characteristics of narrow-band imaging to realize the positioning and property judgment of focus. However, the complex physiological environment of the ear-nose-throat cavity, especially the problems of dynamic mucus coverage, organ peristalsis and spatial mismatch of multi-modal images, severely restrict the accuracy of diagnosis and the reliability of real-time surgical assistance. In the prior art, endoscope image processing faces triple bottlenecks, namely that firstly, anatomical structures in white light images are partially or completely blocked due to the fluidity and adhesiveness of mucus in a cavity, a traditional threshold segmentation method cannot distinguish dynamic mucus from static focuses, so that focus omission rate is remarkably increased, secondly, white light and a narrow-band image generate sub-pixel level spatial offset due to the difference of optical scattering characteristics, an existing registration algorithm (such as affine transformation) is difficult to compensate nonlinear dislocation caused by deformation of mucous membrane, so that multi-mode feature fusion is invalid, thirdly, focus segmentation results and operation navigation and treatment schemes are generated into independent systems, doctors need to label focus positions manually and inquire guidelines, decision opportunities in operation are delayed, and subjective errors are easily introduced. Disclosure of Invention Based on the above, the invention aims to provide a focus segmentation method, a system, equipment and a medium based on ear-nose-throat endoscope images, which can synchronously solve the problems of mucus interference elimination, multi-mode accurate registration and real-time diagnosis and treatment closed loop. The invention adopts the following scheme: In a first aspect, the present invention provides a lesion segmentation method based on an ear-nose-throat endoscopic image, comprising the steps of: S1, synchronously acquiring time sequence trigger signals of an endoscope light source, and capturing time sequence aligned double-spectrum images by alternately driving a white light source and a narrow-band filter device to generate a time sequence aligned white light image sequence and a narrow-band imaging image; S2, performing mucus displacement modeling on the white light image sequence, and calculating a displacement vector of dynamic mucus through pixel gradient changes of adjacent frames to generate a mucus motion vector field; S3, performing mask restoration on the mucus motion vector field and the white light image sequence, dividing a mucus coverage area through a displacement amplitude threshold value, and reconstructing the blocked anatomical structure pixels to generate an optimized white light image without mucus interference; S4, performing cross-modal feature registration on the optimized white light image and the narrow-band imaging image, respectively extracting a white light feature image and a narrow-band feature image, and correcting spatial offset between the feature images through a deformable convolution layer to generate a registered multi-modal feature image; S5, performing focus segmentation on the multi-mode feature map, generating a focus probability map through feature channel stitching and a U-shaped decoding network, and outputting a real-time navigation mark overlapped on an endoscope video and a treatment decision scheme matched with a clinical guideline based on the space coordinates and the area attribute of a connected domain in the focus probability map; The real-time navigation mark is used for indicating the anatomical location of the focus and the grading of the malignancy probability, and the treatment decision scheme is used for indicating the surgical or pharmaceutical intervention scheme matched with the focus type and size. In one embodiment, the method for segmenting a focus based on an ear-nose-throat endoscope image provided by the invention specifically comprises the following steps: s11, performing light source time-sharing triggering processing on a control signal of an endoscope light source, and generating a time-sharing imaging control instruction w