CN-121837302-B - Tooth image segmentation method combined with SAM2 algorithm
Abstract
A tooth image segmentation method combining with SAM2 algorithm includes such steps as obtaining the dental image sequence to be segmented, standardized pretreatment, inputting the pretreated image to deep learning model, building model core encoder based on SAM2, introducing dual-branch feature enhancement structure to the model, coding and local feature enhancement by encoder, decoding the features after coding and local feature enhancement, fusing the features, restoring the spatial resolution of image, generating segmentation mask, completing the construction of model, inputting data set to model, and using the trained model for the segmentation of dental image to be tested.
Inventors
- HU YUNHUA
- LIAO JUNYI
- AN JIAN
- ZHANG ZHENG
Assignees
- 西南石油大学
Dates
- Publication Date
- 20260512
- Application Date
- 20260313
Claims (7)
- 1. A dental image segmentation method combined with SAM2 algorithm, comprising the steps of: step S1, acquiring a dental image sequence to be segmented, and carrying out standardized pretreatment; S2, inputting the preprocessed image into a deep learning model, and carrying out coding and local feature enhancement through an encoder; S3, decoding the characteristics after coding and local characteristic enhancement and fusing the characteristics in a trans-scale mode, recovering the spatial resolution of the image, generating a segmentation mask, and completing the construction of a model; s4, inputting a data set into the model for supervision training; S5, using the trained model for segmentation of the dental image to be detected; The core encoder of the deep learning model in the step S2 is constructed based on SAM2, and a TSA module and a HCPM module are arranged at the tail end of the encoder in parallel in a branching mode; The TSA module is used for enhancing the space perception and long-range dependent modeling capacity of the model, and the space coordinate information of each characteristic point is explicitly embedded by introducing position codes, so that the geometrical distribution and relative position relation of teeth in the jawbone can be accurately understood, and the core calculation formula is shown as follows: Wherein Q, K, V represents a query, a key, and a value matrix, respectively; Represents a scaling factor, attention () represents an Attention mechanism function, softmax () represents a normalization function; representing a transpose of the key matrix; The HCPM module comprises a depth convolution and a standard convolution which are arranged in parallel and used for extracting the characteristics of different receptive fields, wherein the depth convolution is used for capturing the local details of enamel textures and tooth root forms, and the standard convolution is used for integrating the characteristics of an alveolar bone density distribution area; The core calculation formula of the HCPM module is shown as follows: In the formula, Representing the features after standard convolution processing; representing a standard convolution operation; Representing the characteristics after the expansion convolution treatment; representing a dilation convolution operation; Representing fusion features obtained by adding standard convolution and expansion convolution features; representing an input feature; representing the attention weight generated by the BAM submodule; Representing a sigmoid activation function; Representing a point-by-point convolution; point-by-point convolution representing vertical compression; Point-by-point convolution representing horizontal compression; Representing element weights; representing HCPM output characteristics.
- 2. The method for segmenting a dental image using a SAM2 algorithm according to claim 1, wherein the dental image is a cone beam image.
- 3. The method for segmenting the dental image in combination with the SAM2 algorithm of claim 1, wherein the decoding and upsampling stage of the deep learning model in step S2 is further provided with a AMFB module, wherein the AMFB module comprises two parts of AFFM and CMFF which are arranged in sequence.
- 4. The method for segmenting the dental image by combining the SAM2 algorithm of claim 3, wherein AFFM is used for receiving characteristic inputs of different scales, and carrying out weighted fusion on the characteristic inputs to obtain fused characteristics, and a core calculation formula is shown as follows: In the formula, And Respectively expressing AFFM to learn to obtain self-adaptive weights for the high-level features and the low-level features; Representing element-by-element weighting; And Respectively representing high-level semantic features and low-level detail features.
- 5. The method for segmenting the dental image in combination with the SAM2 algorithm of claim 4, wherein CMFF is arranged in AMFB in a manner of being repeatedly stacked three times for multi-scale convolution and attention enhancement of the fusion feature processed by AFFM, and the core calculation formula of CMFF is as follows: In the formula, Representing the output characteristics after AFFM fusion; () Representing an adaptive feature fusion module; Represent the first Features of the secondary MSCB output; () Represent the first A sub-multi-scale convolution block; Representing a stacking sequence number; Representing spatial attention weights generated via the BAM inside CMFF; Features representing MSCB outputs; Represent the first A feature weighted by attention; Representing AMFB output characteristics.
- 6. The method for segmenting the dental image by combining the SAM2 algorithm according to claim 3, wherein the feature input received by the AMFB module is processed and added by the AFFM and CMFF parts in sequence, and the expression capability of the feature is further improved by channel grouping and shuffle operation, and the operation principle of the process is as follows: In the formula, Representing an input feature; () Representing a standard convolution operation; representing a dilation convolution operation; representing features after the Shuffle operation; () A shuffle operation is represented for enhancing inter-channel interactions.
- 7. The method for segmenting a dental image in combination with the SAM2 algorithm according to claim 1, wherein the loss function of the supervised training in the step S4 is a weighted combination loss function mul_loss, which is composed of four loss functions BCELoss, ioULoss, diceLoss and BoundaryLoss, and is represented by the following formula: In the formula, Representing a weighted combined loss function mul_loss; Representing a loss function BCELoss; representing a loss function IoULoss; representing a loss function DiceLoss; representing a loss function BoundaryLoss; The loss function BCELoss is used for measuring a classification error between the pixel probability output by the model and the real label, and is specifically shown as the following formula: In the formula, Representing the total number of pixels; Representing boundary gradients of model predictive labels; representing boundary gradients of the real labels; The loss function IoULoss is used to control the degree of overlap of the predicted region and the real region, encouraging the network to match the real label over the overall segmentation shape, as shown in the following equation: The loss function DiceLoss is used to reflect the region overlap condition, and is specifically shown in the following formula: In the formula, A smoothing term indicating prevention of zero removal errors; the loss function BoundaryLoss is used to calculate the boundary gradient difference between the prediction and the real label, and is specifically shown in the following formula: In the formula, Representing the Sobel gradient operator.
Description
Tooth image segmentation method combined with SAM2 algorithm Technical Field The invention relates to the technical field of image processing, in particular to a tooth image segmentation method combined with a SAM2 algorithm. Background In digital diagnosis and treatment of stomatology, accurate segmentation of teeth and alveolar bone is a key step in orthodontic, implant and maxillofacial surgery planning. CBCT is widely used because of its high spatial resolution, but existing segmentation techniques face significant challenges in processing such images. Firstly, the complex anatomical structure and the blurred boundary are the main pain points, the gray values of the alveolar bone and the tooth root in the CBCT image are extremely close, the periodontal ligament gap is extremely fine, the prior art cannot easily distinguish the blurred boundary, the tooth root and the bone are adhered in the segmentation result, or the fine bone structure of the crest of the alveolar bone is lost for segmenting the tooth. Secondly, long-range dependence and insufficient space perception are achieved, the teeth have specific anatomical arrangement rules, the crowns and the root tips of the single teeth are large in space span, the traditional convolutional neural network is limited by a local receptive field, and the long-range geometrical dependence is difficult to capture, so that the segmentation result of the slender tooth roots is often broken, or overlapping areas of adjacent teeth cannot be correctly distinguished. In addition, the existing large model has poor adaptability to medical features, and although the SAM and other models have strong generalization capability, the SAM and other models lack targeted optimization on dental anatomical features, the existing model is directly applied, and the edge details are often rough, so that the requirements of clinical high precision cannot be met. Disclosure of Invention In view of this, the invention provides a tooth image segmentation method combined with SAM2 algorithm, which trains and obtains an end-to-end deep learning Model (Dento SAM, DE-SAM) for accurate segmentation of teeth and alveolar bones, a core encoder is constructed based on SAM2 (SEGMENT ANYTHING Model 2), and global semantic information and local structural details in dental images are effectively captured through multi-layer feature extraction. At the end of the encoder, the model introduces a double-branch characteristic enhancement structure, namely, on one hand, a TSA (Token Spatial Attention) module embeds an explicit modeling space geometric relation and long-range dependence through position coding and correlation to enhance the overall perception capability of complex tooth morphology and alveolar bone boundary, and on the other hand, a HCPM (Hybrid Convolutional Perception Module) module combines multi-scale convolution, deconvolution and batch normalization operation and fuses BAM (Bilinear Attention Module) attention mechanisms to realize efficient aggregation of local detail enhancement and context information and highlight tooth root microstructure and alveolar bone density change characteristics. In the decoding and up-sampling stage, the model further carries out self-adaptive fusion on the multi-layer characteristics of the SAM encoder and up-sampling characteristics of corresponding scales through AMFB (Adaptive MultiScale Fusion Block), dynamically balances the contribution of the characteristics of different scales, realizes the collaborative modeling of global semantics and local details, thereby improving the precision and boundary consistency of tooth and alveolar bone segmentation, and executing tooth image segmentation based on the model. In order to solve at least one technical problem, the technical scheme provided by the invention is that a tooth image segmentation method combined with a SAM2 algorithm comprises the following steps: step S1, acquiring a dental image sequence to be segmented, and carrying out standardized pretreatment; S2, inputting the preprocessed image into a deep learning model, and carrying out coding and local feature enhancement through an encoder; S3, decoding the characteristics after coding and local characteristic enhancement and fusing the characteristics in a trans-scale mode, recovering the spatial resolution of the image, generating a segmentation mask, and completing the construction of a model; s4, inputting a data set into the model for supervision training; S5, using the trained model for segmentation of the dental image to be detected; The core encoder of the deep learning model in the step S2 is constructed based on SAM2, and a TSA module and a HCPM module are arranged at the tail end of the encoder in parallel in a branching mode; The TSA module is used for enhancing the space perception and long-range dependent modeling capacity of the model, and the space coordinate information of each characteristic point is explicitly embedded by i