CN-121999202-A - Tooth detection system, method and storage medium
Abstract
The invention discloses a tooth detection system and a tooth detection method storage medium, wherein the detection system comprises a feature extraction module, a feature projection module, a hybrid encoder module and a Transformer decoder module, wherein the feature extraction module, the feature projection module and the hybrid encoder module are sequentially connected, the feature extraction module is connected to the feature extraction module, the hybrid encoder module is connected to the feature projection module and is used for carrying out enhancement processing on multi-scale features with unified dimensions and outputting enhancement features, and the Transformer decoder module is connected to the hybrid encoder module and is used for receiving a fixed number of object queries and outputting a detection result set of teeth through iteration refinement. The invention improves the detail expression and geometric consistency of small teeth, provides favorable bottom layer characteristic support for improving the dependence on global context on fuzzy samples to resolve local ambiguity, and simultaneously aims at anisotropic structures of teeth which are mainly arranged along the dental arches in the horizontal direction, so that adjacent teeth can be inquired more separately and positioned more stably.
Inventors
- ZHU JIANGPING
- ZHANG CHEN
- HUANG RUIJIE
- WU PEI
- PENG YIRAN
Assignees
- 成都苍岷科技有限公司
Dates
- Publication Date
- 20260508
- Application Date
- 20260127
Claims (10)
- 1. A dental target detection system comprising, in order: The device comprises a feature extraction module, a feature extraction module and a display module, wherein the feature extraction module is used for extracting a multi-scale feature map after preprocessing an oral cavity two-dimensional image, and the multi-scale feature map at least comprises a first feature map with a first resolution, a second feature map with a second resolution and a third feature map with a third resolution, wherein the first resolution is higher than the second resolution, and the second resolution is higher than the third resolution; The feature projection module is connected to the feature extraction module and is used for projecting the multi-scale feature map to a unified hidden dimension to obtain multi-scale features with the unified dimension; The hybrid encoder module is connected to the characteristic projection module and used for carrying out enhancement processing on the multi-scale characteristics with unified dimension and outputting enhancement characteristics, and comprises a geometric perception high-resolution attention enhancement module and a quality self-adaptive cross-scale interaction module which are connected in sequence: the geometric perception high-resolution attention enhancement module is used for enhancing the characteristics of the first resolution by combining polar coordinate geometric prior and a high-resolution attention path, so that the detail expression of the small teeth and the consistency of the dental arch structure are enhanced; The quality self-adaptive cross-scale interaction module is used for carrying out enhancement processing on the second resolution characteristic and the third resolution characteristic through self-adaptive characteristic fusion guided by quality estimation and cross-scale global context interaction, suppressing noise and aggregating context information to resolve local ambiguity; The transducer decoder module is connected to the hybrid encoder module and used for receiving a fixed number of object queries and outputting a detection result set of teeth through iterative refinement, and the transducer decoder module integrates a discriminant anisotropic decoding module, namely a DAD module, and improves the separability and the positioning stability of adjacent tooth queries through a learnable anisotropic position coding and query enhancement mechanism.
- 2. The dental target detection system of claim 1, wherein the geometrically aware high resolution attention enhancement module comprises: the geometric prior path is used for calculating the radial distance and angle of the position of the feature map relative to the center of the image to form a polar coordinate geometric prior, and the polar coordinate geometric prior is spliced with the original feature and the Cartesian coordinate and then processed to obtain the geometric enhancement feature; A high-resolution attention path for sequentially executing channel attention and spatial attention on the features of the first resolution to obtain semantic enhancement features; And the fusion unit is used for fusing the geometric enhancement features with the semantic enhancement features and outputting final enhancement high-resolution features.
- 3. The dental target detection system of claim 1, wherein the mass adaptive cross-scale intersection The mutual module comprises: The quality perception module analyzes global statistics of the input features through a quality estimator to generate weight vectors, and performs weighted fusion on outputs of the parallel denoising branch, the contrast balance branch and the detail keeping branch by utilizing the weight vectors to obtain quality enhancement features; The global token interaction unit extracts global tokens from the scale features, and performs cross-scale information exchange through a multi-head self-attention mechanism to obtain updated global tokens containing cross-scale contexts; And the local-global fusion unit maps the updated global token to global guide features, fuses the global guide features with corresponding local features through a scale perception gating mechanism and outputs final enhanced features.
- 4. The dental object detection system of claim 3, wherein the quality estimator analyzes global statistics of the input features to generate the weight vector by performing global average pooling or GAP and/or global maximum pooling or GMP on the multi-scale features E to obtain a global description vector g, and outputting the weight vector w εR 3:g = Concat [ GAP (E), GMP (E) ]; w = Softmax (MLP (g)), wherein Softmax is used to ensure that the three-way weights are non-negative and sum to 1.
- 5. The tooth object detection system as claimed in claim 3, characterized in that the weighted fusion of the outputs of the parallel denoising branch, contrast balancing branch and detail preserving branch by means of the weight vector is in particular eq=w_dn·f_dn (E) +w_bal·f_bal (E) +w_dt·f_dt (E), wherein w_dn, w_bal, w_dt are output by the mass estimator, f_dn is the denoising branch, f_bal is the contrast balancing branch and f_dt is the detail preserving branch.
- 6. The dental target detection system of claim 1, wherein the DAD module is integrated in each or a portion of the layers of the transducer decoder, and comprises an anisotropic position encoding unit that applies a learnable, horizontally and vertically different scale factor to each object query-associated reference point, generates an anisotropic position encoding that adapts to the dental arch horizontal dominant structure, and fuses with the query content features through content-position gating, a query channel attention unit that re-calibrates the channel response of the query features after a self-attention operation to enhance channel discrimination, a multi-scale deformable cross-attention unit that samples and aggregates relevant information from the multi-scale features output by the encoder with updated queries, and a gating feedforward network that incorporates a gating mechanism in the feedforward network that filters the noise information stream.
- 7. The dental target detection system of claim 1, further comprising an output module disposed at each or final layer of the decoder, the output module comprising a classification header and a bounding box regression header, the classification header outputting a confidence that each query belongs to a respective dental class, the regression header outputting bounding box parameters and iteratively updating reference points to form a detection set Y = { (b_i, c_i) } _ { i=1 } { Nq }, where b_i represents an i-th prediction box, c_i represents an i-th prediction class information, and Nq represents a prediction quantity.
- 8. The dental target detection system of claim 1, further comprising a feature pyramid fusion module coupled between the hybrid encoder module and the transducer decoder for cross-layer fusion of the enhanced multi-scale features output by the hybrid encoder module, generating fused multi-scale features, and inputting the fused multi-scale features to the transducer decoder.
- 9. A detection method based on the dental target detection system of any of claims 1-7, comprising the steps of: S1, projecting a multi-scale feature map to a unified hidden dimension based on the multi-scale feature map extracted from an oral cavity two-dimensional image; s2, inputting the multi-scale characteristics with unified dimensions into a hybrid encoder for processing, and outputting enhanced characteristics: S2a, performing enhancement processing on the first resolution characteristic by using a geometric perception high resolution attention module; s2b, enhancing the second resolution characteristic and the third resolution characteristic by using a quality self-adaptive cross-scale interaction module; s5, sending the enhancement features output by the encoder to a transducer decoder integrated with a discriminant anisotropic decoding module, and performing the following processing of performing iterative refinement on a fixed number of object queries, and directly outputting a detection result set containing a tooth bounding box and a class without performing non-maximum suppression post-processing.
- 10. An electronic storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the dental object detection method of claim 9.
Description
Tooth detection system, method and storage medium Technical Field The invention relates to the field of oral cavity image detection, in particular to a tooth detection system, a tooth detection method and a storage medium. Background Tooth detection and numbering are important basic tasks of digital oral analysis, and can be used for orthodontic scheme planning, computer-aided diagnosis and the like. Along with the popularization of digital acquisition equipment, multi-source data such as panoramic X-rays, intraoral cameras, mobile phone photos and the like can appear in clinic/follow-up visit, but real acquisition is often accompanied with degradation such as blurring, low contrast, uneven illumination, metal artifacts and the like, so that the reliability of automatic detection is reduced. At the same time, teeth are typically small objects and densely arranged and irregularly distributed along the arch curve, with boundaries being close to or even overlapping each other, which makes the "local texture/boundary only" approach more prone to error. In the prior art, an early CNN two-stage detector is represented by a Faster R-CNN class, and is generally generated through candidate regions, and then classification and bounding box regression are carried out on the candidate regions, so that the early CNN two-stage detector has stronger regional level discrimination capability, and the framework is directly migrated to an oral cavity image to realize tooth positioning and class prediction in the common practice in the early stage of tooth detection/enumeration task. The method has the problems that the quality degradation of the actual acquisition image of the oral cavity is often accompanied by the quality degradation of blurring, artifacts and the like, meanwhile, the tooth targets are dense and close to each other, the candidate region is more easily confused between the adjacent targets and is more sensitive to the interference of a background structure, and the phenomena of false detection, omission detection, unstable positioning and the like occur. YOLO single-stage detectors are based on end-to-end regression, typically with the help of multi-scale feature pyramids and dense predictions to achieve higher speeds, and are widely used in practical engineering. For the tooth detection scene, the common practice is also to directly migrate YOLO series and improved models thereof to the oral cavity image for detection. The method is mainly similar to a two-stage method, and under low-quality input such as blurring/artifact/low contrast, local textures and boundary clues are weakened, close adjacent teeth are more prone to interfering with each other, so that detection frame positioning fluctuation, adjacent target separation difficulty and background structure suppression are insufficient. The transducer end-to-end set prediction detector (DETR and variants thereof), the DETR and variants thereof model detection as 'set prediction', long-range dependence is modeled through object queries (objects) and global attention mechanisms, the inference stage can generally reduce dependence on traditional NMS, and the method has stronger global context utilization capability, but the method faces the specific scene of 'low quality degradation+small target concentration+arrangement along a dental arch curve', and the problems of inconsistent prediction, insufficient query separability of adjacent teeth and the like can still occur. Thus, there is a need for a robust tooth detection scheme that can simultaneously cope with image quality degradation, small target detail preservation, and dental arch geometry modeling. Disclosure of Invention In order to solve the above problems, the present invention provides a tooth detection system, a method and a storage medium, which firstly proposes and implements a progressive processing architecture of detail and geometry enhancement (GAHR) → "quality and context enhancement (QACI) →" Discriminant and Anisotropic Decoding (DAD) ", GAHR strengthens local detail and spatial structure of tooth features, provides better bottom features for QACI, but may be sensitive to global quality degradation, QACI overcomes quality degradation by using global context to compensate for possible defects existing in GAHR, but provides better bottom features based on GAHR to provide more robust and richer information feature sources for DAD, and the accurate decoding capability of DAD in turn verifies and drives GAHR and QACI to generate enhancement features with more local detail and spatial features and more capability of overcoming the context quality degradation, so that the three form a close relationship of mutual complementation, thereby obtaining a synergistic benefit from local geometric modeling, global self-adaptive context polymerization, and anisotropic object decoding. The invention is realized by the following technical scheme: A dental target detection system comprising, in order: The