CN-122023485-A - Automatic registration method and system for enhanced reality-oriented oral surgery endoscope and CBCT (computed tomography) image based on deep learning

CN122023485ACN 122023485 ACN122023485 ACN 122023485ACN-122023485-A

Abstract

The invention discloses an automatic registration method and an automatic registration system for an augmented reality-oriented oral surgery endoscope and CBCT images based on deep learning, and relates to the technical field of oral surgery augmented reality surgery navigation and image fusion. According to the invention, the cross-modal multi-level features are extracted through a dual-branch neural network system, the full-automatic and high-precision registration without manual marking is realized by combining an improved SuperPoint, transformer architecture, a GAT model and a multi-level optimization algorithm, stable pose restraint is provided for augmented reality registration, registration translation and rotation errors respectively reach the sub-millimeter and sub-degree levels, the anti-interference robustness is high, a preoperative CBCT three-dimensional model and key anatomical structures can be registered and overlapped in an augmented reality mode in an intraoperative endoscope view, the visualization and nerve protection capability of anatomical structures in oral surgery such as a wisdom tooth extraction operation can be remarkably improved, and the method can be further expanded to other endoscopic augmented reality surgery scenes such as laparoscopes and arthroscopes.

Inventors

XU XIANGLIANG
JIANG JUNQI

Assignees

北京大学口腔医学院

Dates

Publication Date: 20260512
Application Date: 20260202

Claims (10)

1. The automatic registration method of the augmented reality-oriented oral surgery endoscope and the CBCT image based on the deep learning is characterized by comprising the following steps of: S1, preprocessing data and constructing a data set, namely performing three-dimensional reconstruction on pre-operation CBCT scanning data to generate a three-dimensional grid model containing a target anatomical structure, extracting three-dimensional anatomical datum points, performing image definition optimization on an endoscope video frame acquired in operation, and intercepting a region of interest (ROI) containing an operation region; S2, cross-modal feature detection and matching, namely inputting the endoscope real-time image processed in the S1 and a rendering reference image which is generated by a CBCT three-dimensional model according to current pose rendering and used for augmented reality registration into a pre-trained deep learning registration model, respectively carrying out multi-scale feature extraction, feature point detection and descriptor generation, and obtaining initial feature matching point pairs between two groups of images through a cross-modal feature matching network; s3, geometric verification and screening, namely carrying out joint discrimination on the space geometric relationship and the semantic relationship on the initial feature matching point pair obtained in the step S2 by using a graph attention network GAT, and screening out a robust matching point pair conforming to geometric consistency; S4, estimating and optimizing the pose, namely estimating the pose with 6 degrees of freedom by adopting an improved random sampling consensus RANSAC algorithm and a EPnP algorithm based on the robust matching point pair screened in the S3 to obtain an initial registration pose, and further, carrying out joint nonlinear optimization on the pose and three-dimensional points of the current frame and the historical multiframe by utilizing a beam method adjustment Bundle Adjustment and combining factor graph optimization based on a sliding window to output the optimized high-precision registration pose; And S5, inputting the optimized pose obtained in the step S4 into an augmented reality rendering engine, performing augmented reality registration and rendering on the CBCT three-dimensional model and the key anatomical structure under a coordinate system consistent with an imaging model of an endoscope camera, displaying the CBCT three-dimensional model and the key anatomical structure on an intraoperative endoscope video picture in a superposition manner in a contour line, semitransparent rendering manner, a label and a safety boundary manner, outputting the result to an augmented reality display terminal to realize intraoperative augmented reality navigation, and repeatedly executing the steps S2 to S4 for a subsequent video frame to maintain the time sequence consistency of the augmented reality registration and triggering an adaptive correction flow when a matching abnormality is detected.
2. The automatic registration method of the depth learning-based augmented reality-oriented oral surgical endoscope and the CBCT image according to claim 1, The three-dimensional grid model comprising the target anatomical structure in the S1 is a three-dimensional grid model comprising alveolar bones, teeth and mandible; the three-dimensional anatomical datum points are bone surface key points, and comprise alveolar ridge tops, tooth root apices and mandibular duct running track points; the image definition optimization adopts defogging algorithm as dark channel prior algorithm; The surgical field includes mandibular third molars and alveolar processes.
3. The automatic registration method of the depth learning-based augmented reality-oriented oral surgical endoscope and the CBCT image according to claim 2, The number of clinical pathology contained in the data set in the S1 is more than 300, and the dividing ratio of the training set, the verification set and the test set is 80% to 10%.
4. The automatic registration method of the enhanced reality-oriented oral surgical endoscope and the CBCT image based on the deep learning of claim 3, The deep learning registration model in S2 includes: A dual-branch feature extraction network, which is to adopt a convolutional neural network with a feature pyramid network FPN structure to respectively carry out multi-scale feature extraction on the endoscope image and the CBCT rendering image; The improved characteristic point detection and descriptor generation network is based on an improved SuperPoint network structure, a space grouping enhanced SGE module and a global attention mechanism GAM are introduced, and characteristic point thermodynamic diagrams and high-dimensional descriptors are output in parallel; And a trans-modal feature matching network adopts a trans-former architecture with 6 layers and 8 heads at each layer, and realizes global matching between endoscope image features and CBCT rendering image features through a self-attention and cross-attention mechanism.
5. The automatic registration method of the depth learning-based augmented reality-oriented oral surgical endoscope and the CBCT image according to claim 4, In S3, the specific process of geometric verification by using the graph attention network GAT is that an initial matching point pair is constructed as a graph structure, nodes in the graph are matching point pairs, edges represent the space adjacent relation between the point pairs, the GAT model comprehensively considers the semantic similarity of descriptors of the nodes, the space geometric constraint and the stability of multi-point matching in a neighborhood, confidence scoring is carried out on each matching, and the matching higher than a set threshold value is screened out to be used as a robust matching point pair.
6. The automatic registration method of the depth learning-based augmented reality-oriented oral surgical endoscope and the CBCT image according to claim 5, The improved RANSAC algorithm in S4 adopts a sampling strategy of self-adaptive weighting on geometric scores based on matching points, and dynamically adjusts an interior point judgment threshold, and the beam method adjustment Bundle Adjustment and factor graph optimization are performed in a sliding window, and simultaneously the camera pose of multi-frame images in the window and the coordinates of three-dimensional road mark points are optimized to maintain the pose consistency on time sequence and inhibit accumulated drift.
7. The automatic registration method of the depth learning-based augmented reality-oriented oral surgical endoscope and the CBCT image according to claim 6, The parameter threshold value of pose initialization is set to be pixel error <2, matching score > 0.7, and dynamic receiving hesitate to advance further number is 1000 steps.
8. The automatic registration method of the depth learning-based augmented reality-oriented oral surgical endoscope and the CBCT image according to claim 7, The self-adaptive correction flow in the S5 comprises that when the number of the matching points is detected to be lower than a threshold value, the re-projection error exceeds the threshold value or the pose track is suddenly changed, the system automatically reduces the confidence threshold value of the characteristic matching, backtracks to the historical reliable pose for re-initialization or triggers local/global secondary optimization.
9. The automatic registration method of the depth learning-based augmented reality-oriented oral surgical endoscope and the CBCT image according to claim 8, The detection number range of the high confidence key points of the first frame of the endoscope is 100-200, and the reprojection error of the initial registration result is 2 pixels and the spatial fraction threshold is more than 0.7.
10. The automatic registration system for the augmented reality-oriented oral surgery endoscope and the CBCT image based on the deep learning is characterized by being used for realizing the automatic registration method for the augmented reality-oriented oral surgery endoscope and the CBCT image based on the deep learning, and comprising a data preprocessing and data set constructing module, a cross-modal feature detecting and matching module, a geometric verification and screening module, a pose estimating and optimizing module and an augmented reality rendering and interactive display module which are sequentially connected; The data preprocessing and data set constructing module is used for carrying out three-dimensional reconstruction on the pre-operation CBCT scanning data, generating a three-dimensional grid model containing a target anatomical structure and extracting three-dimensional anatomical reference points; the method comprises the steps of performing image definition optimization on an endoscope video frame acquired in an operation, and intercepting an interested region ROI containing an operation region, marking the corresponding relation between an endoscope image of a clinical case and a CBCT three-dimensional model, and dividing a training set, a verification set and a test set; The cross-modal feature detection and matching module is used for inputting the endoscope real-time image processed by the data preprocessing and data set construction module and a rendering reference image for augmented reality registration generated by the CBCT three-dimensional model according to the current pose rendering to a pre-trained deep learning registration model, respectively extracting multi-scale features, detecting feature points and generating descriptors, and obtaining initial feature matching point pairs between two groups of images through a cross-modal feature matching network; the geometric verification and screening module is used for carrying out joint discrimination on the space geometric relationship and the semantic relationship on the initial feature matching point pairs obtained by the cross-modal feature detection and matching module by utilizing the graph attention network GAT, and screening out robust matching point pairs conforming to geometric consistency; The pose estimation and optimization module is used for carrying out 6-degree-of-freedom pose estimation by adopting an improved random sampling consensus (RANSAC) algorithm and a EPnP algorithm based on the robust matching point pairs screened by the geometric verification and screening module to obtain an initial registration pose, and further carrying out joint nonlinear optimization on the poses of the current frame and the history multiframe and the three-dimensional points by utilizing a beam method adjustment Bundle Adjustment and combining factor graph optimization based on a sliding window to output the optimized high-precision registration pose; The augmented reality rendering and interactive display module is used for using the optimized pose obtained by the pose estimation and optimization module for augmented reality registration and rendering, superposing and displaying a CBCT three-dimensional model and a key anatomical structure on an endoscope video picture in the form of a contour line, semitransparent rendering, a label or a safety boundary, outputting the image to an augmented reality display terminal to realize intraoperative augmented reality navigation, multiplexing a cross-modal feature detection and matching module to the pose estimation and optimization module for a subsequent video frame to continuously update the augmented reality registration pose, and triggering a self-adaptive correction flow when a matching abnormality is detected.

Description

Automatic registration method and system for enhanced reality-oriented oral surgery endoscope and CBCT (computed tomography) image based on deep learning Technical Field The invention relates to the technical field of navigation and image fusion of oral surgery augmented reality surgery, in particular to an automatic registration method and an automatic registration system of an endoscope and a CBCT image for the augmented reality oral surgery based on deep learning. Background In the mandibular wisdom tooth extraction, the operator must achieve precise spatial positioning of anatomical structures such as teeth, alveolar bone, and lower alveolar nerves. Because of the narrow real-time field of view in the operation and the natural registration difficulty with the CBCT modeling result before the operation, the existing operation is mostly dependent on the space imagination capability of the operator and abundant operation experience. For complex anti-living teeth with tooth roots close to mandibular ducts, researches have been shown in recent years that the visual range of doctors can be enlarged to a certain extent by the aid of an endoscope, and simultaneously, lower alveolar nerves can be observed under the endoscope, but the two-dimensional images of the endoscope and three-dimensional actual operation lack of synchronous/automatic association, so that the space identification of key tissues such as the lower alveolar nerves is affected, and potential nerve injury risks are brought. Therefore, the automatic registration method and system for the endoscope and the CBCT image of the oral surgery based on the deep learning, which are oriented to the augmented reality, do not depend on manual marking and the automatic matching registration technology based on the image characteristics in the whole process, realize the visual navigation in the operation under the fusion of the endoscope and the CBCT model, and are the technical problems which are urgently needed to be solved by the person skilled in the art. Disclosure of Invention In view of the above, the invention provides a depth learning-based automatic registration method and system for an endoscope and CBCT image for an oral surgery of an augmented reality, which does not need manual marking and is based on multi-mode image characteristics, outputs a high-precision pose meeting the registration requirement of the augmented reality, realizes real-time augmented reality superposition and interactive display of a two-dimensional image of the endoscope and a three-dimensional model of the CBCT before the surgery in an augmented reality rendering engine, and remarkably improves the visualization and neuroprotection capability of an anatomical structure in the surgery in the process of the wisdom tooth extraction of the living. In order to achieve the above purpose, the present invention adopts the following technical scheme: The automatic registration method of the augmented reality-oriented oral surgery endoscope and the CBCT image based on the deep learning comprises the following steps: S1, preprocessing data and constructing a data set, namely performing three-dimensional reconstruction on pre-operation CBCT scanning data to generate a three-dimensional grid model containing a target anatomical structure, extracting three-dimensional anatomical datum points, performing image definition optimization on an endoscope video frame acquired in operation, and intercepting a region of interest (ROI) containing an operation region; S2, cross-modal feature detection and matching, namely inputting the endoscope real-time image processed in the S1 and a rendering reference image which is generated by a CBCT three-dimensional model according to current pose rendering and used for augmented reality registration into a pre-trained deep learning registration model, respectively carrying out multi-scale feature extraction, feature point detection and descriptor generation, and obtaining initial feature matching point pairs between two groups of images through a cross-modal feature matching network; s3, geometric verification and screening, namely carrying out joint discrimination on the space geometric relationship and the semantic relationship on the initial feature matching point pair obtained in the step S2 by using a graph attention network GAT, and screening out a robust matching point pair conforming to geometric consistency; S4, estimating and optimizing the pose, namely estimating the pose with 6 degrees of freedom by adopting an improved random sampling consensus RANSAC algorithm and a EPnP algorithm based on the robust matching point pair screened in the S3 to obtain an initial registration pose, and further, carrying out joint nonlinear optimization on the pose and three-dimensional points of the current frame and the historical multiframe by utilizing a beam method adjustment Bundle Adjustment and combining factor graph optimization based on a sliding window