CN-122023473-A - Non-rigid registration method and system based on multi-modal facial feature point cloud

CN122023473ACN 122023473 ACN122023473 ACN 122023473ACN-122023473-A

Abstract

The invention discloses a non-rigid registration method and a system based on multi-mode facial feature point cloud, the method comprises the following steps of collecting three-dimensional point cloud and head CT images of the face of a patient, segmenting the face point cloud from the head CT images by using a U-Net network optimized based on an attention mechanism, setting a combined strategy of PCA-CPD rigid coarse registration and non-rigid CPD fine registration, specifically, analyzing PCA by using a main component to obtain rigid initial transformation parameters, initializing a rigid CPD algorithm and performing coarse registration by using the rigid initial transformation parameters, aligning the spatial orientations and dimensions of the two face point clouds to obtain registered source point clouds, inputting the target point clouds and the registered source point clouds into the non-rigid CPD algorithm, and controlling micro deformation of the point clouds after the coarse registration by using the non-rigid CPD algorithm. By adopting the technical scheme, the accurate correspondence between the preoperative model and the real space is realized by utilizing the multi-mode three-dimensional facial information and the non-rigid registration algorithm.

Inventors

HE JINGYUAN
ZHU HUIYI
CHEN PENG
DENG YONGBING
GUO SONGTAO
WANG YI

Assignees

重庆大学
重庆市急救医疗中心

Dates

Publication Date: 20260512
Application Date: 20250911
Priority Date: 20250908

Claims (10)

1. A non-rigid registration method based on a multi-modal facial feature point cloud, comprising the steps of: Under the visible light condition, a three-dimensional point cloud of the face of a patient is collected by using a structured light camera to serve as a target point cloud, and a head CT image of the patient is scanned; dividing a facial point cloud from a head CT image to serve as a source point cloud by using a U-Net network optimized based on an attention mechanism; based on the three-dimensional point cloud of the face of the patient and the face point cloud obtained by segmentation, a combined strategy of PCA-CPD rigid coarse registration and non-rigid CPD fine registration is set, specifically: Analyzing PCA by using a principal component to obtain a rigid initial transformation parameter; initializing a rigid CPD algorithm by using rigid initial transformation parameters, performing coarse registration, and aligning the spatial orientations and dimensions of the two face point clouds to obtain a registered source point cloud; Inputting the target point cloud and the registered source point cloud into a non-rigid CPD algorithm, performing non-rigid transformation on the input coarse registered source point cloud by using the optimal non-rigid transformation parameters calculated by the non-rigid CPD algorithm, and aligning the micro deformation difference between the input coarse registered source point cloud and the target point cloud.
2. The multi-modal facial feature point cloud-based non-rigid registration method of claim 1, wherein the method of capturing a three-dimensional point cloud of a patient's face using a structured light camera under visible light conditions is: Controlling imaging distance, namely fixing the position of a camera during shooting, controlling the distance between a human face and the camera to be 350mm as a reference distance by using a measuring tool, and controlling the human eyes and a camera lens to be on the same horizontal line; Adopting a multi-measurement shooting strategy, namely carrying out front shooting for each patient for multiple times, and selecting the point cloud with the least cavity and interference points from the front shooting; adjusting camera parameters, namely increasing one-time exposure intensity in the shooting process and splicing two 3D point clouds with different exposure intensities; in the shooting process, the threshold value of the overexposure filtering is adjusted to eliminate flying spots or errors; And (3) utilizing voxel downsampling to control the quantity of shot point clouds to be about 10K, and cutting off obvious background interference points outside the face by using meshlab tools.
3. The non-rigid registration method based on multi-modal facial feature point cloud as claimed in claim 1, wherein the method for segmenting the facial point cloud from the head CT image using the 3D Attention U-Net network optimized based on the Attention mechanism is: The master module AG of 3D Attention U-Net optimized based on the Attention mechanism receives the feature map x l from the encoder in the 3DAttention U-Net network and the context information g from higher layers, and the AG calculates the Attention coefficient alpha using the additive Attention mechanism, weights the input features, highlights the key regions: wherein Wx, wg, ψ are 1x1x1 convolutions, b g 、b ψ is the offset of the corresponding convolutions; adjusting the window width of a head CT image of a patient, inputting a 3D Attention U-Net network to obtain a segmented voxel binary image, marking the coordinates of a voxel marked as 1 as a point, and converting the binary image into a point cloud; In an XYZ space where the point cloud is located, the negative axis of the Y axis is the positive direction of the face, the XOZ plane is traversed, and the point with the minimum coordinate value of the Y axis is taken as the face point; and carrying out data preprocessing by utilizing voxel downsampling and clipping, and directly using the processed point cloud for registration.
4. The non-rigid registration method based on multi-modal facial feature point cloud as claimed in claim 1, wherein the method for segmenting the facial point cloud from the head CT image using the Attention mechanism based optimized 2.5D Attention U-Net network is: reading head CT image data and labels thereof from a head CT image data set in a nii.gz format; The number of CT image slices fed into the 2.5D Attention U-Net network each time is super-parameter, a certain head CT image slice and 2e Zhang Qiepian adjacent to the head CT image slice are fed into the 2.5D Attention U-Net network, and tag data of the head CT image slice are used as tags; expanding the input data according to the dimension of the slice, copying the first e slices and the last e slices of the input data outwards, expanding the slices into (e+112+e, 512), and sending e to the adjacent 2e Zhang Qiepian; Sampling the data to a uniform size (e+112+e, 256), and adjusting the window level and width to (350,1600); 2.5D Attention U-Net network outputs the binary image of the segmented voxels, marks the coordinates of the points of the voxels marked as 1, and converts the binary image into point cloud; And carrying out data preprocessing by adopting voxel downsampling and clipping, and obtaining the facial point cloud meeting the registration requirement after processing.
5. The non-rigid registration method based on multi-modal facial feature point cloud as claimed in claim 1, wherein the method for obtaining rigid initial transformation parameters using principal component analysis PCA is: let the source point cloud be Y ε R M×3 , contain M points, the target point cloud be X ε R N×3 , contain N points, the centralization point cloud be: wherein x i ,y i represents the ith point of the target point cloud and the source point cloud, respectively, mu X ,μ Y represents the centroids of the target point cloud and the source point cloud, respectively, Respectively representing the decentralization of the target point cloud and the source point cloud; Calculating a covariance matrix: wherein C X ,C Y represents covariance matrices of the target point cloud and the source point cloud, respectively; Performing feature decomposition on C X ,C Y : Wherein ,Λ X ＝diag(γ X,1 ,γ X,2 ,γ X,3 …γ X,N ),Λ Y ＝diag(γ Y,1 ,γ Y,2 ,γ Y,3 …γ Y,M ) denotes the eigenvalues of X and Y, U X ,U Y denotes the corresponding eigenvectors, T denotes the transpose; rearranging the feature vectors according to the descending order of the feature values, and taking the largest three components as main components: U Xmain ＝[U X,a ,U X,b ,U X,c ],U Ymain ＝[U Y,a ,U Y,b ,U Y,c ] Wherein U Xmain represents a principal component matrix composed of three feature vectors with the largest target point cloud X, the corresponding feature vector of each component [ U X,a ,U X,b ,U X,c ] is [ gamma X,a ,γ X,b ,γ X,c ]s.t.γ X,a ≥γ X,b ≥γ X,c ], U Ymain represents a principal component matrix formed by three largest eigenvectors of the source point cloud Y, and the eigenvector corresponding to each component [ U Y,a ,U Y,b ,U Y,c ] is [ gamma Y,a ,γ Y,b ,γ Y,c ]s.t.γ Y,a ≥γ Y,b ≥γ Y,c ]; the main components are aligned, so that the direction of the feature vectors is consistent: Calculating an initial rotation transformation matrix R _init : Calculating an initial translation vector t _init : t _init ＝μ X -R _init μ Y 。
6. The non-rigid registration method based on multi-modal facial feature point cloud as claimed in claim 1, wherein the rigid CPD algorithm is initialized and coarse registration is performed using the rigid initial transformation parameters, and the spatial orientations and dimensions of the two facial point clouds are aligned to obtain the registered source point Yun Rigidity of (Y), which comprises the following specific steps: Let the source point cloud be Y, M points, the target point cloud be X, N points, the mixed distribution model p (X) of CPD algorithm is expressed as: wherein ω is used as a weight value for measuring the ratio of the outlier to the outlier, and p (x|m) represents a gaussian distribution: Wherein σ 2 represents the covariance matrix, x represents a point in the target point cloud, and y m represents an mth point in the source point cloud; objective function of rigid CPD algorithm finally established by hopeful likelihood function The method comprises the following steps: Where τ Rigidity of (y m , θ) represents transforming the source dataset with a rigid transformation parameter θ Rigidity of , Let wait if and only if ω=0, x n represents the nth point in the target point cloud, N p represents the number of target point cloud valid points, p old (m|x n ) represents the posterior probability calculated with the old parameter values: Wherein y k represents the kth point of the source point cloud; The rigid CPD transform formula τ Rigidity of (Y) is: τ Rigidity of (Y)=sYR+t, wherein s is a scaling parameter, R is a rotation matrix, and t is a translation parameter; Initializing parameters, wherein R=I, t=0, s=1, 0≤ω≤1, I is an identity matrix; Calculating posterior probability Solving the transformation parameters R, t, s and by substituting posterior probability into objective function And iterating until convergence.
7. The non-rigid registration method based on multi-modal facial feature point clouds as claimed in claim 5, wherein the target point cloud and the registered source point cloud are input into a non-rigid CPD algorithm, and the method for controlling micro deformation of the coarsely registered point cloud using the non-rigid CPD algorithm is as follows: The transformation formula for the non-rigid CPD is: τ Non-rigid (Y)=Y+v(Y), wherein v represents a displacement field or a deformation field; Regularizing the deformation field by using a non-rigid CPD algorithm, and performing an objective function after regularization The method comprises the following steps: Where λ is a regularization factor and phi (v) is a regularization term, the norm of v in Hilbert space Is defined as: wherein K represents the highest order of the derivatives to be included in the norm; In the regenerated hilbert space, the norm of v is defined as: Where G is a Gaussian kernel function, And V and G, respectively, w being spatial frequency domain components; acts like a low pass filter, with Regularization theory defines smoothness as a measure of the "oscillating" behavior of a function, so that the norm of the fourier domain can regularize the smoothness of the kernel function, defining An operator L is defined which defines the position of the operator, so that phi (v) = i Lv i 2 , and brought to the objective function: the solution of the Euler-Lagrange equation using the variational method is: Wherein y represents a point in the source point cloud, lv (y) represents calculating an L operator for the deformation field, and delta is a kernel function; is the companion form of L, and can be obtained by integral transformation: Wherein v (y) represents deforming the source point cloud; thus τ Non-rigid (Y) =y+v (Y) =y+gw; Parameter initialization, W=0, 0≤ω Non-rigid ≤1, β, λ >0, Construction G: Wherein y i represents the source point Yun Di i points, β is a hyper-parameter for controlling the width of the gaussian kernel, and y j represents the j-th point of the source point cloud; Calculating posterior probability Solving the transformation parameters W and W by substituting posterior probability into objective function And iterating until convergence.
8. The non-rigid registration system based on the multi-mode facial feature point cloud is characterized by comprising a data acquisition unit and a processing unit, wherein the data acquisition unit is used for acquiring a three-dimensional point cloud of the face of a patient and a head CT image; the input of the processing unit is connected to the output of the data acquisition unit, and the processing unit performs the method of one of claims 1-7 for non-rigid registration.
9. The multi-modal facial feature point cloud-based non-rigid registration system of claim 8, wherein the processing unit comprises a 3D Attention U-Net network structure, the 3D Attention U-Net network structure comprising: an encoder of a 3D Attention U-Net network comprises 4 scale layers, each layer comprising two 3 x 3 convolutional blocks, and each convolutional block is followed by a Relu activation function, followed by a 2 x 2 max pooling layer for downsampling; The decoder, the decoder route is consistent with the encoder route, include 4 layers, each layer has a step length 2X 2 deconvolution layer for up-sampling, then there are two 3X 3 convolution blocks, each back has a Relu function too; The AG module calculates the additive attention weights of the input encoder characteristics and the high-level characteristics by using three convolution blocks of 1 multiplied by 1, a Relu function and a Sigmoid function, and is used for inhibiting uncorrelated characteristic responses of the input encoder characteristics; and an output layer, wherein the final network predicts the segmentation map by using a convolution block of 1 multiplied by 1 and Relu functions, and the number of output channels is the number of segmentation categories.
10. The multi-modal facial feature point cloud-based non-rigid registration system of claim 8, wherein the processing unit comprises a 2.5D Attention U-Net structure, the 2.5D Attention U-Net structure comprising: The data preprocessing module is used for firstly carrying out grouping processing on CT image data, dividing 3D data into 2D data, inputting a single slice and a plurality of adjacent slices as 2D data, and treating different slices as different channels of the 2D data; an encoder, 2.5D Attention U-Net encoder, comprises 4 scale layers, each scale layer comprises two 3×3 convolution blocks, each convolution block is followed by a Relu activation function, and a 2×2 max pooling layer is used for downsampling; The decoder, the decoder route is consistent with the encoder route, include 4 layers, each layer has a step length 2X 2 deconvolution layer for up-sampling, then connect two 3X 3 convolution blocks, connect a Relu function after each convolution block; a skip connection and attention module AG for splicing and fusing the encoder characteristics of each layer with the decoder characteristics through the skip connection, wherein the AG module calculates the additive attention weights of the input encoder characteristics and the high-layer characteristics by using three 1 x 1 convolution blocks, one Relu function and one Sigmoid function, and is used for inhibiting uncorrelated characteristic responses of the input encoder characteristics; And the output layer, the final network predicts the segmentation map by using a1 multiplied by 1 convolution block and Relu functions, and the number of output channels is the number of segmentation categories.

Description

Non-rigid registration method and system based on multi-modal facial feature point cloud Technical Field The invention belongs to the technical field of mixed reality navigation, and relates to a non-rigid registration method and system based on multi-mode facial feature point cloud. Background Cerebral apoplexy (apoplexy) is one of the main death reasons of residents in China at present, and seriously threatens the life health of people. According to the report of the Wei Jian Committee of 2024, 2024 new stroke patients exceed 330 ten thousand people, and existing stroke patients exceed 2800 ten thousand, wherein hemorrhagic stroke (namely cerebral hemorrhage) accounts for about 20% of all cerebral strokes, and the clinical treatment of the patients is always a difficult problem although the morbidity is relatively low, the progress is rapid, the death rate is high and the disability rate is high. For cerebral hemorrhage patients, when the bleeding amount is large, the intracranial pressure is increased, or the nerve function is deteriorated, the operation treatment (hematoma puncture drainage) should be received as soon as possible, which has extremely important significance for rescuing the rest injury and improving the prognosis of the patients. The rapid and efficient hematoma removal in the golden time window is a technical problem which needs to be overcome clinically in neurosurgery at the present stage. Along with the development of digital technologies such as computer graphics, augmented reality, mixed reality and the like, the fusion of medical images and a real space gradually goes into clinical application, mixed reality navigation is used as a newer operation auxiliary means, CT and MRI three-dimensional image holographic projections before operation of a patient are overlapped on the anatomy of the patient, and a doctor can complete 'perspective' navigation operation by observing the specific anatomy of a specific part. The application prospect in the cerebral hemorrhage minimally invasive puncture is quite considerable. At present, the mixed reality technical method aligns a preoperative image model with the real space of the head of a patient by utilizing a point registration mode, specifically, the method is to paste an invasive marker on the skull or place the invasive marker in the skull and obtain the space coordinates of corresponding marker points after completing multiple head CT scanning, thereby achieving the aim of registering the marker points. Although registration can meet the navigation requirement by using the method, and the accuracy of registration can be ensured, there are some problems in specific applications: 1) The greater the number of CT scans, the greater the economic burden on the patient, the greater the exposure to radiation and the greater the likelihood of a delay in the surgical timing. 2) Preoperative complications such as scalp lacerations, skull bleeding or local infection may occur when invasive markers are implanted. 3) The selection of the point positions has certain difficulty and has certain requirements on the accuracy of doctor operation and the registration process. Therefore, the current point registration method has a certain technical difficulty and risk and is limited in clinical use, so that a safer, faster and noninvasive registration method is required to be found to realize the wide application of the mixed reality navigation system in cerebral hemorrhage surgery. Disclosure of Invention The invention aims to provide a non-rigid registration method and a non-rigid registration system based on multi-mode facial feature point cloud, which can realize accurate correspondence between a preoperative model and a real space without depending on invasive markers, thereby assisting in realizing navigation and positioning of cerebral hemorrhage minimally invasive surgery. In order to achieve the aim, the basic scheme of the invention is that the non-rigid registration method based on the multi-mode facial feature point cloud comprises the following steps: Under the visible light condition, a three-dimensional point cloud of the face of a patient is collected by using a structured light camera to serve as a target point cloud, and a head CT image of the patient is scanned; dividing a facial point cloud from a head CT image to serve as a source point cloud by using a U-Net network optimized based on an attention mechanism; based on the three-dimensional point cloud of the face of the patient and the face point cloud obtained by segmentation, a combined strategy of PCA-CPD rigid coarse registration and non-rigid CPD fine registration is set, specifically: Analyzing PCA by using a principal component to obtain a rigid initial transformation parameter; initializing a rigid CPD algorithm by using rigid initial transformation parameters, performing coarse registration, and aligning the spatial orientations and dimensions of the two face point clouds to obta