CN-116993788-B - Multi-frame intelligent agent-based multi-mode medical image flexible registration method

CN116993788BCN 116993788 BCN116993788 BCN 116993788BCN-116993788-B

Abstract

The invention relates to a multi-modal medical image flexible registration algorithm based on multi-frame intelligent bodies, which is a novel end-to-end multi-modal image registration method based on reinforcement learning, is driven by a soft actor-critique algorithm SAC to train, can simulate the gradual registration process of human experts, and improves the accuracy of high-dimensional registration actions. In view of the extremely complex multi-modal environment, pixel level control in three-dimensional space presents serious challenges, the invention combines reinforcement learning with planner network, encourages artificial agents to explicitly learn more accurate registration actions from already generated state frames, overcomes challenges from multi-modal and high-dimensional continuous action space with the advantages of time-space dimension, avoids introducing deep neural networks with huge parameters, has strong robustness and generalization capability, and can drive models to distort moving images along the correct direction.

Inventors

HU JING
ZHENG PENG
Shuai Zhikun
WU XI

Assignees

成都信息工程大学

Dates

Publication Date: 20260512
Application Date: 20230616

Claims (1)

1. The multi-modal medical image flexible registration method based on multi-frame intelligent agent is characterized in that the registration method provides a multi-frame intelligent agent framework for three-dimensional multi-modal flexible image registration based on soft actor-critique algorithm, the framework introduces the concept of planners in a traditional actor-critique framework, the planners observe a plurality of continuous states at one time and generate a low-dimensional plan, the planners serve as templates for generating high-dimensional actions of actors and participate in critique evaluation, the high-dimensional actions are predicted from multi-frame state fusion in a completely unsupervised mode, and the model can be guided to complete image generation and strategy control, and the method specifically comprises the following steps: step 1, preparing an image dataset to be registered, comprising structural images T1w and T2w, wherein T1w is taken as a fixed image, and T2w is taken as a moving image; Step 2, resampling all the images to be registered to 128 x 128 size through maximum-minimum scaling normalization, setting T1w as a fixed image I f , T2w as a moving image I m , and setting an initial state S t=0 as an image pair { I m ,I f }; Step 3, creating a state queue Q with a size of 3 for storing the state of the last 3 frames [ S t-2 ,S t-1 ,S t ], wherein the queue is initialized by 3S t=0 , and three networks, namely a planner network, an actor network and a criticizer network are constructed, specifically: the planner network consists of 5 downsampling modules, each downsampling module comprises two convolution layers and a residual module, the residual module also consists of two convolution layers, and LeakyReLU activation functions are used for the output of each convolution layer; the actor network comprises 5 upsampling modules, the structure of which is consistent with that of a planner network, an output module is additionally arranged in the actor network, the output of the 5 upsampling module is taken as the input of the actor network, and a deformation field is generated; The commentator network consists of 5 downsampling modules, wherein a spectrum normalization method is applied to smooth the network gradient, an output module is additionally introduced after the 5 th downsampling module, the commentator network consists of a convolution layer, a LeakyReLU activation layer and a linear layer, and tensors output by the convolution layer are flattened and then are fed into the linear layer; step 4, inputting the state queue Q created in the step 3 into a planner network, and simultaneously observing the three latest states by the planner [ S t-2 ,S t-1 ,S t ] and downsampling the three latest states to corresponding low-dimensional representations [ Z t-2 ,Z t-1 ,Z t ], approximating space-time information by using offset of the low-dimensional representations Z t , namely [ Z t -Z t-1 ,Z t-1 -Z t-2 ], calculating a mean value and a variance from the low-dimensional representations obtained by downsampling and the offset to fit and generate Gaussian distribution, wherein a potential plan P t is randomly sampled from the Gaussian distribution, and therefore, the potential plan P t explicitly considers space-time characteristics in the registration process; Inputting a potential plan P t into an actor network, splicing a feature image output by each downsampling module in the planner network into the actor network according to the channel number through jump connection, simultaneously reconstructing a fine high-dimensional deformation field by utilizing detail features and a low-dimensional potential plan in the planner network, applying the high-dimensional deformation field to a moving image I m through a space conversion network to enable the high-dimensional deformation field to flexibly deform to obtain a predicted image, taking mutual information loss of the predicted image and a fixed image I f and a space smoothing item of the deformation field as unsupervised registration loss, calculating similarity measures of an image pair { I m ,I f } and a deformed image pair { I m ,I f } through an unsupervised mode irrelevant domain description operator in a flexible registration environment, and taking the similarity measures as unsupervised feedback rewards; Step 6, inputting the current state S t and the potential plan P t into a critic network, capturing abstract features of the current state S t by the critic network and analyzing the abstract features, splicing the features and the potential plan P t in a channel dimension to be used as fusion features, analyzing the fusion features again through an output module in the critic network and outputting the maximum expected reward for executing decision action in the current state as reinforcement learning loss, and updating a state and state queue Q in a registration environment by using a deformed image pair { I m ,I f }; Step 7, repeating the above actions until the deformed moving image I m and the deformed fixed image I f reach an accurate registration state when the modal irrelevant domain description algorithm value is smaller than a set value of 0.006, and ending the registration of the round; And 8, finally obtaining a registration image which is accurately aligned.

Description

Multi-frame intelligent agent-based multi-mode medical image flexible registration method Technical Field The invention relates to the field of image processing, in particular to a multi-frame intelligent agent-based multi-mode medical image flexible registration method. Background In the medical image field, images of different modes contain different anatomical information, for example, T1 weighted (T1-weighted) magnetic resonance (Magnetic Resonance, MR) imaging can reflect anatomical images conforming to human intuition, T2 weighted (T2-weighted) MR imaging is influenced by water content of lesion tissues and presents a local highlighting feature, so that the difference between normal tissues and lesion tissues can be conveniently and effectively observed, and stable multi-mode image registration can be fused with information of different modes to realize advantage complementation, thereby helping doctors to accurately complete disease diagnosis. Medical image registration is the process of mapping image pairs to the same spatial coordinate system by finding their spatial correspondence between them. Existing methods of medical image registration include traditional feature-based methods and learning-based methods. The conventional image registration method relies on manually extracting features to calculate a similarity measure of an image pair, but when facing a complex multi-modal image, there is a problem in that it is difficult to extract effective features. The learning-based method can automatically capture high-dimensional abstract features, and the robustness of a registration algorithm is enhanced. This approach typically accomplishes image registration in a one-time alignment, but it is difficult to process images that have large deformations or displacements. The conventional flexible image registration problem is expressed as an optimization procedure aimed at facilitating image alignment by maximizing the similarity measure between moving and fixed images, while penalizing non-smooth distortions of the deformation field. However, this approach is generally computationally intensive and time-consuming. Thus, conventional registration methods often have difficulty obtaining good image features to achieve accurate global alignment. In recent years, the continued development of deep learning techniques has prompted advances in the field of registration. Deep learning (DEEP LEARNING, DL) based methods typically use convolutional neural networks to automatically capture high-dimensional abstract features from input image pairs, overcoming the obstacles of manually extracting features and achieving further improvement in registration accuracy, but in order to further improve the capture capability of abstract features, deep network structures are often used, which undoubtedly increases computational performance overhead. The method generally adopts a one-time alignment mode to finish image registration, is difficult to process images with large deformation or displacement, and is extremely complex in calculation. The method based on deep reinforcement learning (Deep Reinforcement Learning, DRL) is characterized in that the registration process is regarded as a Markov decision process, so that DRL intelligent bodies can freely explore in a predefined action space, experience is accumulated in the continuous trial-and-error process, and finally corresponding decisions can be quickly made according to a specific environment to finish high-precision registration, so that each step of image registration can be visually presented, and the defect of opacity of a deep learning method is overcome. However, due to the difficult training nature and multi-modal complexity of DRL, and the huge number of parameters, complex spatial mapping relationships, and high-dimensional and continuous motion space of flexible image registration, application to flexible image registration remains highly challenging. The prior art scheme has the defects (especially for the invention) that the prior deep learning method usually adopts a one-time alignment mode to finish image registration, is difficult to process images with large deformation or displacement, and the generated results have poor interpretability, and the extraction of high-dimensional abstract features usually requires a network with deeper layers and more complex structures, thus increasing the instability of training and putting higher requirements on calculation performance. Because of the hard-to-train nature of reinforcement learning and the complexity of the multimodal registration environment, these methods generally can only handle multimodal rigid transformations, and reinforcement learning framework designers tend to ignore the extraction of temporal features. The depth reinforcement learning method mostly needs huge memory space to save history experience, and how to simplify the registration model and improve the registration efficiency becomes a