CN-116563167-B - Face image reconstruction method, system, device and medium based on self-adaptive texture and frequency domain perception

CN116563167BCN 116563167 BCN116563167 BCN 116563167BCN-116563167-B

Abstract

The invention discloses a face image reconstruction method, a system, a device and a medium based on self-adaptive texture and frequency domain perception, and performing self-attention calculation on the coarse features through the self-adaptive texture sensing module, searching for a part with strong correlation in the fine features by using the attention moment array, and performing fine self-attention calculation. And the multi-dimensional perception module is adopted to enhance the expression capability of the features in the cross-space and the cross-channel. And meanwhile, a multi-frequency fusion module based on wavelet transformation is used for fusing the middle-low frequency characteristics from the encoder and the high-frequency characteristics from the decoder. The self-adaptive texture sensing module enables the model to restore a complex area in the image more finely, the multidimensional sensing module enhances important channel information in the features, the multifrequency fusion module eliminates high-frequency noise in the original features, and simultaneously the effective features of middle and low frequencies are fused with the restored high-frequency details, so that the restoration capacity and generalization capacity of the model to the face image are improved.

Inventors

SHI JINGANG
LI GUANXIN
WANG JIAYIN
LU LEI
WANG PING

Assignees

西安交通大学

Dates

Publication Date: 20260512
Application Date: 20230526

Claims (9)

1. The face image reconstruction method based on the self-adaptive texture and the frequency domain perception is characterized by comprising the following steps of: the face image data set to be reconstructed after cutting is obtained and divided into a training set and a testing set; constructing a self-adaptive texture sensing module, and carrying out fine feature recovery on a complex texture region in a human face; The method comprises the steps that based on a multidimensional sensing module formed by alternating depth convolution and channel attention, characteristics are enhanced from dimensions of cross space and cross channels, and characteristics with multidimensional sensing are obtained; designing a multi-frequency fusion module, extracting middle-low frequency information in the characteristics from an encoder stage based on wavelet transformation, and utilizing wavelet inverse transformation to fuse the middle-low frequency information and high frequency information of a decoder stage to obtain output characteristics with full frequency domain information after fusion; Stacking the self-adaptive texture sensing module, the multidimensional sensing module and the multi-frequency fusion module based on the U-shaped structure to obtain a face image super-resolution reconstruction model based on self-adaptive texture and frequency domain sensing; training the self-adaptive texture and frequency domain perception face super-resolution reconstruction model based on the training set to obtain an optimized model; Performing super-resolution reconstruction of the face image on the test set based on the optimized model to obtain a recovered high-resolution face image; The multi-dimensional sensing module is formed by two branches of alternating depth convolution and channel attention, wherein the depth convolution gathers and fuses the features in the space, the channel attention evaluates the weight of each channel through a gating mechanism, the corresponding weight is multiplied by each channel feature, and the alternating depth convolution and channel attention transfer the important information into and out of the multiple dimensions.
2. The face image reconstruction method based on adaptive texture and frequency domain sensing according to claim 1, wherein the dividing ratio of the training set and the testing set is random division or division with a preset ratio, and the preset ratio is set for a person.
3. The face image reconstruction method based on adaptive texture and frequency domain sensing according to claim 1, wherein the constructing an adaptive texture sensing module performs fine feature recovery on a complex texture region in a face, specifically: Searching a region with complex textures from the coarse textures, and carrying out fine self-attention calculation on the region with the complex textures to realize a recovery effect with texture perception, wherein the self-adaptive texture perception module comprises self-attention branches of coarse features and self-attention branches of fine features, and the self-attention branch calculation formula of the coarse features is as follows: Wherein, the Representing the input characteristic map of the object, And The downsampling and linear projection operations are respectively performed, In order to activate the function, Searching the image for texture self-attention, and then taking the front with maximum attention value from the searching image And mapping the areas into the fine feature map, and searching out the fine features of the corresponding areas to calculate multi-head self-attention: Wherein, the The fine features that are searched out are represented, Representing accumulation along the penultimate dimension, Is shown in Maximum of (3) The value of the one of the values, Searching a corresponding region for coordinates according to the maximum value; representing the number of areas to be searched, And Respectively input features Is the height and width of (2); Representing multi-head self-attention calculation; the calculation results of the two branches are combined, specifically: Wherein, the Representing an up-sampling operation.
4. The face image reconstruction method based on adaptive texture and frequency domain sensing according to claim 1, wherein the extracting the middle and low frequency information in the features from the encoder stage based on wavelet transform and fusing the middle and low frequency information and the high frequency information of the decoder stage by wavelet inverse transform to obtain the output features with full frequency domain information after fusion, comprises: the multi-frequency fusion module performs wavelet transformation on the output characteristics of each stage of the encoder to extract middle-low frequency information corresponding to the characteristics, performs wavelet inverse transformation on the basis of the middle-low frequency information and the output characteristics of each stage of the decoder to obtain the fused output characteristics with full frequency domain information, and the specific process expression is as follows: Wherein, the 、、 Respectively representing the output characteristics of the i-th stage encoder, the output characteristics of the decoder and the output characteristics of the multi-frequency fusion module, And Representing the wavelet transform and the inverse wavelet transform respectively, , And Respectively represent slave characteristics Three middle-low frequency characteristics extracted from the method, Representing a1 x 1 convolution operation.
5. The face image reconstruction method based on self-adaptive texture and frequency domain sensing according to claim 1, wherein the self-adaptive texture sensing module, the multidimensional sensing module and the multi-frequency fusion module are stacked based on the U-shaped structure to obtain a face image super-resolution reconstruction model based on self-adaptive texture and frequency domain sensing, specifically, an encoder and a decoder are respectively formed by stacking 4 self-adaptive texture sensing modules, 1 self-adaptive texture sensing module with residual error is connected between the encoder and the decoder, and the multi-frequency fusion module is used for connecting output characteristics of the encoder and the decoder of the same level.
6. The face image reconstruction method based on self-adaptive texture and frequency domain perception according to claim 1, wherein the self-adaptive texture and frequency domain perception face super-resolution reconstruction model is trained based on a training set to obtain an optimized model, specifically, whether a loss function of the self-adaptive texture and frequency domain perception face super-resolution reconstruction model is lower than a set threshold value or not is judged in the training process, or whether the cycle times of the self-adaptive texture and frequency domain perception face super-resolution reconstruction model reach a maximum value or not is judged, if the loss function is smaller than the set threshold value or the cycle times reach the maximum value, training is stopped, and the optimized model is obtained.
7. The face image reconstruction system based on the self-adaptive texture and the frequency domain perception is characterized by comprising the following components: the dividing module acquires and divides the face image data set to be reconstructed after cutting into a training set and a testing set; the first construction module is used for carrying out fine feature recovery on a complex texture region in the face; The multidimensional sensing module strengthens the characteristics from the dimensions of the cross space and the cross channel to obtain the characteristics with multidimensional sensing; The multi-frequency fusion module extracts middle-low frequency information in the characteristics from the encoder stage based on wavelet transformation, and utilizes wavelet inverse transformation to fuse the middle-low frequency information and the high frequency information of the decoder stage to obtain the output characteristics with full frequency domain information after fusion; The second construction module stacks the self-adaptive texture sensing module, the multidimensional sensing module and the multi-frequency fusion module based on the U-shaped structure to obtain a face image super-resolution reconstruction model based on self-adaptive texture and frequency domain sensing; the training module is used for training the self-adaptive texture and frequency domain perception face super-resolution reconstruction model based on the training set to obtain an optimized model; the reconstruction module is used for carrying out super-resolution reconstruction on the face image of the test set based on the optimized model, and obtaining a recovered high-resolution face image; The multi-dimensional sensing module is used for strengthening the characteristics from the dimensions of the cross space and the cross channels to obtain the characteristics with multi-dimensional sensing, and specifically comprises gathering important information from the dimensions of the space and the dimensions of the channels, wherein the multi-dimensional sensing module is composed of two branches of alternating depth convolution and channel attention, the depth convolution gathers and fuses the characteristics in the space, the channel attention evaluates the weight of each channel through a gating mechanism, the corresponding weight is multiplied by each channel characteristic, and the alternating depth convolution and channel attention transfer the important information into and out of the multiple dimensions.
8. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1-6 when the computer program is executed.
9. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method according to any of claims 1-6.

Description

Face image reconstruction method, system, device and medium based on self-adaptive texture and frequency domain perception Technical Field The invention belongs to the technical field of artificial intelligence and deep learning, and relates to a face image reconstruction method, a system, a device and a medium based on self-adaptive texture and frequency domain perception. Background A deep convolutional neural network is a neural network used for image processing, computer vision, and pattern recognition. The feedforward neural network is characterized in that the convolution layer, the pooling layer and other technologies are used for extracting the characteristics of the image, so that the tasks of image classification, object detection, face recognition and the like are realized. The super-resolution reconstruction of face images is an image processing technique that converts low-resolution images into high-resolution images. In applications such as face recognition, video surveillance, etc., it is necessary to convert a low resolution image into a high resolution image for better image quality and higher accuracy. The traditional face image super-resolution reconstruction method mainly comprises an interpolation method and an edge-based method. Interpolation is to obtain a high resolution image by pixel interpolation of a low resolution image, but this method causes blurring and distortion of the image. The edge-based method reconstructs a low-resolution image based on structural information of the edge of the face image, but in some complex cases, the method has some limitations. In recent years, a face image super-resolution reconstruction method based on deep learning is becoming the mainstream. Among them, a method using a deep convolutional neural network is widely used. The deep convolutional neural network can extract high-level features from a low-resolution image, extract shallow-to-deep features in an original image by stacking a plurality of convolutional blocks with different functions, and realize low-resolution-to-high-resolution image conversion according to the extracted features. However, in some complex cases, the convolutional neural network is limited by the size of the convolutional kernel, and cannot well build dependence on global features, so that the network cannot accurately identify and reconstruct details and textures of an image, and image distortion and blurring are caused. Due to the availability of the transducer model in the field of natural language processing, it has recently been applied to the field of computer vision as well. Compared with the traditional deep convolutional neural network, the model based on the transducer has the advantages of long-range dependence modeling capability, global perception capability, expandability, independence from space position and the like, obtains better performance in a plurality of computer vision tasks, and brings new possibility for research and application in the field of computer vision. The Transformer model divides the image into equally sized blocks and then builds global dependencies through a self-attention mechanism, but this leads to the problem of large model computation. The Swin transform model provides a method for calculating self-attention in a local window, and interaction of adjacent information is realized in a sliding window mode, so that the method has excellent effects in a plurality of computer vision tasks. However, when processing a task of reconstructing a super-resolution of a face, the Swin transducer model cannot model non-local dependencies in the face image well, and the fixed rectangular window may make textures in the window have no relevance, which causes artifacts to be generated on a reconstruction result and affects the recovery effect of complex areas in the face. Disclosure of Invention The invention aims to solve the problems that in the prior art, the calculated amount of a transducer model is large, a Swin transducer model cannot well model non-local dependence in a face image, and a fixed rectangular window can enable textures in the window to have no relevance, and provides a face image reconstruction method, a system, a device and a medium based on self-adaptive textures and frequency domain perception. In order to achieve the purpose, the invention is realized by adopting the following technical scheme: a face image reconstruction method based on self-adaptive texture and frequency domain perception comprises the following steps: the face image data set to be reconstructed after cutting is obtained and divided into a training set and a testing set; constructing a self-adaptive texture sensing module, and carrying out fine feature recovery on a complex texture region in a human face; The method comprises the steps that based on a multidimensional sensing module formed by alternating depth convolution and channel attention, characteristics are enhanced from dimensions of cross spa