CN-121563848-B - Image restoration method and device, electronic equipment and storage medium

CN121563848BCN 121563848 BCN121563848 BCN 121563848BCN-121563848-B

Abstract

The embodiment of the application provides an image restoration method, an image restoration device, electronic equipment and a storage medium, which relate to the technical field of image processing and comprise the steps of acquiring a target image to be restored; the method comprises the steps of calling a pre-trained initial repair network to conduct image degradation treatment on a target image to obtain a degradation result of the target image, conducting repair optimization on the degradation result according to a preset repair optimization mode to obtain an image repair result of the target image, wherein the preset repair optimization mode comprises the steps of conducting treatment on the degradation result based on a diffusion network, conducting fusion treatment through a pre-trained image fusion network by taking a prediction result related to a first task and a prediction result related to a second task as contents to be fused, wherein the first task is used for the optimized prediction result, and the second task is used for outputting the prediction result with identity information. The scheme can improve the image quality and keep the identity information for image restoration.

Inventors

LIU LIANG
YANG YUE
YAN CONGQUAN
YANG PENGJU
XIE DI

Assignees

杭州海康威视数字技术股份有限公司

Dates

Publication Date: 20260508
Application Date: 20260121

Claims (16)

1. A method of image restoration, the method comprising: acquiring a target image to be repaired; invoking a pre-trained initial repair network to perform image degradation treatment on the target image to obtain a degradation result of the target image; Performing repair optimization on the degradation removal result according to a preset repair optimization mode to obtain an image repair result of the target image; The predetermined repair optimization mode comprises the following steps: processing the degradation-removing result based on a pre-trained diffusion network for executing a first task and a second task, and performing fusion processing through a pre-trained image fusion network by taking a prediction result about the first task and a prediction result about the second task, which are obtained when the diffusion network processes the degradation-removing result of the target image, as contents to be fused; The first task is used for carrying out quality optimization on the image input into the diffusion network so as to output an optimized prediction result, and the second task is used for outputting the prediction result with the identity information in the input image; the training mode of the diffusion network comprises the following steps: Processing the second sample image through the trained initial repair network to obtain a degradation removal result of the second sample image; According to a preset processing mode, carrying out prediction processing on a first task and prediction processing on a second task on the degradation removal result of the second sample image through the diffusion network so as to obtain a prediction result on the first task of each time step and a prediction result on the second task of each time step; Calculating a loss of the diffusion network with respect to the first task based on the prediction result of each time step with respect to the first task and the truth image of the second sample image, and calculating a loss of the diffusion network with respect to the second task based on the prediction result of each time step with respect to the second task and the truth image of the second sample image; Judging whether the diffusion network is converged based on the loss of the diffusion network about the first task and the loss of the diffusion network about the second task, and if not, adjusting network parameters of the diffusion network; The preset processing mode comprises the steps of constructing input content which is needed by a first task and is related to the time step, based on an intermediate result of the time step, a leachable parameter set for the diffusion network, a degradation removing result of a second sample image and a set second reference prompt, and inputting the constructed input content to the diffusion network for executing the first task, wherein the second reference prompt is used for enabling the diffusion network to understand the fine degree of degradation removing of the second sample image by an initial repair network; Based on the intermediate result of the time step, the learnable parameters set for the diffusion network, the depth map of the degradation-free result of the second sample image, and the identity feature and the face description information of the object in the degradation-free result of the second sample image, input contents required for the second task for the time step are constructed, and the constructed input contents are input to the diffusion network for execution of the second task.
2. The method according to claim 1, wherein performing repair optimization on the degradation removal result according to a predetermined repair optimization manner to obtain an image repair result of the target image includes: taking the first time step as the current time step, constructing input content related to a first task and input content related to a second task of the current time step of the diffusion network based on an intermediate result of the current time step and a degradation removal result of the target image, and inputting the constructed input content into the diffusion network to obtain a prediction result related to the first task and a prediction result related to the second task of the current time step; Invoking an image fusion network based on the obtained prediction result of the current time step about the first task and the obtained prediction result of the second task, so that the image fusion network performs result fusion processing on the obtained prediction result of the current time step about the first task and the obtained prediction result of the second task, and obtaining a fusion result of the current time step; And if not, calculating the intermediate result of the next time step of the current time step based on the obtained fusion result of the current time step, taking the next time step as a new current time step, and returning to the step of calling the image fusion network based on the obtained prediction result of the current time step about the first task and the prediction result of the second task.
3. The method according to claim 1 or 2, wherein the initial repair network is trained based on a first sample image and a truth image of the first sample image, the truth image of any sample image being an image that is an expected repair result of the sample image; The first sample image is an image obtained after image degradation processing is carried out on a preset initial image, and a true image of the first sample image is an image obtained by converting an image spectrum of the initial image after low-pass filtering according to a target frequency domain threshold set for the first sample image; Correspondingly, the invoking the pre-trained initial repair network to perform image degradation processing on the target image to obtain a degradation result of the target image includes: and calling the initial repair network based on a specified frequency domain threshold value to perform image degradation processing on the target image based on the specified frequency domain threshold value to obtain a degradation result of the target image.
4. A method according to claim 3, wherein the training mode of the initial repair network comprises: Inputting vector features of the first sample image and the target frequency domain threshold value into the initial restoration network, so that the initial restoration network takes the vector features of the target frequency domain threshold value as a first reference prompt, and performing image degradation processing on the first sample image to obtain a degradation result of the first sample image; Calculating the loss of the initial repair network based on the degradation result of the first sample image and the truth image of the first sample image; responding to the loss of the initial repair network to represent that the initial repair network is not converged, and adjusting network parameters of the initial repair network; Correspondingly, the step of calling the initial repair network based on a specified frequency domain threshold to perform image degradation processing on the target image based on the specified frequency domain threshold to obtain a degradation result of the target image includes: and inputting the vector features of the target image and the specified frequency domain threshold value into the initial restoration network, so that the initial restoration network takes the vector features of the specified frequency domain threshold value as a first reference prompt, and performing image degradation processing on the target image to obtain a degradation result of the target image.
5. The method of claim 4, wherein the initial repair network comprises a generator and a arbiter; Inputting the vector features of the first sample image and the target frequency domain threshold value into the initial restoration network, so that the initial restoration network uses the vector features of the target frequency domain threshold value as a first reference prompt, performing image degradation processing on the first sample image to obtain a degradation result of the first sample image, and the method comprises the following steps: Inputting the vectors of the first sample image and the target frequency domain threshold value into a generator in the initial restoration network, so that the generator in the initial restoration network takes the vector characteristic of the target frequency domain threshold value as a first reference prompt, and performing image degradation processing on the first sample image to obtain a degradation result of the first sample image; The training mode of the initial repair network further comprises the following steps: constructing input content of a discriminator in the initial repair network based on the degradation result of the first sample image and a truth image of the first sample image; Inputting the input content of the constructed discriminator into the discriminator so that the discriminator performs discrimination processing on the received input content to obtain a discrimination result of the first sample image; the calculating the loss of the initial repair network based on the de-degradation result of the first sample image and the truth image of the first sample image comprises: Performing loss calculation according to the difference between the degradation result of the first sample image and the truth image of the first sample image to obtain a first type of loss; performing loss calculation based on the discrimination result of the first sample image to obtain second class loss; a loss of the initial repair network is determined based on the first type of loss and the second type of loss of the initial repair network.
6. The method of claim 5, wherein constructing the input content of the arbiter in the initial repair network based on the de-degradation result of the first sample image and the truth image of the first sample image comprises: processing the target frequency domain threshold according to the learnable parameters set by the discriminator to obtain a priori feature corresponding to the target frequency domain threshold; Splicing the prior feature corresponding to the target frequency domain threshold value and the facial region feature of the degradation removal result of the first sample image to obtain a first spliced feature corresponding to the first sample image, and splicing the prior feature corresponding to the target frequency domain threshold value and the facial region feature of the true image of the first sample image to obtain a second spliced feature corresponding to the first sample image; inputting the input content of the constructed discriminator into the discriminator so that the discriminator performs discrimination processing on the received input content to obtain a discrimination result of the first sample image, wherein the discrimination result comprises: Inputting a first splicing characteristic corresponding to the first sample image into the discriminator so that the discriminator performs discrimination processing on the received first splicing characteristic to obtain a discrimination result of the first sample image about the first splicing characteristic; Inputting the second stitching characteristic of the first sample image into the discriminator so that the discriminator performs discrimination processing on the received second stitching characteristic to obtain a discrimination result of the first sample image about the second stitching characteristic; The step of performing loss calculation based on the discrimination result of the first sample image to obtain a second type of loss includes: And carrying out loss calculation based on the discrimination result of the first sample image about the first splicing characteristic and the discrimination result of the first sample image about the second splicing characteristic to obtain second-class loss.
7. The method of claim 6, wherein the performing a loss calculation based on the discrimination result of the first sample image with respect to the first stitching feature and the discrimination result of the first sample image with respect to the second stitching feature to obtain a second type of loss comprises: Extracting result content of prior features corresponding to the target frequency domain threshold from the discrimination results of the first sample image about the first splicing features, and scoring based on the extracted result content to obtain a first discrimination score of the first sample image; Extracting result content of prior features corresponding to the target frequency domain threshold from the discrimination results of the first sample image and the second splicing features, and scoring based on the extracted result content to obtain a second discrimination score of the first sample image; determining a countermeasures loss based on a first discrimination score of the first sample image; Determining a discriminator loss based on the first discrimination score and the second discrimination score of the first sample image.
8. The method of claim 1, wherein calculating the loss of the diffusion network with respect to the first task based on the prediction result of each time step with respect to the first task and the truth image of the second sample image comprises: For each time step, calculating the loss of the time step about the first task according to the prediction result of the time step about the first task, the intermediate result of the time step, the time parameter of the time step and the hidden space characteristics of the truth image of the second sample image; determining a loss of the diffusion network with respect to the first task based on the losses of the time steps with respect to the first task; and/or the number of the groups of groups, The calculating the loss of the diffusion network about the second task based on the prediction result about the second task of each time step and the truth image of the second sample image comprises: For each time step, calculating the loss of the time step about the second task according to the prediction result of the time step about the second task, the intermediate result of the time step, the time parameter of the time step and the hidden space characteristics of the reference image corresponding to the second sample image; determining a loss of the diffusion network with respect to the second task based on the losses of the time steps with respect to the second task; the obtaining mode of the reference image corresponding to the second sample image includes: inputting the vector features of the second sample image and the frequency domain threshold value into a trained initial repair network aiming at each frequency domain threshold value in the set multiple frequency domain threshold values so as to obtain a degradation removal result under the frequency domain threshold value; And selecting images which meet similar conditions with the degradation result of the second sample image and the true value image of the second sample image from the degradation result under each frequency domain threshold value, and obtaining a reference image corresponding to the second sample image.
9. The method of claim 1, wherein the flooding network comprises a backbone network; The training mode of the diffusion network further comprises the following steps: for each time step, obtaining output data obtained by the backbone network for processing at least one input content under the time step, wherein the at least one input content is at least one of input content required by a first task and input content required by a second task; For the obtained output data, splitting partial data corresponding to the learnable parameters of the diffusion network to obtain face output data, and splitting partial data corresponding to the intermediate result of the time step to obtain trunk output data; the method comprises the steps of obtaining a mean value and a variance of a time step, obtaining face output data, carrying out linear processing on the obtained face output data to obtain the mean value and the variance of the time step, and converting the obtained trunk output data into an intermediate image of the time step, calculating loss about face output according to the obtained variance and the mean value, and calculating loss about a trunk network according to the obtained variance, the mean value and the intermediate image; The determining whether the flooding network converges based on the loss of the flooding network with respect to the first task and the loss of the flooding network with respect to the second task includes: Based on the loss of the flooding network about the first task, the loss of the second task, and the calculated loss about the backbone network and the loss about the facial output, it is determined whether the flooding network converges.
10. The method of claim 1 or 2, wherein the image fusion network is a reinforcement learning network comprising a performer network and a reviewer network; the training method for the image fusion network comprises the following steps: acquiring a third sample image; the first time step is taken as a current time step, based on an intermediate result of the current time step and a third sample image, input content of the current time step of the trained diffusion network is constructed, the constructed input content is input into the diffusion network, and a prediction result of the current time step corresponding to the third sample image is obtained; Invoking the performer network to generate a fusion strategy of the current time step corresponding to the third sample image based on the prediction result of the current time step corresponding to the third sample image; Based on the generated fusion strategy, carrying out result fusion processing on the prediction result of the current time step corresponding to the third sample image to obtain a fusion result of the current time step corresponding to the third sample image; decoding to obtain a repair image of the current time step corresponding to the third sample image based on the fusion result of the current time step corresponding to the third sample image, and calculating a reward value of the obtained repair image of the current time step; based on the fusion strategy of the current time step corresponding to the third sample image, constructing the input condition of the current time step of the commentary network, and inputting the constructed input condition into the commentary network to obtain the Q value corresponding to the input condition of the current time step; If the current time step is not the last time step, calculating the intermediate result of the next time step of the current time step based on the fusion result of the current time step corresponding to the third sample image, taking the next time step as a new current time step, and returning to the step of constructing the input content of the current time step of the diffusion network based on the intermediate result of the current time step and the third sample image, or returning to the step of acquiring the third sample image if the current time step is the last time step; And responding to the condition of meeting the loss calculation, calculating the loss of the performer network and the loss of the commentator network based on the Q value corresponding to the input condition of each time step and the rewarding value of the repair image of each time step, and adjusting the network parameters of the reinforcement learning network when judging that the reinforcement learning network is not converged based on the calculated loss of the performer network and the calculated loss of the commentator network.
11. The method of claim 10, wherein the training the image fusion network further comprises: responding to the input conditions of two adjacent time steps, constructing a state information set corresponding to a target time step based on the input conditions of the two adjacent time steps, the rewarding value of the repair image of the previous time step and the fusion strategy of the previous time step, and caching the state information set corresponding to the constructed target time step into a preset cache space, wherein the target time step is the previous time step in the two adjacent time steps, The step of responding to the condition of meeting the loss calculation, calculating the loss of the performer network and the loss of the commentator network based on the Q value corresponding to the input condition of each time step and the rewarding value of the repair image of each time step, comprising: Selecting a specified number of state information sets from the predetermined cache space in response to the loss calculation condition being satisfied, and calculating the loss of the evaluator network based on the Q value corresponding to the input condition of the target time step in each selected state information set and the reward value of the repair image of the target time step, and the Q value corresponding to the input condition of the later time step in each selected state information set; and calculating the loss of the performer network based on the Q value corresponding to the fusion strategy of the target time step.
12. The method of claim 10, wherein the third sample image is an acquired image in need of image restoration; and/or the number of the groups of groups, The step of establishing the input condition of the current time step of the commentary network based on the fusion strategy of the current time step corresponding to the third sample image comprises the following steps: Constructing an input condition of the current time step of the criticism network based on a fusion strategy of the current time step corresponding to the third sample image, a current intermediate result, a measurement result of the current time step about the first task, a prediction result about the second task and a depth map of a degradation-free result of the third sample image and the third sample image; and/or the number of the groups of groups, Decoding to obtain a repair image of the current time step based on the fusion result of the current time step, wherein the method comprises the following steps: And based on the fusion result of the current time step, reversely pushing an initial diffusion intermediate result corresponding to the current third sample image, and based on the initial diffusion intermediate result obtained by reversely pushing, decoding a repair image of the current time step.
13. The method of claim 10, wherein said calculating the prize value for the resulting repair image for the current time step comprises: Determining facial features in a repair image of the current time step corresponding to the third sample image, and determining a first type of reward value based on the similarity of the determined facial features and facial features in a first specified image, wherein the first specified image is an image which contains an object of the third sample image and has no repair requirement; Invoking a first large model to perform abnormality detection on the repair image of the current time step corresponding to the third sample image, and determining a second class rewarding value based on an abnormality detection result output by the first large model; Invoking a second large model to perform content consistency analysis on the repair image of the current time step corresponding to the third sample image and a second designated image, and determining a third type of rewarding value based on the analysis result of the second large model, wherein the second designated image is a degradation removal result of the third sample image; and determining the reward value of the obtained repair image of the current time step based on the first class reward value, the second class reward value and the third class reward value.
14. An image restoration device, the device comprising: the acquisition module is used for acquiring a target image to be repaired; the invoking module is used for invoking a pre-trained initial repair network to conduct image degradation processing on the target image so as to obtain a degradation result of the target image; the restoration module is used for carrying out restoration optimization on the degradation removal result according to a preset restoration optimization mode to obtain an image restoration result of the target image; The predetermined repair optimization mode comprises the following steps: processing the degradation-removing result based on a pre-trained diffusion network for executing a first task and a second task, and performing fusion processing through a pre-trained image fusion network by taking a prediction result about the first task and a prediction result about the second task, which are obtained when the diffusion network processes the degradation-removing result of the target image, as contents to be fused; The first task is used for carrying out quality optimization on the image input into the diffusion network so as to output an optimized prediction result, and the second task is used for outputting the prediction result with the identity information in the input image; the training mode of the diffusion network comprises the following steps: Processing the second sample image through the trained initial repair network to obtain a degradation removal result of the second sample image; According to a preset processing mode, carrying out prediction processing on a first task and prediction processing on a second task on the degradation removal result of the second sample image through the diffusion network so as to obtain a prediction result on the first task of each time step and a prediction result on the second task of each time step; Calculating a loss of the diffusion network with respect to the first task based on the prediction result of each time step with respect to the first task and the truth image of the second sample image, and calculating a loss of the diffusion network with respect to the second task based on the prediction result of each time step with respect to the second task and the truth image of the second sample image; Judging whether the diffusion network is converged based on the loss of the diffusion network about the first task and the loss of the diffusion network about the second task, and if not, adjusting network parameters of the diffusion network; The preset processing mode comprises the steps of constructing input content which is needed by a first task and is related to the time step, based on an intermediate result of the time step, a leachable parameter set for the diffusion network, a degradation removing result of a second sample image and a set second reference prompt, and inputting the constructed input content to the diffusion network for executing the first task, wherein the second reference prompt is used for enabling the diffusion network to understand the fine degree of degradation removing of the second sample image by an initial repair network; Based on the intermediate result of the time step, the learnable parameters set for the diffusion network, the depth map of the degradation-free result of the second sample image, and the identity feature and the face description information of the object in the degradation-free result of the second sample image, input contents required for the second task for the time step are constructed, and the constructed input contents are input to the diffusion network for execution of the second task.
15. An electronic device, comprising: A memory for storing a computer program; a processor for implementing the method of any of claims 1-13 when executing a program stored on a memory.
16. A computer readable storage medium, characterized in that the computer readable storage medium has stored therein a computer program which, when executed by a processor, implements the method of any of claims 1-13.

Description

Image restoration method and device, electronic equipment and storage medium Technical Field The present application relates to the field of image processing technologies, and in particular, to an image restoration method, an image restoration device, an electronic device, and a storage medium. Background In some scenes, due to factors such as long-distance shooting and equipment limitation, the acquired images generally have the characteristics of small face ratio of people and low overall resolution, and in order to improve the image quality (such as image definition), the images need to be subjected to image restoration. Currently, image restoration is generally performed by using an image restoration model, where the image restoration model may be trained by using a truth image (such as a high-quality image) and a simulation image corresponding to the truth image (for example, a low-quality image obtained by performing simulation degradation by using the truth image), that is, the simulation image is used as an input of the image restoration model, and the corresponding truth image is used for supervision, so as to train the image restoration model. However, the degree of abnormality and the shooting mode of the image to be repaired are not controllable, so that an image obtained by repairing the image to be repaired based on the trained image repairing model may generate image abnormality or generate a large difference between identity information and identity information in the image to be repaired. Therefore, how to improve the image quality and maintain the identity information for image restoration is a urgent problem to be solved. Disclosure of Invention The embodiment of the application aims to provide an image restoration method, an image restoration device, electronic equipment and a storage medium, so that the image quality is improved and identity information is reserved for image restoration. The specific technical scheme is as follows: In a first aspect, an embodiment of the present application provides an image restoration method, where the method includes: acquiring a target image to be repaired; invoking a pre-trained initial repair network to perform image degradation treatment on the target image to obtain a degradation result of the target image; Performing repair optimization on the degradation removal result according to a preset repair optimization mode to obtain an image repair result of the target image; The predetermined repair optimization mode comprises the following steps: processing the degradation-removing result based on a pre-trained diffusion network for executing a first task and a second task, and performing fusion processing through a pre-trained image fusion network by taking a prediction result about the first task and a prediction result about the second task, which are obtained when the diffusion network processes the degradation-removing result of the target image, as contents to be fused; the first task is used for carrying out quality optimization on the image input into the diffusion network so as to output an optimized prediction result, and the second task is used for outputting the prediction result with the identity information in the input image aiming at the image input into the diffusion network. In a second aspect, an embodiment of the present application provides an image restoration apparatus, including: the acquisition module is used for acquiring a target image to be repaired; the invoking module is used for invoking a pre-trained initial repair network to conduct image degradation processing on the target image so as to obtain a degradation result of the target image; the restoration module is used for carrying out restoration optimization on the degradation removal result according to a preset restoration optimization mode to obtain an image restoration result of the target image; The predetermined repair optimization mode comprises the following steps: processing the degradation-removing result based on a pre-trained diffusion network for executing a first task and a second task, and performing fusion processing through a pre-trained image fusion network by taking a prediction result about the first task and a prediction result about the second task, which are obtained when the diffusion network processes the degradation-removing result of the target image, as contents to be fused; the first task is used for carrying out quality optimization on the image input into the diffusion network so as to output an optimized prediction result, and the second task is used for outputting the prediction result with the identity information in the input image aiming at the image input into the diffusion network. In a third aspect, an embodiment of the present application provides an electronic device, including: A memory for storing a computer program; And the processor is used for realizing any one of the image restoration methods when executing the programs