US-12626328-B2 - Training method and apparatus for image processing network, computer device, and storage medium
Abstract
An image processing method is performed by a computer device, which includes: converting initial image data into super-resolution image data using a trained image processing network, a resolution of the super-resolution image data being greater than or equal to a target resolution; performing image quality enhancement processing on the super-resolution image data using the trained image processing network, to obtain first enhanced image data; when there is a face image in the first enhanced image data, performing face enhancement on the face image in the first enhanced image data using the trained image processing network to obtain second enhanced image data; and performing image sharpening processing on the second enhanced image data using the trained image processing network to obtain sharpened image data.
Inventors
- Shichang SHI
- Fei Huang
- Chao Hua
- Wei Xiong
- Liang Yang
Assignees
- TENCENT TECHNOLOGY (SHENZHEN) COMPANY LIMITED
Dates
- Publication Date
- 20260512
- Application Date
- 20230608
- Priority Date
- 20211012
Claims (17)
- 1 . An image processing method performed by a computer device, and the method comprising: converting initial image data into super-resolution image data using a trained image processing network, a resolution of the super-resolution image data being greater than or equal to a target resolution; performing image quality enhancement processing on the super-resolution image data using the trained image processing network, to obtain first enhanced image data; when there is a face image in the first enhanced image data, performing face enhancement on the face image in the first enhanced image data using the trained image processing network to obtain second enhanced image data; extracting high-frequency image information in the second enhanced image data using a sharpening network; generating a sharpening mask for the second enhanced image data by the sharpening network; extracting sharpened image information in the second enhanced image data according to the sharpening mask; predicting a first weighted weight for the high-frequency image information, a second weighted weight for the sharpened image information, and a third weighted weight for the second enhanced image data by the sharpening network; and performing a weighted sum of the high-frequency image information, the sharpened image information, and the second enhanced image data according to the first weighted weight, the second weighted weight, and the third weighted weight, to obtain the sharpened image data.
- 2 . The method according to claim 1 , wherein the trained image processing network comprises a super-resolution network, and the converting initial image data into super-resolution image data using a trained image processing network comprises: detecting a resolution of the initial image data; and when the resolution of the initial image data is less than the target resolution, adjusting the resolution of the initial image data to the target resolution using the super-resolution network, to obtain the super-resolution image data.
- 3 . The method according to claim 1 , wherein the trained image processing network comprises a face enhancement network; and the performing face enhancement on the face image in the first enhanced image data using the trained image processing network to obtain second enhanced image data comprises: performing face detection on the first enhanced image data using the face enhancement network; and when there is a face image in the first enhanced image data, performing face enhancement processing on the face image in the first enhanced image data using the face enhancement network, to obtain the second enhanced image data.
- 4 . The method according to claim 3 , wherein the face enhancement network comprises a face detection network, a face enhancement sub-network, and a face fusion network; the performing face enhancement on the face image in the first enhanced image data using the trained image processing network to obtain second enhanced image data comprises: cutting out the face image from the first enhanced image data using the face detection network, to obtain a cut-out face image; performing the face enhancement processing on the cut-out face image using the face enhancement sub-network, to obtain an enhanced face image; generating a face fusion mask using the face fusion network; and performing image fusion processing on the first enhanced image data and the enhanced face image according to the face fusion mask, to obtain the second enhanced image data.
- 5 . The method according to claim 1 , wherein the initial image data is any one of a plurality of image frames obtained by segmenting video data; and the method further comprises: generating optimized video data of the video data according to the sharpened image data corresponding to each image frame in the plurality of image frames; and pushing the optimized video data to an application client, to allow the application client to output the optimized video data.
- 6 . The method according to claim 1 , wherein the trained image processing network is trained by: obtaining a sample image pair, the sample image pair comprising low-definition image data and high-definition image data, and the low-definition image data having the same content as the high-definition image data; calling the image processing network to adjust a resolution of the low-definition image data to a target resolution, to obtain sample super-resolution image data, and generating a super-resolution loss function according to the sample super-resolution image data and the high-definition image data; calling the image processing network to perform image quality enhancement processing on the sample super-resolution image data, to obtain first sample enhanced image data, and generating an image quality loss function according to the first sample enhanced image data and the high-definition image data; calling the image processing network to perform face enhancement processing on a face image in the first sample enhanced image data, to obtain a sample enhanced face image, fusing the sample enhanced face image with the first sample enhanced image data, to obtain second sample enhanced image data, and generating a face loss function according to the sample enhanced face image and a face image in the high-definition image data; calling the image processing network to perform image sharpening processing on the second sample enhanced image data, to obtain sample sharpened image data, and generating a sharpening loss function according to the sample sharpened image data and the high-definition image data; and updating a network parameter of the image processing network according to the super-resolution loss function, the image quality loss function, the face loss function, and the sharpening loss function, to obtain the trained image processing network.
- 7 . A computer device, comprising a memory and a processor, the memory storing computer-readable instructions, and the computer-readable instructions, when being executed by the processor, causing the computer device to perform an image processing method including: converting initial image data into super-resolution image data using a trained image processing network, a resolution of the super-resolution image data being greater than or equal to a target resolution; performing image quality enhancement processing on the super-resolution image data using the trained image processing network, to obtain first enhanced image data; when there is a face image in the first enhanced image data, performing face enhancement on the face image in the first enhanced image data using the trained image processing network to obtain second enhanced image data; extracting high-frequency image information in the second enhanced image data using a sharpening network; generating a sharpening mask for the second enhanced image data by the sharpening network; extracting sharpened image information in the second enhanced image data according to the sharpening mask; predicting a first weighted weight for the high-frequency image information, a second weighted weight for the sharpened image information, and a third weighted weight for the second enhanced image data by the sharpening network; and performing a weighted sum of the high-frequency image information, the sharpened image information, and the second enhanced image data according to the first weighted weight, the second weighted weight, and the third weighted weight, to obtain the sharpened image data.
- 8 . The computer device according to claim 7 , wherein the trained image processing network comprises a super-resolution network, and the converting initial image data into super-resolution image data using a trained image processing network comprises: detecting a resolution of the initial image data; and when the resolution of the initial image data is less than the target resolution, adjusting the resolution of the initial image data to the target resolution using the super-resolution network, to obtain the super-resolution image data.
- 9 . The computer device according to claim 7 , wherein the trained image processing network comprises a face enhancement network; and the performing face enhancement on the face image in the first enhanced image data using the trained image processing network to obtain second enhanced image data comprises: performing face detection on the first enhanced image data using the face enhancement network; and when there is a face image in the first enhanced image data, performing face enhancement processing on the face image in the first enhanced image data using the face enhancement network, to obtain the second enhanced image data.
- 10 . The computer device according to claim 9 , wherein the face enhancement network comprises a face detection network, a face enhancement sub-network, and a face fusion network; the performing face enhancement on the face image in the first enhanced image data using the trained image processing network to obtain second enhanced image data comprises: cutting out the face image from the first enhanced image data using the face detection network, to obtain a cut-out face image; performing the face enhancement processing on the cut-out face image using the face enhancement sub-network, to obtain an enhanced face image; generating a face fusion mask using the face fusion network; and performing image fusion processing on the first enhanced image data and the enhanced face image according to the face fusion mask, to obtain the second enhanced image data.
- 11 . The computer device according to claim 7 , wherein the initial image data is any one of a plurality of image frames obtained by segmenting video data; and the method further comprises: generating optimized video data of the video data according to the sharpened image data corresponding to each image frame in the plurality of image frames; and pushing the optimized video data to an application client, to allow the application client to output the optimized video data.
- 12 . The computer device according to claim 7 , wherein the trained image processing network is trained by: obtaining a sample image pair, the sample image pair comprising low-definition image data and high-definition image data, and the low-definition image data having the same content as the high-definition image data; calling the image processing network to adjust a resolution of the low-definition image data to a target resolution, to obtain sample super-resolution image data, and generating a super-resolution loss function according to the sample super-resolution image data and the high-definition image data; calling the image processing network to perform image quality enhancement processing on the sample super-resolution image data, to obtain first sample enhanced image data, and generating an image quality loss function according to the first sample enhanced image data and the high-definition image data; calling the image processing network to perform face enhancement processing on a face image in the first sample enhanced image data, to obtain a sample enhanced face image, fusing the sample enhanced face image with the first sample enhanced image data, to obtain second sample enhanced image data, and generating a face loss function according to the sample enhanced face image and a face image in the high-definition image data; calling the image processing network to perform image sharpening processing on the second sample enhanced image data, to obtain sample sharpened image data, and generating a sharpening loss function according to the sample sharpened image data and the high-definition image data; and updating a network parameter of the image processing network according to the super-resolution loss function, the image quality loss function, the face loss function, and the sharpening loss function, to obtain the trained image processing network.
- 13 . A non-transitory computer-readable storage medium, storing computer-readable instructions, and the computer-readable instructions, when being executed by a processor of a computer device, causing the computer device to perform an image processing method including: converting initial image data into super-resolution image data using a trained image processing network, a resolution of the super-resolution image data being greater than or equal to a target resolution; performing image quality enhancement processing on the super-resolution image data using the trained image processing network, to obtain first enhanced image data; when there is a face image in the first enhanced image data, performing face enhancement on the face image in the first enhanced image data using the trained image processing network to obtain second enhanced image data; extracting high-frequency image information in the second enhanced image data using a sharpening network; generating a sharpening mask for the second enhanced image data by the sharpening network; extracting sharpened image information in the second enhanced image data according to the sharpening mask; predicting a first weighted weight for the high-frequency image information, a second weighted weight for the sharpened image information, and a third weighted weight for the second enhanced image data by the sharpening network; and performing a weighted sum of the high-frequency image information, the sharpened image information, and the second enhanced image data according to the first weighted weight, the second weighted weight, and the third weighted weight, to obtain the sharpened image data.
- 14 . The non-transitory computer-readable storage medium according to claim 13 , wherein the trained image processing network comprises a super-resolution network, and the converting initial image data into super-resolution image data using a trained image processing network comprises: detecting a resolution of the initial image data; and when the resolution of the initial image data is less than the target resolution, adjusting the resolution of the initial image data to the target resolution using the super-resolution network, to obtain the super-resolution image data.
- 15 . The non-transitory computer-readable storage medium according to claim 13 , wherein the trained image processing network comprises a face enhancement network; and the performing face enhancement on the face image in the first enhanced image data using the trained image processing network to obtain second enhanced image data comprises: performing face detection on the first enhanced image data using the face enhancement network; and when there is a face image in the first enhanced image data, performing face enhancement processing on the face image in the first enhanced image data using the face enhancement network, to obtain the second enhanced image data.
- 16 . The non-transitory computer-readable storage medium according to claim 15 , wherein the face enhancement network comprises a face detection network, a face enhancement sub-network, and a face fusion network; the performing face enhancement on the face image in the first enhanced image data using the trained image processing network to obtain second enhanced image data comprises: cutting out the face image from the first enhanced image data using the face detection network, to obtain a cut-out face image; performing the face enhancement processing on the cut-out face image using the face enhancement sub-network, to obtain an enhanced face image; generating a face fusion mask using the face fusion network; and performing image fusion processing on the first enhanced image data and the enhanced face image according to the face fusion mask, to obtain the second enhanced image data.
- 17 . The non-transitory computer-readable storage medium according to claim 13 , wherein the initial image data is any one of a plurality of image frames obtained by segmenting video data; and the method further comprises: generating optimized video data of the video data according to the sharpened image data corresponding to each image frame in the plurality of image frames; and pushing the optimized video data to an application client, to allow the application client to output the optimized video data.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS This application is a continuation application of PCT Patent Application No. PCT/CN2022/117789, entitled “TRAINING METHOD AND APPARATUS FOR IMAGE PROCESSING NETWORK, COMPUTER DEVICE, AND STORAGE MEDIUM” filed on Sep. 8, 2022, which claims priority to Chinese Patent Application No. 202111188444.9, entitled “TRAINING METHOD AND APPARATUS FOR IMAGE PROCESSING NETWORK, COMPUTER DEVICE, AND STORAGE MEDIUM” filed with the Chinese Patent Office on Oct. 12, 2021, all of which is incorporated by reference in its entirety. TECHNICAL FIELD The present disclosure relates to the technical field of image processing, and in particular to a training method and apparatus for an image processing network, a computer device, and a storage medium. BACKGROUND As computer network technologies advance, image optimization has been applied to a growing number of scenes, such as a scene in which a photo of a user needs to be optimized, or image frames in video data need to be optimized. The image optimization can be carried out by training image models. In the related art, a plurality of image models with different optimization tasks are separately trained. Through the plurality of trained image models, an image is processed in a superimposed optimization manner. However, in this case, one image model may have a reverse optimization effect on another one. As a result, the optimization effects of the image models are mutually weakened, thereby reducing the image optimization effects of the trained image models. SUMMARY According to another aspect of the present disclosure, an image processing method is performed by a computer device, which includes: converting initial image data into super-resolution image data using a trained image processing network, a resolution of the super-resolution image data being greater than or equal to a target resolution;performing image quality enhancement processing on the super-resolution image data using the trained image processing network, to obtain first enhanced image data;when there is a face image in the first enhanced image data, performing face enhancement on the face image in the first enhanced image data using the trained image processing network to obtain second enhanced image data; andperforming image sharpening processing on the second enhanced image data using the trained image processing network to obtain sharpened image data. According to another aspect of the present disclosure, a computer device is provided, which includes a memory and a processor, the memory storing computer-readable instructions, and the computer-readable instructions, when being executed by the processor, causing the computer device to perform the method according to any one of foregoing aspects of the present disclosure. According to yet another aspect of the present disclosure, a non-transitory computer-readable storage medium is provided, which stores computer-readable instructions, and the computer-readable instructions, when being executed by a processor of a computer device, causing the computer device to perform the method according to any one of the foregoing aspects of the present disclosure. BRIEF DESCRIPTION OF THE DRAWINGS To describe the technical solutions in the embodiments of the present disclosure or in the prior art more clearly, the following briefly introduces the accompanying drawings for describing the embodiments or the prior art. Apparently, the accompanying drawings in the following description show merely some embodiments of the present disclosure, and a person of ordinary skill in the art may still derive other drawings from the accompanying drawings without creative efforts. FIG. 1 is a schematic structural diagram of a network architecture according to an embodiment of the present disclosure. FIG. 2 is a schematic scenario diagram of network training according to an embodiment of the present disclosure. FIG. 3 is a schematic flowchart of a training method for an image processing network according to an embodiment of the present disclosure. FIG. 4 is a schematic structural diagram of an encoder-decoder network according to an embodiment of the present disclosure. FIG. 5 is a schematic structural diagram of a basic unit according to an embodiment of the present disclosure. FIG. 6 is a schematic scenario diagram of obtaining loss functions according to the present disclosure. FIG. 7 is a schematic flowchart of an image processing method according to an embodiment of the present disclosure. FIG. 8 is a schematic scenario diagram of face optimization according to an embodiment of the present disclosure. FIG. 9 is a schematic scenario diagram of image optimization according to an embodiment of the present disclosure. FIG. 10 is a schematic scenario diagram of data pushing according to an embodiment of the present disclosure. FIG. 11 is a schematic structural diagram of a training apparatus for an image processing network according to an embodiment of th