CN-120219912-B - Training method of image processing model, video image processing method and system

CN120219912BCN 120219912 BCN120219912 BCN 120219912BCN-120219912-B

Abstract

The present application relates to the field of image processing, and more particularly, to a training method of an image processing model, a video image processing method and a system. The method comprises the steps of extracting a video frame image from a video to be used as an image to be processed, cutting the image to be processed according to an interested area to obtain a cut image, inputting the cut image into a twin network to be processed to obtain a segmented area image, wherein the training method of the twin network comprises the steps of obtaining a plurality of sample images, cutting target objects of the sample images to obtain sample cut images, triangulating and deforming the sample cut images to obtain corresponding deformed sample images, marking the sample images and the corresponding deformed sample images to cut objects, inputting the marked sample cut images and the corresponding deformed sample images into the twin network respectively, and training the twin network to obtain the trained twin network. The method can realize accurate segmentation and processing of the segmented object in the image.

Inventors

Mai Ruijie

Assignees

广州虎牙科技有限公司

Dates

Publication Date: 20260512
Application Date: 20250228

Claims (9)

1. A method of training an image processing model, the method comprising: Acquiring a plurality of sample images; cutting the sample image by a target object to obtain a sample cutting image; The sample cutting image is subjected to triangulation deformation to obtain a corresponding deformed sample image, wherein the triangulation deformation is to construct triangular grids on a pixel point set of the sample cutting image, and the sample cutting image is divided into a plurality of triangular areas; Marking the sample image and the corresponding deformed sample image as a segmentation object; inputting the marked sample clipping image into a twin network to obtain a first generated image, and inputting the deformed sample image corresponding to the sample clipping image into the twin network to obtain a corresponding second generated image; Performing inverse deformation operation on the second generated image to obtain an inverse deformed second generated image; Comparing the first generated image with the second generated image of the inverse deformation by global pixels to obtain a loss value of the twin network; And optimizing parameters of the twin network according to the loss value to obtain the trained twin network.
2. A method of video image processing, the method comprising: extracting a video frame image from a video as an image to be processed; cutting the image to be processed according to the region of interest to obtain a cut image; Inputting the clipping image into a twin network to process to obtain a segmented region image, wherein the twin network is a twin network trained by the training method of claim 1.
3. A video image processing method according to claim 2, wherein the region of interest is stored in a preset database; extracting specific position information of a target object in the image to be processed according to the segmentation area image; generating an interested region of the current video frame image according to the specific position information; And replacing the region of interest of the current video frame image with the region of interest stored in the database, and storing the replaced region of interest as the region of interest corresponding to the next video frame image.
4. A video image processing method according to claim 2, wherein the method further comprises: according to the segmented area image corresponding to the previous video frame image, obtaining a motion vector of each pixel in the segmented area image corresponding to the current video frame image through a rapid optical flow algorithm; and adjusting noise points or incoherent areas in the segmented area image corresponding to the current video frame image according to the motion vector to obtain the current segmented area image after finishing the post-processing.
5. A video image processing method according to any one of claims 2-4, wherein the method further comprises: expanding boundaries of the segmented region image by applying a morphological dilation operation; Performing eclosion treatment on the boundary to obtain a segmentation area image with the desalted boundary area; And carrying out Gaussian blur filtering on the segmented region image with the desalted boundary region to obtain a final segmented region image.
6. A video image processing system, the system comprising: the acquisition module is used for extracting a video frame image from the video to serve as an image to be processed; the clipping module is used for clipping the image to be processed according to the region of interest to obtain a clipping image; the generation module is used for inputting the clipping image into a twin network to be processed to obtain a segmented region image, wherein the twin network is a twin network trained by the training method of claim 1.
7. A video image processing system according to claim 6, wherein the region of interest is stored in a pre-set database; extracting specific position information of a target object in the image to be processed according to the segmentation area image; generating an interested region of the current video frame image according to the specific position information; And replacing the region of interest of the current video frame image with the region of interest stored in the database, and storing the replaced region of interest as the region of interest corresponding to the next video frame image.
8. An electronic device, comprising: a memory for storing one or more computer programs; a processor which, when executed by the processor, implements the training method of an image processing model as claimed in claim 1 or a video image processing method as claimed in any of claims 2 to 5.
9. A computer readable storage medium storing computer instructions for causing a processor to perform the training method of an image processing model according to claim 1 or a video image processing method according to any one of claims 2-5 when executed.

Description

Training method of image processing model, video image processing method and system Technical Field The present invention relates to the field of image processing, and more particularly, to a training method of an image processing model, a video image processing method and a system. Background In the field of live video broadcasting, the accurate detection and segmentation of a target area is a key technology, particularly a human skin area, and the detected and segmented human skin area can be used for face beautification of human images, including a plurality of functions of freckle removal, acne removal, skin beautification Yan Mopi, skin color adjustment and the like. However, the prior art still has many defects in aspects of precision, compatibility, instantaneity, stability and the like, an algorithm used in the prior art is poor in performance when processing light change, shielding and complex background in a picture, a target area cannot be accurately identified and segmented, the real-time requirement is difficult to meet in practical application in the prior art, when the prior art segments a skin area, the prior art still has great challenges to different varieties, skin colors and environmental illumination processing, and large errors of identifying and segmenting the skin area often occur. There is a need for a technique for detecting and segmenting target regions that requires improvement. Disclosure of Invention The present invention is directed to overcoming at least one of the shortcomings (drawbacks) of the prior art described above and providing a training method for an image processing model, a video image processing method and system for achieving accurate segmentation and processing of segmented objects in an image. According to a first aspect of the present application, there is provided a training method of an image processing model, the method comprising: Acquiring a plurality of sample images; cutting the sample image by a target object to obtain a sample cutting image; Performing triangulation deformation on the sample clipping image to obtain a corresponding deformed sample image; Marking the sample image and the corresponding deformed sample image as a segmentation object; and respectively inputting the marked sample clipping image and the corresponding deformed sample image into a twin network, and training the twin network to obtain a trained twin network. The method can be used for training the twin network by utilizing the sample clipping image and the corresponding deformed sample image, can be compatible with the image deformed by uncontrollable reasons, so that the robustness of the twin network can be improved, the problem that the target area cannot be accurately segmented by the subsequent video or picture due to deformation can be solved, and the reliability of video or picture segmentation can be improved. Optionally, the respectively inputting the marked sample clipping image and the corresponding deformed sample image into a twin network, training the twin network to obtain a trained twin network, and the method includes: Inputting the marked sample clipping image into the twin network to obtain a first generated image, and inputting the deformed sample image corresponding to the sample clipping image into the twin network to obtain a corresponding second generated image; acquiring a loss value of the twin network according to the first generated image and the corresponding second generated image; And optimizing parameters of the twin network according to the loss value to obtain the trained twin network. It can be understood that the corresponding deformed sample image is obtained by triangulation deformation of the sample image, and a second generated image is generated for the deformed sample image, so that deformation and distortion of the area where the segmented object is formed in the video due to network problems or target object movement problems are simulated, and the deformed sample image is utilized to train the twin network, so that the processing capacity of the twin network on the deformed image can be enhanced, the area where the segmented object is located can be accurately segmented when the twin network processes the deformed image, and the robustness of the twin network is enhanced. Optionally, the acquiring the loss value of the twin network according to the first generated image and the corresponding second generated image includes: Performing inverse deformation operation on the second generated image to obtain an inverse deformed second generated image; and comparing the first generated image with the second generated image of the inverse deformation by using global pixels to obtain a loss value of the twin network. It can be understood that the loss value of the twin network is obtained according to the similarity between the first generated image and the second generated image corresponding to the inverse deformation, and the second gen