CN-122027846-A - Video element elimination method, device, equipment and medium based on frame selection interaction

CN122027846ACN 122027846 ACN122027846 ACN 122027846ACN-122027846-A

Abstract

The application relates to a video element elimination method, device, equipment and medium based on frame selection interaction, wherein the method comprises the steps of responding to frame selection operation of a user in a video preview area, generating a rectangular selection area covering a target elimination element, acquiring coordinate information of the rectangular selection area under an original video frame coordinate system, generating a binarization mask image with the same resolution as that of the original video frame based on the coordinate information in a client, sending the original video frame and the binarization mask image to a cloud server, repairing content of a corresponding area in the original video frame based on the binarization mask image through an image elimination model, generating elimination result data, returning the elimination result data to the client, and rendering and displaying the elimination result data in the video preview area to replace an original picture. According to the method, the target element area is accurately specified in a frame selection interaction mode, so that the mask defects are reduced, the content restoration quality is improved, and meanwhile, the operation stability and the editing efficiency are improved.

Inventors

LI ZILI
TIAN XIANZHAO
CHEN DAFA
XU HAOPENG
KONG YULU

Assignees

万兴科技集团股份有限公司

Dates

Publication Date: 20260512
Application Date: 20260129

Claims (10)

1. A method for eliminating video elements based on frame selection interaction, comprising: Responding to frame selection operation of a user in a video preview area, generating a rectangular selection area covering a target elimination element, and acquiring coordinate information of the rectangular selection area under an original video frame coordinate system; Generating a binarized mask image in the client having the same resolution as the original video frame based on the coordinate information; sending the original video frame and the binarization mask image to a cloud server; Performing content restoration on a corresponding region in the original video frame based on the binarization mask image through an image elimination model in the cloud server to generate elimination result data; and returning the elimination result data to the client, and rendering and displaying the elimination result data in the video preview area so as to replace an original picture.
2. The frame selection interaction based video element elimination method according to claim 1, wherein said generating a binarized mask image based on said coordinate information and having a same resolution as said original video frame in a client comprises: Creating an initial matrix with the same resolution as the original video frame in the client, wherein all elements in the initial matrix are initialized to be reserved with identifiers; modifying values of all elements in the corresponding rectangular selection area in the initial matrix into marks to be eliminated according to the coordinate information of the rectangular selection area to obtain a modified matrix; and carrying out edge smoothing processing on the modified matrix to generate the binarization mask image.
3. The method for eliminating video elements based on frame selection interaction according to claim 1, wherein the sending the original video frame and the binarized mask image to a cloud server comprises: Packaging the original video frame and the binarization mask image according to a preset compression format to obtain packaged image information; and constructing an API request through a preset standard network protocol, and sending the packaged image information to the cloud server based on the API request.
4. The method for removing video elements based on frame selection interaction according to claim 3, wherein the performing content restoration on the corresponding region in the original video frame based on the binarization mask image by the image removal model in the cloud server, generating removal result data includes: Receiving and analyzing the packed image information through the cloud server to obtain the original video frame and the binarization mask image; Inputting the original video frame and the binarization mask image to the image elimination model deployed on the GPU cluster; Identifying the content to be eliminated of the binarization mask image to the corresponding region in the original video frame through the image elimination model; and carrying out characteristic reconstruction and content generation on the content to be eliminated to generate the elimination data.
5. The frame selection interaction-based video element elimination method according to claim 1, wherein the returning the elimination result data to the client and rendering and displaying the elimination result data in the video preview area to replace an original picture comprises: returning the elimination result data to the memory of the client; And replacing all elements in the rectangular selection area corresponding to the original video frame with the elimination result data, and rendering and displaying in the video preview area.
6. The method for eliminating video elements based on frame selection interaction according to any one of claims 1 to 5, wherein the generating a rectangular selection area covering a target elimination element in response to a frame selection operation of a user in a video preview area and acquiring coordinate information of the rectangular selection area under an original video frame coordinate system includes: monitoring operation information in the video preview area; If the frame selection operation of the user in the video preview area is monitored, responding to the frame selection operation, and generating a rectangular selection area covering the target elimination element; acquiring an upper left corner coordinate and a lower right corner coordinate in the rectangular selected area to obtain an initial coordinate; and converting the initial coordinates into absolute pixel coordinates under the original video frame coordinate system to obtain the coordinate information.
7. The method for eliminating video elements based on frame selection interaction according to any one of claims 1 to 5, wherein parameters of the rectangular selection area are stored as reusable selection area nodes, and the reusable selection area nodes are applied to other time points of the same video or target elements with the same positions and sizes in different video clips.
8. A video element elimination apparatus based on frame selection interaction, comprising: the frame selection interaction module is used for responding to frame selection operation of a user in the video preview area, generating a rectangular selection area covering the target elimination element, and acquiring coordinate information of the rectangular selection area under an original video frame coordinate system; A mask image generation module for generating a binarized mask image of the same resolution as the original video frame based on the coordinate information in a client; The image data sending module is used for sending the original video frame and the binarization mask image to a cloud server; the result data generation module is used for repairing the content of the corresponding region in the original video frame based on the binarization mask image through an image elimination model in the cloud server to generate elimination result data; And the result rendering module is used for returning the elimination result data to the client and rendering and displaying the elimination result data in the video preview area so as to replace an original picture.
9. A computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the frame-based interactive video element elimination method of any of claims 1-7 when the computer program is executed.
10. A computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, which when executed by a processor implements the frame selection interaction based video element elimination method of any of claims 1 to 7.

Description

Video element elimination method, device, equipment and medium based on frame selection interaction Technical Field The present application relates to the field of image processing technologies, and in particular, to a method, an apparatus, a device, and a medium for eliminating video elements based on frame selection interaction. Background In desktop-side video editing software, a user needs to precisely specify the region range of a target element to be eliminated in a video frame so as to realize high-quality content restoration. In the prior art, an intelligent brush interaction mode is commonly adopted, a user manually smears the outline of a target object on a video preview interface through a mouse, and the client generates an irregular mask image according to the outline. However, in the face of high resolution video material (e.g., on the order of 4K/8K), this approach suffers from the significant disadvantage that manual painting is difficult to meet pixel-level accuracy requirements, especially when the target element is in a regular geometry, which is highly prone to edge coverage or rough boundaries by the user, resulting in local voids or breaks in the resulting mask image. Such mask defects directly cause artifact residues, clutter generation or fill region artifacts in subsequent content repair stages. Meanwhile, the smearing operation is highly dependent on the hand stability of a user, and a background area can be smeared by slight shake, so that the background information is lost or the coverage of target elements is insufficient. In addition, the existing interaction flow forcedly pops up a secondary modal window to interrupt the editing continuity of a user, a real-time progress feedback mechanism is lacked in the processing process, the user cannot effectively judge the operation state, and often mistakes that the function is invalid, so that the operation efficiency and the experience fluency are further reduced. Disclosure of Invention The embodiment of the application aims to provide a video element elimination method, device, equipment and medium based on frame selection interaction, which accurately designates a target element area through a frame selection interaction mode, reduces mask defects, improves content restoration quality, and improves operation stability and editing efficiency. In order to solve the above technical problems, an embodiment of the present application provides a method for removing video elements based on frame selection interaction, including: Responding to frame selection operation of a user in a video preview area, generating a rectangular selection area covering a target elimination element, and acquiring coordinate information of the rectangular selection area under an original video frame coordinate system; Generating a binarized mask image in the client having the same resolution as the original video frame based on the coordinate information; sending the original video frame and the binarization mask image to a cloud server; Performing content restoration on a corresponding region in the original video frame based on the binarization mask image through an image elimination model in the cloud server to generate elimination result data; and returning the elimination result data to the client, and rendering and displaying the elimination result data in the video preview area so as to replace an original picture. In order to solve the above technical problems, an embodiment of the present application provides a video element elimination apparatus based on frame selection interaction, including: the frame selection interaction module is used for responding to frame selection operation of a user in the video preview area, generating a rectangular selection area covering the target elimination element, and acquiring coordinate information of the rectangular selection area under an original video frame coordinate system; A mask image generation module for generating a binarized mask image of the same resolution as the original video frame based on the coordinate information in a client; The image data sending module is used for sending the original video frame and the binarization mask image to a cloud server; the result data generation module is used for repairing the content of the corresponding region in the original video frame based on the binarization mask image through an image elimination model in the cloud server to generate elimination result data; And the result rendering module is used for returning the elimination result data to the client and rendering and displaying the elimination result data in the video preview area so as to replace an original picture. In order to solve the technical problems, the technical scheme adopted by the invention is that the computer equipment comprises one or more processors and a memory, wherein the memory is used for storing one or more programs, so that the one or more processors can realize the video elem