CN-122002031-A - Video processing method and related device

CN122002031ACN 122002031 ACN122002031 ACN 122002031ACN-122002031-A

Abstract

The application discloses a video processing method and a related device, which are used for acquiring attribute information of each frame to be decoded in a video code stream to be processed, wherein the attribute information is used for indicating the frame type of each frame to be decoded, and the frame to be decoded with the frame type being a non-reference frame can be determined from the video code stream to be processed based on the attribute information and removed to obtain a recombinant code stream. And carrying out frame extraction processing from the recombined code stream based on the preset frame quantity to obtain a plurality of sampling frames, wherein the extracted sampling frames are more likely to be key frames, so that the number of the samples of the frames to be decoded which are depended on can be reduced, the total number of the extracted sampling frames is further reduced, and when the plurality of sampling frames are subjected to decoding processing to obtain a decoded video, the decoding speed can be greatly improved, and the video frame cutting efficiency is improved. In a word, the application can reduce the total number of the extracted sampling frames, improve the decoding speed and the video frame cutting efficiency, does not influence the decoding process of the sampling frames, and ensures that the video frame cutting is carried out smoothly.

Inventors

FENG SHENMING

Assignees

腾讯科技（深圳）有限公司

Dates

Publication Date: 20260508
Application Date: 20241105

Claims (14)

1. A method of video processing, the method comprising: the method comprises the steps of obtaining attribute information of each frame to be decoded in a video code stream to be processed, wherein the attribute information is used for indicating frame types of each frame to be decoded, the frame types comprise key frames and non-key frames, and the non-key frames comprise reference frames and non-reference frames; Based on the attribute information, eliminating frames to be decoded, of which the frame types are non-reference frames, from the video code stream to be processed to obtain a reconstructed code stream; frame extraction processing is carried out from the recombined code stream based on the preset frame quantity, so as to obtain a plurality of sampling frames; and decoding the plurality of sampling frames to obtain decoded video.
2. The method according to claim 1, wherein the removing frames to be decoded, which are non-reference frames in frame type, from the video bitstream to be processed based on the attribute information to obtain a reconstructed bitstream includes: Based on the attribute information, eliminating frames to be decoded, of which the frame types are non-reference frames, from the video code stream to be processed, and obtaining a first intermediate code stream; Based on decoding dependency relationship among frames to be decoded, adjusting frame types of the frames to be decoded of the first intermediate code stream to obtain a second intermediate code stream; And when the second intermediate code stream has the frame to be decoded with the frame type being a non-reference frame, taking the second intermediate code stream as the video code stream to be processed, returning to execute the step of eliminating the frame to be decoded with the frame type being the non-reference frame from the video code stream to be processed based on the attribute information, and obtaining a first intermediate code stream until a stop condition is reached, and obtaining the recombined code stream.
3. The method according to claim 1, wherein the frame extraction process from the reorganized code stream based on a preset number of frames to obtain a plurality of sampling frames includes: determining the total number of key frames in the recombined code stream; And when the total number of the key frames is greater than or equal to the preset frame number, performing frame extraction processing from the key frames based on the preset frame number to obtain a plurality of sampling frames, wherein the total number of the sampling frames is equal to the preset frame number.
4. A method according to claim 3, characterized in that the method further comprises: when the total number of the key frames is smaller than the preset frame number, frame extraction processing is carried out from the key frames to obtain first sampling frames with a first sampling number; based on a second sampling number, frame extraction processing is carried out from the reference frames to obtain a plurality of second sampling frames, wherein the second sampling number is a difference value between the preset frame number and the first sampling number; And taking the plurality of first sampling frames and the plurality of second sampling frames as a plurality of sampling frames, wherein the total number of the sampling frames is larger than the preset frame number.
5. A method according to claim 3, wherein said performing frame extraction processing from said key frames based on said preset number of frames to obtain a plurality of said sampled frames comprises: And uniformly extracting frames from the key frames based on the preset frame quantity to obtain a plurality of sampling frames.
6. The method according to any one of claims 1-5, further comprising: Extracting the characteristics of each decoded frame in the decoded video to obtain the image characteristics of each decoded frame; calculating a first similarity between image features of two decoded frames; determining a first redundant frame from the two decoded frames when the first similarity is greater than a first similarity threshold; and eliminating the first redundant frame from the decoded video to obtain a first de-duplicated video.
7. The method of claim 6, wherein the image features of the two decoded frames are image features corresponding to each of two adjacent decoded frames in the decoded video.
8. The method according to any one of claims 1-5, further comprising: Determining gray scale distribution information of each decoded frame in the decoded video; determining a second redundant frame from the decoded video based on the gray level distribution information, wherein in the gray level distribution information of the second redundant frame, the number of pixel points with gray level values within a preset gray level range is larger than a preset number; And eliminating the second redundant frame from the decoded video to obtain a second duplicate removal video.
9. The method according to any one of claims 1-5, further comprising: inputting the decoded video into a de-duplication model, and extracting embedded vectors of each decoded frame in the decoded video; Calculating a second similarity between the embedded vectors of the two decoded frames; determining a third redundant frame from the two decoded frames when the second similarity is greater than a second similarity threshold; and eliminating the third redundant frame from the decoded video, and outputting the third duplicate removal video.
10. The method according to any one of claims 1-5, wherein before the obtaining attribute information of each frame to be decoded in the video bitstream to be processed, the method further comprises: Acquiring a video code stream of a video to be processed; Determining a plurality of frames to be decoded from the video code stream according to a preset frame cutting strategy; and constructing the video code stream to be processed based on a plurality of frames to be decoded.
11. A video processing apparatus, the apparatus comprising: the device comprises an acquisition unit, an acquisition unit and a processing unit, wherein the acquisition unit is used for acquiring attribute information of each frame to be decoded in a video code stream to be processed, the attribute information is used for indicating frame types of each frame to be decoded, the frame types comprise key frames and non-key frames, and the non-key frames comprise reference frames and non-reference frames; The rejecting unit is used for rejecting frames to be decoded with frame types being non-reference frames from the video code stream to be processed based on the attribute information to obtain a reconstructed code stream; the extraction unit is used for carrying out frame extraction processing from the recombined code stream based on the preset frame quantity to obtain a plurality of sampling frames; and the decoding unit is used for decoding the plurality of sampling frames to obtain decoded video.
12. A computer device, the computer device comprising a processor and a memory: The memory is used for storing a computer program and transmitting the computer program to the processor; The processor is configured to perform the method of any of claims 1-10 according to the computer program.
13. A computer readable storage medium for storing a computer program which, when executed by a computer device, implements the method of any one of claims 1-10.
14. A computer program product comprising a computer program which, when run on a computer device, causes the computer device to perform the method of any of claims 1-10.

Description

Video processing method and related device Technical Field The present application relates to the field of video processing, and in particular, to a video processing method and related apparatus. Background When the video is cut, that is, a part of frames are extracted from a section of video, for example, the video includes 100 frames, a video composed of 50 frames can be obtained through the video cut processing, and various processes such as action recognition can be performed based on the video composed of 50 frames. The current video frame cutting technology generally adopts a method of uniformly cutting frames, namely, video frames are extracted from a video code stream of video according to the mode of equal time intervals or equal frame number intervals, the mode does not consider the type of the video frames, and all the video frames are treated equally. However, when the extracted video frame needs to depend on other video frames during decoding, for example, the extracted video frame is a non-key frame, the video frame on which the extracted video frame depends needs to be decoded. Assuming that the number of frames of the required video frame is n, the number of frames x and x > n are required to be decoded uniformly, that is, the number of decoded video frames is too large, especially when there are a large number of non-key frames in the video code stream, the extracted non-key frames are likely to be relatively large, and the number of frames x to be finally decoded is far greater than the number of frames n required, that is, the number of frames to be finally decoded is far greater, so that the overall decoding speed is slow and the efficiency of video frame cutting is low. Disclosure of Invention In order to solve the technical problems, the application provides a video processing method and a related device, which can improve the efficiency of video frame capture. The embodiment of the application discloses the following technical scheme: In one aspect, the present application provides a video processing method, the method including: the method comprises the steps of obtaining attribute information of each frame to be decoded in a video code stream to be processed, wherein the attribute information is used for indicating frame types of each frame to be decoded, the frame types comprise key frames and non-key frames, and the non-key frames comprise reference frames and non-reference frames; Based on the attribute information, eliminating frames to be decoded, of which the frame types are non-reference frames, from the video code stream to be processed to obtain a reconstructed code stream; frame extraction processing is carried out from the recombined code stream based on the preset frame quantity, so as to obtain a plurality of sampling frames; and decoding the plurality of sampling frames to obtain decoded video. In yet another aspect, the present application provides a video processing apparatus, the apparatus comprising: the device comprises an acquisition unit, an acquisition unit and a processing unit, wherein the acquisition unit is used for acquiring attribute information of each frame to be decoded in a video code stream to be processed, the attribute information is used for indicating frame types of each frame to be decoded, the frame types comprise key frames and non-key frames, and the non-key frames comprise reference frames and non-reference frames; The rejecting unit is used for rejecting frames to be decoded with frame types being non-reference frames from the video code stream to be processed based on the attribute information to obtain a reconstructed code stream; the extraction unit is used for carrying out frame extraction processing from the recombined code stream based on the preset frame quantity to obtain a plurality of sampling frames; and the decoding unit is used for decoding the plurality of sampling frames to obtain decoded video. In yet another aspect, the present application provides a computer device comprising a processor and a memory: The memory is used for storing a computer program and transmitting the computer program to the processor; the processor is configured to execute the method according to the computer program. In yet another aspect, the present application provides a computer readable storage medium for storing a computer program which, when executed by a computer device, performs the method. In yet another aspect, the application provides a computer program product comprising a computer program which, when run on a computer device, causes the computer device to perform the method. According to the technical scheme, in the process of obtaining the decoded video through video frame cutting processing, in order to improve video frame cutting efficiency, attribute information of each frame to be decoded in a video code stream to be processed can be obtained, the attribute information is used for indicating frame types of each frame to be decoded,