CN-122027868-A - Video metadata generation method, device, equipment and readable storage medium

CN122027868ACN 122027868 ACN122027868 ACN 122027868ACN-122027868-A

Abstract

The application discloses a method, a device, equipment and a readable storage medium for generating video metadata, which comprise the steps of receiving a target video frame of metadata to be generated, determining the last video frame of the target video frame as a reference video frame, calculating the perception difference between the target video frame and the reference video frame, determining the redundancy of the target video frame according to the perception difference, determining initial metadata corresponding to the target video frame according to the redundancy, and carrying out infinite impulse response time domain filtering on the initial metadata to obtain the target metadata. The application reduces the video metadata calculation burden, reduces the calculation power expenditure, reduces the power consumption and saves the cost.

Inventors

JIANG XIAO

Assignees

马栏山音视频实验室

Dates

Publication Date: 20260512
Application Date: 20260228

Claims (10)

1. A method of generating video metadata, comprising: receiving a target video frame of metadata to be generated, and determining the last video frame of the target video frame as a reference video frame; Calculating a perceived difference between the target video frame and the reference video frame; Determining redundancy of the target video frame according to the perceived difference; determining initial metadata corresponding to the target video frame according to the redundancy; and carrying out infinite impulse response time domain filtering on the initial metadata to obtain target metadata.
2. The method of generating video metadata according to claim 1, wherein calculating a perceived difference between the target video frame and the reference video frame, and determining redundancy of the target video frame from the perceived difference, comprises: calculating an absolute error sum between the target video frame and the reference video frame; And determining redundancy of the target video frame according to the absolute error sum.
3. The video metadata generation method according to claim 1 or 2, wherein determining redundancy of the target video frame from the perceived difference comprises: Acquiring a first perception difference threshold and a second perception difference threshold, wherein the first perception difference threshold is smaller than the second perception difference threshold; When the perceived difference is less than the first perceived difference threshold, determining that the redundancy of the target video frame is high redundancy; When the perceived difference is greater than or equal to the first perceived difference threshold and less than the second perceived difference threshold, determining that the redundancy of the target video frame is common redundancy; and when the perceived difference is greater than or equal to the second perceived difference threshold, determining that the redundancy of the target video frame is non-redundant.
4. The method of generating video metadata according to claim 3, wherein determining initial metadata corresponding to the target video frame according to the redundancy comprises: When the redundancy of the target video frame is determined to be high redundancy, obtaining initial metadata corresponding to the target video frame by utilizing front and rear frame metadata interpolation; when the redundancy of the target video frame is determined to be common redundancy or non-redundancy, calculating a target scaling ratio; scaling the target video frame according to the target scaling ratio to obtain a proxy frame; And carrying out statistical information extraction on the proxy frame to obtain the initial metadata.
5. The method of generating video metadata according to claim 4, wherein when determining that the redundancy of the target video frame is normal redundancy or non-redundancy, calculating a target scaling ratio comprises: Acquiring a preset acceleration ratio or a current calculation load; Calculating an initial scaling ratio according to the preset acceleration ratio or the current computing force load; when the redundancy of the target video frame is determined to be common redundancy, the initial scaling ratio is adjusted downwards to obtain the target scaling ratio; When the redundancy of the target video frame is determined to be non-redundant, the initial scaling ratio is determined to be the target scaling ratio.
6. The method of generating video metadata according to claim 1, wherein performing infinite impulse response time-domain filtering on the initial metadata comprises: determining a target frame weight corresponding to the target video frame according to the perception difference; and carrying out infinite impulse response time domain filtering on the initial metadata according to the target frame weight.
7. The method of generating video metadata according to claim 1, further comprising, after performing infinite impulse response time-domain filtering on the initial metadata to obtain target metadata: and packaging the target metadata into the supplemental enhancement information or metadata track of the video stream.
8. A video metadata generation apparatus, comprising: the reference video frame determining module is used for receiving a target video frame of metadata to be generated and determining the last video frame of the target video frame as a reference video frame; The perception difference calculation module is used for calculating the perception difference between the target video frame and the reference video frame; the redundancy determining module is used for determining redundancy of the target video frame according to the perceived difference; the initial metadata determining module is used for determining initial metadata corresponding to the target video frame according to the redundancy; And the target metadata acquisition module is used for carrying out infinite impulse response time domain filtering on the initial metadata to obtain target metadata.
9. A video metadata generation apparatus, comprising: A memory for storing a computer program; a processor for implementing the steps of the video metadata generation method according to any one of claims 1 to 7 when executing the computer program.
10. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the video metadata generation method according to any of claims 1 to 7.

Description

Video metadata generation method, device, equipment and readable storage medium Technical Field The present application relates to the field of multimedia technologies, and in particular, to a method, an apparatus, a device, and a readable storage medium for generating video metadata. Background With the popularity of the cyanine color image (HDR Vivid) technology, dynamic metadata (DYNAMIC METADATA) becomes critical to enhance the video viewing experience. Dynamic metadata requires a pixel-level statistical analysis (e.g., calculating maximum luminance, average luminance, histogram distribution, etc.) of each frame of image to guide the display device for accurate tone mapping. The main stream metadata generation scheme mainly faces two pain points, firstly, the content variation between most adjacent frames in the video is extremely small, the identical statistical characteristic calculation redundancy is high in frame-by-frame repeated calculation, and the calculation power waste is caused. Secondly, metadata reflects macro statistical features of the image (such as overall brightness distribution), is insensitive to local micro details, and makes full-precision calculation heavy. Furthermore, on the terminal side devices (such as mobile phones and tablet computers), continuous high-load pixel operation can reduce endurance and cause power consumption to rise and heat, so that user experience is affected. In summary, how to effectively solve the problems of heavy calculation load, waste of calculation power, high power consumption and the like in the current video metadata generation method is an urgent need of those skilled in the art. Disclosure of Invention The application aims to provide a video metadata generation method, which reduces the calculation burden of video metadata, reduces the calculation power cost, reduces the power consumption and saves the cost, and also aims to provide a video metadata generation device, equipment and a computer readable storage medium. In order to solve the technical problems, the application provides the following technical scheme: a video metadata generation method, comprising: receiving a target video frame of metadata to be generated, and determining the last video frame of the target video frame as a reference video frame; Calculating a perceived difference between the target video frame and the reference video frame; Determining redundancy of the target video frame according to the perceived difference; determining initial metadata corresponding to the target video frame according to the redundancy; and carrying out infinite impulse response time domain filtering on the initial metadata to obtain target metadata. In one embodiment of the present application, calculating a perceived difference between the target video frame and the reference video frame, and determining redundancy of the target video frame based on the perceived difference, comprises: calculating an absolute error sum between the target video frame and the reference video frame; And determining redundancy of the target video frame according to the absolute error sum. In one embodiment of the present application, determining redundancy of the target video frame according to the perceived difference includes: Acquiring a first perception difference threshold and a second perception difference threshold, wherein the first perception difference threshold is smaller than the second perception difference threshold; When the perceived difference is less than the first perceived difference threshold, determining that the redundancy of the target video frame is high redundancy; When the perceived difference is greater than or equal to the first perceived difference threshold and less than the second perceived difference threshold, determining that the redundancy of the target video frame is common redundancy; and when the perceived difference is greater than or equal to the second perceived difference threshold, determining that the redundancy of the target video frame is non-redundant. In a specific embodiment of the present application, determining initial metadata corresponding to the target video frame according to the redundancy includes: When the redundancy of the target video frame is determined to be high redundancy, obtaining initial metadata corresponding to the target video frame by utilizing front and rear frame metadata interpolation; when the redundancy of the target video frame is determined to be common redundancy or non-redundancy, calculating a target scaling ratio; scaling the target video frame according to the target scaling ratio to obtain a proxy frame; And carrying out statistical information extraction on the proxy frame to obtain the initial metadata. In one embodiment of the present application, when determining that the redundancy of the target video frame is normal redundancy or non-redundancy, calculating the target scaling ratio includes: Acquiring a preset acceleration r