US-12621506-B2 - Content adaptive micro encoding optimization for video
Abstract
In some embodiments, a method analyzes flagged locations from a plurality of locations in an encoding of a video to form a cluster of locations. Draft micro-chunk boundaries for the cluster are determined based on searching for a first start location and a first end location in the encoding. The method searches in a first search range before the first start location and a second search range after the first end location for a second start location in the first search range and a second end location in the second search range. The second start location and the second end location form a micro-chunk. An encoding parameter set is determined for the micro-chunk formed by the second start location and the second end location based on content characteristics of the micro-chunk. The method uses the encoding parameter set to encode the micro-chunk for insertion in the encoding of the video.
Inventors
- Yuanyi XUE
- Roberto Gerson De Albuquerque Azevedo
- Christopher Richard Schroers
- Scott Labrozzi
- Wenhao Zhang
Assignees
- DISNEY ENTERPRISES, INC.
- BEIJING YOJAJA SOFTWARE TECHNOLOGY DEVELOPMENT CO., LTD.
Dates
- Publication Date
- 20260505
- Application Date
- 20230925
Claims (20)
- 1 . A method comprising: analyzing flagged frames from a plurality of frames in an encoding of a video to form a cluster of frames, wherein the cluster includes a plurality of flagged frames; determining draft micro-chunk boundaries for the cluster of a first start frame and a first end frame in the encoding that includes the cluster; searching in a first search range before the first start frame and a second search range after the first end frame for a second start frame in the first search range and a second end frame in the second search range, wherein the second start frame and the second end frame form a micro-chunk; analyzing multiple frames based on the second start frame and the second end frame to determine content characteristics for the micro-chunk; determining an encoding parameter set for the micro-chunk formed by the second start frame and the second end frame based on mapping the content characteristics of the micro-chunk to the encoding parameter set; and using the encoding parameter set to encode the plurality of frames in the micro-chunk for insertion in the encoding of the video.
- 2 . The method of claim 1 , wherein the flagged frames are flagged based on a quality metric of the flagged frames meeting a threshold.
- 3 . The method of claim 1 , wherein a quality control process analyzes the encoding of the video based on a quality metric and outputs frames for the flagged frames in the encoding of the video.
- 4 . The method of claim 1 , wherein analyzing the flagged frames comprises: grouping a portion of flagged frames in the cluster when distance between frames of the portion of flagged frames in the encoding of the video meet a threshold.
- 5 . The method of claim 1 , further comprising: generating the first search range and the second search range based on a pre-set distance from the first start frame or the first end frame, or based on a length of the cluster.
- 6 . The method of claim 1 , wherein determining the draft micro-chunk boundaries comprises: determining the draft micro-chunk boundaries based on a minimum micro-chunk duration.
- 7 . The method of claim 1 , wherein searching in the first search range and the second search range comprises: determining a first usage area of video buffer verifier usage that meets a threshold in the first search range, wherein the video buffer verifier usage is determined during the encoding of the video; and determining a second usage area of video buffer verifier usage that meets a threshold in the second search range.
- 8 . The method of claim 7 , wherein the first usage area and the second usage area have video buffer verifier usage that is lower than another area of the first search range or the second search range.
- 9 . The method of claim 1 , wherein searching in the first search range and the second search range comprises: analyzing scene changes to select the second start frame or the second end frame.
- 10 . The method of claim 9 , wherein analyzing scene changes comprises: calculating, for a scene change, an intra-scene distance for a scene based on the scene change and an inter-scene distance between scenes, wherein the intra-scene distance is a length of forward time in the video between the scene change and a next scene change, and the inter-scene distance is an average of distances to two scene change frames; and selecting the scene change as the second start frame or the second end frame based on the intra-scene distance and inter-scene distance.
- 11 . The method of claim 9 , wherein analyzing scene changes comprises: selecting the scene change as the second start frame or the second end frame based on a distance from multiple scene changes that are not within a threshold distance from the scene change.
- 12 . The method of claim 1 , wherein determining the encoding parameter set comprises: analyzing content of the micro-chunk to classify the micro-chunk in a pre-defined category, wherein the encoding parameter set that is associated with the pre-defined category is used for the micro-chunk.
- 13 . The method of claim 1 , wherein determining the encoding parameter set comprises: analyzing content of the micro-chunk to classify the micro-chunk in a pre-defined category that is based on an aspect of a content characteristic from a plurality of aspects of the content characteristic for the micro-chunk, wherein the encoding parameter set that is associated with the pre-defined category is used for the micro-chunk.
- 14 . The method of claim 1 , wherein determining the encoding parameter set comprises: analyzing content of the micro-chunk to classify the micro-chunk in a first pre-defined category; and analyzing content of the micro-chunk to classify the micro-chunk in a second pre-defined category that is based on an aspect of a content characteristic from a plurality of aspects of the content characteristic for the micro-chunk, wherein the encoding parameter set that is associated with the first pre-defined category or the second pre-defined category is used for the micro-chunk.
- 15 . The method of claim 1 , wherein determining the encoding parameter set comprises: analyzing frames of the micro-chunk to classify the micro-chunk in a plurality of pre-defined categories, wherein each of the plurality of pre-defined categories is associated with a pre-defined encoding parameter set; selecting a pre-defined category from the plurality of pre-defined categories; and using the pre-defined encoding parameter set as the encoding parameter set for the micro-chunk.
- 16 . The method of claim 1 , wherein determining the encoding parameter set comprises: using a continuous learning process to select the encoding parameter set, wherein the continuous learning process continually learns the encoding parameter set to use when encoding micro-chunks.
- 17 . The method of claim 16 , wherein the continuous learning process comprises: receiving a state of an encoder that is encoding micro-chunks; and generating a reward that is used to adjust the encoding parameter set.
- 18 . A non-transitory computer-readable storage medium having stored thereon computer executable instructions, which when executed by a computing device, cause the computing device to be operable for: analyzing flagged frames from a plurality of frames in an encoding of a video to form a cluster of frames, wherein the cluster includes a plurality of flagged frames; determining draft micro-chunk boundaries for the cluster of a first start frame and a first end frame in the encoding that includes the cluster; searching in a first search range before the first start frame and a second search range after the first end frame for a second start frame in the first search range and a second end frame in the second search range, wherein the second start frame and the second end frame form a micro-chunk; analyzing multiple frames based on the second start frame and the second end frame to determine content characteristics for the micro-chunk; determining an encoding parameter set for the micro-chunk formed by the second start frame and the second end frame based on mapping the content characteristics of the micro-chunk to the encoding parameter set; and using the encoding parameter set to encode the plurality of frames in the micro-chunk for insertion in the encoding of the video.
- 19 . An apparatus comprising: one or more computer processors; and a computer-readable storage medium comprising instructions for controlling the one or more computer processors to be operable for: analyzing flagged frames from a plurality of frames in an encoding of a video to form a cluster of frames, wherein the cluster includes a plurality of flagged frames; determining draft micro-chunk boundaries for the cluster of a first start frame and a first end frame in the encoding that includes the cluster; searching in a first search range before the first start frame and a second search range after the first end frame for a second start frame in the first search range and a second end frame in the second search range, wherein the second start frame and the second end frame form a micro-chunk; analyzing multiple frames based on the second start frame and the second end frame to determine content characteristics for the micro-chunk; determining an encoding parameter set for the micro-chunk formed by the second start frame and the second end frame based on mapping the content characteristics of the micro-chunk to the encoding parameter set; and using the encoding parameter set to encode the plurality of frames in the micro-chunk for insertion in the encoding of the video.
- 20 . The non-transitory computer-readable storage medium of claim 18 , wherein determining the encoding parameter set comprises: analyzing content of the micro-chunk to classify the micro-chunk in a pre-defined category, wherein the encoding parameter set that is associated with the pre-defined category is used for the micro-chunk.
Description
BACKGROUND A video delivery system may offer a large number of instances of content (e.g., videos) that can be delivered to client devices. The instances of content are encoded for delivery to clients. The video delivery system may use an encoding pipeline that includes an encoder, which uses a predefined set of encoding parameters to encode the content. That is, the values of the encoding parameters may be the same for all instances of content that are encoded by the encoding pipeline. Typically, the encoding parameters are optimized for the characteristics from the best-known common content that may be encoded by the pipeline. For example, the encoding parameters may be based on characteristics of action movies if that is considered the most common content being encoded. However, the video delivery system may have a vast library of content, which may include action movies, animated movies, television shows, nature shows, etc. A problem that is experienced is that for some content, or more specifically some parts of the same instance of content, the use of the same set of encoding parameters may produce a suboptimal encoding of the instance of content or portions of the instance of content. BRIEF DESCRIPTION OF THE DRAWINGS The included drawings are for illustrative purposes and serve only to provide examples of possible structures and operations for the disclosed inventive systems, apparatus, methods and computer program products. These drawings in no way limit any changes in form and detail that may be made by one skilled in the art without departing from the spirit and scope of the disclosed implementations. FIG. 1 depicts a simplified system for a video encoding pipeline according to some embodiments. FIG. 2 depicts an example of a video encoding with good frames and flagged frames according to some embodiments. FIG. 3 depicts a simplified flow chart of a method for determining draft micro-chunk boundaries according to some embodiments. FIG. 4 depicts a second part of determining micro-chunk boundaries according to some embodiments. FIG. 5 depicts an example of search ranges for adjusting micro-chunk boundaries according to some embodiments. FIG. 6 depicts an example of performing video buffering verifier (VBV) usage-based filtering according to some embodiments. FIG. 7 depicts an example of scene change-based filtering according to some embodiments. FIG. 8 depicts a first process for selecting an encoding parameter set according to some embodiments. FIG. 9 depicts an example of predefined encoding parameter set for different labels according to some embodiments. FIG. 10 depicts an example of a system for implementing a reinforcement learning process according to some embodiments. FIG. 11 depicts a video streaming system in communication with multiple client devices via one or more communication networks according to one embodiment. FIG. 12 depicts a diagrammatic view of an apparatus for viewing video content and advertisements. DETAILED DESCRIPTION Described herein are techniques for a video encoding system. In the following description, for purposes of explanation, numerous examples and specific details are set forth to provide a thorough understanding of some embodiments. Some embodiments as defined by the claims may include some or all the features in these examples alone or in combination with other features described below, and may further include modifications and equivalents of the features and concepts described herein. System Overview In some embodiments, a system may optimize encoding parameters for identified portions of an encoding for an instance of content. The following uses a video as an example of the instance of content that is being encoded, but other types of content may be appreciated, such as audio, or other content. A quality control process may identify problematic parts of the encoding that do not meet a threshold of a quality metric. These frames may be referred to as flagged frames or “bad” frames within the video. The number of flagged frames may be in the minority compared to the number of good frames that meet the quality metric threshold, but there is no guarantee as to where the flagged frames may arise prior to the encoding, and also how the flagged frames might be clustered through the video. For example, the flagged frames may last a few seconds or the flagged frames may appear for a fraction of a second. To address the problematic portions of the encoding, one solution may be to re-encode the entire video with different encoding parameters. However, this solution may have disadvantages. For example, the re-encoding of the entire video may waste computing resources. As the encoding may include a majority of good frames, re-encoding the frames that were good frames may not be needed and wastes computing resources. Also, the quality control process may also need to be re-performed on the entire encoding, which wastes additional computing resources. Further, the re-enco