US-12627867-B2 - Systems and methods for preserving consistent picture quality during live streaming of hybrid content

US12627867B2US 12627867 B2US12627867 B2US 12627867B2US-12627867-B2

Abstract

Systems and methods are provided herein for providing consistent picture quality during live or non-live streaming of hybrid content. For example, a camera may capture a first piece of hybrid content depicting a scene. The scene may include one or more real objects (e.g., a first person conducting an interview) and a piece of content (e.g., a stream of a second person being interviewed) being displayed on a screen (e.g., television screen). A system may receive the first piece of hybrid content and the source of the piece of content being displayed within the scene. The system may then insert the source of the piece of content into the first piece of content, replacing the depiction of the piece of content, to generate a second piece of hybrid content. The second piece of hybrid content may then be transmitted to one or more devices.

Inventors

Tao Chen
Ning Xu

Assignees

ADEIA GUIDES INC.

Dates

Publication Date: 20260512
Application Date: 20241001

Claims (20)

1 . A method comprising: generating a first piece of content, wherein the first piece of content comprises a depiction of a second piece of content being displayed on a display; receiving a source video of the second piece of content; determining a position of the display within the first piece of content; synchronizing the source video with the depiction of the second piece of content displayed within the first piece of content; generating a third piece of content by combining the first piece of content and the synchronized source video, wherein the synchronized source video is inserted into the position of the display within the first piece of content replacing the depiction of the second piece of content; and transmitting the third piece of content to one or more devices.
2 . The method of claim 1 , wherein the first piece of content is generated by capturing a first scene comprising the display displaying the second piece of content.
3 . The method of claim 1 , wherein determining the position of the display comprises detecting the display within the first piece of content using one or more of an object detection algorithm, corner detection, and edge detection.
4 . The method of claim 1 , wherein determining the position of the display comprises receiving a first input, wherein the first input indicates the position of the display.
5 . The method of claim 1 , further comprising cropping the synchronized source video before the synchronized source video is inserted into the position of the display within the first piece of content.
6 . The method of claim 1 , further comprising: identifying a first set of frames from the depiction of the second piece of content displayed within the first piece of content; identifying a second set of frames from the source video; matching one or more frames of the first set of frames with one or more frames of the second set of frames; and synchronizing the source video with the depiction of the second piece of content displayed within the first piece of content using the matched one or more frames of the first set of frames to generate the synchronized source video.
7 . The method of claim 1 , wherein a first device comprises the display depicted in the first piece of content, and the source video of the second piece of content is received from the first device.
8 . An apparatus comprising: control circuitry; and at least one memory including computer program code for one or more programs, the at least one memory and the computer program code configured to, with the control circuitry, cause the apparatus to perform at least the following: generate a first piece of content, wherein the first piece of content comprises a depiction of a second piece of content being displayed on a display; receive a source video of the second piece of content; determine a position of the display within the first piece of content; synchronize the source video with the depiction of the second piece of content displayed within the first piece of content; generate a third piece of content by combining the first piece of content and the synchronized source video, wherein the synchronized source video is inserted into the position of the display within the first piece of content replacing the depiction of the second piece of content; and transmit the third piece of content to one or more devices.
9 . The apparatus of claim 8 , wherein the first piece of content is generated by capturing a first scene comprising the display displaying the second piece of content.
10 . The apparatus of claim 8 , wherein the apparatus is further caused, when determining the position of the display, to detect the display within the first piece of content using one or more of an object detection algorithm, corner detection, and edge detection.
11 . The apparatus of claim 8 , wherein the apparatus is further caused, when determining the position of the display, to receive a first input, wherein the first input indicates the position of the display.
12 . The apparatus of claim 8 , wherein the apparatus is further caused to crop the synchronized source video before the synchronized source video is inserted into the position of the display within the first piece of content.
13 . The apparatus of claim 8 , wherein the apparatus is further caused to: identify a first set of frames from the depiction of the second piece of content displayed within the first piece of content; identify a second set of frames from the source video; match one or more frames of the first set of frames with one or more frames of the second set of frames; and synchronize the source video with the depiction of the second piece of content displayed within the first piece of content using the matched one or more frames of the first set of frames to generate the synchronized source video.
14 . The apparatus of claim 8 , wherein the apparatus is further caused to receive the source video of the second piece of content from a first device, wherein the first device comprises the display depicted in the first piece of content.
15 . A non-transitory computer-readable medium having instructions encoded thereon that, when executed by control circuitry, cause the control circuitry to: generate a first piece of content, wherein the first piece of content comprises a depiction of a second piece of content being displayed on a display; receive a source video of the second piece of content; determine a position of the display within the first piece of content; synchronize the source video with the depiction of the second piece of content displayed within the first piece of content; generate a third piece of content by combining the first piece of content and the synchronized source video, wherein the synchronized source video is inserted into the position of the display within the first piece of content replacing the depiction of the second piece of content; and transmit the third piece of content to one or more devices.
16 . The non-transitory computer-readable medium of claim 15 , wherein the first piece of content is generated by capturing a first scene comprising the display displaying the second piece of content.
17 . The non-transitory computer-readable medium of claim 15 , wherein the control circuitry is further caused, when determining the position of the display, to detect the display within the first piece of content using one or more of an object detection algorithm, corner detection, and edge detection.
18 . The non-transitory computer-readable medium of claim 15 , wherein the control circuitry is further caused, when determining the position of the display, to receive a first input, wherein the first input indicates the position of the display.
19 . The non-transitory computer-readable medium of claim 15 , wherein the control circuitry is further caused to crop the synchronized source video before the synchronized source video is inserted into the position of the display within the first piece of content.
20 . The non-transitory computer-readable medium of claim 15 , wherein the control circuitry is further caused to: identify a first set of frames from the depiction of the second piece of content displayed within the first piece of content; identify a second set of frames from the source video; match one or more frames of the first set of frames with one or more frames of the second set of frames; and synchronize the source video with the depiction of the second piece of content displayed within the first piece of content using the matched one or more frames of the first set of frames to generate the synchronized source video.

Description

CROSS-REFERENCE TO RELATED APPLICATION This patent application is a continuation of U.S. patent application Ser. No. 18/213,437, filed Jun. 23, 2023, the disclosure of which is hereby incorporated by reference herein in its entirety. BACKGROUND The present disclosure relates to the delivery of content, and in particular to techniques for optimizing the picture quality of the delivered content. SUMMARY Streaming content has become increasingly popular, providing a convenient and immersive way to access entertainment, information, and/or educational materials. With an increase in streaming comes an increase in streaming of hybrid content. Hybrid content may refer to content comprising scenes with both real objects and displayed objects. For example, a scene may include a real person giving a seminar, where a graphic is displayed using a computer monitor next to the person. In another example, a scene may include a real person virtually interviewing a second person, where the second person is displayed on a television screen next to the real person. However, traditional streaming technologies often provide less than optimal picture quality of these types of scenes. This stems, in part, from cameras having a lower dynamic range than the human eye. Accordingly, when cameras capture a scene comprising hybrid content, the resulting hybrid content may be overexposed in some areas and/or underexposed in other areas. Traditional capture production in streaming technologies may attempt to combat this problem by manipulating the exposure settings of the cameras. However, adjusting a camera's exposure settings to optimize the brightness of a screen (e.g., computer monitor displaying a graphic) in the scene often results in other portions of the scene being underexposed. Adjusting the camera's exposure settings to optimize the brightness of a real object (e.g., person giving the seminar) in the scene often results in the screen displaying an object being overexposed. In view of these deficiencies, there exists a need for improved systems and methods for streaming hybrid content with consistent picture quality. Accordingly, techniques are disclosed herein for providing consistent picture quality during live or non-live streaming of hybrid content. For example, a first piece of hybrid content may depict a scene. The scene may include one or more real objects (e.g., a first person conducting an interview) and a piece of content (e.g., a stream of a second person being interviewed) being displayed on a screen (e.g., television screen). The first piece of hybrid content may be captured by one or more cameras and then processed at a first device. The first device may determine the position of the depiction of the television screen in the first piece of hybrid content. For example, the first device may use one or more object detection algorithms, corner detection algorithms, edge detection algorithms, and/or user inputs to identify where the depiction of the television screen is located within the first piece of hybrid content. The first device may also determine a first set of features corresponding to the depiction of the piece of content (e.g., the stream of the second person being interviewed) displayed within the first piece of hybrid content. For example, the first device may perform feature extraction on the depiction of the stream of the second person being interviewed that is displayed on the television screen within the first piece of hybrid content to determine the first set of features. The first device may also receive a source video corresponding to the piece of content depicted in the first piece of hybrid content. For example, the source video of the stream of the second person being interviewed may be transmitted to the first device and may also be transmitted to the television screen depicted in the first piece of hybrid content. The first device may determine a second set of features corresponding to the received source video. For example, the first device may perform feature extraction on the source video of the stream of the second person being interviewed to determine the second set of features. The first device may then determine a geometric transformation (e.g., affine transformation) using the first set of features and the second set of features. The first device may then modify the source video using the determined geometric transformation. In some embodiments, the features of the modified source video match the first set of features corresponding to the depiction of the stream of the second person being interviewed within the first piece of hybrid content. The first device may then synchronize the modified source video with the depiction of the piece of content displayed within the first piece of hybrid content. For example, the first device may change the playback speed, frame rate, and/or playback point of the modified source video to match the playback speed, frame rate, and/or playback point of th