US-20260129139-A1 - CONTEXT-AWARE VOICE CONTROL OF LIVE VIDEO PRODUCTION
Abstract
For context-aware voice control of live video production, a stream of speech that is related to a live video production is received, during that live video production. A control output to change an aspect of the live video production is provided based on a trigger element in the stream of speech and also a context of the live video production at a time of receipt of the trigger element in the stream of speech.
Inventors
- Robert CORDLE, III
- Troy English
- Wojciech Marek TRYC
Assignees
- ROSS VIDEO LIMITED
Dates
- Publication Date
- 20260507
- Application Date
- 20251104
Claims (20)
- 1 . Video production control equipment comprising: an interface to receive, during a live video production, a stream of speech that is related to the live video production; a controller, coupled to the interface, to provide a control output to change an aspect of the live video production based on a trigger element in the stream of speech and a context of the live video production at a time of receipt of the trigger element in the stream of speech.
- 2 . The video production control equipment of claim 1 , wherein the stream of speech comprises an audio input of the live video production.
- 3 . (canceled)
- 4 . The video production control equipment of claim 1 , wherein the controller is configured to monitor the stream of speech for occurrence of any of a plurality of trigger elements in the stream of speech.
- 5 . (canceled)
- 6 . The video production control equipment of claim 4 , further comprising: a memory, coupled to the controller, storing the plurality of trigger elements; one or more interfaces, coupled to the memory, to enable updating of the trigger elements.
- 7 . The video production control equipment of claim 1 , wherein the controller is configured to match the trigger element to one or more of a plurality of control actions.
- 8 . (canceled)
- 9 . The video production control equipment of claim 7 , further comprising: a memory, coupled to the controller, storing the plurality of control actions; one or more interfaces, coupled to the memory, to enable updating of the control actions.
- 10 . The video production control equipment of claim 7 , wherein the controller is configured to provide the control output based on relevance of the one or more control actions to the context of the live video production.
- 11 . (canceled)
- 12 . The video production control equipment of claim 10 , wherein the relevance is determined based on configurable relevance determination parameters.
- 13 . The video production control equipment of claim 1 , wherein the controller is configured to track context of the live video production.
- 14 - 15 . (canceled)
- 16 . The video production control equipment of claim 13 , further comprising: one or more interfaces to enable updating of the context of the live video production.
- 17 - 18 . (canceled)
- 19 . A video production system comprising: the video production control equipment of claim 1 ; and video production equipment, coupled to the video production control equipment, to provide a live video production output of the live video production.
- 20 . A method comprising: receiving, during a live video production, a stream of speech that is related to the live video production; providing a control output to change an aspect of the live video production based on a trigger element in the stream of speech and a context of the live video production at a time of receipt of the trigger element in the stream of speech.
- 21 . The method of claim 20 , wherein the stream of speech comprises an audio input of the live video production.
- 22 . (canceled)
Description
CROSS-REFERENCE TO RELATED APPLICATION The present application is related to, and claims the benefit of, U.S. provisional patent application Ser. No. 63/715,968, entitled “CONTEXT-AWARE VOICE CONTROL OF LIVE VIDEO PRODUCTION”, filed on Nov. 4, 2024, the entire contents of which are hereby incorporated by reference. FIELD The present disclosure relates generally to equipment and methods for live video production control, and in particular, to context-aware voice control of live video production. BACKGROUND Realtime control responsiveness is especially important in media applications such as live video productions. Delays in changing content during a live video broadcast, for example, are quite noticeable when on-air commentary becomes out of sync with video or graphics that are displayed. In a live news broadcast, for example, production operators may need to schedule or anticipate live video production changes to reduce delays between when certain content is needed and when that content is available for output. Pre-scheduling may be effective as long as production flow remains on schedule and there are no unexpected developments, but this is rarely the case in live video production. In a live football sportscast, for example, it is impossible to predict a team or participant that may score or where (locally or otherwise) other developments that may be of interest may take place. In this example, when focus is to shift to a scoring team or player or to a different location at which developments may be of interest, production staff have to identify, locate, and deploy appropriate content, which takes time and can result in noticeable delay during a live broadcast. In a manual control scenario, a production crew is responsible for production control, which inherently involves delays as a crew member determines a control action that is to be taken and initiates that action. To the extent that some level of control automation is available, in the case of ambiguity in an input such as a county name that is used in multiple states, either the ambiguity must be resolved by operator intervention or the ambiguity causes an error by initiating multiple competing actions or not initiating any action, all of which result in delay. There remains a need for more responsive control of live video production. SUMMARY Embodiments disclosed herein may enable realtime, context-aware control of a live video production or production environment, via voice control. In some embodiments, speech is parsed and monitored to identify certain keywords or commands, and a live production is controlled based on not only an identified keyword or command, but also the context of the production when an identified keyword or command was spoken. This type of control can significantly reduce or avoid noticeable delay between a time at which content is needed and a time at which that content can be made available, thereby providing substantial improvements in live video production control and quality. Context-aware voice control as disclosed herein may facilitate dynamic adjustments in live broadcasts, for example, and/or in other live production scenarios. A context-aware approach to voice control may be particularly advantageous in managing ambiguous voice commands. Such ambiguity is common in live production scenarios. In fast-paced environments such as sports broadcasting, where rapid transitions and real-time reactions are preferred, voice control systems may struggle to distinguish between commands with similar or overlapping keywords such as “Tigers” referring to different sports teams or “Washington” referring to various geographic locations. By incorporating contextual analysis as disclosed herein, such as active geographic focus, visual content currently displayed, or time-based cues, ambiguities may be effectively resolved without manual intervention. This may enable more accurate, instantaneous command execution, and allow production teams to operate smoothly even under unpredictable, high-pressure conditions. Such context-awareness may help ensure that only relevant control actions are triggered, thereby potentially reducing latency and enhancing precision of live video production control. One aspect of the present disclosure relates to video production control equipment that includes an interface and a controller. The interface is to receive, during a live video production, a stream of speech that is related to the live video production. The controller is coupled to the interface, to provide a control output to change an aspect of the live video production, based on a trigger element in the stream of speech and a context of the live video production at a time of receipt of the trigger element in the stream of speech. Another aspect of the present disclosure relates to a method that involves: receiving, during a live video production, a stream of speech that is related to the live video production; and providing a control output to ch