CN-120283395-B - Video processing system and video processing method

CN120283395BCN 120283395 BCN120283395 BCN 120283395BCN-120283395-B

Abstract

Systems and methods for video processing may be provided that generate 3D streaming data via an API. The 3D streaming session may be delivered at the client device with low latency and high throughput capacity. After authenticating the client device, a server may be selected to support the encapsulated 3D streaming engine to initiate the session. Library files may be injected into executable files of the packaged 3D streaming engine to generate a digital representation of the interactive 3D environment. The interactive 3D environment may be rendered, encoded, and streamed to the client device via a GPU. The 3D streaming session may be adjusted via the client device and maintained until terminated at the client device. The number of virtual compute instances of the encapsulated 3D streaming engine provided by the server may be dynamically adjusted based on metrics and machine learning predictions of the connected client devices.

Inventors

GAFNER VERENA
S. Koche
ADAMS ANGELIQUE

Assignees

蒙基威有限公司

Dates

Publication Date: 20260508
Application Date: 20230929
Priority Date: 20220930

Claims (20)

1. A video processing system, the video processing system comprising: a plurality of servers operating in a peer-to-peer communication network via a web page real-time communication WebRTC application programming interface API to generate 3D streaming data; the plurality of servers respond to the session handler receiving a request for a 3D streaming session at the first client device by: (i) Receiving a context verification regarding the first client device from the session handling program in communication with the first client device, the context verification including an internet protocol, IP, location of the first client device; (ii) Responding to the context verification with respect to the first client device by assigning a first server of the plurality of servers to direct a 3D streaming session at the first client device based on the internet protocol, IP, location of the first client device; (iii) Directing a first instance of an encapsulated 3D streaming engine at the first server to initiate the 3D streaming session at the first client device via the WebRTC application programming interface API; (iv) Injecting a library file of the WebRTC application programming interface API into an executable file of the packaged 3D streaming engine to generate a digital representation of an interactive 3D environment for the 3D streaming session at the first client device; (v) Rendering the digital representation of the interactive 3D environment within a buffer of a graphics processing unit GPU; (vi) An encoder encodes the rendered representation for transmission of the 3D streaming session to the first client device; (vii) Streaming, at the first client device, the encoded representation of the interactive 3D environment; (viii) Dynamically adjusting the 3D streaming session of the interactive 3D environment in response to input received from the first client device, and (Ix) Maintaining the digital representation of the interactive 3D environment of the 3D streaming session by iteratively processing (v) through (viii) until the 3D streaming session is terminated at the first client device.
2. The video processing system of claim 1, wherein the first server is configured to establish a predetermined number of virtual compute instances of the 3D streaming session with one or more client devices, and The first server is configured to automatically adjust a number of virtual compute instances provided based on a change in a number of the one or more client devices, wherein at least one of the one or more client devices is the first client device.
3. The video processing system of claim 2, wherein at least one of the plurality of servers is configured to respond to a request for a 3D streaming session from the one or more client devices by assigning a server of the plurality of servers, wherein the first server is one of the plurality of servers.
4. The video processing system of claim 3, wherein at least one of the plurality of servers performs a computational analysis to determine an appropriate number of 3D streaming sessions for the plurality of client devices that are adjusted based on the predetermined number of virtual compute instances.
5. The video processing system of claim 4, wherein the computed analytics include concurrent user CCU, session time, day-active user DAU, month-active user MAU, and session.
6. The video processing system of claim 3, wherein the packaged 3D streaming engine configures the 3D streaming session as a media streaming cluster such that at least one of the plurality of servers performs multicast transmission of data packets containing at least a portion of the digital representation of the interactive 3D environment to the one or more client devices.
7. The video processing system of claim 6, wherein dynamically adjusting the 3D streaming session of the interactive 3D environment in response to input received from the first client device further comprises: The packaged 3D streaming engine is configured to capture events associated with the interactive 3D environment, the events detected by one or more listeners at one or more of the one or more client devices; the packaged 3D streaming engine is configured to capture metadata, consumption information, and interaction data during the 3D streaming session at one or more of the one or more client devices; The packaged 3D streaming engine is configured to respond to the captured event, metadata, consumption information, and interaction data by redrawing the interactive 3D environment at the one or more client devices to maintain the digital representation of the interactive 3D environment.
8. The video processing system of claim 7, wherein the packaged 3D streaming engine maintains the digital representation of the interactive 3D environment by: Computing composite 3D media data based on the captured event, metadata, consumption information, and interaction data; Defining a composite image layout based on attributes derived from the 3D streaming session at the one or more client devices Configuring the 3D streaming session to provide a composite media signal according to a defined composite image layout, and The composite media signal is transmitted in data packets to the one or more client devices via a packetizer.
9. The video processing system of claim 6, further comprising the packaged 3D streaming engine interfacing with the one or more client devices to facilitate control communication interfaces to enable voice, text, and video transmissions among the one or more client devices in a packet-switched communication system of a peer-to-peer network.
10. The video processing system of claim 7, wherein the packaged 3D streaming engine is configured to train an interactive frame prediction model for the digital representation of the interactive 3D environment based on the encoded streaming data of the 3D streaming session and the captured events, the metadata, the interaction data, and the consumption information from the one or more client devices.
11. The video processing system of claim 10, wherein the encoder is configured to encode the 3D streaming session by selectively using predicted frames generated based on a trained interactive frame prediction model and transmitting the trained interactive frame prediction model and the encoded streaming data to the first client device and a second client device to create the digital representation of the interactive 3D environment, and The first client device and the second client device are configured to receive the trained interactive frame prediction model and the encoded stream data, and decode the encoded stream data based on the trained interactive frame prediction model to create the digital representation of the interactive 3D environment.
12. The video processing system of claim 1, wherein the packaged 3D stream engine is configured as a software container containing bindings of software components having configuration files, libraries, and dependent items required for execution, wherein the packaged 3D stream engine is not a software plug-in.
13. The video processing system of claim 12, wherein the software component includes at least one of the library file of the WebRTC application programming interface API, an event manager, a scene manager, a resource manager, a session manager, a physical manager, and an artificial intelligence system, and Wherein the WebRTC application programming interface API includes one or more WebRTC application programming interface API functions configured to call one or more of the software components.
14. The video processing system of claim 12, wherein the encapsulated 3D streaming engine is deployed as a virtual machine via secure sockets layer SSL, the virtual machine being configured in the software container; Wherein the input is received through a data channel established using a real-time communication data channel RTCDATACHANNEL application programming interface API.
15. A video processing method, the video processing method comprising: the server computer system is configured to respond to the session handler receiving a request for a 3D streaming session at the first client device by: (i) Receiving a context verification regarding the first client device from the session handling program in communication with the first client device, the context verification including an internet protocol, IP, location of the first client device; (ii) Directing a 3D streaming session at the first client device in response to the contextual verification with respect to the first client device by assigning a first server of a plurality of servers of the server computer system based on the internet protocol, IP, location of the first client device, wherein the plurality of servers operate in a peer-to-peer communication network via a web page real-time communication WebRTC application programming interface, API, to generate 3D streaming data; (iii) Directing a first instance of an encapsulated 3D streaming engine at the first server to initiate the 3D streaming session at the first client device via the WebRTC application programming interface API; (iv) Injecting a library file of the WebRTC application programming interface API into an executable file of the packaged 3D streaming engine to generate a digital representation of an interactive 3D environment for the 3D streaming session at the first client device; (v) Rendering the digital representation of the interactive 3D environment within a buffer of a graphics processing unit GPU; (vi) Encoding, via an encoder, the rendered representation for transmission of the 3D streaming session to the first client device; (vii) Streaming, at the first client device, the encoded representation of the interactive 3D environment; (viii) Dynamically adjusting the 3D streaming session of the interactive 3D environment in response to input received from the first client device, and (Ix) Maintaining the digital representation of the interactive 3D environment of the 3D streaming session by iteratively processing (v) through (viii) until the 3D streaming session is terminated at the first client device.
16. The video processing method of claim 15, further comprising configuring the first server to establish a predetermined number of virtual compute instances of the 3D streaming session with one or more client devices, and The first server is configured to automatically adjust the provided number of virtual compute instances based on a change in the number of the one or more client devices, wherein at least one of the one or more client devices is the first client device.
17. The video processing method of claim 16, further comprising configuring at least one of the plurality of servers to respond to a request for a 3D streaming session from the one or more client devices by assigning a server of the plurality of servers to respond to a corresponding request for a 3D streaming session, wherein the first server is one of the plurality of servers.
18. The video processing method of claim 17, further comprising configuring at least one of the plurality of servers to perform a computational analysis to determine an appropriate number of 3D streaming sessions for the plurality of client devices adjusted based on the predetermined number of virtual compute instances.
19. The video processing method of claim 18, further comprising configuring the computed analytics to include concurrent user CCU, session time, day-active user DAU, month-active user MAU, and session.
20. The video processing method of claim 17, further comprising configuring the 3D streaming session as a media streaming cluster such that at least one of the plurality of servers performs multicast transmission of data packets containing at least a portion of the digital representation of the interactive 3D environment to the one or more client devices.

Description

Video processing system and video processing method Related application The present application claims the benefit of U.S. provisional application No. 63/377,986, filed at 9/30 of 2022. The entire teachings of the above application are incorporated herein by reference. Background Streaming video applications have evolved in a recent new dimension in an attempt to provide a more immersive experience for users. As virtual alternatives to face-to-face interactions continue to evolve, users are provided with opportunities to connect safely and reliably across long geographic distances. Disclosure of Invention Embodiments of the present disclosure provide techniques for streaming interactive video and audio content of interest to a user. Streaming technology provides users with a high quality, low latency video and audio streaming experience. Graphics Processing Unit (GPU) instances are configured to directly host services that support streaming content, facilitating implementation of the method without middleware or plug-ins. Embodiments of the present invention relate to systems and methods for video processing and for streaming processed media content to client devices of these users. In some embodiments, a video processing system includes a plurality of servers operating in a peer-to-peer communication network via a WebRTC Application Programming Interface (API) to generate 3D streaming data. The plurality of servers may receive a request for a 3D streaming session at the first client device for the session handling program in part by receiving a context verification for the first client device from the session handling program in communication with the first client device, the context verification including an Internet Protocol (IP) location of the first client device. The plurality of servers may respond to the context verification with respect to the first client device by assigning the first server of the plurality based on the IP location of the first client device to direct the 3D streaming session at the first client device. Continuing, the plurality of servers may direct a first instance of the packaged 3D streaming engine at the first server to initiate a 3D streaming session at the first client device via the WebRTC API. The plurality of servers may inject library files of the WebRTC API into executable files of the packaged 3D streaming engine to generate a digital representation of the interactive 3D environment for the 3D streaming session at the first client device. The plurality of servers may continue to render the digital representation of the interactive 3D environment within the buffer of the GPU, encode the rendered representation for transmission of the 3D streaming session to the first client device, and stream the encoded representation of the interactive 3D environment at the first client device. The plurality of servers may dynamically adjust the 3D streaming session of the interactive 3D environment in response to input received from the first client device. The plurality of servers may maintain a digital representation of the interactive 3D environment of the 3D streaming session by iteratively performing the processes involving rendering, encoding, streaming, and adjusting described above until the 3D streaming session is terminated at the first client device. The first server may be configured to establish a predetermined number of Virtual Computing Instances (VCIs) of the 3D streaming session with the one or more client devices, and the first server may be configured to automatically adjust the number of VCIs provided based on a change in the number of client devices. At least one of the one or more client devices may be a first client device. At least one of the plurality of servers may be configured to respond to a request for a 3D streaming session from the client device by assigning a server of the plurality of servers to respond to a corresponding request for the 3D streaming session. The first server may be one of a plurality of servers. At least one of the plurality of servers may perform a computational analysis to determine an appropriate number of 3D streaming sessions for the plurality of client devices that are adjusted based on the predetermined number of VCIs. The calculated analysis may include concurrent users (CCU), session times, day Active Users (DAU), month Active Users (MAU), and sessions. The encapsulated 3D streaming engine may configure the 3D streaming session as a video streaming cluster such that at least one of the plurality of servers performs multicast transmission of data packets containing at least part of the digital representation of the interactive 3D environment to one or more client devices. Dynamically adjusting the 3D streaming session of the interactive 3D environment in response to the input received from the first client device may further include the packaged 3D streaming engine being configured to capture events associated with the int