BR-122025009369-A2 - SYSTEMS AND METHODS FOR PROVIDING PROGRAMMABLE MEDIA COMMUNICATION SERVICES
Abstract
Systems and methods for providing real-time media communication services using a server-resident software application that receives media sources from multiple sending participants and generates a single composite media source that includes media sources from the sending participants and sends the composite media source to other computing services for various purposes such as recording, redistribution, and/or retransmission to remote computing devices of multiple real-time media communication participants. The composite media source may include supplementary information in addition to the media sources from the live participants. This supplementary information is provided through API-configurable programmatic code that is then executed and used as a server-resident software application.
Inventors
- ESCODA OSCAR, DIVORRA
- O'CONNELL JOHN, G.
- ROSSI CAIO DE MELLO
- TALUSANI RAJKIRAN
- POH YEE, HUI
- SHOUSHAN ROY, BEN
Assignees
- VONAGE BUSINESS INC
Dates
- Publication Date
- 20260310
- Application Date
- 20220513
- Priority Date
- 20210514
Claims (11)
- 1. Apparatus for providing a composite media stream for publication to a plurality of presenters and participants who are part of a communication session by means of interconnected computing devices (210, 212, 214, 216), characterized by comprising: an API interface unit (330, 1102) configured to receive instructions relating to programmatic control over the rendering and publication of a composite media stream; a rendering engine control unit (1104) configured to control the rendering and publication of the composite media stream based on information received from the API interface unit (330, 1102); a rendering engine unit (1106) configured to process one or more media streams received from the interconnected computing devices (210, 212, 214, 216) of the communication session and supplementary information, and configured to generate the composite media stream; a composite media sending unit configured to cause the composite media stream to be published to the participants of the communication session; wherein the rendering engine control unit (1104) provides a URL to the rendering engine unit (1106) and wherein the rendering engine unit (1106) is configured to use the provided URL to access and execute computer instructions configured to generate the composite media stream.
- 2. Device according to claim 1, characterized in that the rendering engine unit (1106) is further configured to use the provided URL to obtain one or more media streams that will be used to generate the composite media stream.
- 3. Device according to claim 2, characterized in that the rendering engine unit (1106) is also configured to use the provided URL to obtain supplementary information that will be used to generate the composite media stream.
- 4. Apparatus, according to claim 1, characterized in that the rendering engine unit (1106) is further configured to output audio and video streams to the composite media stream for a virtual audio device unit (1110) and a virtual display server unit (1108), respectively.
- 5. Device according to claim 4, characterized in that the composite media sending unit reads data from the virtual audio device unit (1110) and the virtual display server unit (1108) and uses the data read to cause the composite media stream to be published to the participants in the communication session.
- 6. Apparatus for providing a composite media stream for publication to a plurality of presenters and participants who are part of a communication session by means of interconnected computing devices (210, 212, 214, 216), characterized by comprising: an API interface (330, 1102) that includes means for receiving instructions related to programmatic control over the rendering and publication of a composite media stream; means for controlling the rendering and publication of the composite media stream based on information received from the API interface unit (330, 1102); means for processing one or more media streams received from the interconnected computing devices (210, 212, 214, 216) of the communication session and supplementary information, and for generating the composite media stream; means to cause the composite media stream to be published to the participants of the communication session; wherein the means to control the rendering and publication of the composite media stream provide a URL to the means to process one or more media streams, and wherein the means to process one or more media streams are configured to use the provided URL to access and execute computer instructions configured to generate the composite media stream.
- 7. Non-transient computer-readable medium characterized by containing instructions that, when implemented by one or more processors (810) of a computing device (210, 212, 214, 216), cause the computing device (210, 212, 214, 216) to execute a method comprising: receiving, through an API interface, instructions related to programmatic control over the rendering and publication of a composite media stream; processing one or more media streams received from the interconnected computing devices (210, 212, 214, 216) of the communication session and supplementary information based on the information received through the API interface to generate the composite media stream; and causing the composite media stream to be published to the participants of the communication session; wherein the processing of one or more media streams comprises the use of a provided URL to access and execute computer instructions configured to generate the composite media stream.
- 8. Non-transient, computer-readable medium, according to claim 7, characterized in that the processing step further comprises the use of the provided URL to obtain one or more media streams that will be used to generate the composite media stream.
- 9. Non-transient, computer-readable medium, according to claim 8, characterized in that the processing step further comprises the use of the provided URL to obtain supplementary information that will be used to generate the composite media stream.
- 10. Non-transient computer-readable medium, according to claim 7, characterized in that the method further comprises: the output of an audio stream to the composite media stream for a virtual audio device unit (1110); and the output of a video stream to the composite media stream for a virtual display server unit (1108).
- 11. Non-transient computer-readable medium according to claim 10, characterized in that causing the composite media stream to be published comprises reading data from the virtual audio device unit (1110) and the virtual display server unit (1108) and using the read data to cause the composite media stream to be published to the participants in the communication session.
Description
REFERENCE TO PREVIOUS ORDERS [0001] This is a split order from BR 11 2023 023918 4 of 11/14/2023. BACKGROUND OF THE INVENTION [0002] Figures 1A-1C illustrate typical videoconferencing and/or real-time media communication environments in which multiple sending connections (which may be presenters, participants, or any other media source) 102, 104, 106 provide video sources 102A, 104A, 106A to a server 108 that coordinates media transmission. The server 108 could be a multipoint control unit (MCU) that connects different endpoints (e.g., receiving participants and/or sending presenters, or any combination of sending and/or receiving endpoints) providing appropriate media streams to all receiving endpoints. An MCU can perform video mixing, transcoding, security functions, as well as a variety of other services. Most modern video communication services use a Selective Forwarding Unit (SFU) as a server, where media is selectively, packet by packet, forwarded from sending connections to receiving connections, so there is no need for mixing and/or transcoding. The selection of packets for forwarding is based on a variety of intelligent server strategies and decisions to handle user experience quality, quality of service, operating cost, network usage, etc., or multiple network connections to the server. SFUs provide complete flexibility to receive endpoints to process individual media streams and organize them as needed individually, since each media stream is preserved independently (unlike MCUs). [0003] As illustrated in Figure 1A, in a typical state-of-the-art real-time media communications environment, the service provides all individual media streams 102A, 104A, 106A to each individual participant and/or terminal 110, 112, 114. This allows each terminal 110, 112, 114 to arrange the media streams from the sending terminals in whatever way is considered most useful or convenient for a receiving participant and/or terminal. This is common in real-time media communications, such as videoconferencing, or even in near real-time communications, where the media sources (e.g., video) of each presenter and/or sender 102, 104, 106 are essentially viewed immediately by at least one receiving participant and/or terminal. This ensures so-called interactive media communications. In a conference setting, it allows participants to provide timely feedback, such as asking questions of the presenting participants, and intervening as needed, so that the videoconference resembles a face-to-face meeting. Figures 1B and 1C show how a terminal typically hosts the sender and receiver. Any order or combination of senders and receivers in a given terminal application could be considered. Figures 1B and 1C represent examples of a real-time communications system for the specific purpose of a videoconferencing use case. At the same time, real-time media communications and the scope of this disclosure encompass any other uses, including real-time media communications applications such as, but not limited to, social networking, e-learning, e-health, webinars, etc. Furthermore, they can use any type of communication topology involving any combination of low-delay interactive transmissions and delay-tolerant media transmissions, with diverse topologies of sending and receiving participants, from one or a few sending participants to a very large number of receiving participants (e.g., 1:N or M:N, where N>M and even N>>M), to a very large number of sending participants and a few selective receiving participants (M:N where M>N and even M>>N), while also balancing N:N participant sessions, where the number of sending and receiving participants is equal. The sending and receiving participants can be the same participants or distinct participants. The participant terminals may or may not be used by a subject. A terminal can be just a programmatic application and/or destination service, or any other agent that consumes and/or generates media with its application for any purpose. [0004] The price of the flexibility and scalability provided by an SFU 108 server is that it must send all media sources to all sending participants 102, 104, 106 and to each receiving participant. This places an increased processing cost on the terminals, as well as on network usage. At the same time, it places a much lower processing load on the 108 server than an MCU would imply. On the other hand, if it is possible to forgo flexibility, the cost of composing the receiving views can be centralized at a single processing point as in MCUs (Figure 1C), having a much higher cost on the server and saving communication and terminal resources. In addition to the server cost, the problems of the MCU and the mixing and transcoding usually embedded in them are the lack of flexibility translated into a limited set of media composition and mixing presets and configurability available for choice. [0005] It is common in different real-time media communication systems to have services th