EP-4664855-B1 - METHOD, DEVICE AND NON-TRANSITORY COMPUTER-READABLE STORAGE MEDIUM FOR CONTROLLING REAL TIME VIDEO STREAMING
Inventors
- PETTERSSON, MATTIAS
- JONSSON, AXEL
Dates
- Publication Date
- 20260513
- Application Date
- 20240610
Claims (15)
- A method for controlling real time video streaming, the method comprising: providing (S802) a dataset structure for transmission of video data, wherein a dataset generated according to the dataset structure in a first configuration comprises encoded video data (306) with a first encoding bitrate and padding data (304) having a first size; generating (S804) first datasets (302) according to the dataset structure in the first configuration, and transmitting the first datasets as a data stream via a communication channel (112); during the transmission of the first datasets, receiving a first metric indicating a level of network congestion of the communication channel, and determining (S806) from the received first metric that a level of network congestion has increased above a threshold; as a result of determining from the received first metric that the level of network congestion has increased above the threshold, adjusting (S808) the dataset structure to a second configuration, wherein a dataset generated according to the adjusted dataset structure comprises encoded video data (312) with a second encoding bitrate and padding data (310) having a second size, wherein the second encoding bitrate is lower than the first encoding bitrate; and generating (S810) second datasets (308) according to the dataset structure in the second configuration and transmitting the second datasets as a data stream via the communication channel, wherein the dataset structure defines that encoded video data is prioritized over padding data during the transmission of the dataset generated according to the dataset structure.
- The method of claim 1, further comprising: in response to determining (S806) that the level of network congestion has increased above the threshold: adjusting (S902) the dataset structure to a third configuration, wherein a dataset generated according to the dataset structure in the third configuration comprises encoded video data (404) with the second encoding bitrate and no padding data; generating (S904) third datasets (402) according to the dataset structure in the third configuration, and transmitting the third datasets as a data stream via the communication channel; during the transmission of the third datasets, receiving a second metric indicating a level of network congestion of the communication channel and determining (S906) from the received second metric that the level of network congestion has decreased below the threshold; and adjusting (S808) the dataset structure to the second configuration.
- The method of claim 1, further comprising: in response to determining (S806) that the level of network congestion has increased above the threshold, instructing (S1002) a video encoder to start encoding of video data with the second encoding bitrate; adjusting (S1004) the dataset structure to a fourth configuration, wherein a dataset (502) generated (S1006) according to the dataset structure in the fourth configuration comprises encoded video data (504) with the first encoding bitrate and no padding data; and in response to receiving (S1008) an indication that the video encoder encodes video data with the second encoding bitrate, adjusting (S808) the dataset structure to the second configuration.
- The method of any one of claims 1-3, further comprising: during the transmission of the second datasets, receiving a third metric indicating a level of network congestion of the communication channel and determining (S812) from the received third metric that the level of network congestion has decreased below the threshold; adjusting (S814) the dataset structure to the first configuration; and generating (S816) fourth datasets (602) according to the dataset structure in the first configuration and transmitting the fourth datasets as a data stream via the communication channel.
- The method of any one of claims 1-4, wherein the second size is less than the first size.
- The method of any one of claims 1-5, wherein the dataset structure defines that encoded video data is positioned before padding data in a dataset generated according to the dataset structure.
- The method of any one of claims 1-6, further comprising determining a size of padding data of a dataset generated according to the dataset structure using at least one of: transmission technology of the communication channel; a size of encoded video data of the dataset generated according to the dataset structure; a measurement indicating a variance of available bandwidth of the communication channel; a measured roundtrip time, RTT, of a signal transmitted on the communication channel; or a user input indicating an importance of the video data.
- The method of claim 7, further comprising: determining the measurement indicating a variance of available bandwidth of the communication channel using historical data identifying a frequency of changes of configurations of the dataset structure.
- The method of claim 7, wherein a comparably larger RTT results in a comparably larger padding data size.
- The method of any one of claims 1-9, wherein the datasets are transmitted over the communication channel using at least one of a TCP protocol and a UDP protocol.
- The method of claim 10, wherein encoded video data of a dataset is transmitted using the TCP protocol and padding data of the dataset is transmitted using the UDP protocol.
- The method of any one of claims 1-11, wherein the metric indicating a level of network congestion of the communication channel defines one of: packet loss rate, jitter, transmission buffer occupancy, a number or frequency of explicit congestion notifications, or latency.
- A non-transitory computer-readable storage medium having stored thereon instructions for implementing the method according to any one of claims 1-12 when executed on a camera having processing capabilities.
- A device for controlling real time video streaming, the device configured for: providing (S802) a dataset structure for transmission of video data, wherein a dataset generated according to the dataset structure in a first configuration comprises encoded video data (306) with a first encoding bitrate and padding data (304) having a first size; generating (S804) first datasets (302) according to the dataset structure in the first configuration, and transmitting the first datasets as a data stream via a communication channel (112); during the transmission of the first datasets, receiving a first metric indicating a level of network congestion of the communication channel, and determining (S806) from the received metric that a level of network congestion has increased above a threshold; as a result of determining from the received first metric that a level of network congestion has increased above a threshold, adjusting (S808) the dataset structure to a second configuration, wherein a dataset generated according to the adjusted dataset structure comprises encoded video data (312) with a second encoding bitrate and padding data (310) having a second size, wherein the second encoding bitrate is lower than the first encoding bitrate; and generating (S810) second datasets (308) according to the dataset structure in the second configuration and transmitting the second datasets as a data stream via the communication channel, wherein the dataset structure defines that the encoded video data is prioritized over padding data during the transmission of the dataset generated according to the dataset structure.
- The device according to claim 14, implemented in a camera capturing the video data.
Description
Technical Field The present invention relates to techniques for managing network congestion, and in particular to techniques for controlling a bitrate of encoded video data using padding data added to the encoded video data during transmission on the network. Background Streaming video data over networks with limited capacity presents a significant challenge. The core issue lies in ensuring that the amount of transmitted data does not exceed what the network can handle. If the data rate surpasses the capacity of the network, network congestion will increase, and once the level of network congestion crosses a certain threshold, packet loss may occur. To mitigate this risk, the network bandwidth may be estimated, a process known as Bandwidth Estimation (BWE). The estimated BWE value may then be used to regulate the bitrate by a bitrate controller during video encoding. Various methods and protocols have been developed for streaming, each employing different approaches for calculating BWE. Despite these advancements, the risk of data loss persists, particularly if the bitrate controller's adaptation to changing bandwidth is sluggish or if the bandwidth decreases rapidly. For example, BWE may become challenging when multiple streaming devices share the same bandwidth, leading to increased competition and fluctuating available capacity. One well-known solution to this problem involves using the Transmission Control Protocol (TCP), which ensures reliability by retransmitting potentially dropped packets. However, the reliability of TPC comes at the expense of reduced control and potentially lower transmission rates. On the other hand, datagram-based protocols such as the User Datagram Protocol (UDP) do not suffer from these drawbacks. UDP allows for more efficient transmission rates and greater control over the data flow. However, it inherently carries the original problem of increased risk of data loss due to its lack of retransmission mechanisms. There is thus a need for improvements in this context. US 2010/198980 A1 (Skype Limited) describes a method comprising: receiving data at a first node; encoding a first portion of the data at a first bit rate to generate a first encoded data stream; monitoring an indication of the capacity of the channel; transmitting to the second node a padded data stream via the channel, wherein padding bits are added to the first encoded data stream, in dependence on the indication of the capacity of the channel, to generate the padded data stream; determining if transmitting the padded data stream exceeds the capacity of the channel; and encoding a second portion of the data at a higher bit rate than the first bit rate, to generate a second encoded data stream for transmission over the channel, if it is determined that transmitting the padded data stream does not exceed the channel's capacity. 3GPP TS 26.114 V18.4.0 describes a standardized IP Multimedia Subsystem (IMS) telephony service that builds on the IMS capabilities to establish multimedia communications between terminals within and in-between operator networks. Summary In view of the above, solving or at least reducing one or several of the drawbacks discussed above would be beneficial, as set forth in the attached independent patent claims. The invention is defined by the subject-matter of the independent claims. According to a first aspect of the present invention, there is provided a method for controlling real time video streaming, the method comprising: providing a dataset structure for transmission of video data, wherein a dataset generated according to the dataset structure in a first configuration comprises encoded video data with a first encoding bitrate and padding data; generating first datasets according to the dataset structure in the first configuration, and transmitting the first datasets as a data stream via a communication channel; during the transmission of the first datasets, receiving a first metric indicating a level of network congestion of the communication channel, and determining from the received first metric that a level of network congestion has increased above a threshold; adjusting the dataset structure to a second configuration, wherein a dataset generated according to the adjusted dataset structure comprises encoded video data with a second encoding bitrate and padding data, wherein the second encoding bitrate is lower than the first encoding bitrate; and generating second datasets according to the dataset structure in the second configuration and transmitting the second datasets as a data stream via the communication channel. Advantageously, the present disclosure presents techniques to maintain a high bitrate while managing network congestion. These techniques involve using padding as a buffer to detect congestion early. Initially, the transmitted datasets include encoded video data at a high bitrate, supplemented with padding data. During transmission, if an increase in a level of network