CN-122027861-A - Streaming media self-adaptive bit rate method and device

CN122027861ACN 122027861 ACN122027861 ACN 122027861ACN-122027861-A

Abstract

The embodiment of the application provides a streaming media self-adaptive bit rate method and device, which comprise the steps of constructing a multi-strategy expert data set, wherein the multi-strategy expert data set comprises strategy data samples respectively obtained based on a plurality of preset self-adaptive bit rate decision methods, the strategy data samples comprise current environment states, corresponding bit rates, rewarding values and next environment states, selecting strategy data samples for training from the multi-strategy expert data set, converting the current environment states in the selected strategy data samples into state embedded vectors, inputting the state embedded vectors into pre-constructed intelligent bodies, and training the intelligent bodies to obtain a trained bit rate strategy model. The trained bitrate strategy model can be utilized to provide an adaptive bitrate meeting the user quality of service.

Inventors

HUANG XIAOHONG
LI DANDAN
ZHANG PEI
XIE KUN
Ou Haoyang

Assignees

北京邮电大学

Dates

Publication Date: 20260512
Application Date: 20251230
Priority Date: 20251024

Claims (10)

1. A streaming media adaptive bitrate method, comprising: constructing a multi-strategy expert data set, wherein the multi-strategy expert data set comprises strategy data samples respectively obtained based on a plurality of preset self-adaptive bit rate decision methods, and the strategy data samples comprise a current environment state, a corresponding bit rate, a corresponding rewarding value and a next environment state; selecting a strategy data sample for training from the multi-strategy expert data set; Converting the current environmental state in the selected strategy data sample into a state embedding vector; Inputting the state embedded vector into an agent constructed in advance, and training the agent to obtain a trained bit rate strategy model.
2. The method of claim 1, wherein said constructing a multi-policy expert dataset comprises: Determining throughput estimation of the next media block in the current environment state based on harmonic average values of download rates of the past preset number of media blocks, and selecting the maximum bit rate which is smaller than or equal to the throughput estimation in all available bit rates as the bit rate of the next media block; Determining a bit rate of a current environmental state based on a piecewise linear function of a preset buffer level; Predicting the result of the transmission of a plurality of media blocks at different bit rates in a preset prediction time domain based on a preset system model, and determining the bit rate in the prediction time domain by solving an optimization problem for maximizing the quality of service performance; Determining the bit rate of the current environment state based on a self-adaptive bit rate method occupied by a preset buffer area; and determining the bit rate of the current environment state based on a preset reinforcement learning method.
3. The method of claim 2, wherein at the i-th media block, the method of calculating the throughput estimate C pred for the next media block at the current environmental state is: (3) Where k=5, c j is the throughput of the jth media block.
4. The method of claim 2, wherein the piecewise linear function of the buffer level is: (4) Wherein B i 、B i-1 is the buffer level when downloading the ith and i-1 th media blocks, C i-1 is the throughput of the ith-1 th media block when actually downloading, k i is the slope, and the calculation method is as follows: (5) Where B i-2 is the buffer level when downloading the i-2 th media block and C i-2 is the throughput of the i-2 th media block when actually downloaded.
5. The method according to claim 2, wherein the determining the bit rate of the current environment state based on the adaptive bit rate method of the preset buffer occupancy comprises: Determining a throughput estimate of a current network available throughput based on an actual download rate at which the first media block was downloaded; Under this throughput estimation, calculating a theoretical buffer level based on BOLA algorithm; calculating the number of virtual placeholders to be added according to the theoretical buffer zone level and the current actual buffer zone level; after adding the number of virtual placeholders to the buffer, the bit rate is calculated based on BOLA algorithm.
6. The method of claim 2, wherein said optimizing the problem of maximizing quality of service performance by solving comprises: and solving an optimization problem for maximizing the quality of service performance in the prediction time domain according to the current buffer level, the future throughput estimation interval and the code rate selected by the last media block.
7. The method of claim 1, wherein the current environmental state comprises time-sequential type state data and scalar type state data; The converting the current environmental state in the selected policy data sample into a state embedding vector includes: extracting time sequence state characteristics from the time sequence state data by utilizing a characteristic extraction layer; extracting scalar state features from the scalar state data using a full connection layer; mapping the time sequence state features and scalar state features into feature vectors with consistent dimensions by using a linear projection layer; And carrying out standardization processing on the feature vectors with consistent dimensions by using a normalization layer to obtain the state embedded vector.
8. The method of claim 1, wherein the agent comprises a policy network and a value network, wherein the policy network is implemented based on a large language model, wherein a network header of the policy network comprises a full connection layer and an activation function for outputting probabilities of all available bit rates.
9. The method of claim 1, wherein selecting training policy data samples from the multi-policy expert dataset comprises: Classifying the strategy data samples according to the session feature types of the strategy data samples to obtain classified strategy data samples; downsampling various strategy data samples with the sample number larger than the preset number to obtain various downsampled strategy data samples; and forming a balance data set by various downsampled strategy data samples and various non-downsampled strategy data samples, and randomly selecting a batch of samples from the balance data set for training.
10. A streaming media adaptive bitrate apparatus, comprising: The system comprises a construction module, a judgment module and a judgment module, wherein the construction module is used for constructing a multi-strategy expert data set, the multi-strategy expert data set comprises strategy data samples respectively obtained based on a plurality of preset self-adaptive bit rate decision methods, and the strategy data samples comprise a current environment state, a corresponding bit rate, a rewarding value and a next environment state; The selecting module is used for selecting a strategy data sample for training from the multi-strategy expert data set; the conversion module is used for converting the current environment state in the selected strategy data sample into a state embedding vector; And the training module is used for inputting the state embedded vector into a pre-constructed intelligent agent, and training the intelligent agent to obtain a trained bit rate strategy model.

Description

Streaming media self-adaptive bit rate method and device Technical Field The embodiment of the application relates to the technical field of artificial intelligence, in particular to a streaming media self-adaptive bit rate method and device. Background The HTTP adaptive streaming (HTTP ADAPTIVE STREAMING, HAS) technique is to pre-segment a complete video file into a series of short duration (e.g., 5-10 seconds) media blocks (chunks) at the server side, and generate multiple versions of different bit rates (i.e., different sharpness and file size) for each media block. In the playing process, the player selects the most suitable bit rate to download the next media block according to the current network condition through the streaming media self-adaptive bit rate algorithm, and the downloaded media block is stored in a local buffer area to ensure the continuous playing of the video. In this way, the HAS technology can dynamically adapt to the change of network bandwidth, provide high definition image quality when the network condition is good, and switch to low definition image quality when the network condition is poor, thereby maximizing user service quality as much as possible and avoiding play jamming. In the traditional streaming media self-adaptive bit rate method, the decision logic is based on fixed rules or modeling set manually, flexibility is lacking, when a network environment shows a highly dynamic, non-stable or complex mode, the optimal performance is difficult to achieve, the online reinforcement learning algorithm searches for the optimal performance through random exploration and multiple iterative processes, the training cost is high, and the user watching experience is also reduced. Disclosure of Invention In view of the foregoing, an objective of an embodiment of the present application is to provide a method and apparatus for adaptive bit rate of streaming media. Based on the above object, an embodiment of the present application provides a streaming media adaptive bitrate method, including: constructing a multi-strategy expert data set, wherein the multi-strategy expert data set comprises strategy data samples respectively obtained based on a plurality of preset self-adaptive bit rate decision methods, and the strategy data samples comprise a current environment state, a corresponding bit rate, a corresponding rewarding value and a next environment state; selecting a strategy data sample for training from the multi-strategy expert data set; Converting the current environmental state in the selected strategy data sample into a state embedding vector; Inputting the state embedded vector into an agent constructed in advance, and training the agent to obtain a trained bit rate strategy model. Optionally, the constructing the multi-policy expert dataset includes: Determining throughput estimation of the next media block in the current environment state based on harmonic average values of download rates of the past preset number of media blocks, and selecting the maximum bit rate which is smaller than or equal to the throughput estimation in all available bit rates as the bit rate of the next media block; Determining a bit rate of a current environmental state based on a piecewise linear function of a preset buffer level; Predicting the result of the transmission of a plurality of media blocks at different bit rates in a preset prediction time domain based on a preset system model, and determining the bit rate in the prediction time domain by solving an optimization problem for maximizing the quality of service performance; Determining the bit rate of the current environment state based on a self-adaptive bit rate method occupied by a preset buffer area; and determining the bit rate of the current environment state based on a preset reinforcement learning method. Optionally, in the ith media block, the method for calculating the throughput estimation C pred of the next media block in the current environment state is: (3) Where k=5, c j is the throughput of the jth media block. Optionally, the piecewise linear function of the buffer level is: (4) Wherein B i、Bi-1 is the buffer level when downloading the ith and i-1 th media blocks, C i-1 is the throughput of the ith-1 th media block when actually downloading, k i is the slope, and the calculation method is as follows: (5) Where B i-2 is the buffer level when downloading the i-2 th media block and C i-2 is the throughput of the i-2 th media block when actually downloaded. Optionally, the determining the bit rate of the current environment state based on the adaptive bit rate method occupied by the preset buffer zone includes: Determining a throughput estimate of a current network available throughput based on an actual download rate at which the first media block was downloaded; Under this throughput estimation, calculating a theoretical buffer level based on BOLA algorithm; calculating the number of virtual placeholders to be added according