CN-122027804-A - Video stream real-time compression method based on edge calculation

CN122027804ACN 122027804 ACN122027804 ACN 122027804ACN-122027804-A

Abstract

The invention discloses a video stream real-time compression method based on edge calculation, which comprises the following steps of deploying edge calculation nodes to obtain state information, receiving video frames and finishing preprocessing to extract content features, extracting frame-extracted video frames according to frame-extracted periods and obtaining semantic responses through lightweight semantic recognition to generate regional semantic features, generating regional importance indexes according to regional importance mapping rules, determining regional coding parameters according to the state information and the regional importance indexes, generating layered bit streams and executing sending priority scheduling according to the regional importance indexes, executing retransmission, reissuing or discarding of base layer data and enhancement layer data according to the state information by utilizing short-time cache queues at the edge nodes, receiving cloud quality feedback and updating frame-extracted periods, mapping rules and sending configuration. The invention realizes stable video compression and efficient resource utilization under the condition of weak network by combining the self-adaptive parameter control of the edge node state information and the layered bit stream scheduling.

Inventors

WU XIAOMAO
YANG YOUKE
SU QIANG

Assignees

深圳极派科技有限公司

Dates

Publication Date: 20260512
Application Date: 20260207

Claims (9)

1. The video stream real-time compression method based on edge calculation is characterized by comprising the following steps: The method comprises the steps that edge computing nodes are arranged at a video acquisition end, and state information of the edge computing nodes is acquired; receiving an original video frame sent by a camera, performing preprocessing on the video frame, and extracting content characteristics based on inter-frame difference, optical flow strength and texture complexity; Extracting an extraction frame video frame from the video frame according to a preset extraction frame period, inputting the extraction frame video frame into a lightweight semantic recognition network to obtain a semantic response, and transmitting the semantic response to an adjacent video frame based on a motion field to generate regional semantic features; Fusing the content features with the regional semantic features, and obtaining regional importance indexes according to a preset regional importance mapping rule; Determining quantitative step length adjustment factors, downsampling modes and reference frame updating intervals for different areas according to the state information and the area importance index to form area coding parameters; Compressing video frames based on the region coding parameters, generating a layered bit stream containing base layer video data and enhancement layer video data, and executing transmission priority scheduling on the base layer video data and the enhancement layer video data according to the state information; Maintaining a short-time buffer queue at an edge node, and executing retransmission, reissue or discarding strategies on the base layer video data and the enhancement layer video data according to the state information; And receiving video quality feedback information returned by the cloud server, and updating the frame extraction period, the region importance mapping rule and the sending configuration of the layered bit stream according to the feedback information.
2. The method for compressing video stream in real time based on edge computation according to claim 1, wherein the step of collecting state information of the edge computation node comprises: Setting a resource monitoring unit and a network monitoring unit in the edge computing node, wherein the resource monitoring unit is used for respectively acquiring CPU load, memory occupation amount and coding queue length according to a preset sampling period and recording the change rates of the CPU load and the coding queue length; The network monitoring unit is used for respectively obtaining the uplink network bandwidth, the link delay and the packet loss rate through detection message sending, packet returning receiving and link flow statistics, and performing timestamp alignment and formatting processing on the instantaneous sampling values of the bandwidth, the delay and the packet loss rate to generate state information.
3. The method for real-time compression of video streams based on edge computation of claim 1, wherein the step of extracting content features comprises: Continuously receiving original pre-coded video frames output by a camera through a video input link of an edge node, and rearranging disordered frames according to acquisition time stamps of the video frames; Performing resolution standardization processing of bilinear interpolation or neighborhood scaling mode on the arranged video frames according to a preset target resolution, and converting an original color format into a preset uniform color space format; performing noise reduction processing on the converted video frame by adopting a mode of combining time domain smoothing filtering and space convolution filtering; Performing pixel-by-pixel subtraction on adjacent video frames received in sequence to generate an inter-frame differential image; Calculating block-level optical flow intensity for each block according to a preset block division mode on the basis of the inter-frame differential image; the texture complexity characteristic is obtained by comprehensively calculating the gray gradient distribution, the directional gradient statistics and the frequency domain energy distribution of the video frame; A set of content features is formed that includes inter-frame differential features, optical flow intensity features, and texture complexity features.
4. The method for real-time compression of video streams based on edge computation of claim 1, wherein the step of generating regional semantic features comprises: Selecting a target frame serving as a frame extraction video frame from a continuous video frame sequence arranged in time sequence according to a preset frame extraction period, performing normalization processing on the target frame, inputting the target frame into a lightweight semantic recognition network, and outputting semantic response comprising semantic category indexes, category confidence and corresponding pixel region positions; After the semantic response is acquired, performing block division on the target frame and a plurality of adjacent video frames before and after the target frame, and calculating pixel gradient differences, block matching offset and optical flow initial vectors between the adjacent frames based on the block division to obtain a block-level motion field; performing time sequence mapping on each pixel position in the semantic response based on the displacement vector of the corresponding block in the motion field, and transmitting the semantic response to the corresponding spatial position of the adjacent video frame from the target frame; And the propagated responses are subjected to pixel-level resampling, block-level consistency check and time stamp alignment processing to form regional semantic features covering each video frame.
5. The method for compressing a video stream in real time based on edge calculation according to claim 1, wherein the step of generating the region importance index comprises: performing block-level division on the video frames according to preset space grids on the content characteristics obtained through inter-frame difference, optical flow intensity and texture complexity, wherein each block corresponds to a group of content characteristic vectors; Mapping the regional semantic features to corresponding block regions according to a space division mode consistent with the grid, and performing pixel sampling density unification processing on the mapped semantic features to obtain block-level semantic feature vectors; performing time stamp alignment on the content feature vector and the semantic feature vector of each block, and performing feature stitching, normalization and weighting factor weighting on the two types of vectors according to a preset fusion structure; And establishing a corresponding fusion index for the weighted vector according to the block position, the block identifier and the frame number, and generating a region importance index according to the fusion index.
6. The method of claim 1, wherein the step of forming the region coding parameters comprises: Performing a time stamp alignment process on the region importance index and the state information; dividing the video frame into regions according to a preset space block division scheme, wherein each space block corresponds to one importance value in the region importance index; Normalizing the CPU load, memory occupation, coding queue length, uplink bandwidth, link delay and packet loss rate in the state information according to a preset rule, and establishing a parameter mapping relation according to the normalized state vector and the region importance value; And respectively calculating quantization step length adjustment factors, a space downsampling mode and a reference frame updating interval of the corresponding region according to the parameter mapping relation, and combining the three types of parameters according to region numbers, time stamps and block serial numbers to form a region coding parameter set.
7. The method for real-time compression of video streams based on edge computation of claim 1, wherein said step of transmitting a priority schedule comprises: Respectively compressing each region of each video frame according to a quantization step length adjustment factor in the region coding parameters, a space downsampling mode and a reference frame updating interval to generate corresponding region compressed data; The region compressed data are respectively classified into a base layer data area and an enhancement layer data area according to a preset hierarchical division rule, and the base layer data and the enhancement layer data in the same frame are sequentially arranged according to a region position index to form base layer video data and enhancement layer video data of corresponding frames; Then, the base layer video data and the enhancement layer video data of the continuous frames are subjected to stream organization according to the time stamp sequence, and are combined into a layered bit stream; And performing priority judgment on the transmission sequence of the base layer video data and the enhancement layer video data according to the bandwidth, the time delay and the packet loss rate in the state information, and performing transmission scheduling on the layered bit stream according to the judgment result.
8. The method for real-time compression of video streams based on edge computation of claim 1, wherein the step of maintaining the short-time buffer queue comprises: Establishing independent short-time buffer queues for the base layer video data and the enhancement layer video data in the edge node respectively, and writing the base layer video data and the enhancement layer video data into corresponding buffer queues according to time stamps when generating frame-level layered bit streams each time; Recording the belonging layer, frame number, area number and sending state identification for each buffer memory item; then, threshold judgment is carried out on the uplink bandwidth, the link delay and the packet loss rate in the state information, and retransmission, retransmission supplement or discarding operation is carried out on the buffer entries in the state to be transmitted or the partial transmission state according to the judgment result; and when the bandwidth meets the preset sending condition, performing concurrent sending according to the time stamp sequence of the cache entries in the cache queue, and when the bandwidth is insufficient or the packet loss rate exceeds a preset threshold, performing discarding on the data entries in the enhancement layer, and performing retransmission on the data entries in the base layer according to the sending state identification.
9. The method for compressing video stream in real time based on edge calculation according to claim 1, wherein the step of receiving video quality feedback information returned by the cloud server and updating a frame extraction period, a region importance mapping rule and a layered bit stream transmission configuration comprises: setting a feedback receiving channel in the edge node, and periodically acquiring quality feedback information sent by a cloud server through the receiving channel; Performing analysis on the decoding frame time stamp, the reconstructed video frame sequence number and the corresponding quality evaluation field in the quality feedback information to generate a feedback record corresponding to the current coding frame sequence; And then, carrying out numerical value updating on the currently used frame extraction period according to the feedback record, carrying out parameter item replacement or weight item adjustment on the mapping rule of the region importance index, and resetting priority items, stream transmission sequence items and data selection items in video data transmission configuration of the base layer and the enhancement layer.

Description

Video stream real-time compression method based on edge calculation Technical Field The invention relates to the technical field of video processing, in particular to a video stream real-time compression method based on edge calculation. Background With the rapid development of application scenes such as video monitoring, intelligent manufacturing, vehicle-mounted vision, security inspection and the like, the number of front-end cameras is continuously increased, video resolution and frame rate are continuously improved, and bandwidth resources occupied by video data on a transmission link are obviously increased. In the traditional architecture, video data is generally directly uploaded to a central server or a cloud for encoding processing by a camera, so that the problem of large dependence on network bandwidth exists, and video picture jamming, delay increase or image quality degradation are very easy to cause once the bandwidth is insufficient, network fluctuation or link congestion occurs. The conventional video compression method depends on fixed coding parameters or preset code rate control strategies, lacks perception of real-time state of a network in the coding process, cannot dynamically adjust compression strength according to bandwidth changes, and is difficult to meet real-time requirements. Meanwhile, most of the existing methods adopt an integral coding or global unified quantization strategy, and different areas cannot be subjected to differentiation processing according to content differences in videos, so that the definition of key areas is insufficient or coding resources are distributed unreasonably. With the increase of edge computing power, it is possible to migrate video preprocessing, content analysis and preliminary compression computation to edge nodes close to the camera. However, existing edge-side coding schemes typically perform coarse-granularity compression only for the entire picture, lack content-based region subdivision capability, and fail to combine content features, semantic information, and network state for joint coding control, so that coding quality cannot be kept stable with limited resources. In addition, instantaneous bandwidth fluctuation, packet loss and delay change often occur in a video uploading link, the existing scheme lacks a buffer and scheduling mechanism matched with layered bit streams, and is difficult to ensure continuous uploading of data of key frames or key areas in a weak network environment, and meanwhile, the existing scheme lacks a quality feedback updating mechanism from a cloud, so that coding parameters cannot be adaptively corrected according to actual running conditions. Therefore, how to provide a video stream real-time compression method based on edge computation is a problem that needs to be solved by those skilled in the art. Disclosure of Invention The invention provides a video stream real-time compression method based on edge calculation, which generates an area importance index by fusing content features and semantic features, determines area coding parameters by combining resource states and network states of edge nodes, enables a coding process to carry out fine granularity self-adaptive adjustment according to content differences and link conditions, realizes retransmission, reissue or discarding control on data of a base layer and an enhancement layer by generating layered bit streams of the base layer and the enhancement layer and carrying out priority scheduling according to state information and matching with a short-time buffer queue, thereby maintaining continuity of video transmission in a weak network environment, and simultaneously updates a frame extraction period, an area importance mapping rule and a transmission configuration by utilizing cloud feedback, enables an edge side coding strategy to have self-adaptive adjustment capability, and integrally improves coding flexibility and resource utilization efficiency of the video stream at an edge side. According to the embodiment of the invention, the video stream real-time compression method based on edge calculation comprises the following steps: The method comprises the steps that edge computing nodes are arranged at a video acquisition end, and state information of the edge computing nodes is acquired; receiving an original video frame sent by a camera, performing preprocessing on the video frame, and extracting content characteristics based on inter-frame difference, optical flow strength and texture complexity; Extracting an extraction frame video frame from the video frame according to a preset extraction frame period, inputting the extraction frame video frame into a lightweight semantic recognition network to obtain a semantic response, and transmitting the semantic response to an adjacent video frame based on a motion field to generate regional semantic features; Fusing the content features with the regional semantic features, and obtaining regional impor