CN-116723354-B - Distributed edge collaborative video analysis method based on multi-agent reinforcement learning

CN116723354BCN 116723354 BCN116723354 BCN 116723354BCN-116723354-B

Abstract

The invention discloses a distributed edge collaborative video analysis method based on multi-agent reinforcement learning, which is used for jointly learning an optimal strategy of video frame preprocessing, model selection and request scheduling through the mutual collaboration of edge nodes, so that the overall cost of a system is minimized. The present invention models each edge node as an agent, which is an autonomous entity and makes distributed control decisions by observing its local state. The invention adopts the attention mechanism to distinguish the importance of the information collected from different edge nodes, verifies the performance of the information by deploying a video analysis test platform with a plurality of edge nodes, and uses the real-world data set and experimental setting to carry out extensive experiments. Experimental results show that compared with the existing baseline method, the method can obviously improve the overall rewards by 33.6% -86.4%.

Inventors

GAO GUANYU
DONG YUQI

Assignees

南京理工大学

Dates

Publication Date: 20260512
Application Date: 20230510

Claims (5)

1. A distributed edge collaborative video analysis method based on multi-agent reinforcement learning is characterized in that the method is based on a video analysis system of distributed edge calculation, a DNN model is deployed on edge nodes positioned in different geographic positions, each edge node receives a video analysis reasoning request from a corresponding area, and the overall cost of the system is minimized through collaborative learning, and the method comprises the following steps: (1) Reasoning requests of different regions send video frames to corresponding edge nodes through http protocol ; (2) When edge node When receiving the reasoning request, the decision engine makes a decision according to the current system state Making a decision and then controlling the decision Applied to video frames, decisions include selection of inference nodes, selection of DNN models, and selection of resolutions; (3) Edge node Reducing the video frames to a specified resolution according to the resolution decision to reduce transmission delay and inference delay between edge nodes; (4) Edge node Determining whether the reasoning of the video frame is performed locally or forwarded to another edge node according to the reasoning node decision; (5) If the video frame of the current time slot is inferred at the local edge node, the preprocessed video frame is put into a local inference queue to wait for the inference of a DNN model selected on the local edge node; (6) If the local edge node is overloaded, the video frame will be dispatched to another edge node, the preprocessed video frame will be put into a dispatch queue waiting to be forwarded to another edge node for reasoning through gRPC; The update training mode for the corresponding model of the decision engine is as follows: Representing an edge node's Actor network as For each edge node, the Actor network optimizes the control strategy to increase rewards and updates by maximizing the following objectives, expressed as follows: Wherein the method comprises the steps of Is the number of samples that are to be taken, Is the probability ratio of importance samples, which allows samples to be sampled at parameters For updating parameters , Is the discount rewards of edge node i, calculated by GAE, for evaluating the quality of state, action pairs > 0, Indicating a relatively good, otherwise relatively poor; is a super parameter for controlling the clipping strength, and is used for preventing the new strategy function from changing too much relative to the old strategy function; is the strategy entropy used for increasing exploration, and sigma is the coefficient of the strategy entropy; Representing a Critic network of edge nodes as The Critic network trains according to the loss objective defined in the following equation: Wherein the method comprises the steps of Is a discount prize to be awarded, Representing edge nodes Global state in the transition from slot t to slot t +1, Is a parameter of Is a function of the Critic network of (c), Is a parameter of The lower part of the Critic network is provided with a network, Is a super parameter that controls the shear strength.
2. The method for collaborative video analysis of a distributed edge based on multi-agent reinforcement learning of claim 1 wherein the decision engine of step (2) is implemented based on a multi-agent reinforcement learning algorithm comprising the steps of: based on discrete time system consideration, the local state of each edge node is calculated Input to an Actor network Obtaining the category logarithmic probability distribution of each action, and then sampling according to the probability distribution to obtain the action of each edge node The expression of the class log probability distribution for each action is as follows: Wherein, the Representing edge nodes In time slot The logarithmic probability distribution of the category at that location, Representing edge nodes The parameters of (2) are The input of the Actor network is For generating a logarithmic probability distribution of categories for each action ; The distributed edge computing video analysis system performs video analysis on the reasoning request according to control actions, wherein the actions of each edge node comprise deciding that each edge node locally reasoning or forwarding to another edge node, selecting a DNN model for reasoning and a resolution for video frame preprocessing; for edge nodes In time slot The control actions at this point are defined as follows: Wherein the method comprises the steps of Is the edge node selected for reasoning, Is a set of edge nodes that are configured to communicate, Is the DNN model of choice for reasoning, Is a collection of available DNN models, Is the resolution selected for video frame pre-processing, Is a set of available video resolutions; if the edge node for selecting reasoning is the same as the edge node for receiving the reasoning request, the reasoning is performed locally, otherwise, the request is forwarded to the corresponding other edge nodes for reasoning.
3. The method for collaborative video analysis of distributed edges based on multi-agent reinforcement learning of claim 2 wherein the method is time-slotted At the end, each edge node Will calculate its time slot Awards during the period And time slots Shared rewards of the video analytics system during the period ; In the method, if an inference request is successfully completed, the rewards of the inference request are calculated as a linear combination of accuracy and delay, and conversely, if a request is discarded, the rewards of the inference request are defined as a weighted penalty value; definition in a time period During which at the edge node Go on The rewards of each request are: Wherein the method comprises the steps of For time slots Period edge node Upper first Waiting time of individual requests in queue For the video frame loss threshold value, For time slots During which at the edge node Go on The secondary request is sent to the server in response to the secondary request, Time slots During which at the edge node Go on The total time delay of the secondary request is, In order to penalize the constant of the penalty, Penalty weights for overall delay; Edge node In time slot Awards during the period To be in time slot Period edge node The sum of rewards obtained by reasoning is expressed as: Wherein, the Is an edge node In time slot The number of inference requests made during the period; in order to optimize the overall performance of the co-operation between edge nodes, the method designs the reward function as a shared reward, i.e. the sum of rewards of all edge nodes, the shared reward being expressed as The expression is as follows: 。
4. The multi-agent reinforcement learning-based distributed edge collaborative video analysis method according to claim 3, wherein training data collection processing modes for a corresponding model of a decision engine are as follows: Each edge node In time slot Initially obtaining its new local state from a video analysis system And global state The local state, the global state, the action, the shared rewards, the new local state and the new global state are stored into an experience buffer as a conversion process; after each batch of data is collected, the estimated advantage is calculated using GAE and the jackpot is calculated in trace τ, the network is trained with this batch of data and the policy objective of the Actor network and the loss objective of the Critic network are optimized by Adam optimizer.
5. The method for collaborative video analysis of a distributed edge based on multi-agent reinforcement learning of claim 2 wherein step (2) is performed for a discrete time system wherein time is recorded as At each time slot Each edge node In time slot Observing their local state, the expression form is as follows: Wherein the method comprises the steps of Is in time slot Edge nodes of previous time slots Is a mean inferential request arrival rate of (c), Is an edge node In time slot The local reasoning at this point is the queue length, Is an edge node To edge node Is used to schedule the length of the queue, Is a time slot Edge node And edge node Bandwidth between; Assuming that each edge node can only observe its local state, the global state of the environment is composed of the local states observed by each edge node, and the time slots are formed The global state of the environment at is represented as: Wherein, the Is the number of edge nodes.

Description

Distributed edge collaborative video analysis method based on multi-agent reinforcement learning Technical Field The invention relates to the technical fields of edge calculation, deep reinforcement learning and computer vision, in particular to a distributed edge collaborative video analysis method based on multi-agent reinforcement learning. Background Video analytics have been widely used in many computer vision-based applications, such as video surveillance, augmented reality, and autopilot. Currently, most of the most advanced video analysis algorithms are implemented using Deep Neural Networks (DNNs). The DNN-based video analysis technique has higher accuracy, however, deploying DNN models for video analysis faces many challenges in real-world scenarios. First, DNN models for video analysis are typically composed of hundreds of layers, which can lead to significant inference delays. In addition, the video itself has a large content, and transmission of the original video content may result in large bandwidth costs and intolerable transmission delays. To reduce the bandwidth cost and data transfer delay of video analytics applications, the DNN model may be deployed on edge nodes near the user. The edge node can receive video data from the user with little delay. But the processing power of one edge node is limited. When a large number of inference requests arrive, the edge node may be overloaded, exceeding its processing power, which will significantly increase the inference delay of the video frames. To guarantee system performance, careful design of video analytics mechanisms with edge computation is required. Overload is easily caused by an edge node with insufficient computing power. Furthermore, the edge nodes are located in different geographical areas and their workload is time-varying and unbalanced, and some edge nodes may be very lightly workload while others may be overloaded. Therefore, it is necessary to consider the collaboration among multiple edge nodes at different locations to improve the overall performance of the video analytics system. Considering the above factors, deploying an edge system in reality has the following challenges. One edge node needs to consider not only its own video frame preprocessing and model selection decisions, but also decisions of other edge nodes to maximize overall performance, and thus the decision of the video analysis pipeline becomes more complex. Furthermore, each edge node is an autonomous entity that needs to cooperate with other edge nodes while making its own decisions for received reasoning requests, and thus a distributed decision mechanism is needed to support the cooperation of the edge nodes. Disclosure of Invention Aiming at the problems of insufficient processing capacity of edge nodes, time variability and unbalance of workload, how to cooperatively learn and optimize system performance among edge nodes and the like in a real deployment scene, the invention provides a distributed edge collaborative video analysis method based on multi-agent reinforcement learning. According to the technical scheme, the distributed edge collaborative video analysis method based on multi-agent reinforcement learning is characterized in that a DNN model is deployed on edge nodes positioned in different geographic positions based on a video analysis system of distributed edge calculation, each edge node receives a video analysis reasoning request from a corresponding area, and the overall cost of the system is minimized through collaborative learning, and the method comprises the following steps: (1) The reasoning requests of different areas send video frames to corresponding edge nodes i (i epsilon N) through an http protocol; (2) When the edge node i (i epsilon N) receives an inference request, the decision engine makes a decision according to the current system state o i (t), then a control decision a i (t) is applied to the video frame, and the decision comprises the selection of the inference node, the selection of a DNN model and the selection of resolution; (3) The edge nodes i will reduce the video frames to a specified resolution according to the resolution decision to reduce the transmission delay and inference delay between the edge nodes; (4) The edge node i determines whether the reasoning of the video frame is carried out locally or forwarded to another edge node according to the reasoning node decision; (5) If the video frame of the current time slot is inferred at the local edge node, the preprocessed video frame is put into a local inference queue to wait for the inference of a DNN model selected on the local edge node; (6) If the local edge node is overloaded, the video frame will be dispatched to another edge node and the pre-processed video frame will be placed in a dispatch queue awaiting inference by being forwarded to another edge node through gRPC. Further, for a discrete time system in step (2), where time is noted as t=0, 1,2,.. At