CN-121979683-A - ASIC edge equipment-oriented container video memory budget control and dynamic optimization system and method
Abstract
The invention provides a system and a method for controlling and dynamically optimizing the memory budget of a container facing ASIC edge equipment, wherein the system comprises a memory statistics and quota interface, a container identification mapping layer, a monitoring and decision engine, a strategy engine, an executor and a monitoring and auditing module; the method comprises the steps of providing a display memory statistics and quota interface facing to a container at a driving layer or an operation library layer, collecting display memory occupation and pressure indexes of each container in a user state operation resident display memory daemon period, generating an optimization action by combining service priority and threshold rules, executing a combined action of display memory budget shaping, reasoning parameter self-adaptive adjustment and cache recovery on a target container through an actuator when a display memory pressure event is triggered, and rechecking and auditing effects by adopting a monitoring and auditing module, so that stability and continuity of multi-container concurrent reasoning are improved within an edge equipment resource boundary.
Inventors
- CEN JIE
- PANG ZHENZHEN
Assignees
- 深圳行胜数字技术有限公司
Dates
- Publication Date
- 20260505
- Application Date
- 20260326
Claims (10)
- 1. An ASIC edge device oriented container memory budget control and dynamic optimization system, comprising: The memory statistics and quota interface is arranged at the input end of the ASIC edge equipment dispatcher and used for registering a container and controlling the memory by taking the container as an object; The container identification mapping layer is used for mapping the operation information provided by the container in operation into a container handle identifiable by a drive/operation library, so as to realize the statistics and budget control of the video memory by taking the container as a unit; The monitoring and decision engine is used for periodically collecting relevant indexes of the video memory occupation of each container, calculating a video memory pressure index based on the relevant indexes of the video memory occupation, and triggering the action of the strategy engine if the video memory pressure index exceeds a video memory pressure threshold; The strategy engine is used for defining a video memory pressure threshold value and an optimization action execution rule and outputting an executable optimization action sequence to the executor; the executor is arranged between the strategy engine and the video memory statistics and quota interface, and is used for executing budget shaping according to the optimized action sequence, triggering reasoning, recovering the cache during running and carrying out self-adaptive adjustment on reasoning parameters; the monitoring and auditing module is respectively connected with the output ends of the monitoring and decision engine and the video memory statistics and quota interface and is used for collecting video memory indexes, optimizing actions and effect rechecking results and alarming risks caused by video memory pressure events and insufficient memory.
- 2. The system for controlling and dynamically optimizing the memory budget of a container for ASIC edge devices of claim 1 further comprising a performance map module for performing a correlation analysis on memory occupancy metrics, memory pressure metrics, and an optimization sequence of actions to provide calibration data for a policy engine.
- 3. The ASIC edge device oriented container memory budget control and dynamic optimization system as recited in claim 2, further comprising a memory occupancy model building module for generating a memory occupancy model based on container operational data and outputting peak budget suggestions and staged budget suggestions.
- 4. The system for controlling and dynamically optimizing the memory budget of a container facing ASIC edge devices of claim 3, further comprising a budget execution interception layer disposed at a location of a driver interface or a user mode memory allocator for performing budget verification on a memory application of the container, and executing application interception when the application results in memory overestimation.
- 5. The ASIC edge device oriented container video memory budget control and dynamic optimization system of claim 4 wherein after said budget execution interception layer intercepts, an error code with a container handle and request context is returned and a monitor and decision engine triggers a corresponding demotion action, and when the budget allows but global constraints are to be broken, the monitor and decision engine contracts the low priority container budget by priority, allowing the high priority container to continue allocation.
- 6. A method for controlling and dynamically optimizing the budget of a container video memory, which is implemented based on the system for controlling and dynamically optimizing the budget of the container video memory of an ASIC edge-oriented device according to any one of claims 1 to 5, and is characterized by comprising the following steps: s1, after a container is started, a monitoring and decision engine acquires relevant indexes of video memory occupation of each container when the container runs, acquires a container handle and establishes a mapping; S2, the policy engine calculates initial budget of each container according to the total display memory, the safety reserved quantity and the container priority of the equipment and then sends the initial budget to an executor; S3, the executor issues initial budgets of all containers to the video memory statistics and quota interface, and then video memory scheduling is carried out through the ASIC edge equipment scheduler and hardware resources; S4, the monitoring and decision engine collects the video memory occupation, the distribution failure count and the key time delay of each container in a preset period, calculates the video memory pressure index, forms statistics in a time window, and triggers the action of the strategy engine when the video memory pressure index exceeds a video memory pressure threshold or distribution failure occurs; S5, outputting an executable optimization action sequence to an executor by the strategy engine, wherein the optimization action sequence comprises a cache recovery and reasoning parameter adjustment item; S6, the executor executes budget shaping according to the action sequence, triggers cache recovery during reasoning operation, carries out self-adaptive adjustment on reasoning parameters, and calculates effect scores; and S7, the monitoring and auditing module reads the optimized memory occupation related index and the new memory pressure index in the rechecking window, if the pressure is released, the step S4 is executed, and if the pressure is not improved or the service level agreement SLA is affected, the step S is rolled back to the previous strategy or the degradation and alarm are executed.
- 7. The method for controlling and dynamically optimizing the memory budget of a container according to claim 6, further comprising a map generation and strategy calibration step of periodically converging indexes and action records of each container, generating a performance map of the container-level time sequence and the associated view through a performance map module, and calibrating each threshold and weight parameter of the design according to samples in the performance map by the strategy engine.
- 8. The method for controlling and dynamically optimizing the memory budget of a container according to claim 7, wherein the policy engine performs calibration on each designed threshold and weight parameter according to the samples in the performance map by: (1) Taking a sample point set S= { M i 、B i 、F i 、L i 、Action、E i 、P i } of each container in the performance map as input, wherein i is a serial number of the container, M i is a video memory occupation of the ith container, B i is a video memory budget of the ith container, F i is a distribution failure count of the ith container, L i is a key time delay of the ith container, action is an optimization Action, E i is an optimization effect score of the ith container, and P i is a video memory pressure index of the ith container; (2) And if the trigger condition is invalid or jitter is caused, the trigger condition is tightened or the priority of the action is reduced, and all the adjustment and the calibration, action recording and calibration effects are bound.
- 9. The method for controlling and dynamically optimizing memory budget of a container according to claim 6, wherein in step S2, the initial budget is calculated by the method comprising the steps of Deducting the safety reserve Then, by priority of container i And historical baseline Proportional allocation of available budget with result cut-off at An interval, wherein, 、 The minimum and maximum budgets allowed for the container i, respectively.
- 10. The method for controlling and dynamically optimizing a memory budget for a container according to claim 6, wherein the target budget is a target budget during run-time On the premise of not damaging the total constraint, small-amplitude automatic adjustment is carried out according to the difference value between the current video memory occupation and the target budget, the allocation failure signal and the video memory pressure level, and the result is always limited to An optimization effect score is defined for panel presentation and policy calibration, The strategy engine introduces a hysteresis window and a rechecking window in the judgment of the video memory pressure index, and limits budget updating to And hierarchically contracting the low-priority containers according to the priority when triggering global constraint recovery, wherein the global constraint formula is as follows: Wherein, the For the number of concurrent containers, Reserved for drive, system and emergency buffering.
Description
ASIC edge equipment-oriented container video memory budget control and dynamic optimization system and method Technical Field The invention relates to the technical field of container video memory optimization, in particular to a system and a method for controlling and dynamically optimizing the budget of a container video memory for ASIC edge equipment. Background With the popularization of the edge intelligent reasoning business, a plurality of AI applications such as image recognition, target detection, structured extraction, voice analysis and the like are often concurrently operated on an edge equipment product carrying an ASIC reasoning acceleration chip. Such applications are often deployed in a container manner to achieve isolation, fast delivery, and easy operation and maintenance. Unlike data center servers, edge devices typically have stronger resource boundaries, with limited memory/device memory capacity on the accelerator side, limited bandwidth and power consumption, lack of swap partition (swap) or unacceptable swap overhead, and with concurrent reasoning, model weights, active Cache, KV Cache optimizations, operator temporary buffers, etc. can form peak occupancy in ASIC device memory. Once the device Memory OOM (Out Memory) occurs, the reasoning fails slightly, the container is restarted, the driving exception or the device reset is triggered again, so that the connection with other containers Of the same machine is interrupted, and the operation and maintenance risk that the whole machine is influenced by the single-point OOM is formed. For general architectures such as x86/ARM, the container resource isolation mainly depends on cgroup (control group) to limit the CPU/memory to obtain a stable isolation effect, and for GPU scenes, engineering can realize observability and limitation to a certain extent by means of mature memory management ecology (such as a memory pool, stream distribution, an upper limit of a job-level memory and a statistic interface of a driving side). However, on ASIC edge devices, the driver interface and the operating library ABI often show manufacturer differentiation, and the memory allocation is often hidden inside the reasoning operating library, so that the container layer is difficult to perceive who is applying for the memory, how much is applied for, whether the memory is recyclable, and the existing container mechanism cannot form a hard boundary control of the container level for the accelerator memory, so that the stability of multi-container concurrent reasoning is difficult to guarantee. Disclosure of Invention The invention provides a system and a method for controlling and dynamically optimizing the memory budget of a container for ASIC edge equipment, which aims to solve the problems that in the prior art, the memory of the memory/equipment is uncontrollable and the OOM frequency and service continuity are difficult to guarantee under the multi-container concurrency reasoning of the ASIC edge equipment. The invention relates to a container video memory budget control and dynamic optimization system facing ASIC edge equipment, which comprises: The memory statistics and quota interface is arranged at the input end of the ASIC edge equipment dispatcher and used for registering a container and controlling the memory by taking the container as an object; The container identification mapping layer is used for mapping the operation information provided by the container in operation into a container handle identifiable by a drive/operation library, so as to realize the statistics and budget control of the video memory by taking the container as a unit; The monitoring and decision engine is used for periodically collecting relevant indexes of the video memory occupation of each container, calculating a video memory pressure index based on the relevant indexes of the video memory occupation, and triggering the action of the strategy engine if the video memory pressure index exceeds a video memory pressure threshold; The strategy engine is used for defining a video memory pressure threshold value and an optimization action execution rule and outputting an executable optimization action sequence to the executor; the executor is arranged between the strategy engine and the video memory statistics and quota interface, and is used for executing budget shaping according to the optimized action sequence, triggering reasoning, recovering the cache during running and carrying out self-adaptive adjustment on reasoning parameters; the monitoring and auditing module is respectively connected with the output ends of the detection and decision engine and the video memory statistics and quota interface and is used for collecting video memory indexes, optimizing actions and effect rechecking results and alarming risks caused by video memory pressure events and insufficient memory. Further, the system also comprises a performance map module which is used for carrying out assoc