CN-121996403-A - GPU memory optimization method and system

CN121996403ACN 121996403 ACN121996403 ACN 121996403ACN-121996403-A

Abstract

The invention discloses a GPU memory optimization method and system, which are used for obtaining real-time change characteristics of a task load by obtaining dynamic data flow information and an intermediate data generation rule of the task load, determining an allocation proportion and a reserved space of a memory pool by adopting an optimization algorithm based on gradient descent according to the real-time change characteristics of the task load to obtain an adjusted memory pool configuration, extracting available memory block information from the adjusted memory pool configuration, compressing intermediate data with low access frequency by adopting a Huffman coding algorithm to obtain a compressed intermediate data storage unit, temporarily storing the compressed intermediate data storage unit into a low-speed storage area if the access frequency of the compressed intermediate data storage unit is lower than a preset threshold value, and reallocating memory resources according to a task priority to obtain an optimized memory allocation scheme. The invention obviously improves the utilization rate of the memory, reduces the access delay and realizes the efficient memory resource management.

Inventors

GONG ZHI
HUANG YIN
ZHANG FULI
WANG ZHICHEN
CHEN YUANTAO
HU JIRONG
HOU MUZHOU
Teng meng
LIANG WEIFANG
CHEN JIAO

Assignees

湖南信息学院

Dates

Publication Date: 20260508
Application Date: 20251106

Claims (10)

1. The GPU memory optimization method is characterized by comprising the following steps of: Acquiring dynamic data flow information and intermediate data generation rules of a task load, extracting a memory demand peak value and a data access frequency from a task execution sequence, and classifying a task load mode by adopting a preset convolutional neural network model to obtain real-time change characteristics of the task load; according to the real-time change characteristics of the task load, analyzing the memory allocation request and the release frequency, and determining the allocation proportion and the reserved space of the memory pool by adopting an optimization algorithm based on gradient descent to obtain the adjusted memory pool configuration; extracting available memory block information from the adjusted memory pool configuration, generating data compression priority aiming at intermediate data, and compressing the intermediate data with low access frequency by adopting a Huffman coding algorithm to obtain a compressed intermediate data storage unit; And if the access frequency of the compressed intermediate data storage unit is lower than a preset threshold value, temporarily storing the compressed intermediate data storage unit in a low-speed storage area, and reallocating the memory resources according to the task priority by an intelligent scheduling algorithm to obtain an optimized memory allocation scheme.
2. The GPU memory optimization method of claim 1, wherein the step of obtaining the dynamic data flow information and the intermediate data generation rule of the task load, extracting the memory demand peak value and the data access frequency from the task execution sequence, classifying the task load mode by using a preset convolutional neural network model, and obtaining the real-time change characteristic of the task load comprises: Acquiring dynamic data flow information from a task sequence, processing the dynamic data flow information by adopting a time sequence analysis tool, extracting intermediate data of task loads, and obtaining time distribution and data generation rules of task execution; Calculating the peak value of the memory requirement and the data access frequency by adopting a statistical analysis tool according to the time distribution and the data generation rule of task execution, and marking as high resource occupation if the peak value of the memory requirement exceeds a preset threshold value to obtain the resource occupation characteristic of the task load; Extracting the characteristics of the data access frequency and the memory demand peak value in the resource occupation characteristics through a convolutional neural network, classifying modes aiming at the extracted characteristics, and determining task load types; And if the task load type is high-frequency change, continuously tracking the dynamic data flow by adopting a real-time monitoring tool to acquire real-time change characteristics and obtain dynamic behavior description of the task load.
3. The GPU memory optimization method of claim 2, wherein the step of obtaining dynamic data stream information from a task sequence, processing the dynamic data stream information by using a time sequence analysis tool, extracting intermediate data of a task load, and obtaining time distribution and data generation rules of task execution, wherein the dynamic data stream information is obtained by the following formula: Wherein, the Indicating time of day Is provided with a dynamic data stream information of (a), Representing the total number of tasks in the task sequence, Represent the first The weight coefficient of each task is set to be equal to the weight coefficient of the other task, Represent the first The individual tasks are at the moment Is used to determine the execution state value of (1), A time-decay parameter is indicated and, Represent the first Start time of each task; The intermediate data of the task load is obtained by the following formula: Wherein, the Represent the first The tasks within a time window are loaded with intermediate data, Representing the number of data samples within the time window, Represent the first The resource occupancy of the individual sample points, Represent the first The computational complexity of the individual sample points, Represent the first The memory usage of the sampling points, Represent the first Bandwidth consumption of the individual sampling points; the data generation rule is obtained by the following formula: Wherein, the A predictive function representing the law of data generation, A time variable representing a time series of events, A phase parameter representing the execution of a task, An amplitude coefficient representing the periodic fluctuations, A frequency parameter representing the generation of data, The phase shift amount is indicated as such, The coefficients representing the exponential decay term are presented, Indicating the rate of decay, Representing the reference offset.
4. A GPU memory optimization method according to claim 3, wherein, according to the time distribution and data generation rule of task execution, a statistical analysis tool is used to calculate the peak value of the memory demand and the data access frequency, if the peak value of the memory demand exceeds a preset threshold value, the peak value of the memory demand is marked as high resource occupation, and in the step of obtaining the resource occupation characteristic of the task load, the peak value of the memory demand is obtained by the following formula: Wherein, the A peak value representing the memory requirement is indicated, Indicating the total time for which the task is to be performed, Representing the total number of tasks, Represent the first The individual tasks are at the moment Is used for the memory occupation amount of the memory, Represent the first The individual tasks are at the moment An active state identification of 1 when the task is active, or 0 otherwise; The data access frequency is obtained by the following formula: Wherein, the The frequency of data access is indicated and, The statistical time window is represented as a window of time, Representing the total number of data access events, Represent the first The size of the amount of data in the secondary access event, Represent the first Duration of the secondary access event; the high resource occupancy is marked by the following formula: Wherein, the A feature tag representing the occupancy of a resource, Indicating the current amount of memory usage, Indicating that the memory threshold value is preset, And if the ratio of the current memory usage to the preset memory threshold is greater than or equal to the resource occupancy judgment coefficient, marking as 1 to represent high resource occupancy, otherwise marking as 0 to represent normal resource occupancy.
5. The GPU memory optimization method of claim 4, wherein the feature extraction is performed on the data access frequency and the memory demand peak value in the resource occupation feature by a convolutional neural network, and the pattern classification is performed on the extracted feature, and in the step of determining the task load class, the task load class is obtained by the following formula: Wherein, the Representing the final determined class of task load, The candidate class is represented as such, Representing the extracted integrated feature vector(s), Representing that given features belong to a category Is a function of the probability of (1), Representing categories The corresponding weight vector is used to determine the weight vector, Representing the total number of categories that are to be counted, Representing all classes Molecular summation of (a).
6. The GPU memory optimization method of claim 5, wherein if the task load class is a high frequency change, a real-time monitoring tool is used to continuously track the dynamic data stream to obtain a real-time change characteristic, and in the step of obtaining a dynamic behavior description of the task load, the dynamic behavior description of the task load is obtained by the following formula: Wherein, the Represent the first The dynamic behavior descriptive value of the individual task, Indicating the total length of the observation period, Indicating time of day Is used for the load variation coefficient of the (c) in the (c), The parameters of the variability weights are represented as, Indicating time of day Is used to determine the response characteristic value of the (c), Represents the response characteristic mean value, Representing the standard deviation of the response characteristics.
7. The GPU memory optimization method of claim 1, wherein the step of analyzing the memory allocation request and the release frequency according to the real-time change characteristics of the task load, determining the allocation proportion and the reserved space of the memory pool by using an optimization algorithm based on gradient descent, and obtaining the adjusted memory pool configuration comprises: Acquiring time sequence data of a memory allocation request and periodic data of release frequency from real-time change characteristics of a task load, calculating frequency distribution of the memory allocation request by adopting a time sequence analysis tool, and extracting the periodic characteristics of the release frequency by combining Fourier transformation to obtain dynamic behavior patterns of the memory allocation request and the release frequency; According to the dynamic behavior mode of the release frequency, adopting an optimization algorithm based on gradient descent to iteratively optimize an objective function, and determining a preliminary configuration parameter of a memory pool, wherein the objective function is obtained through the following formula: Wherein, the The distribution ratio is indicated as such, The ratio of the reserved space is indicated, Representing the frequency average of memory allocation requests, Representing the periodic intensity of the release frequency, And Is a weight coefficient; Continuously tracking the frequency change of the memory allocation request and the periodic fluctuation of the release frequency through a real-time monitoring tool, acquiring real-time use state data of the memory pool, marking the memory in the real-time use state data as a high-load state if the memory use rate in the real-time use state data exceeds a preset threshold value, and judging the dynamic adjustment requirement of the memory pool configuration; and aiming at the dynamic adjustment requirement, adjusting the allocation proportion and the reserved space of the memory pool by adopting a preset threshold comparison method, and if the duration of the high-load state exceeds a preset threshold, increasing the reserved space proportion and reducing the allocation proportion to obtain the adjusted memory pool configuration.
8. The GPU memory optimization method of claim 1, wherein the extracting available memory block information from the adjusted memory pool configuration, generating a data compression priority for the intermediate data, compressing the low access frequency intermediate data using a huffman coding algorithm, and obtaining the compressed intermediate data storage unit comprises: Extracting memory block information from the adjusted memory pool configuration, scanning available memory blocks of the memory pool through a memory management tool, obtaining the size and allocation state of each memory block, and generating a memory block information list containing the size and the state of the memory block to obtain a memory block information list; Analyzing the access frequency of the intermediate data through a statistical tool according to the memory block information list and the data access mode, and generating a data compression priority list by adopting a priority ordering algorithm aiming at the intermediate data with the access frequency lower than a preset threshold value to obtain the data compression priority list; Performing compression processing on the intermediate data in the data compression priority list by adopting a Huffman coding algorithm, generating a coding table by constructing a Huffman tree, and converting the intermediate data into compressed data according to the coding table to obtain a compressed data set; And according to the compressed data set and the storage space allocation requirement, allocating the compressed data set to a compressed storage unit through a storage management tool, and if the occupancy rate of the compressed storage unit exceeds a preset threshold, adjusting the storage space allocation proportion to obtain the configuration of the compressed storage unit.
9. The GPU memory optimization method of claim 1, wherein if the access frequency of the compressed intermediate data storage unit is lower than a preset threshold, temporarily storing the compressed intermediate data storage unit in a low-speed storage area, reallocating the memory resources according to the task priority by an intelligent scheduling algorithm, and obtaining the optimized memory allocation scheme comprises: Obtaining access frequency data from the compressed intermediate data storage units, analyzing the access frequency of each storage unit through a statistics tool, and marking the access frequency as low-frequency data if the access frequency is lower than a preset threshold value to obtain a low-frequency data set; According to the low frequency data set, a data migration tool is adopted to transfer a storage unit marked as low frequency data to a low-speed storage area, and a storage allocation record is generated to obtain a low-speed storage allocation record; According to the low-speed storage allocation record and the task priority list, reallocating memory resources through an intelligent scheduling algorithm, and preferentially allocating memory required by a high-priority task to obtain a preliminary memory allocation scheme; And detecting the memory utilization rate by a memory management tool aiming at the preliminary memory allocation scheme, and if the memory utilization rate exceeds a preset threshold, adjusting the resource proportion of the low-speed memory area to obtain an optimized memory allocation scheme.
10. A GPU memory optimization system for performing a GPU memory optimization method according to any of claims 1 to 9, the GPU memory optimization system comprising: the real-time change characteristic acquisition module (10) is used for acquiring dynamic data flow information and intermediate data generation rules of the task load, extracting a memory demand peak value and a data access frequency from a task execution sequence, and classifying a task load mode by adopting a preset convolutional neural network model to obtain the real-time change characteristic of the task load; The memory pool configuration acquisition module (20) is used for analyzing the memory allocation request and the release frequency according to the real-time change characteristics of the task load, determining the allocation proportion and the reserved space of the memory pool by adopting an optimization algorithm based on gradient descent, and obtaining the adjusted memory pool configuration; The data storage unit acquisition module (30) is used for extracting available memory block information from the adjusted memory pool configuration, generating data compression priority aiming at the intermediate data, compressing the intermediate data with low access frequency by adopting a Huffman coding algorithm, and obtaining a compressed intermediate data storage unit; And the memory allocation scheme acquisition module (40) is used for temporarily storing the compressed intermediate data storage unit into the low-speed storage area if the access frequency of the compressed intermediate data storage unit is lower than a preset threshold value, and reallocating the memory resources according to the task priority by the intelligent scheduling algorithm to obtain an optimized memory allocation scheme.

Description

GPU memory optimization method and system Technical Field The invention relates to the technical field of graphics processors, and particularly discloses a GPU memory optimization method and system. Background With the rapid development of artificial intelligence and deep learning, the role of a graphics processor in high-performance computing is becoming more and more critical, and the memory management efficiency of the graphics processor directly affects the performance and resource utilization of computing tasks. The efficient memory management is not only the basis for improving the calculation speed, but also the important guarantee for promoting the large-scale model training and reasoning. However, existing memory management methods have significant drawbacks in terms of dynamics and adaptability. Many schemes rely on static allocation or a simple memory reclamation mechanism, and are difficult to deal with the rapid change of memory requirements in complex tasks, so that the memory is fragmented or excessively allocated, and resource waste is caused. In addition, when the existing method processes intermediate data, the difference between the data characteristics and the task priority is often ignored, and memory occupation and calculation efficiency cannot be effectively balanced. The core challenge is how to achieve efficient allocation of memory and data compression in a dynamic task environment while ensuring continuity of the computing process. Firstly, the lack of dynamic memory allocation makes it difficult for the system to adjust the size of the memory pool in real time according to the task load. For example, when training a large neural network, intermediate data generated by some layers may occupy a large amount of memory in a short time, but the existing method cannot flexibly adjust memory allocation according to task stages, resulting in low memory use efficiency. Second, compression and management of intermediate data lacks intelligent scheduling. A large amount of intermediate data may not be used in the calculation process temporarily, but the existing method cannot effectively identify and temporarily store the data, so that memory resources are occupied, and parallel execution of more tasks is limited. For example, in an image processing task, some intermediate feature maps may not need to be accessed immediately in subsequent computations, but continue to occupy valuable memory due to lack of intelligent scheduling. Therefore, how to design a collaborative management system capable of realizing efficient utilization of memory and calculation continuity through self-adaptive memory allocation, data compression and intelligent scheduling mechanisms in a dynamic task environment becomes a key problem for improving the performance of a graphics processor. Disclosure of Invention The invention provides a GPU memory optimization method and system, which aim to solve at least one defect in the prior art. One aspect of the invention relates to a method for optimizing a GPU memory, comprising the following steps: Acquiring dynamic data flow information and intermediate data generation rules of a task load, extracting a memory demand peak value and a data access frequency from a task execution sequence, and classifying a task load mode by adopting a preset convolutional neural network model to obtain real-time change characteristics of the task load; according to the real-time change characteristics of the task load, analyzing the memory allocation request and the release frequency, and determining the allocation proportion and the reserved space of the memory pool by adopting an optimization algorithm based on gradient descent to obtain the adjusted memory pool configuration; extracting available memory block information from the adjusted memory pool configuration, generating data compression priority aiming at intermediate data, and compressing the intermediate data with low access frequency by adopting a Huffman coding algorithm to obtain a compressed intermediate data storage unit; And if the access frequency of the compressed intermediate data storage unit is lower than a preset threshold value, temporarily storing the compressed intermediate data storage unit in a low-speed storage area, and reallocating the memory resources according to the task priority by an intelligent scheduling algorithm to obtain an optimized memory allocation scheme. Further, acquiring dynamic data flow information and intermediate data generation rules of a task load, extracting a memory demand peak value and a data access frequency from a task execution sequence, classifying task load modes by adopting a preset convolutional neural network model, and acquiring real-time change characteristics of the task load, wherein the steps comprise: Acquiring dynamic data flow information from a task sequence, processing the dynamic data flow information by adopting a time sequence analysis tool, extracting intermedi