US-12619461-B2 - Graphics processing unit performance analysis method, computer device and storage medium

US12619461B2US 12619461 B2US12619461 B2US 12619461B2US-12619461-B2

Abstract

The present application relates to a graphics processing unit (GPU) performance analysis method and a computer device, and a storage medium. The method includes: submitting a GPU task queue generated by a central processing unit (CPU) to a GPU; the GPU task queue including a plurality of GPU tasks sorted by processing start moments, each of the GPU tasks being configured with respective access addresses of a plurality of storage spaces required to be accessed when the GPU task is processed; processing the plurality of GPU tasks through the GPU according to a processing time sequence of each GPU task in the task queue; and acquiring, for a memory access procedure required in a GPU task processing procedure, execution performance reference information of the GPU and/or the CPU for each GPU task according to access time information generated by the CPU and the GPU for the memory access procedure.

Inventors

Bojun SHI
Zeqin ZHOU

Assignees

GLENFLY TECH CO., LTD.

Dates

Publication Date: 20260505
Application Date: 20230830
Priority Date: 20221222

Claims (11)

1 . A graphics processing unit (GPU) performance analysis method, comprising: submitting a GPU task queue generated by a central processing unit (CPU) to a GPU; the GPU task queue comprising a plurality of GPU tasks sorted by processing start moments, each of the GPU tasks being configured with respective access addresses of a plurality of storage spaces required to be accessed when the GPU task is processed; processing the plurality of GPU tasks through the GPU according to a processing time sequence of each GPU task in the task queue; and acquiring, for a memory access procedure required in a GPU task processing procedure, execution performance reference information of the GPU and/or the CPU for each GPU task according to access time information generated by the CPU and the GPU for the memory access procedure; wherein submitting the GPU task queue generated by the CPU to the GPU comprises: creating the plurality of GPU tasks according to a plurality of GPU task creation events, and forming the GPU task queue according to the plurality of GPU tasks; configuring, through a GPU access event, a storage space required to be accessed when each GPU task is processed, an access start moment and an access end moment; and submitting the GPU task queue to the GPU through a GPU task submission event; wherein acquiring, for the memory access procedure required in the GPU task processing procedure, execution performance reference information of the GPU and/or the CPU for each GPU task according to access time information comprises: acquiring, for the memory access procedure required in the GPU task processing procedure, an access duration corresponding to access to each access address corresponding to the GPU task; determining, according to the GPU task creation event, a waiting duration between the GPU task currently created by the CPU and a previous GPU task created previously by the CPU; and determining the execution performance reference information of the GPU and/or the CPU for a same GPU task according to the access duration and the waiting duration corresponding to the GPU task.
2 . The method according to claim 1 , wherein processing the plurality of GPU tasks through the GPU according to the processing time sequence of each GPU task in the task queue comprises: Processing, through the task start event, the GPU task corresponding to a task start event according to the processing time sequence of each GPU task in the task queue, and 33 accessing, through the GPU access event, the storage space required to be accessed in the GPU task processing procedure; and ending the processing procedure of the GPU task through a task end event.
3 . The method according to claim 2 , wherein drawing the GPU task creation event, the access event, the task start event, and the task end event corresponding to each GPU task in the same flowchart according to the processing time sequence comprises: sorting the GPU task creation event, the access event, the task start event, and the task end event corresponding to each GPU task in order of processing time, and selecting an event sorted first as a current event and a target thread where the current event is located; if the current event is the access event, adding a memory access node to a corresponding position of the target thread in the flowchart, connecting the memory access node to a last access node in a target storage space corresponding to the access event, and taking the current memory access node as the last access node in the target storage space; and adding an access address of the target storage space corresponding to the current event to an access event of a last GPU task of the target thread, and selecting next event as the current event; if the current event is the GPU task creation event, marking a GPU task corresponding to the current event as the last GPU task of the target thread, and selecting next event as the current event; and if the current event is the task end event, adding, to the flowchart, a rectangular box corresponding to an access time period during which the GPU task corresponding to the current event accesses the target storage space, connecting the rectangular box to the access node of the target storage space, and selecting next event as the current event.
4 . The method according to claim 1 , wherein determining the execution performance reference information of the GPU and/or the CPU for the same GPU task according to the access duration and the waiting duration corresponding to the GPU task comprises: determining the execution performance reference information of the GPU for the same GPU task if the access duration corresponding to the GPU task exceeds a preset access duration and the waiting duration corresponding to the GPU task does not exceed a preset creation duration; and determining the execution performance reference information of the CPU for the same GPU task if the access duration corresponding to the GPU task does not exceed the presetaccess duration and the waiting duration corresponding to the GPU task exceeds the preset creation duration.
5 . The method according to claim 1 , wherein the method further comprises: drawing the GPU task creation event, the access event, the task start event, and the task end event corresponding to each GPU task in a same flowchart according to a processing time sequence; a horizontal axis of the flow chart indicating the time sequence, and a vertical axis indicating a thread number of the CPU.
6 . A computer device, comprising a memory and a processor, the memory storing a computer program, wherein the processor, when executing the computer program, implements the following steps: submitting a graphics processing unit (GPU) task queue generated by a central processing unit (CPU) to a GPU; the GPU task queue comprising a plurality of GPU tasks sorted by processing start moments, each of the GPU tasks being configured with respective access addresses of a plurality of storage spaces required to be accessed when the GPU task is processed; processing the plurality of GPU tasks through the GPU according to a processing time sequence of each GPU task in the task queue; and acquiring, for a memory access procedure required in a GPU task processing procedure, execution performance reference information of the GPU and/or the CPU for each GPU task according to access time information generated by the CPU and the GPU for the memory access procedure; wherein submitting the GPU task queue generated by the CPU to the GPU comprises: creating the plurality of GPU tasks according to a plurality of GPU task creation events, and forming the GPU task queue according to the plurality of GPU tasks; configuring, through a GPU access event, a storage space required to be accessed when each GPU task is processed, an access start moment and an access end moment; and submitting the GPU task queue to the GPU through a GPU task submission event; wherein acquiring, for the memory access procedure required in the GPU task processing procedure, execution performance reference information of the GPU and/or the CPU for each GPU task according to access time information comprises: acquiring, for the memory access procedure required in the GPU task processing procedure, an access duration corresponding to access to each access address corresponding to the GPU task; determining, according to the GPU task creation event, a waiting duration between the GPU task currently created by the CPU and a previous GPU task created previously by the CPU; and determining the execution performance reference information of the GPU and/or the CPU for a same GPU task according to the access duration and the waiting duration corresponding to the GPU task.
7 . The computer device of claim 6 , wherein processing the plurality of GPU tasks through the GPU according to the processing time sequence of each GPU task in the task queue comprises: processing through the task start event, the GPU task corresponding to a task start event according to the processing time sequence of each GPU task in the task queue, and accessing, through the GPU access event, the storage space required to be accessed in the GPU task processing procedure; and ending the processing procedure of the GPU task through a task end event.
8 . The computer device of claim 7 , wherein the processor, when executing the computer program, further implements the following steps: drawing the GPU task creation event, the access event, the task start event, and the task end event corresponding to each GPU task in a same flowchart according to a processing time sequence; a horizontal axis of the flow chart indicating the time sequence, and a vertical axis indicating a thread number of the CPU.
9 . The computer device of claim 8 , wherein drawing the GPU task creation event, the access event, the task start event, and the task end event corresponding to each GPU task in the same flowchart according to the processing time sequence comprises: sorting the GPU task creation event, the access event, the task start event, and the task end event corresponding to each GPU task in order of processing time, and selecting an event sorted first as a current event and a target thread where the current event is located; if the current event is the access event, adding a memory access node to a corresponding position of the target thread in the flowchart, connecting the memory access node to a last access node in a target storage space corresponding to the access event, and taking the current memory access node as the last access node in the target storage space; and adding an access address of the target storage space corresponding to the current event to an access event of a last GPU task of the target thread, and selecting next event as the current event; if the current event is the GPU task creation event, marking a GPU task corresponding to the current event as the last GPU task of the target thread, and selecting next event as the current event; and if the current event is the task end event, adding, to the flowchart, a rectangular box corresponding to an access time period during which the GPU task corresponding to the current event accesses the target storage space, connecting the rectangular box to the access node of the target storage space, and selecting next event as the current event.
10 . The computer device of claim 6 , wherein determining the execution performance reference information of the GPU and/or the CPU for the same GPU task according to the access duration and the waiting duration corresponding to the GPU task comprises: determining the execution performance reference information of the GPU for the sameGPU task if the access duration corresponding to the GPU task exceeds a preset access duration and the waiting duration corresponding to the GPU task does not exceed a preset creation duration; and determining the execution performance reference information of the CPU for the same GPU task if the access duration corresponding to the GPU task does not exceed the preset access duration and the waiting duration corresponding to the GPU task exceeds the preset creation duration.
11 . A non-transitory computer-readable storage medium, storing a computer program, when the computer program is executed by a processor comprising: submitting a GPU task queue generated by a central processing unit (CPU) to a GPU; the GPU task queue comprising a plurality of GPU tasks sorted by processing start moments, each of the GPU tasks being configured with respective access addresses of a plurality of storage spaces required to be accessed when the GPU task is processed; processing the plurality of GPU tasks through the GPU according to a processing time sequence of each GPU task in the task queue; and acquiring, for a memory access procedure required in a GPU task processing procedure, execution performance reference information of the GPU and/or the CPU for each GPU task according to access time information generated by the CPU and the GPU for the memory access procedure; wherein submitting the GPU task queue generated by the CPU to the GPU comprises: creating the plurality of GPU tasks according to a plurality of GPU task creation events, and forming the GPU task queue according to the plurality of GPU tasks; configuring, through a GPU access event, a storage space required to be accessed when each GPU task is processed, an access start moment and an access end moment; and submitting the GPU task queue to the GPU through a GPU task submission event; wherein acquiring, for the memory access procedure required in the GPU task processing procedure, execution performance reference information of the GPU and/or the CPU for each GPU task according to access time information comprises: acquiring, for the memory access procedure required in the GPU task processing procedure, an access duration corresponding to access to each access address corresponding to the GPU task; determining, according to the GPU task creation event, a waiting duration between the GPU task currently created by the CPU and a previous GPU task created previously by the CPU; and determining the execution performance reference information of the GPU and/or the CPU for a same GPU task according to the access duration and the waiting duration corresponding to the GPU task.

Description

CROSS REFERENCE TO RELATED APPLICATIONS This application claims all benefits accruing under 35 U.S.C. § 119 to Chinese Patent Applications No. 2022116558648 filed on Dec. 22, 2022 in the China National Intellectual Property Administration, the content of which is hereby incorporated by reference. TECHNICAL FIELD The present application relates to the field of graphics processing unit (GPU) performance tuning technologies, and in particular, to a GPU performance analysis method and apparatus, a computer device, and a storage medium. BACKGROUND Modern desktop operating systems rely extensively on GPUs for acceleration. The GPUs are applicable to fields such as 3D graphics acceleration, video encoding and decoding, high-performance parallel computing, and display output. Performance analysis and optimization of a GPU is an important aspect of development of a graphics application. Hardware and software performance bottlenecks are found by analyzing actual operation of the GPU, and a driver is adjusted and optimized to maximize performance of hardware of the GPU. In conventional technologies, in order to find a root cause of a GPU performance problem, there is typically a need to understand an overall behavior of the graphics application and a graphics system and to optimize the driver as a whole to avoid the bottlenecks. However, source code of the graphics application may be unavailable, and architectures of graphics systems of different operating systems may be different, resulting in incapability to analyze the overall behavior of the graphics application and the graphics system and incapability to acquire accurate GPU performance data. SUMMARY Based on this, there is a need to provide, with respect to the above technical problems, a GPU performance analysis method and apparatus, a computer device, and a storage medium that can analyze performance of a GPU from an overall behavior of a central processing unit (CPU) and the GPU without the source code of the graphics application. In a first aspect, the present application provides a GPU performance analysis method. The method includes: submitting a GPU task queue generated by a CPU to a GPU; the GPU task queue including a plurality of GPU tasks sorted by processing start moments, each of the GPU tasks being configured with respective access addresses of a plurality of storage spaces required to be accessed when the GPU task is processed;processing the plurality of GPU tasks through the GPU according to a processing time sequence of each GPU task in the task queue; andacquiring, for a memory access procedure required in a GPU task processing procedure, execution performance reference information of the GPU and/or the CPU for each GPU task according to access time information generated by the CPU and the GPU for the memory access procedure. In an embodiment, the submitting the GPU task queue generated by the CPU to the GPU includes: creating the plurality of GPU tasks according to a plurality of GPU task creation events, and forming the GPU task queue according to the plurality of GPU tasks;configuring, through a GPU access event, an access address of a storage space required to be accessed when each GPU task is processed; andsubmitting the GPU task queue to the GPU through a GPU task submission event. In an embodiment, the processing the plurality of GPU tasks through the GPU according to the processing time sequence of each GPU task in the task queue includes: processing, through the task start event, the GPU task corresponding to a task start event according to the processing time sequence of each GPU task in the task queue, and accessing, through the GPU access event, the storage space required to be accessed in the GPU task processing procedure; andending the processing procedure of the GPU task through a task end event. In an embodiment, the acquiring, for the memory access procedure required in the GPU task processing procedure, execution performance reference information of the GPU and/or the CPU for each GPU task according to access time information generated by the CPU and the GPU for the memory access procedure includes: acquiring, for the memory access procedure required in the GPU task processing procedure, an access duration corresponding to access to each access address corresponding to the GPU task;determining, according to the GPU task creation event, a waiting duration between the GPU task currently created by the CPU and a previous GPU task created previously by the CPU; anddetermining the execution performance reference information of the GPU and/or the CPU for a same GPU task according to the access duration and the waiting duration corresponding to the GPU task. In an embodiment, the determining the execution performance reference information of the GPU and/or the CPU for the same GPU task according to the access duration and the waiting duration corresponding to the GPU task includes: determining the execution performance reference infor