CN-122019286-A - GPU testing method and device, electronic equipment and storage medium

CN122019286ACN 122019286 ACN122019286 ACN 122019286ACN-122019286-A

Abstract

The embodiment of the invention provides a GPU testing method, a device, electronic equipment and a storage medium, and relates to the technical field of artificial intelligence, wherein the method comprises the steps of carrying out iterative training on a preset large model for a preset number of times by training data in a fixed sequence by utilizing a GPU to be tested to obtain model loss of the preset large model in each iteration as a target model loss; the method comprises the steps of obtaining a difference characterization value between a sequence formed by target model loss and a reference sequence, wherein the reference sequence comprises model loss obtained by carrying out iterative training on a preset large model for preset times by using a reference GPU (graphics processing unit) according to training data in a fixed sequence, and generating a test result of the GPU to be tested based on the difference characterization value. According to the embodiment of the invention, the GPU can be tested so as to facilitate the knowledge of the training effect of the GPU by technicians, and further, a proper GPU is selected.

Inventors

JIA ZHITONG

Assignees

新华三技术有限公司

Dates

Publication Date: 20260512
Application Date: 20260119

Claims (10)

1. A method of GPU testing, the method comprising: Performing iterative training on a preset large model for preset times by using training data of a to-be-tested GPU in a fixed sequence to obtain model loss of the preset large model in each round of iteration, wherein the model loss is used as a target model loss; Obtaining a difference characterization value between a sequence formed by the target model loss and a reference sequence, wherein the reference sequence comprises model loss obtained by performing the preset number of times of iterative training on the preset large model by using a reference GPU (graphics processing Unit) according to the training data of the fixed sequence; and generating a test result of the GPU to be tested based on the difference characterization value.
2. The method of claim 1, wherein said obtaining the difference between the sequence of the target model loss component and a reference sequence comprises: And respectively aiming at each round of iteration, calculating the relative deviation between the target model loss of the round of iteration and the model loss of the round of iteration in a reference sequence, and taking the relative deviation of each round of iteration as a difference characterization value between a sequence consisting of the target model loss and the reference sequence.
3. The method of claim 2, wherein the relative deviation for each iteration is calculated based on the formula: deviation=|loss1-loss0|/loss0; The method comprises the steps of determining a reference sequence, wherein the displacement is the relative deviation of the round of iteration, loss1 is the target model loss of the round of iteration, and loss0 is the model loss of the round of iteration in the reference sequence.
4. The method according to claim 2, wherein the generating the test result of the GPU to be tested based on the difference characterization value comprises: Determining the number of iteration rounds with the relative deviation smaller than a preset threshold value as the number to be utilized; and generating a test result of the GPU to be tested based on the number to be utilized.
5. The method of claim 4, wherein generating test results for the GPU under test based on the number of GPUs under test comprises: And determining a test result corresponding to the number interval to which the number to be utilized belongs as the test result of the GPU to be tested according to the corresponding relation between the preset number intervals and the test result.
6. The method according to any one of claims 1-5, wherein prior to the training of the predetermined large model in a fixed order of training data for a predetermined number of iterations with the GPU under test, the method further comprises: initializing the preset large model by using a preset random seed, wherein the model loss contained in the reference sequence is obtained by training under the condition that the preset large model is initialized by using the preset random seed; And/or the number of the groups of groups, Closing a designated training function of the preset large model, wherein the designated training function is obtained by randomly inactivating neurons in the preset large model in the process of training the preset large model, and model loss contained in the reference sequence is trained under the condition of closing the designated training function.
7. A GPU testing apparatus, the apparatus comprising: the model training module is used for carrying out preset times of iterative training on a preset large model by utilizing the GPU to be tested according to training data in a fixed sequence to obtain model loss of the preset large model in each round of iteration, wherein the model loss is used as a target model loss; The difference obtaining module is used for obtaining a difference characterization value between a sequence formed by the target model loss and a reference sequence, wherein the reference sequence comprises model loss obtained by carrying out the preset number-of-times iterative training on the preset large model by using a reference GPU (graphics processing unit) according to the training data of the fixed sequence; and the result generation module is used for generating a test result of the GPU to be tested based on the difference characterization value.
8. The apparatus of claim 7, wherein the difference obtaining module is specifically configured to: And respectively aiming at each round of iteration, calculating the relative deviation between the target model loss of the round of iteration and the model loss of the round of iteration in a reference sequence, and taking the relative deviation of each round of iteration as a difference characterization value between a sequence consisting of the target model loss and the reference sequence.
9. The electronic equipment is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus; a memory for storing a computer program; A processor for carrying out the method steps of any one of claims 1-6 when executing a program stored on a memory.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored therein a computer program which, when executed by a processor, implements the method steps of any of claims 1-6.

Description

GPU testing method and device, electronic equipment and storage medium Technical Field The invention relates to the technical field of artificial intelligence, in particular to a GPU testing method and device, electronic equipment and a storage medium. Background With rapid development and widespread use of artificial intelligence, attention is paid to a GPU (Graphic Processing Unit graphics processing unit) as an artificial intelligence computing support. Because of the numerous GPU styles on the market, when performing large model training, a suitable GPU style needs to be selected. At present, when a GPU is selected for a large model training task, performance indexes such as the video memory size, floating point operation rate and the like of the GPU are mainly focused, however, these indexes often cannot effectively reflect the training effect of the GPU on the large model, so that the GPU needs to be tested so as to facilitate the knowledge of a technician on the training effect, and further a proper GPU is selected. Disclosure of Invention The embodiment of the invention aims to provide a GPU testing method, a device, electronic equipment and a storage medium, so as to realize the testing of a GPU, facilitate the knowledge of a technician on the training effect of the GPU and further select a proper GPU. The specific technical scheme is as follows: The embodiment of the invention firstly provides a GPU testing method, which comprises the following steps: Performing iterative training on a preset large model for preset times by using training data of a to-be-tested GPU in a fixed sequence to obtain model loss of the preset large model in each round of iteration, wherein the model loss is used as a target model loss; Obtaining a difference characterization value between a sequence formed by the target model loss and a reference sequence, wherein the reference sequence comprises model loss obtained by performing the preset number of times of iterative training on the preset large model by using a reference GPU (graphics processing Unit) according to the training data of the fixed sequence; and generating a test result of the GPU to be tested based on the difference characterization value. In an embodiment of the present invention, the obtaining the difference between the sequence of the target model loss component and the reference sequence includes: And respectively aiming at each round of iteration, calculating the relative deviation between the target model loss of the round of iteration and the model loss of the round of iteration in a reference sequence, and taking the relative deviation of each round of iteration as a difference characterization value between a sequence consisting of the target model loss and the reference sequence. In one embodiment of the invention, the relative deviation for each iteration is calculated based on the following formula: deviation=|loss1-loss0|/loss0; The method comprises the steps of determining a reference sequence, wherein the displacement is the relative deviation of the round of iteration, loss1 is the target model loss of the round of iteration, and loss0 is the model loss of the round of iteration in the reference sequence. In an embodiment of the present invention, the generating the test result of the GPU to be tested based on the difference characterization value includes: Determining the number of iteration rounds with the relative deviation smaller than a preset threshold value as the number to be utilized; and generating a test result of the GPU to be tested based on the number to be utilized. In an embodiment of the present invention, the generating the test result of the GPU to be tested based on the number of to-be-utilized includes: And determining a test result corresponding to the number interval to which the number to be utilized belongs as the test result of the GPU to be tested according to the corresponding relation between the preset number intervals and the test result. In an embodiment of the present invention, before the training data of the GPU to be tested in a fixed order is used to perform the iterative training for the preset number of times on the preset large model, the method further includes: initializing the preset large model by using a preset random seed, wherein the model loss contained in the reference sequence is obtained by training under the condition that the preset large model is initialized by using the preset random seed; And/or the number of the groups of groups, Closing a designated training function of the preset large model, wherein the designated training function is obtained by randomly inactivating neurons in the preset large model in the process of training the preset large model, and model loss contained in the reference sequence is trained under the condition of closing the designated training function. The embodiment of the invention also provides a GPU testing device, which comprises: the model training m