Search

CN-122019201-A - Data processing method, device and equipment of GPU server and storage medium

CN122019201ACN 122019201 ACN122019201 ACN 122019201ACN-122019201-A

Abstract

The application discloses a data processing method, a device, equipment and a storage medium of a GPU server, and relates to the field of data processing; the method comprises the steps of calculating a correlation coefficient between any two data according to the data to be processed, dividing the data to be processed into a plurality of data partitions according to the correlation coefficient, obtaining a calculation force distribution proportion of each data partition according to characteristic parameters of each data partition, distributing the number of GPU stream multiprocessors corresponding to each data partition according to the calculation force distribution proportion, obtaining a parallel calculation result of each data partition according to the distributed number of GPU stream multiprocessors and the calculation force distribution proportion, obtaining a final processing result according to the parallel calculation result of each data partition, calculating an average relative error between the final processing result and a standard real result, and outputting the final processing result to target equipment if the average relative error is smaller than or equal to an error threshold value. The method can improve the data processing efficiency.

Inventors

  • WANG CHUNJIE
  • HOU JINGYAN

Assignees

  • 北京神州鲲泰信息技术有限公司

Dates

Publication Date
20260512
Application Date
20260415

Claims (10)

  1. 1.A data processing method of a GPU server, the method comprising: acquiring data to be processed; according to the data to be processed, calculating the relevance coefficient between any two data in the data to be processed; dividing the data to be processed into a plurality of data partitions according to the relevance coefficient; According to the characteristic parameters of each data partition, obtaining the calculation force distribution proportion required by each data partition; Distributing the quantity of GPU stream multiprocessors corresponding to each data partition according to the calculated force distribution proportion; Obtaining a parallel computing result of each data partition according to the distributed quantity of GPU stream multiprocessors and the computing power distribution proportion; obtaining a final processing result according to the parallel calculation result of each data partition; calculating an average relative error between the final processing result and a standard real result; And if the average relative error is smaller than or equal to an error threshold, outputting the final processing result to target equipment.
  2. 2. The method according to claim 1, wherein the characteristic parameters include a data partition size, a data type, and a computational complexity, and the obtaining the calculation power allocation proportion required by each data partition according to the characteristic parameters of each data partition includes: Determining a weight coefficient corresponding to the data type according to the data type; And obtaining the calculation force distribution proportion required by each data partition according to the data partition size, the weight coefficient and the calculation complexity of each data partition.
  3. 3. The method of claim 1, wherein the error threshold is obtained by: and obtaining an error threshold according to the error reference value, the calculation complexity and the data partition size.
  4. 4. The method of claim 1, wherein the dividing the data to be processed into a plurality of data partitions according to the correlation coefficient comprises: When the relevance coefficient between any two data is greater than or equal to a first threshold value, dividing the two data into the same data partition; and dividing the two data into different data partitions when the relevance coefficient between any two data is smaller than a first threshold value.
  5. 5. The method according to claim 1, wherein the method further comprises: if the sum of the numbers of the GPU stream multiprocessors distributed by all the data partitions is smaller than the total number of the GPU stream multiprocessors, determining the rest GPU stream multiprocessors according to the sum of the numbers of the GPU stream multiprocessors distributed by all the data partitions and the total number of the GPU stream multiprocessors; And distributing the rest GPU stream multiprocessors to the data partition with the highest computing power distribution proportion.
  6. 6. The method according to claim 1, wherein the method further comprises: And if the average relative error is greater than the error threshold value, carrying out parallel calculation on each data partition again.
  7. 7. The method according to claim 1, wherein calculating the correlation coefficient between any two data in the data to be processed according to the data to be processed comprises: denoising the data to be processed to obtain denoised data; and calculating the relevance coefficient between any two data in the denoised data according to the denoised data.
  8. 8. A data processing apparatus of a GPU server, the apparatus comprising: The acquisition module is used for acquiring data to be processed; The data processing module is used for calculating the relevance coefficient between any two data in the data to be processed according to the data to be processed, dividing the data to be processed into a plurality of data partitions according to the relevance coefficient, obtaining the calculation force distribution proportion required by each data partition according to the characteristic parameter of each data partition, distributing the number of GPU stream multiprocessors corresponding to each data partition according to the calculation force distribution proportion, obtaining the parallel calculation result of each data partition according to the distributed number of GPU stream multiprocessors and the calculation force distribution proportion, obtaining the final processing result according to the parallel calculation result of each data partition, and calculating the average relative error between the final processing result and the standard real result; And the judging module is used for outputting the final processing result to target equipment if the average relative error is smaller than or equal to an error threshold value.
  9. 9. A computing device comprising a memory and a processor; wherein one or more computer programs are stored in the memory, the one or more computer programs comprising instructions, which when executed by the processor, cause the computing device to perform the method of any of claims 1-7.
  10. 10. A computer-readable storage medium, characterized in that the computer-readable storage medium is for storing a computer program, the computer program is for performing the method of any of claims 1 to 7.

Description

Data processing method, device and equipment of GPU server and storage medium Technical Field The present application relates to the field of data processing, and in particular, to a data processing method, apparatus, device and storage medium for a GPU server. Background Along with the rapid development of big data technology, various data intensive application scenes put forward higher requirements on the efficiency, accuracy and resource utilization rate of data processing, and GPU (Graphics Processing Unit) servers can efficiently process massive data by virtue of a massive parallel computing architecture, and are widely applied to various data processing scenes such as data operation, data statistics, data integration and the like, so that the method becomes one of devices for supporting big data processing. In the prior art, a relatively simple processing flow is generally adopted in a data processing method of a GPU server, data to be processed is generally obtained first, after original data is simply divided directly, GPU stream multiprocessor resources are allocated to each data partition by adopting a fixed computational power allocation policy, then parallel computation is performed on the corresponding data partition through each stream multiprocessor, and finally, the computation result is integrated and output after simple verification is performed. Some existing schemes may perform preliminary processing on the data, or simply adjust the distribution of computational power, and some schemes may set a fixed error threshold for result verification in an attempt to improve the data processing effect. However, the data processing method of the existing GPU server has serious memory access delay in the data processing process, which results in low data processing efficiency. Disclosure of Invention The application provides a data processing method, device and equipment of a GPU server and a storage medium, which can improve data processing efficiency. In order to achieve the above purpose, the application adopts the following technical scheme: in a first aspect, the present application provides a data processing method of a GPU server, including: acquiring data to be processed; according to the data to be processed, calculating the relevance coefficient between any two data in the data to be processed; dividing the data to be processed into a plurality of data partitions according to the relevance coefficient; According to the characteristic parameters of each data partition, obtaining the calculation force distribution proportion required by each data partition; Distributing the quantity of GPU stream multiprocessors corresponding to each data partition according to the calculated force distribution proportion; Obtaining a parallel computing result of each data partition according to the distributed quantity of GPU stream multiprocessors and the computing power distribution proportion; obtaining a final processing result according to the parallel calculation result of each data partition; calculating an average relative error between the final processing result and a standard real result; And if the average relative error is smaller than or equal to an error threshold, outputting the final processing result to target equipment. Optionally, the characteristic parameters include a size of a data partition, a data type, and a computational complexity, and the obtaining, according to the characteristic parameters of each data partition, a calculation power allocation proportion required by each data partition includes: Determining a weight coefficient corresponding to the data type according to the data type; And obtaining the calculation force distribution proportion required by each data partition according to the data partition size, the weight coefficient and the calculation complexity of each data partition. Optionally, the error threshold is obtained by: and obtaining an error threshold according to the error reference value, the calculation complexity and the data partition size. Optionally, the dividing the data to be processed into a plurality of data partitions according to the relevance coefficient includes: When the relevance coefficient between any two data is greater than or equal to a first threshold value, dividing the two data into the same data partition; and dividing the two data into different data partitions when the relevance coefficient between any two data is smaller than a first threshold value. Optionally, the method further comprises: if the sum of the numbers of the GPU stream multiprocessors distributed by all the data partitions is smaller than the total number of the GPU stream multiprocessors, determining the rest GPU stream multiprocessors according to the sum of the numbers of the GPU stream multiprocessors distributed by all the data partitions and the total number of the GPU stream multiprocessors; And distributing the rest GPU stream multiprocessors to the data p