CN-121979752-A - Model performance test method, device, equipment and medium

CN121979752ACN 121979752 ACN121979752 ACN 121979752ACN-121979752-A

Abstract

The invention relates to the technical field of safety test, and discloses a method, a device, equipment and a medium for testing model performance, wherein the method, the device, the equipment and the medium are used for sending pressure data to a code task model, receiving GPU load fed back by the code task model, tokens processing efficiency and response time, dynamically adjusting the pressure data to ensure reasonable utilization of hardware resources and explore model limit performance, reducing the pressure when the GPU load is overhigh, determining a user load capacity limit value when the response time exceeds a standard, determining the maximum processing efficiency when tokens processing efficiency reaches a peak value, and finally taking the two key indexes as test results. According to the invention, the pressure data is regulated through multiple gradients, so that the pressure data is more close to dynamic irregular fluctuation of the concurrent quantity in the real production environment, and the accuracy of a model test result is improved.

Inventors

ZHONG MINLING

Assignees

招商局金融科技有限公司

Dates

Publication Date: 20260505
Application Date: 20251230

Claims (10)

1. A model performance testing method, comprising: acquiring pressure data for testing the load of the code task model, and sending the pressure data to a preset code task model; Receiving GPU load, tokens processing efficiency and response time under the pressure data fed back by the code task model; when the GPU load exceeds a preset interval, performing multiple adjustment on the pressure data to obtain adjustment pressure data, and returning to the step of sending the pressure data to a preset code task model; performing multiple adjustment on the pressure adjustment data again to obtain readjusted pressure data, and returning to the step of sending the pressure data to a preset code task model; When the GPU load is in a preset interval, the response time exceeds a preset response threshold value and the tokens processing efficiency reaches a preset peak value, taking the peak value as the maximum tokens processing efficiency of the code task model under a preset condition; and taking the user load capacity limit value and the maximum tokens processing efficiency as performance test results.
2. The model performance test method of claim 1, wherein after receiving the feedback of the code task model for GPU load under the pressure data, tokens processing efficiency, and response time, the method further comprises: Receiving the network transmission quantity fed back by the code task model under the pressure data; acquiring a sending time stamp of the sending pressure data and a receiving time stamp of the receiving pressure data, and calculating a time difference between the sending time stamp and the receiving time stamp; calculating the ratio of the network transmission quantity and the time difference, and taking the ratio as the network throughput under the pressure data.
3. The method for testing performance of a model of claim 1, wherein said performing a multiple adjustment on said pressure data to obtain adjusted pressure data comprises: judging whether the GPU load exceeds the upper limit of a preset interval or not; When the GPU load exceeds the upper limit of the interval, the pressure data is reduced according to a preset first scale factor, and the reduced pressure data is used as adjustment pressure data; and when the load of the GPU does not exceed the upper limit of the interval, amplifying the pressure data according to a preset second proportion multiple, and taking the amplified pressure data as adjustment pressure data.
4. The method for testing performance of a model of claim 1, wherein the multiple adjustment is performed on the pressure data to obtain adjusted pressure data, and the method further comprises: Respectively carrying out normalization processing on the GPU load, the tokens processing efficiency and the response time to obtain normalization load corresponding to the GPU load, normalization tokens efficiency corresponding to the tokens processing efficiency and normalization response time corresponding to the response time; splicing the normalized load, the normalized tokens efficiency and the normalized response time into a multidimensional performance index vector; calculating a deviation vector between the multi-dimensional performance index vector and a preset index vector; and adjusting the pressure data according to a preset pressure index mapping library and the deviation vector to obtain adjusted pressure data.
5. The model performance test method of claim 1, wherein the response time comprises a streaming request response time and a non-streaming request response time, the preset response threshold comprises a streaming request response threshold and a non-streaming request response threshold, and when the GPU load is in a preset interval and the response time exceeds the preset response threshold, the method comprises, after taking pressure data corresponding to the response time as a user load capacity limit value: when the streaming request response time exceeds the streaming request response threshold, taking pressure data corresponding to the streaming request response time as a user load capacity limit value under streaming request; And when the response time of the non-streaming request exceeds the response threshold value of the non-streaming request, taking the pressure data corresponding to the response time of the non-streaming request as a user load capacity limit value under the non-streaming request.
6. The model performance test method of claim 1, wherein after receiving the feedback of the code task model for GPU load under the pressure data, tokens processing efficiency, and response time, the method further comprises: determining a load threshold of the GPU load; Identifying an abnormal GPU load in the GPU loads that exceeds the load threshold; And identifying pressure data corresponding to the abnormal GPU load, and determining a model critical point according to the corresponding pressure data.
7. The model performance test method of claim 1, wherein the sending the pressure data to a pre-set code task model comprises: determining a cooling interval for transmitting pressure data; and sending the pressure gradient data to the preset code task model one by one according to the cooling interval period.
8. A model performance testing apparatus, comprising: the pressure data transmitting module is used for acquiring pressure data for testing the load of the code task model and transmitting the pressure data to a preset code task model; the receiving module is used for receiving GPU load, tokens processing efficiency and response time under the pressure data fed back by the code task model; The pressure data multiple adjustment module is used for performing multiple adjustment on the pressure data when the GPU load exceeds a preset interval to obtain adjustment pressure data, and returning to the step of sending the pressure data to a preset code task model; the load capacity limit value acquisition module is used for taking pressure data corresponding to the response time as a user load capacity limit value when the GPU load is in a preset interval and the response time exceeds a preset response threshold value; the pressure data multiple readjustment module is used for carrying out multiple adjustment on the pressure adjustment data again to obtain readjusted pressure data, and returning to the step of sending the pressure data to a preset code task model; A maximum tokens processing efficiency obtaining module, configured to, when the GPU load is in a preset interval, the response time exceeds a preset response threshold, and the tokens processing efficiency reaches a preset peak value, take the peak value as the maximum tokens processing efficiency of the code task model under a preset condition; and the performance test result acquisition module is used for taking the user load capacity limit value and the maximum tokens processing efficiency as performance test results.
9. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the model performance testing method according to any of claims 1 to 7 when executing the computer program.
10. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the model performance testing method according to any one of claims 1 to 7.

Description

Model performance test method, device, equipment and medium Technical Field The present invention relates to the field of security testing technologies, and in particular, to a method, an apparatus, a device, and a medium for testing performance of a model. Background Along with the large-scale application of the code task model in code completion, intelligent question and answer, automatic programming and other code task scenes, VLLM-based reasoning service has become a core technical architecture for supporting high-concurrency code task requests. The conventional performance test method of the code task model generally adopts a fixed concurrent pressure gradient to carry out a pressure test, but the concurrency of the real production environment is dynamically and irregularly fluctuated, so that the test result cannot accurately reflect the bearing capacity of the production environment. Therefore, in order to meet the increasing performance test requirements of the code task model, the performance test method of the current code task model needs to be improved so as to solve the problem of insufficient accuracy of the test result of the existing method. Disclosure of Invention The invention provides a method, a device, equipment and a medium for testing model performance, which mainly solve the problem of insufficient accuracy of a result of the model performance test. In a first aspect, a method for testing performance of a model is provided, including: acquiring pressure data for testing the load of the code task model, and sending the pressure data to a preset code task model; Receiving GPU load, tokens processing efficiency and response time under the pressure data fed back by the code task model; when the GPU load exceeds a preset interval, performing multiple adjustment on the pressure data to obtain adjustment pressure data, and returning to the step of sending the pressure data to a preset code task model; performing multiple adjustment on the pressure adjustment data again to obtain readjusted pressure data, and returning to the step of sending the pressure data to a preset code task model; When the GPU load is in a preset interval, the response time exceeds a preset response threshold value and the tokens processing efficiency reaches a preset peak value, taking the peak value as the maximum tokens processing efficiency of the code task model under a preset condition; and taking the user load capacity limit value and the maximum tokens processing efficiency as performance test results. In a second aspect, there is provided a model performance test apparatus comprising: the pressure data transmitting module is used for acquiring pressure data for testing the load of the code task model and transmitting the pressure data to a preset code task model; the receiving module is used for receiving GPU load, tokens processing efficiency and response time under the pressure data fed back by the code task model; The pressure data multiple adjustment module is used for performing multiple adjustment on the pressure data when the GPU load exceeds a preset interval to obtain adjustment pressure data, and returning to the step of sending the pressure data to a preset code task model; the load capacity limit value acquisition module is used for taking pressure data corresponding to the response time as a user load capacity limit value when the GPU load is in a preset interval and the response time exceeds a preset response threshold value; the pressure data multiple readjustment module is used for carrying out multiple adjustment on the pressure adjustment data again to obtain readjusted pressure data, and returning to the step of sending the pressure data to a preset code task model; A maximum tokens processing efficiency obtaining module, configured to, when the GPU load is in a preset interval, the response time exceeds a preset response threshold, and the tokens processing efficiency reaches a preset peak value, take the peak value as the maximum tokens processing efficiency of the code task model under a preset condition; and the performance test result acquisition module is used for taking the user load capacity limit value and the maximum tokens processing efficiency as performance test results. In a third aspect, a computer device is provided comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor executing the computer program to perform the steps of a model performance testing method as described above. In a fourth aspect, a computer readable storage medium is provided, the computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of a model performance testing method as described above. According to the scheme realized by the model performance testing method, the device, the computer equipment and the storage medium, the standardization and the repeatability of the test ar