Search

CN-115794557-B - Hardware counter multiplexing estimation implementation method

CN115794557BCN 115794557 BCN115794557 BCN 115794557BCN-115794557-B

Abstract

A hardware counter multiplexing estimation implementation method is characterized in that a training set is built and a transducer model is trained in an offline stage, estimated data acquired by a hardware counter of performance monitoring software is fitted in an online stage, and therefore accuracy of hardware counter multiplexing is improved. The training set is obtained by running the selected application, monitoring the events in the selected hardware event set by using the hardware counter multiplexing mode and the hardware event mode of the performance monitoring software, recording the obtained values of the hardware counter, and then sequentially carrying out abnormal data screening, first-order differential processing, logarithmic processing and disorder arrangement processing. According to the invention, the time sequence analysis capability of the transducer model is utilized to perform data fitting on estimated data acquired by the hardware counter, so that the multiplexing precision of the hardware counter is remarkably improved, and the accuracy of performance monitoring (such as VTune and PAPI) is improved.

Inventors

  • LIN XINHUA
  • WANG LIUZHEN
  • WANG YICHAO

Assignees

  • 上海交通大学

Dates

Publication Date
20260505
Application Date
20221210

Claims (5)

  1. 1. A hardware counter multiplexing estimation implementation method is characterized in that a training set is built and a transducer model is trained in an offline stage, estimated data acquired by a hardware counter of performance monitoring software is fitted in an online stage, and therefore the multiplexing precision of the hardware counter is improved; the training set is constructed by the following steps: Step 1, selecting a computing mode from Roidinia as a common application in high-performance computing and a hardware event set to be monitored; ① in the hardware counter multiplexing mode, using the format of the character string array to organize the hardware event set to be recorded, running the application and the performance monitoring software selected in the step 1 by the main thread, analyzing the character string array according to fixed time intervals, calling an interface of the performance monitoring software to obtain the count value of the corresponding hardware event, and writing the count value of the corresponding hardware event into the txt file of the recording result in turn, ② in the hardware counter corresponding to one hardware event mode, also running the application and the performance monitoring software selected in the step 1 by the main thread, reading the hardware count of the hardware event to be monitored from the thread according to fixed time intervals, and writing the txt file of the recording result; step 3, obtaining a training set after preprocessing after obtaining the values of a plurality of hardware counters by repeatedly executing the step 2; The training is that the data collected by a hardware counter multiplexing mode in a training set is used as an estimated value, the data collected by a hardware event mode is collected by a corresponding hardware counter and is used as a true value, and the Loss is set as a mean square error; The fitting is that the hardware counter sum collected at the present is recorded in an online stage and compared with the historical hardware counter sum, when the collected hardware counter sum is larger than the historical hardware counter sum, the historical hardware counter sum is updated, otherwise, whether the collected data have problems and wait for the next input is judged, the collected hardware counter is subjected to first-order difference and then is input into a trained transducer model, a hardware count estimated value with higher precision is obtained, and therefore a performance monitoring result with higher precision is achieved.
  2. 2. The hardware counter multiplexing estimation implementation method according to claim 1, wherein the preprocessing includes abnormal data screening, first order difference processing, logarithmic processing and out-of-order arrangement processing performed sequentially.
  3. 3. The method of claim 2, wherein the abnormal data filtering means discarding the data of the last 4 time steps after discarding the last 4% of the acquisition step size by discarding the acquisition of less than fifteen percent of the sum of all hardware counts with the maximum value of the sum of hardware counts being Max to remove the error data acquired due to abnormal overflow of the hardware counter.
  4. 4. The method for implementing multiplexing estimation of hardware counter according to claim 2, wherein the out-of-order arrangement means that the obtained data is randomly disturbed in a ratio of 7:1:2, and the data are sequentially used as a training set, a test piece and a verification set, and no repeated element exists between every two sets.
  5. 5. A system for realizing the hardware counter multiplexing estimation realization method according to any one of claims 1-4 is characterized by comprising a data collection unit, a data preprocessing unit, a training unit and a prediction unit, wherein the data collection unit uses performance monitoring software to collect hardware event counts in a hardware counter multiplexing mode or a hardware event mode of one hardware counter to obtain hardware event count values which are operated once, the data preprocessing unit carries out abnormal data rejection, tail data cleaning, first-order difference and logarithmic processing according to information in the data collection unit to obtain training samples, the training unit uses the training samples to carry out model training to obtain different trained models, and the prediction unit uses the trained models to fit data in the data preprocessing unit and recover the logarithmic data to obtain hardware count estimation values with higher precision.

Description

Hardware counter multiplexing estimation implementation method Technical Field The invention relates to a technology in the application field of a neural network, in particular to a hardware counter multiplexing estimation implementation method. Background To increase the number of hardware events that can be collected by a single collection, commonly used performance analysis tools such as PAPI, intel VTune, etc. provide hardware counter multiplexing functions. The hardware counter multiplexing comprises two steps of time division multiplexing data acquisition and precision recovery through an estimation algorithm. Firstly, the hardware counter multiplexing reads the hardware event data acquired by the hardware counter at fixed time intervals, and secondly, the hardware counter multiplexing supplements the hardware events which are not acquired in the time slices by using a linear interpolation method. Through multiplexing mode, the performance monitoring software can cover more hardware event monitoring with smaller hardware counter, thereby leading researchers to perform data modeling on the performance of the processor and quantitatively analyze the performance of the processor. Therefore, hardware counter multiplexing functionality in processor performance monitoring software is of great value for quantifying research into processor performance. However, the current hardware counter multiplexing method adopting numerical fitting methods such as fixed interpolation, linear interpolation, nonlinear interpolation and the like is poor in universality and low in precision. None of these methods take into account the chronology of the hardware event count, but guess how the interpolation should be done based on a probability distribution, and the hardware events do not meet a specific random process. The interpolation method based on probability distribution is poor in results obtained on partial application and hardware events, and cannot meet the requirements of performance modeling. Disclosure of Invention Aiming at the defects in the prior art, the invention provides a hardware counter multiplexing estimation implementation method, which utilizes the time sequence analysis capability of a transducer model to perform data fitting on estimation data acquired by a hardware counter, thereby obviously improving the multiplexing precision of the hardware counter and improving the accuracy of performance monitoring (such as VTune and PAPI). The invention is realized by the following technical scheme: The invention relates to a hardware counter multiplexing estimation implementation method, which comprises the steps of constructing a training set and training a transducer model in an off-line stage, and fitting estimated data acquired by a hardware counter of performance monitoring software in an on-line stage, so that the multiplexing precision of the hardware counter is improved. The training set is constructed by the following steps: Step 1, selecting a computing mode from Roidinia as a common application in high-performance computing and a hardware event set to be monitored; such applications include, but are not limited to: Running the application selected in the step 1, simultaneously using a hardware counter multiplexing mode of the performance monitoring software and the events in the hardware event set selected in the step 1, and recording the obtained hardware counter values, specifically ① in the hardware counter multiplexing mode, using the format of a character string array to organize the hardware event set to be recorded, running the application and the performance monitoring software selected in the step 1 by a main thread, analyzing the character string array according to a fixed time interval from the thread, calling an interface of the performance monitoring software to obtain the count value of the corresponding hardware event, and sequentially writing the count value of the corresponding hardware event into a txt file of the recording result, ② in the hardware counter corresponding to one hardware event mode, also running the application and the performance monitoring software selected in the step 1 by the main thread, reading the hardware count of the monitored hardware event from the thread according to the fixed time interval, and writing the txt file of the recording result. And 3, repeatedly executing the step 2 to obtain values of a plurality of hardware counters, and preprocessing to obtain a training set. The pretreatment comprises abnormal data screening, first-order differential processing, logarithmic processing and disordered arrangement processing which are sequentially carried out. The abnormal data screening means that the maximum value of the hardware count sum is Max, and the fifteen percent of the hardware count sum is collected and is discarded, so that the last 4% of the collection step length is discarded after the error data collected due to abnormal overflow of the har