CN-121255194-B - Machine learning-based heterogeneous code optimization using method for RISC-V and TPU

CN121255194BCN 121255194 BCN121255194 BCN 121255194BCN-121255194-B

Abstract

The invention relates to a heterogeneous code optimizing use method of RISC-V and TPU based on machine learning, which belongs to the technical field of RISC-V architecture embedded heterogeneous computation, uses a deep learning model comprising an embedded layer and an RNN layer to learn the representation of codes, extracts low-dimensional feature vectors, then searches for the best classical machine learning algorithm, namely the best test performance obtained in a plurality of machine learning methods, maps the features to specific output to select tasks to be distributed to the TPU and a CPU to execute the required tasks, and precisely matches RISC-V architecture characteristics, breaks through heterogeneous cooperative bottlenecks, ensures classification accuracy, reduces false allocation risk, reduces development cost, allocates computing resources as required and maximizes heterogeneous system performance.

Inventors

DAI HONGJUN
LI ZHUOHANG
LI BING
ZHAI MINGJIE

Assignees

山东大学

Dates

Publication Date: 20260505
Application Date: 20250909

Claims (8)

1. The heterogeneous code optimizing use method based on the machine learning RISC-V and the TPU is characterized in that a deep learning model comprising an embedded layer and an RNN layer is used for learning the representation of the code, extracting low-dimensional characteristic vectors and then searching for the optimal classical machine learning algorithm; comprises three stages of (A) preprocessing LLVM-IR codes, including the following steps: (1) Running SPEC2017 benchmark test and OpenCL code on RISC-V development board SG2042 and TPU, compiling test set source code into IR representation by LLVM; (2) LLVM-IR code was converted into a series of token, with the following rules: Removing blank lines, notes and nonsensical code lines; Rewriting all vectors, arrays and constants into data corresponding to the types of the vectors, the arrays and the constants to normalize the data; replacing all variables, declarations, function names and character strings with placeholders; rewriting an address Alignment instruction Alignment into a token; rewriting the data type into a single token; (3) Token atomization, namely creating a dictionary to correspond all tokens to unique integers, and converting all the tokens into a string of integer strings, wherein each code corresponds to the unique integer string; (B) Training a deep learning model, wherein the deep learning model comprises an embedded layer, an RNN layer and a maximum pooling layer, the RNN layer is a multistage RNN model framework, the first layer is a vectorization layer, an integer string obtained in the last step is used as input, each integer is converted into a vector and divided by 64, the second layer is a double-layer neural network, the divided vectors obtained in the last layer are used as input, the kernel sizes of 32 filters are set to be 32 and the Sigmoid is used as an activation function, the third layer takes the maximum value of all results obtained by each filter as a result to obtain a vector with the length of 32, and finally, the vector is a full-connection layer with the length of 32x2, and Relu is used as an activation function to obtain a two-classification result; (C) The method comprises the steps of training a machine learning model by using a built feature extractor, constructing another RNN model with the same structure as the feature extractor, transferring the learning experience obtained in the previous step, namely a neural network parameter, to the RNN model, adding an auxiliary element on the basis of a vector with the length of 32 obtained in the previous step to serve as an input training machine learning algorithm, specifically speaking, taking a vectorization layer and a neural network layer obtained in the previous step as the basis of the model, adding an auxiliary element after passing through a filter, namely a vector with the length of 33, to train the machine learning model, thus obtaining the feature extractor, extracting features by code segments, generating feature vectors, selecting integrated learning after obtaining one-dimensional feature vectors, stacking the results obtained by three machine learning algorithms by using Xgboost, adaBoostGuassian and random forests as machine learning algorithms, and completing the final classification work by means of voting of Adaboost to obtain the allocation situation of TPU and CPU.
2. The method for optimized use of heterogeneous codes of RISC-V and TPU based on machine learning according to claim 1, wherein in step (A), the address alignment instruction designs the rewrite rule according to the number of common alignment bytes of RISC-V, the number of common alignment bytes being 8/16/32 bytes.
3. The method for optimized use of heterogeneous codes of RISC-V and TPU based on machine learning according to claim 1, wherein in step (A), four types int, float, double, ptr are reserved in data type rewriting.
4. The method of optimizing the heterogeneous codes of RISC-V and TPU based on machine learning according to claim 1, wherein in the step (B), the result of the two classifications determines that the codes are distributed to the TPU or RISC-V CPU for execution, the training result is trained by using the divided data set, ADABoost is started to optimize the training result, namely, a plurality of weak classifiers are constructed by iteration and combined into a sample with a strong classifier focus on the previous round of classification errors, the samples are given higher weight, the weight of the well represented weak classifier is given to the sample at the same time, and the final prediction result is finally obtained by weighted voting or weighted summation, the weight distribution of the training sample is initialized firstly, the weight of each sample is usually set to be uniform, namely, the weight of each sample is 1/N, N is the sample number, then in each round of iteration, one weak classifier is trained according to the current weight distribution, the classification error rate of the classifier is calculated, the error rate is determined to be greater according to the error rate, the lower weight of the weak classifier is updated, the sample weight is amplified, and the weight of the well represented weak classifier is simultaneously normalized according to the error number of the repeated until the weak classifier meets the requirement, and the weight is 1 repeatedly normalized.
5. The method for optimized use of heterogeneous codes of RISC-V and TPU based on machine learning according to claim 1, wherein in step (C), the auxiliary element is 0 or 1.
6. The machine learning based heterogeneous code optimization method of RISC-V and TPU according to claim 1, wherein in step (C), xgboost has a parameter of max_depth=7, learning_rate=0.01, n_ estimators =100, n_jobs=15, and the random forest has a parameter of: max _ depth = 6, n_ estimators =100.
7. A computer readable storage medium, having stored thereon a program which when executed by a processor performs the steps of a machine learning based RISC-V and TPU heterogeneous code optimization method as defined in claim 1.
8. An electronic device comprising a memory, a processor and a program stored on the memory and executable on the processor, the processor implementing the steps in a machine learning based RISC-V and TPU heterogeneous code optimization method of use as defined in claim 1 when the program is executed by the processor.

Description

Machine learning-based heterogeneous code optimization using method for RISC-V and TPU Technical Field The invention relates to a machine learning-based heterogeneous code optimization using method of RISC-V and TPU, belonging to the technical field of RISC-V architecture embedded heterogeneous computing. Background Heterogeneous systems composed of RISC-V and TPU are attractive because they can provide potentially enormous performance at lower cost. However, achieving this potential is challenging due to the complexity of programming. Users typically need to determine the potential portions of code that are suitable for SIMD parallelization and rewrite with the language of a particular architecture. To achieve good performance, a large number of rewrites may be required to accommodate the TPU programming model and amortize the cost of communicating with separate devices having different address spaces. This programming complexity prevents more adoption of TPU-based heterogeneous systems. Data conversion is critical to achieving good performance on TPU. While TPU's may improve performance, it is not always superior to CPU's. There is therefore a need for a technique to determine the task allocation of the TPU and the CPU. Disclosure of Invention Aiming at the defects of the prior art, the invention provides a heterogeneous code using method of RISC-V and TPU based on machine learning. Term interpretation: Heterogeneous systems-in general, parallel computing systems can be divided into two broad categories depending on the type of processor and other resources used. Systems using multiple processors of the same type are referred to as homogeneous systems, while systems using multiple processors of different types are referred to as heterogeneous systems. Currently, parallel computing systems are mainly heterogeneous systems composed of processors of various sizes, speeds, and memory types. The most common systems are composed of a CPU combined with dedicated processing units or accelerators, such as Graphics Processing Units (GPUs) and Tensor Processing Units (TPUs), field Programmable Gate Arrays (FPGAs), etc. TPU is an abbreviation of Tensor Processing Unit, translated into tensor processing unit, which is an application specific integrated circuit chip used for machine learning and deep learning calculations. The TPU is mainly used to accelerate and optimize the computation of machine learning algorithms, especially those based on tensor operations. These chips are designed to handle machine learning workloads more efficiently, especially as compared to conventional Central Processing Units (CPUs) and Graphics Processors (GPUs). TPU is designed by Google and used in its data center to support its search, cloud computing, and artificial intelligence services. Code optimization, namely, a mode of enabling code running speed to be fastest, enabling target code quantity to be minimum or enabling a processor to run target codes with maximum performance through means of optimization option selection, optimization parameter sequence change, code mapping and the like. RNN model RNN (Recurrent Neural Network ) is a neural network model that is dedicated to processing sequence data. Unlike conventional feedforward neural networks, RNNs have a special loop structure that models data in the time dimension by hiding state (HIDDEN STATE) from remembering context information in the sequence. LLVM: LLVM (Low Level Virtual Machine) is an open source compiler infrastructure for building compilers and related tools. It was originally designed for compilation optimization and code generation, but has now been extended to a complete tool chain supporting multi-language, multi-platform development. The modular design of LLVM makes it a powerful tool for building compilers and development tools. LLVM-IR (LLVM INTERMEDIATE presentation) is an intermediate Representation in the LLVM compiler framework, is used as a bridge between the front end and the back end of the compiler, has the characteristics of platform independence, low-level abstraction, strong type, SSA (static single assignment) form and the like, and the front end converts source codes into LLVM-IR, and then converts the LLVM-IR into target machine codes after being processed by an optimizer. The technical scheme of the invention is as follows: A heterogeneous code optimization using method based on machine learning RISC-V and TPU, which uses a deep learning model comprising an embedded layer and an RNN layer to learn the representation of the code, extracts low-dimensional feature vectors, then searches for the best classical machine learning algorithm, i.e. the best test performance obtained in a plurality of machine learning methods, maps the features to specific outputs to select tasks to be allocated to the TPU and the CPU to execute the required tasks; Previous work was based on feature engineering or end-to-end deep learning models and attempted to solve the