CN-121979531-A - Vectorization method, vectorization device, electronic equipment and storage medium
Abstract
The disclosure provides a vectorization method, a vectorization device, electronic equipment and a storage medium, wherein the vectorization method comprises the steps of obtaining parameter information of each function in a program, judging whether the function meets a specificity condition according to the parameter information of each function, executing specificity operation based on a judgment result, performing parameter alias analysis according to the specificity operation result to obtain an analysis result, wherein the analysis result is used for indicating pointer parameters without alias relation, and executing vectorization without memory check according to the analysis result and the pointer parameters without alias relation.
Inventors
- LU SHIWEI
- ZHUANG YUAN
- WU YUAN
- LI GEN
Assignees
- 飞腾信息技术有限公司
Dates
- Publication Date
- 20260505
- Application Date
- 20251229
Claims (10)
- 1. A method of vectorizing, the method comprising: acquiring parameter information of each function in a program; Judging whether the function meets the specificity condition according to the parameter information of each function, and executing the specificity operation based on the judgment result; Performing parameter alias analysis according to the specific operation result to obtain an analysis result, wherein the analysis result is used for indicating pointer parameters without alias relation; and according to the analysis result, vectorizing the memory-free check is carried out for the pointer parameters without the alias relation.
- 2. The method according to claim 1, wherein the method further comprises: Determining a set of call instructions for each function, the set of call instructions comprising at least one call instruction; determining that the at least one call instruction comprises an instruction of indirect call; Determining each called function matched with the instruction containing the indirect call as a candidate function; the candidate function is specialized as a direct call.
- 3. The method according to claim 1, wherein determining whether the function satisfies a specificity condition based on the parameter information of each function, performing a specificity operation based on the determination result, comprises: traversing the call points of the functions according to the parameter information of each function, and executing function specialization on the call points if the call points meet the specialization conditions set by the target compiler; Executing function call point specialization on the call point if the call point meets the function call point specialization condition, and judging whether the call point meets the constant parameter propagation function specialization condition if the call point does not meet the function call point specialization condition; and if the call point meets the condition of the constant parameter propagation function specialization, executing the constant parameter propagation function specialization on the function of the call point.
- 4. A method according to claim 3, wherein determining whether the call site satisfies a condition for function call site specialization comprises: Searching parameters for calculating the cycle execution times of the called function corresponding to the call point according to the parameter information; Determining whether pointer parameters exist in the parameters for calculating the cycle execution times, and if the pointer parameters exist in the parameters for calculating the cycle execution times, tracking the source of the pointer parameters; If the pointer parameter is derived from a global variable, deriving all constant conditions corresponding to the pointer parameter by combining the pointer parameter and the context information, and screening constant conditions meeting a first condition from all the derived constant conditions; combining the constant conditions meeting the first condition to obtain at least one constant combination; determining whether the number of combinations of the at least one constant combination is smaller than a first threshold value, and if the number of combinations is smaller than an upper limit value, determining that the call point meets a function call point specialization condition; The first condition is that the parameter constant is 2 n.
- 5. The method of claim 3 or 4, wherein performing function call site specificity on the call site comprises: A specialized call is generated for each of the at least one constant combination.
- 6. A method according to claim 3, wherein determining whether the call site satisfies a condition for constant parameter propagation function specialization comprises: Determining the specific benefit of the function of the call point, and if the specific benefit is larger than a second threshold value, considering that the call point meets the condition of constant parameter propagation function specification; wherein determining the specific benefit of the function of the call point comprises: Traversing all parameters of the function of the calling point, and if the parameters meet a second condition, adding one to the specific benefit of the function of the calling point; traversing all parameters of the called function corresponding to the call point, and if the parameters meet a fourth condition, adding one to the specific collection of the function of the call point, and if the parameters meet the fourth condition and meet a fifth condition, adding one to the specific collection of the function of the call point; the parameters of the function of the call point are used for calculating the cycle execution times; The third condition is that the constant corresponding to the parameter of the function of the call point is 2 n; the fourth condition is that the parameter of the called function is used for calculating the cycle execution times, the parameter of the called function is derived from the parameter of the call point, and the parameter of the called function can be calculated; the fifth condition is that the result obtained by calculating the parameters of the called function is 2-n.
- 7. The method of claim 1, wherein performing the parameter alias analysis based on the specialized operations results to obtain analysis results comprises: Traversing the parameters of each function in the program, and determining an objective function with the number of pointer parameters being more than or equal to 2; Traversing call points of the target function, marking that the two pointer parameters of the target function have alias relations if the two pointer parameters of the target function are from the same pointer, and/or tracking sources of the pointer parameters at the call points, and determining that the pointer parameters have alias relations if the sources of the pointer parameters cannot be determined; Based on the traversal result, the pointer parameters for which no alias relation exists are marked without aliases.
- 8. A vectorization device, characterized in that it comprises: The acquisition module is used for acquiring the parameter information of each function in the program; The first processing module is used for judging whether the function meets the specificity condition according to the parameter information of each function and executing the specificity operation based on the judgment result; The second processing module is used for carrying out parameter alias analysis according to the specific operation result to obtain an analysis result, wherein the analysis result is used for indicating pointer parameters without alias relation; And the third processing module is used for executing vectorization of memory-free check on the pointer parameters without alias relation according to the analysis result.
- 9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method of any of claims 1 to 7 when the program is executed.
- 10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method according to any one of claims 1 to 7.
Description
Vectorization method, vectorization device, electronic equipment and storage medium Technical Field The disclosure relates to the field of computer technology, and in particular, to a vectorization method, a vectorization device, an electronic device and a storage medium. Background Loop vectorization (Loop Vectorization) is a performance improvement technology widely adopted in a compiling optimization stage, and the core idea is that similar operations on different data elements in a loop body are executed in parallel through SIMD (Single Instruction Multiple Data ) instructions, so that processing of multiple data is completed in one instruction transmission. For example, for an array operation in which additions are made element by element in a loop, vectorization can combine multiple additions into one SIMD instruction, significantly reducing the number of instructions and improving the throughput of the processor. In the loop vectorization process, in order to ensure the correctness of parallel execution, a compiler must analyze whether a dependency relationship exists between all memory accesses in a loop, and particularly when pointer or array accesses are involved in the loop, it must be confirmed that there is no read-after-Write (WAR), write-after-Read (RAW) or write-after-write (WAW) conflict between them, which would affect the execution result. This detection of Memory access security prior to vectorization is known as Memory Check (Memory Check). Memory checking typically introduces runtime checking logic into the vectorized code to dynamically determine whether alias relationships (aliasing) exist for different memory regions in the loop iteration. If the run-time detection result shows that there is no conflict, the program can execute the vectorized version, otherwise, only the conservative scalar version can be executed. However, such memory checking introduces additional branching and condition decisions, affecting performance, and increasing instruction path uncertainty, especially when loop iteration times are small or collision detection is complex, which overhead is not negligible. FIG. 1 is a simple example of memory checking, where the program performs memory checking, entering a SIMD instruction (SIMD instructions) to improve performance if memory is normal (Y), and entering a scalar instruction (scalar instructions) to perform safer processing if memory is abnormal (N). Whether SIMD or scalar instructions, ultimately execute some underlying common instructions (common instructions), which are the basis for all operations, processing such as arithmetic operations, data transfers, etc. Furthermore, for cases where the number of loop executions is unknown or difficult to accurately derive at compile-time, a compiler will typically generate several vectorized versions, i.e., multi-vector versions, of the same loop using different vectorization factors (vectorization factor, VF). The purpose of generating a plurality of VF versions is to select the most suitable version according to the actual iteration times of loops, the data alignment condition and the hardware SIMD width during operation, so that the performance and the correctness are both considered, wherein a larger VF is beneficial to improving the data parallelism but has higher requirements on the iteration times and the alignment, and a smaller VF is more robust under the conditions of short loops or misalignment. Fig. 2 is a simple example of different VF versions, performing a round robin execution count and memory check of vf=16 version, using vf=16 version if both are normal (Y), performing a round robin execution count and memory check of vf=8 if there is a problem (N), using vf=8 version if the check of vf=8 is normal (Y), and using scalar version if the check of vf=8 is also a problem (N). To enable runtime selection, the compiler typically generates a corresponding memory check or other runtime predicate for each vectorized version, and if the predicate fails, rolls back to the scalar version or version of the smaller VF. Although this multi-VF version strategy can improve performance robustness in uncertain environments, it also presents several problems: 1. code volume increase and instruction cache pressure increase; 2. each vectorized version may repeatedly generate memory checks, resulting in runtime overhead accumulation; 3. the selection itself introduces judgment and branching during operation, which weakens the performance gain brought by vectorization. Disclosure of Invention The present disclosure provides a vectorizing method, apparatus, electronic device, and storage medium, so as to at least solve the above technical problems in the prior art. In a first aspect, an embodiment of the present disclosure provides a vectorization method, including: acquiring parameter information of each function in a program; Judging whether the function meets the specificity condition according to the parameter informati