CN-121979678-A - Fortran program parallel optimization method based on intelligent dependency analysis

CN121979678ACN 121979678 ACN121979678 ACN 121979678ACN-121979678-A

Abstract

The invention discloses a Fortran program parallel optimization method based on intelligent dependency analysis, which comprises the steps of firstly constructing a parallel optimization system consisting of a loop extraction module, a loop nesting relation analysis module, a semantic analysis module, an intelligent dependency analysis engine, a variable classification module and an instruction generation and injection module, wherein the loop extraction module analyzes the nesting relation and variable action domain of loops, the loop nesting relation analysis module determines the nesting relation between loops, the semantic analysis module constructs a line-level data access view, the variable classification module identifies private variables and reduction variables, the intelligent dependency analysis engine executes loop type inspection, I/O operation inspection and data dependency inspection, the instruction generation and injection module generates a parallelization instruction, and the parallelization instruction is inserted into a source code to obtain a parallelized program source code. The invention can solve the problems that the existing parallelization method has low identification accuracy and low safety of the circularly dependent relationship, and can not accurately identify parallelization errors.

Inventors

GONG CHUNYE
ZHANG LE
WANG DAYING
WANG XINYU
Jiang Huanghuang
GAO XIANG
FENG ZHIPENG
LIU HAN
RAN CHUNMEI
WANG QINGLIN
WANG HAOYU
AI XIN

Assignees

中国人民解放军国防科技大学

Dates

Publication Date: 20260505
Application Date: 20260123

Claims (18)

1. A Fortran program parallel optimization method based on intelligent dependency analysis is characterized by comprising the following steps: Firstly, constructing a Fortran automatic parallel optimization system based on intelligent dependency analysis, wherein the system comprises a circulation extraction module, a circulation nested relation analysis module, a semantic analysis module, an intelligent dependency analysis engine, a variable classification module and an instruction generation and injection module; Secondly, a cyclic extraction module carries out grammar analysis on Fortran program source codes input by a user, identifies and extracts a cyclic structure in the Fortran program source codes, sends an extracted cyclic positioning list to a cyclic nesting relation analysis module, sends a top cyclic block list to a semantic analysis module, and sends a cyclic metadata list to an intelligent dependency analysis engine and a variable classification module, wherein the cyclic positioning list stores N cyclic positioning triples, N is the number of cycles in the Fortran program source codes, and the cyclic metadata list stores N cyclic metadata six tuples; Thirdly, the semantic analysis module receives a top-level cyclic block list from the cyclic extraction module, performs semantic analysis on a source code row set in a top-level cyclic block in the top-level cyclic block list, constructs a row-level data access view, and sends the row-level data access view to the intelligent dependency analysis engine and the variable classification module, wherein the row-level data access view comprises a left variable list, a right variable list, a left array list, a right array list and an I/O row list; the left variable list stores LVL left character string tuples, the LVL left character string tuples are (the row number of the LVL left character string is located, the variable list corresponding to the LVL left character string), the right variable list stores RVL right character string tuples, and the RVL right character string tuples are (the row number of the RVL right character string is located, and the variable list corresponding to the RVL right character string is located); the left array list stores LAL left array variable tuples, and the LAL th left array variable tuple is (an array variable element list corresponding to the LAL th left character string and the LAL th left character string); the right array list stores RAL right array variable tuples, and the ram right array variable tuples are (array variable element list corresponding to the ram right character string and the ram right character string) line numbers; the I/O rank list stores the elements of the IOL, and the IOL-th element is the IOL-th I/O statement line number; 1. a left variable list length LVL of not less than l vl of not less than 1 and a right variable list length RVL of not less than RVL and a left array list length LAL of not less than 1 and not more than LAL, the length RAL of the right array list is more than or equal to 1 and less than or equal to 1, and the length IOL of the I/O rank list is more than or equal to 1; Fourthly, the circulation nesting relation analysis module receives the circulation locating list from the circulation extracting module, analyzes and determines nesting relation among the circulation according to the starting line number and the ending line number of each circulation in the circulation locating list, establishes a circulation hierarchical structure, finds the id number of a father circulation in the circulation locating list for each circulation in the circulation locating list, adds the father circulation id number into all triples in the circulation locating list, changes the triples into circulation nesting relation quadruples, obtains a circulation nesting relation list, and sends the circulation nesting relation list to the intelligent dependency analysis engine; Fifthly, the variable classification module receives a circulating metadata list from the circulating extraction module, receives a row-level data access view from the semantic analysis module, classifies the N list elements in the circulating metadata list into private variables and reduction variables, sends the reduction variable list to the intelligent dependency analysis engine, and sends the reduction variable list and the private variable list to the instruction generation and injection module; The intelligent dependency analysis engine receives the cyclic nested relation list from the cyclic nested relation analysis module, the line-level data access view from the semantic analysis module, the cyclic metadata list from the cyclic extraction module and the reduced variable list from the variable classification module; executing three-level parallelization check on each cycle, namely cycle type check, I/O operation check and data dependency check, wherein any check hit is used for overruling the subsequent check, generating a cycle dependency list, and sending the cycle dependency list to an instruction generation and injection module, wherein the content of the cycle dependency list is N cycle-to-parallel information quintuples, if any dependency relationship in the three-level parallelization check is found in the N cycle, the parallelization of the N cycle is marked as non-parallelizable, namely 'False', otherwise, the parallelization of the N cycle is marked as parallelizable, namely 'True', the parallelizable cycle is M, M is less than or equal to N, and 1 is less than or equal to N; And the instruction generating and injecting module receives a circular dependency list from the intelligent dependency analysis engine, receives a private variable list and a reduced variable list from the variable classifying module, generates an OpenMP parallelization instruction set which is most suitable for the circular in the circular dependency list, injects the OpenMP parallelization instruction in the OpenMP parallelization instruction set to the front and back of the beginning line and the end line of M parallelizable circular in the Fortran program source code, and generates the parallelized Fortran program source code.
2. The method is characterized in that the cyclic extraction module is a Fortran code analysis tool, is connected with a semantic analysis module and a cyclic nesting relation analysis module, is used for carrying out grammar analysis on Fortran program source codes input by a user, identifying and extracting cyclic structures in the Fortran program source codes, analyzing a cyclic nesting relation and a variable scope to obtain a cyclic positioning list, a cyclic metadata list and a top cyclic block list, wherein the cyclic positioning list stores N cyclic positioning triplets, is used for accurately positioning the cycle according to id numbers, initial line numbers and end line numbers in the Fortran program source codes, the cyclic metadata list stores N cyclic metadata six tuples, the top cyclic block list stores K top cyclic block four tuples, is used for reducing the complete context of the top cyclic according to blocks and supporting subsequent semantic analysis and instruction injection, and is less than or equal to N; the system comprises a circulation nesting relation analysis module, a circulation extraction module, an intelligent dependency analysis engine, a circulation locating module, a parent circulation id number, a circulation nesting relation quadruple, a circulation nesting relation hierarchical structure, an intelligent dependency analysis engine, a circulation hierarchical structure and a circulation nesting relation analysis engine, wherein the circulation nesting relation analysis module is connected with the circulation extraction module and the intelligent dependency analysis engine, receives a circulation locating list from the circulation extraction module, adds the parent circulation id number to all triples in the circulation locating list, changes the triples into the circulation nesting relation quadruple, places the circulation nesting relation quadruple into the circulation nesting relation list, analyzes and determines nesting relation among the circulation according to a start line number and an end line number of each circulation in the circulation locating list, and sends the circulation nesting relation list to the intelligent dependency analysis engine; The semantic analysis module is connected with the cycle extraction module, the variable classification module and the intelligent dependency analysis engine, receives a top-layer cycle block list from the cycle extraction module, and preprocesses a set formed by source code line sets corresponding to K top-layer cycle block quadruple elements in the top-layer cycle block list to obtain a preprocessed Fortran cycle source code line set; The variable classification module is connected with the circulation extraction module, the semantic analysis module, the intelligent dependency analysis engine and the instruction generation and injection module, receives the circulation metadata list from the circulation extraction module, receives the line-level data access view from the semantic analysis module, and performs variable classification on the circulation metadata list according to the line-level data access view; the method comprises the steps of obtaining two types of information after classifying each parallelizable circulating variable in a circulating metadata list, obtaining a private variable in an mth circulating after classifying the mth circulating variable, adding the private variable as an mth element into a private variable list, obtaining an mth reduced variable binary group at the same time, adding the mth reduced variable binary group as an mth element into a reduced variable list, defaulting variables except the private variable list and the reduced variable list to be shared and not listed independently; The intelligent dependency analysis engine is connected with the cycle extraction module, the cycle nesting relation analysis module, the semantic analysis module, the variable classification module and the instruction generation and injection module, receives a cycle metadata list from the cycle extraction module, receives a cycle nesting relation list from the cycle nesting relation analysis module, receives a row-level data access view from the semantic analysis module, receives a reduction variable list from the variable classification module, and executes three-level parallelization check on data dependency relations among N cycle analysis cycles in Fortran program source codes input by a user according to the row-level data access view, the reduction variable list and the cycle metadata list, wherein any check hit is a rejection subsequent check, and comprises I/O operation dependency check, cycle type check and data dependency check, so as to obtain a cycle dependency list; The instruction generation and injection module is connected with the intelligent dependency analysis engine and the variable classification module, receives a circular dependency list from the intelligent dependency analysis engine, receives a private variable list and a reduced variable list from the variable classification module, marks the circular dependency list as parallelizable circulation according to the private variable list and the reduced variable list, generates an OpenMP parallelization instruction set which is most suitable for the circulation, injects OpenMP parallelization instructions in the OpenMP parallelization instruction set into parallelizable M circulation before and after the beginning and ending of the parallelizable circulation in the Fortran program source code, and generates parallelized Fortran program source code.
3. The intelligent dependency analysis-based Fortran program parallel optimization method of claim 2, wherein an nth cycle positioning triplet among the N cycle positioning triples is (id number of the nth cycle, start line number of the nth cycle, end line number of the nth cycle); the nth cycle metadata six-tuple in the nth cycle metadata six-tuple is (id number of the nth cycle, start line number of the nth cycle, end line number of the nth cycle, cycle index variable name of the nth cycle, cycle type of the nth cycle, cycle hierarchy of the nth cycle), wherein the cycle type comprises DO cycle, DO WHILE cycle, the cycle hierarchy refers to relative positions and dependencies of each cycle in the nested cycle, the kth top cycle block four-tuple in the K top cycle block four-tuple is (id number of the kth top cycle, start line number of the kth top cycle, end line number of the kth top cycle, cycle variable name of the kth top cycle), the nth cycle nesting relation four-tuple in the cycle nesting relation list is (id number of the nth cycle, start line number of the nth cycle, end line number of the nth cycle, parent cycle id number of the nth cycle, variables in the mth cycle comprises id numbers of the kth cycle, the mth cycle index, the number of the mth cycle is a reduction operation variable in the order, + [ the values of the mth cycle index, the reduction values in the reduction operation variable, + ], and the reduction values in the reduction operation variable are formed by the reduction values in the mth cycle + The OpenMP parallelization instruction set comprises instructions including parallel block creation, parallel loop for parallelization, reduction, private, and instruction content, wherein each OpenMP instruction comprises an instruction type, an instruction position, an instruction parameter and an instruction content.
4. The intelligent dependency analysis-based Fortran program parallel optimization method of claim 1, wherein the second step of the loop extraction module parses a Fortran program source code input by a user, and the method for identifying and extracting a loop structure in the Fortran program source code is as follows: step 2.1, initializing a cyclic positioning list, a cyclic metadata list, a top cyclic block list, a cyclic nesting stack and a Fortran program source code list to be empty, and enabling a cyclic id counter to be 0; Step 2.2, receiving a Fortran program source code file path input by a user, reading Fortran program source code file contents, reading all contents in the Fortran program source code file according to rows, and storing the contents in a Fortran program source code list, wherein the Fortran program source code list comprises NN elements, each element is a row of source code character strings, and NN is the row number of source codes in the Fortran program source code file; Step 2.3, traversing the Fortran program source code list line by line, skipping annotation lines, identifying and processing DO circulation and DO WHILE circulation structures, extracting the basic information and structural features of the circulation, and obtaining a circulation positioning list, a circulation metadata list and a top circulation block list; Step 2.4, traversing a top-level circulation block list, and outputting all lines from the corresponding circulation start line to the circulation end line in a Fortran program source code list to a new Fortran file specially storing Fortran circulation according to the sequence from small value numbers to large value numbers of a four-element internal circulation id counter in the top-level circulation block list and the start line number and the end line number in each element in the top-level circulation block list, wherein the new Fortran file name is an original Fortran source program file name added with an 'loop' suffix by utilizing a file handle returned by a Python built-in function open; And 2.5, sending the top-level cyclic block list to a semantic analysis module, sending the cyclic positioning list to a cyclic nested relation analysis module, and sending the cyclic metadata list to an intelligent dependency analysis engine and a variable classification module.
5. The intelligent dependency analysis-based Fortran program parallel optimization method of claim 4, wherein the step 2.3 of traversing the Fortran program source code list line by line, skipping annotation lines, identifying and processing DO loop and DO WHILE loop structures, extracting basic information and structural features of loops, and obtaining a loop positioning list, a loop metadata list, and a top loop block list comprises the following steps: Step 2.3.1 initializing the code line sequence number to nn=1; Step 2.3.2 judges whether the source code of the (n) th row is an annotation row, if so, the source code of the (n) th row is not processed, and directly goes to step 2.3.8; step 2.3.3 converting the nn line character into lowercase and removing the blank space; step 2.3.4 judges whether the nn row is a do row or not, namely a circulation starting row by a startswith method built in Python, if so, the step 2.3.5 is turned, otherwise, the step 2.3.7 is turned directly; Step 2.3.5, calling a match method built in Python to match the regular expression of the nn row, and judging whether the source code of the nn row accords with the initial form of 'DO variable = range'; if the matching is successful, extracting a circulation variable in the nn line source code, taking the length of a circulation nested stack as a circulation level, setting a circulation type identifier as 'DO', and then pressing the initial line number nn, the circulation variable, the circulation type identifier and the circulation level metadata as metadata quadruples into the circulation nested stack to mark and enter a layer of new circulation, turning to step 2.3.10; Step 2.3.6, judging whether the source code of the nn row is a DO WHILE beginning row or not, calling a match method built in Python to match the regular expression of the nn row, judging whether the nn row accords with the beginning form of DO WHILE or not, if the matching is successful, extracting the condition in brackets as a circulation condition, taking the length of a circulation nested stack as a circulation level, setting a circulation type identifier as a WHILE, then pressing the nn, the circulation type identifier and the circulation level as metadata triples into the circulation nested stack to mark a layer of new circulation, and completing the post-conversion step 2.3.10, and if the regular expression matching is unsuccessful, converting step 2.3.7; Step 2.3.7, judging whether the nn line is an 'END DO' line or not through a built-in startswith method of Python, if the nn line is an 'END DO', popping DO or DO WHILE circulation metadata tuple at the top of a circulation nested stack, if the popped DO or DO WHILE circulation metadata tuple at the top of the stack is a triplet, indicating that the circulation is DO WHILE type, adding a value of a circulation id counter, a start line number and an END line number nn as the triples to a circulation positioning list, adding the value of the circulation id counter, the start line number, the END line number, 'and a circulation type identifier' WHILE 'as the six tuples to the circulation metadata list, completing the post-conversion step 2.3.8, if the DO or DO circulation metadata tuple at the top of the circulation nested stack is a quadruple, adding the value of the circulation id counter, the start line number and the END line number nn as the triples to the circulation positioning list, and adding the value of the circulation id counter, the start line number, the END line number, the circulation index variable name and the circulation type identifier' six tuple as the triples to the circulation metadata of the post-conversion step 3535; Step 2.3.8, if the stack after the stack flick of the loop nested stack is empty, adding the value of the loop id counter, the start line number, the end line number and the loop index variable name into a top loop block list based on the start line number and the end line number in the loop positioning list obtained in step 2.3.7 as the start-stop boundary of the top loop, turning to step 2.3.9; Step 2.3.9 loop id counter self-increment 1; step 2.3.10, let nn=nn+1, if NN is not less than 1 and not more than NN, turn to step 2.3.2, and if NN > NN, indicate that NN line source codes in the Fortran program source code list are processed completely, obtain a cyclic positioning list, a cyclic metadata list, a top cyclic block list, turn to step 2.4.
6. The intelligent dependency analysis-based Fortran program parallel optimization method of claim 1, wherein in the third step, the semantic analysis module receives a top-level cyclic block list from the cyclic extraction module, performs semantic analysis on a source code row set in a top-level cyclic block in the top-level cyclic block list, and constructs a row-level data access view by: Step 3.1, traversing the top circulation block list, and preprocessing each row of codes in the starting row number and the ending row number row by row according to the starting row number and the ending row number in each top circulation block list element to obtain a source circulation row list, wherein the method comprises the following steps: Step 3.1.1, the top cyclic block list index k=1, the source cyclic list is initialized to be empty, and the source cyclic row list index sn=0; Step 3.1.2, let k top-level cyclic block inner line index kindex = the start line number in k top-level cyclic block list element; Step 3.1.3, carrying out blank removal, unified lowercase treatment and unified lowercase treatment on the kindex th line of the Fortran source program to obtain a kindex th line source circulation line character string after pretreatment; Step 3.1.4, adding (row index kindex, kindex row source cycle row character string after pretreatment) as a source cycle row binary group into the source cycle list, so that sn=sn+1; step 3.1.5, making kindex = kindex +1, if kindex is less than or equal to the end line number in the kth top-level circulation block list element, turning to step 3.1.3, and if kindex is greater than the end line number in the kth top-level circulation block list element, turning to step 3.1.6; Step 3.1.6, if K is less than or equal to K, turning to step 3.1.2, otherwise, K is more than K, and the total number of source circulation row tables SN=sn, so as to obtain a source circulation row list containing SN source circulation row binary groups, turning to step 3.2; step 3.2, removing the line tail annotation in the source circulation list to obtain the source circulation list from which the line tail annotation is removed; And 3.3, carrying out variable and array deep semantic analysis on the source circulation list without the row tail annotation, and constructing a row-level data access view, wherein the method comprises the following steps of: step 3.3.1, initializing a source circulation row list index sn=1, and enabling an I/O row list, a left variable list, a left array list, a right variable list and a right array list to be empty; step 3.3.2, replacing a plurality of continuous spaces in the source circulation line character string with the notes removed in the sn element in the source circulation line list with the notes removed in the line tail by using a sub method built in Python, and turning to step 3.3.3; step 3.3.3, converting characters in the source circulation line character string of the sn-th element into lower case and removing blank spaces; Step 3.3.4, if the source circulation line of the sn-th element is a line which does not need to be processed, turning to step 3.3.5, wherein the line which does not need to be processed comprises DO WHILE statement, conditional statement, circulation control statement, program control statement and I/O statement, and if the sn-th line is a line which needs to be processed, turning to step 3.3.6; Step 3.3.5, if the source circulation behavior I/O statement of the sn-th element is, adding the code line index sn into the I/O line list, turning to step 3.4, otherwise, explaining that the source circulation behavior of the sn-th element is other lines which do not need to be processed except the I/O statement, and directly turning to step 3.4; Step 3.3.6, extracting variables and array information on the left side and the right side of a '=' separator of a source circulation row assignment statement of the sn-th element to obtain a left variable list, a right variable list, a left array list, a right array list and an I/O row list; Step 3.4, if SN is less than or equal to SN, turning to step 3.3.2, and if SN is greater than or equal to SN, obtaining a row-level data access view, and sending the row-level data access view to the intelligent dependency analysis engine and the variable classification module, wherein the row-level data access view comprises a left variable list, a right variable list, a left array list, a right array list and an I/O row list.
7. The intelligent dependency analysis-based Fortran program parallel optimization method of claim 6, wherein the method for removing end-of-line annotations from the source circulation list in step 3.2 to obtain the source circulation list from which the end-of-line annotations are removed is as follows: Step 3.2.1 having the source circulation list index sn=1; Step 3.2.2, dividing the source circulation line character strings in the sn-th element in the source circulation line list by using a built-in split method of Python by using 'I' as a separator to obtain a sn-th new character string list; Step 3.2.3, if the length of the sn new character string list is greater than 1, replacing the sn source circulation character string with the first element in the sn new character string list, turning to step 3.2.4, otherwise, turning to step 3.2.4 directly if the length of the sn new character string list is less than or equal to 1; step 3.2.4, if SN is less than or equal to SN, turning to step 3.2.3, and if SN is more than SN, indicating that the end-of-line annotation processing of the source circulation list is finished, obtaining a source circulation list with the end-of-line annotation removed, and turning to step 3.3.
8. The intelligent dependency analysis-based Fortran program parallel optimization method of claim 6, wherein the method for extracting variables and array information on left and right sides of "=" delimiters of source cyclic row assignment statement of the sn element in step 3.3.6 is: In step 3.3.6.1, if the source circulation line of the sn-th element contains a value symbol "=", dividing the source circulation line of the sn-th element into a left section and a right section by using "=" as a separator to obtain a left character string and a right character string, and making the left character string length be LSN and the right character string length be RSN, turning to step 3.3.6.2, otherwise turning to step 3.4; step 3.3.6.2, performing bracket pairing on the left character string to obtain a bracket position list, wherein the method comprises the following steps: Step 3.3.6.2.1, initializing the left string index lsn to 1, leaving the bracket matching stack and bracket position list empty, and the elements in the bracket position list are triples (the strings between the bracket start index, the bracket end index, and the bracket start index to the end index); Step 3.3.6.2.2, judging whether the character with the index of lsn is '('), pressing lsn into a bracket matching stack, turning to step 3.3.6.2.4, otherwise turning to step 3.3.6.2.3; step 3.3.6.2.3 if the index lsn is' and the bracket matching stack is not empty, popping the top element of the bracket matching stack, taking the sub-string from the top element of the bracket matching stack to the index lsn in the index lsn and the left string as a triplet, and storing the triplet in the bracket position list, turning to step 3.3.6.2.4; Step 3.3.6.2.4, let LSN = LSN +1, if LSN is less than or equal to LSN, go to step 3.3.6.2.2, if LSN is greater than LSN, indicate that a bracket position list is obtained, let the bracket position list length be PL, go to step 3.3.6.3; Step 3.3.6.3 adopts an array variable identification method to traverse the bracket position list, filters the function call to identify the array variable, and obtains an array variable element list, wherein the method comprises the following steps: Step 3.3.6.3.1 initializes the bracket position list index pl to 0, the array variable element list is made empty, the first operator list is [ + "," - ","/"], and the first operator list length is made L1; Step 3.3.6.3.2, taking out the first value in the triplet element with the index pi in the bracket position list, namely, the bracket starting index, if the first value is 0, turning to step 3.3.6.3.7, otherwise turning to step 3.3.6.3.3; Step 3.3.6.3.3, if the previous character of the first value is an operator, describing that the first value is not an array variable, turning to step 3.3.6.3.7, otherwise turning to step 3.3.6.3.4; step 3.3.6.3.4, removing spaces from the substrings from the index 1 to the index of the first value in the left character string to obtain a left bracket truncated character string; step 3.3.6.3.5, performing operator segmentation on the left bracket truncated character string by adopting an operator segmentation method to obtain a to-be-matched search list, wherein the method comprises the following steps: Step 3.3.6.3.5.1, initializing a to-be-matched search list to be empty, and enabling an index L1 of an operator list L1 to be 1; Step 3.3.6.3.5.2, if the left bracket cut string contains an operator element with index L1 in the operator list L1, then using the operator element as a segmenter, using a split function to segment the left bracket cut string to obtain an operator segmentation list L, and adding all the elements in the operator segmentation list L into the to-be-matched search list, turning to step 3.3.6.3.5.4; Step 3.3.6.3.5.3, adding left bracket truncated character strings on the left side into a to-be-matched searching list; step 3.3.6.3.5.4, let l1=l1+1, if 1 is less than or equal to l1 and less than or equal to L1, go to step 3.3.6.3.5.2, otherwise, indicate that L1 > L1, obtain the search list to be matched, go to step 3.3.6.3.6; Step 3.3.6.3.6, searching the array variable names in the to-be-matched searching list by utilizing a findall method built in Python, and obtaining the array variable names after searching and matching; Step 3.3.6.3.7, carrying out keyword inspection on the array variable name obtained in step 3.3.6.3.6, and if the array variable name is a keyword, turning to step 3.3.6.3.9, otherwise turning to step 3.3.6.3.8; step 3.3.6.3.8, storing a starting index of the array variable name, a second value of the triple element with the index of al, and an obtained array variable element character string formed by splicing the array variable name and a third value of the triple element with the index of al into an array variable element list, wherein the starting index of the array variable name is the length of the first value-variable name of the triple element with the index of al; step 3.3.6.3.9, let pl=pl+1, if PL is 1 or less and PL is not more than PL, go to step 3.3.6.3.2, if PL > PL, the step 3.3.6.4 is to go to step 3.3.6.4, if PL > PL, the step is to say that an array variable element list is obtained; step 3.3.6.4, performing variable extraction on the left character string by adopting a variable extraction method to obtain a variable list with the Fortran built-in function removed, wherein the variable extraction method comprises the following steps: Step 3.3.6.4.1 initializes the reduced array list, the first variable candidate list, the second variable candidate list, and the variable list to null, initializes the second operator list to [ "+", "-", ","/"," (")", "," ], and the length of the second operator list is L2. Step 3.3.6.4.2 deals with the indirect memory access problem that the elements in the array variable element list may have, and the method is: Step 3.3.6.4.2.1, if the length of the array variable element list is greater than 0, turning to step 3.3.6.4.2.2, and performing outermost array variable name extraction processing on the elements with array variable elements in the inner layer in the array variable element list, otherwise turning to step 3.3.6.4.4; Step 3.3.6.4.2.2, extracting the outermost layer array variable name of the element with the array variable element in the inner layer in the array variable element list: Step 3.3.6.4.2.2.1 makes the array variable element list index al=1; step 3.3.6.4.2.2.2, initializing an indirect access flag to False, and making the comparison index aj=al+1; Step 3.3.6.4.2.2.3, if the starting position of the ith array variable element in the array variable element list is not less than the starting position of the aj-th array variable element, and the ending position of the ith array variable element is not less than the ending position of the aj-th array variable element, indicating that the ith array variable element is contained in the aj-th array variable element, setting an indirect access flag as True, turning to step 3.3.6.4.2.2.4, otherwise, directly turning to step 3.3.6.4.2.2.4; Step 3.3.6.4.2.2.4 let aj=aj+1, if aj is less than or equal to AL, go to step 3.3.6.4.2.2.3, if aj > AL, go to step 3.3.6.4.2.2.5; If the indirect access flag is False, step 3.3.6.4.2.2.5 adds the variable element of the ith array to the reduced array list to make the length of the reduced array list be SAL, and go to step 3.3.6.4.2.2.6, otherwise go to step 3.3.6.4.2.2.6 directly; Step 3.3.6.4.2.2.6, let al=al+1, if AL is less than or equal to AL, go to step 3.3.6.4.2.2.2, if AL > AL, describe processing the left string to get a reduced array list, go to step 3.3.6.4.3; Step 3.3.6.4.3, obtaining a character string containing variables according to the reduced array list, wherein the method comprises the following steps: step 3.3.6.4.3.1 goes to step 3.3.6.4.3.2 if SAL > 0, otherwise sal=0 goes to step 3.3.6.4.4; Step 3.3.6.4.3.2, adding the substrings between the left character string index of 1 and the starting index of the first element bracket in the reduced array list into a first variable candidate list; step 3.3.6.4.3.3 lets the reduced array list index sal=2; step 3.3.6.4.3.4, adding a substring with a starting index of a second value of a previous element of the sal reduced array list element and a finishing index of the first value of the sal reduced array list element in the left string to the first variable candidate list, so that the length of the first variable candidate list is VL1; Step 3.3.6.4.3.5, let sal=sal+1, if SAL is less than or equal to SAL, go to step 3.3.6.4.3.4, if SAL > SAL, show that the reduced array list is processed completely, get the first variable candidate list, go to step 3.3.6.4.3.6; If the length LSN of the left string is greater than the bracket end index of the last element of the reduced array list, step 3.3.6.4.3.6 adds the substring with the start index of the left string being the bracket end index of the last element and the end index being the length of the left string to the first variable candidate list, and goes to step 3.3.6.4.4, otherwise goes directly to step 3.3.6.4.4; Step 3.3.6.4.4 performs operator segmentation on the first variable candidate list by: step 3.3.6.4.4.1, if the length of the first variable candidate list is greater than 0, turning to step 3.3.6.4.4.2, otherwise turning to step 3.3.6.4.4.6; step 3.3.6.4.4.2 makes the first variable candidate list index vl1=1; step 3.3.6.4.4.3, removing the head-tail blank character from the v1 element in the first variable candidate list; Step 3.3.6.4.4.4, performing operator segmentation on the v1 element in the first variable candidate list by adopting the operator segmentation method in step 3.3.6.3.5 to obtain a second to-be-matched search list, adding non-blank character string list elements in the second to-be-matched search list into the second variable candidate list one by one after removing the spaces before and after removing the spaces, wherein the length of the second variable candidate list is VL2; Step 3.3.6.4.4.5, let vl1=vl1+1, if vl1 is less than or equal to VL1, go to step 3.3.6.4.4.3, if vl1 > VL1, indicate that the processing of the first variable candidate list is completed, obtain a second variable candidate list, go to step 3.3.6.4.4.6; Step 3.3.6.4.4.6, after the head and tail blank characters of the left character string are removed, performing operator segmentation on the left character string of the source circulation loop of the sn-th element in the source circulation loop list by adopting the operator segmentation method in step 3.3.6.3.5 to obtain an unprocessed third variable candidate list, and adding non-blank character string list elements in the unprocessed third variable candidate list into the third variable candidate list one by one after removing the front and rear blank spaces; Step 3.3.6.4.5, removing constants, repeated elements and built-in functions of the system from the second variable candidate list to obtain a variable list from which the constants, repeated elements and built-in functions are removed; Step 3.3.6.5, writing a row number of the sn element and an array variable element list corresponding to a source circulation "=" left character string of the sn element in the source circulation list into a left array list in a form of a binary group, and writing a row number of the sn element and a variable list corresponding to a source circulation "=" left character string of the sn element into a left variable list in a form of a binary group, wherein the size of the left array list is LAL, and the size of the left variable list is LVL; Step 3.3.6.6, traversing the right character string by adopting the array variable identification method in step 3.3.6.3, filtering the function call, and identifying the array variable to obtain an array variable element list corresponding to the sn-th row "=" right character string, wherein the array variable element list is AR in size; performing variable extraction on the character string on the right side of the source circulation loop "=" of the sn-th element by adopting the variable extraction method in step 3.3.6.4 to obtain a variable list of which the built-in function is removed, wherein the variable list of which the size is VR, corresponds to the character string on the right side of the "=" in the source circulation loop of the sn-th element; And 3.3.6.7, recording an array variable element list corresponding to a 'right character string' in a source circulation line number of the sn element and a source circulation line of the sn element in a form of a binary group into a right array list, and recording a variable list corresponding to a line number of the sn element and a 'right character string' in a source circulation line of the sn element except for a built-in function in a form of a binary group into a right variable list, wherein the size of the right array list is RAL, and the size of the right variable list is RVL.
9. The intelligent dependency analysis-based Fortran program parallel optimization method of claim 1, wherein in the fourth step, the loop nesting relationship analysis module receives a loop positioning list from the loop extraction module, performs nesting relationship analysis on loops in the loop positioning list, and finds id numbers of loops of a previous stage in the loop positioning list for each loop in the loop positioning list, the method is as follows: Step 4.1, initializing a present unprocessed cycle flag huL to True; Step 4.2, traversing each triplet element in the cyclic positioning list, and adding an integer value as an initial parent cyclic id to each triplet element to form a quadruple, so as to obtain an initialized cyclic nesting relationship list, wherein the integer value is-1000, which indicates that the parent cyclic id is not found; And 4.3, updating parent cycle ids of all elements of the cyclic nested relation list by adopting a cyclic nested relation analysis method on the cyclic nested relation list, and sending the updated cyclic nested relation list to an intelligent dependency analysis engine, wherein the method comprises the following steps of: Step 4.3.1, making huL be False, making the cycle id of the minimum initial line number of the untreated cycle be-1, the cycle id of the maximum end line number of the untreated cycle be-1, the minimum initial line number of the untreated cycle be an initial boundary value 100000000, the maximum end line number of the untreated cycle be an initial boundary value-100, making the nearest cycle distance of the upper boundary be 100000000, the nearest cycle distance of the lower boundary be 100000000, the nearest cycle id of the upper boundary be-100 and the nearest cycle id of the lower boundary be-100; Step 4.3.2, traversing a loop nesting relation list, and determining an untreated loop minimum starting line number loop id, an untreated loop maximum ending line number loop id, an untreated loop minimum starting line number and an untreated loop maximum ending line number for loops marked as-1000, i.e. parent loops are not found; Step 4.3.3, if the cycle id of the minimum initial line number of the untreated cycle and the cycle id of the maximum end line number of the untreated cycle are not-1, making huL =true, turning to step 4.3.4, otherwise turning to step 4.3.6; step 4.3.4, traversing a loop nesting relationship list, and determining an upper boundary nearest loop distance, a lower boundary nearest loop distance, an upper boundary nearest loop id and a lower boundary nearest loop id for a currently processed loop; step 4.3.5, updating a parent loop id in the loop nesting relationship list element according to the upper boundary nearest loop id and the lower boundary nearest loop id, wherein the method comprises the following steps: Step 4.3.5.1, if the upper boundary nearest cycle id is-100, indicating that the cycle corresponding to the cycle id of the minimum initial line number of the unprocessed cycle is the topmost cycle, assigning the parent cycle id of the cycle element indexed as the cycle id of the minimum initial line number of the unprocessed cycle in the cycle nesting relation list to be-1, turning to step 4.3.5.2, otherwise, indicating that the upper boundary nearest cycle id is the parent cycle id of the minimum initial line number of the unprocessed cycle, assigning the upper boundary nearest cycle id to the parent cycle id of the cycle element indexed as the cycle id of the minimum initial line number of the unprocessed cycle in the cycle nesting relation list, turning to step 4.3.5.2; Step 4.3.5.2, if the latest cycle id of the lower boundary is-100, indicating that the cycle corresponding to the cycle id of the maximum ending line number of the unprocessed cycle is the topmost cycle, assigning the parent cycle id of the element indexed as the cycle id of the maximum ending line number of the unprocessed cycle in the cycle nesting relation list to be-1, turning to step 4.3.6, otherwise, indicating that the latest cycle id of the lower boundary is the parent cycle id of the maximum ending line number of the unprocessed cycle, assigning the latest cycle id of the lower boundary to the parent cycle id of the cycle element indexed as the cycle id of the maximum ending line number of the unprocessed cycle in the cycle nesting relation list, and turning to step 4.3.6; step 4.3.6, if huL =true, the loop in the loop nesting relationship list still exists the loop in which the parent loop is not found, turning to step 4.3.1, otherwise, huL is False, which indicates that the parent loop of all loops in the loop nesting relationship list is found, turning to step 4.4; step 4.4, the loop nested relation list is sent to the intelligent dependency analysis engine.
10. The intelligent dependency analysis based Fortran program parallel optimization method of claim 9, wherein the traversing the loop nesting relationship list of step 4.3.2, determining the unprocessed loop minimum starting line number loop id, the unprocessed loop maximum ending line number loop id, the unprocessed loop minimum starting line number, and the unprocessed loop maximum ending line number for the loop marked-1000, i.e. the parent loop is not found, is; Step 4.3.2.1, making the cyclic nesting relationship list index n=1; Step 4.3.2.2, if the parent cycle id number of the nth cycle is-1000, turning to step 4.3.2.3, otherwise turning to step 4.3.2.5; Step 4.3.2.3, if the starting line number of the nth cycle is smaller than the minimum starting line number of the unprocessed cycle, assigning the starting line number of the nth cycle to the minimum starting line number of the unprocessed cycle, assigning the index n to the minimum starting line number of the unprocessed cycle, turning to step 4.3.2.4, otherwise, directly turning to step 4.3.2.4; if the ending line number of the nth cycle is greater than the largest ending line number of the unprocessed cycle, the step 4.3.2.4 assigns the ending line number of the nth cycle to the largest ending line number of the unprocessed cycle, assigns the index n to the largest ending line number of the unprocessed cycle id, and goes to the step 4.3.2.5, otherwise, goes directly to the step 4.3.2.5; step 4.3.2.5 causes n=n+1, if N is not less than 1 and not more than N, turning to step 4.3.2.2, and if N is more than N, turning to step 4.3.3.
11. The intelligent dependency analysis based Fortran program parallel optimization method of claim 9, wherein the traversing the loop nesting relationship list of step 4.3.4 determines the upper boundary nearest loop distance, the lower boundary nearest loop distance, the upper boundary nearest loop id, and the lower boundary nearest loop id for the currently processed loop is: step 4.3.4.1 lets index n=1; step 4.3.4.2, if the parent cycle id number of the nth cycle is not-1000, indicating that the nth cycle has been processed, turning to step 4.3.4.3, otherwise turning to step 4.3.4.5; Step 4.3.4.3, if the starting line number of the nth cycle is smaller than the minimum starting line number of the untreated cycle, and the absolute value of the difference between the starting line number of the nth cycle and the minimum starting line number of the untreated cycle is smaller than the upper boundary nearest cycle distance, and the ending line number of the nth cycle is larger than the maximum ending line number of the untreated cycle, updating the upper boundary nearest cycle distance to the absolute value of the difference between the starting line number of the nth cycle and the minimum starting line number of the untreated cycle, and updating the upper boundary nearest cycle id to the cycle nesting relation list index n, turning to step 4.3.4.4, otherwise directly turning to step 4.3.4.4; step 4.3.4.4, if the end line number of the nth cycle is smaller than the maximum end line number of the unprocessed cycle, and the absolute value of the difference between the end line number of the nth cycle and the maximum end line number of the unprocessed cycle is smaller than the minimum boundary nearest cycle distance, and the start line number of the nth cycle is larger than the minimum start line number of the unprocessed cycle, updating the upper boundary nearest cycle distance to the absolute value of the difference between the end line number of the nth cycle and the maximum end line number of the unprocessed cycle, and updating the upper boundary nearest cycle id to the cycle nesting relation list index n, turning to step 4.3.4.5, otherwise directly turning to step 4.3.4.5; if n=n+1 in step 4.3.4.5, if 1 is less than or equal to N, go to step 4.3.4.2, and if N > N, the parent loop id of the currently processed loop is found, go to step 4.3.5.
12. The intelligent dependency analysis-based Fortran program parallel optimization method of claim 1, wherein the variable classification module receives the circular metadata list from the circular extraction module, receives the line-level data access view from the semantic analysis module, performs variable classification on N list elements in the circular metadata list, and classifies the N list elements into private variables and reduced variables according to the following steps: Step 5.1, initializing a private variable list and a reduction variable list to be empty, and respectively adding N empty lists into the private variable list and the reduction variable list; Step 5.2 let index n=1; step 5.3, the temporary private variable list and the temporary reduction variable list are made to be empty; Step 5.4, adding the nth cycle in the cycle metadata list and the cycle index variable name between the starting line number and the ending line number of the nth cycle into the temporary private variable list; step 5.5, classifying variables of the nth cycle in the cycle metadata list, wherein the method comprises the following steps: Step 5.5.1, let left variable list index lvl=1, let right variable list index rvl =1; Step 5.5.2, if the line number of the lvl-th left variable list element in the left variable list is not between the loop start line number and the loop end line number of the nth list element in the loop metadata list, turning to step 5.5.11, otherwise turning to step 5.5.3; Step 5.5.3, initializing a reduction variable existence flag as False, and initializing a classification left variable as a first element of a lvl-th left variable list element in a left variable list; step 5.5.4, enabling the rvl th right variable list index vr=1 in the right variable list; Step 5.5.5 if the classified left variable is the same as the vr right variable in the rvl th right variable list in the right variable list, it is possible to classify the left variable into a reduced variable, turn to 5.5.6, otherwise turn to 5.5.9; step 5.5.6, initializing a reduction operator, wherein the character string on the right of the equal sign is an empty character string, and the reduction mark is False; Step 5.5.7 extracts the equal number right character string of the corresponding row of the classified left variable in the Fortran source program, performs normalization processing, constructs a classified left variable regular pattern with non-word boundary limitation, sequentially performs extremum function structure matching comprising the classified left variable, left binary operation structure matching starting with the classified left variable and an operator, and right binary operation structure matching ending with the operator and the classified left variable on the right character string by using the pattern, and eliminates the situation of compound operators by checking adjacent characters of the operator, and sets a reduction mark as true and extracts a corresponding reduction operator when any matching is successful, and the method is as follows: Step 5.5.7.1, aiming at the line number appointed by the lvl-th element in the left variable list, acquiring a code line corresponding to the lvl-th element in the Fortran source code, positioning the index position of the "=" character in the line by using a find function built in Python, marking the position as eq_pos, extracting a substring on the right side of the equal number of the line appointed by the lvl-th left variable list element in the Fortran source program, removing the head and tail blank characters from the substring, converting the substring into a lowercase form, and finally assigning the substring to the right character string of the equal number; step 5.5.7.2, performing regular escape by using an escape function built in Python and taking the classified left variable as a basis to form a classified left variable mode fragment; step 5.5.7.3 adds "non-word boundary" definitions before and after the classification left variable pattern fragment, respectively, constructing a classification left variable boundary regular formula v, v being "(; In step 5.5.7.4, the right character string of the equivalent number is subjected to min or max reduction form judgment, and the method is as follows: Step 5.5.7.4.1 uses a match function built in Python to match the right character string with equal sign using the regular expression "\s (max|min) \s\ ((.+ -.) \s\$)", if matching is successful, the returned object is the max or min object after matching is recorded to be successful, and then the step 5.5.7.4.2 is turned to step 5.5.7.5; Step 5.5.7.4.2, using a search function built in Python, using a classified left variable boundary regular pattern as a pattern string, using a second capturing group in the max or min object as a searched character string, and matching, if matching is successful, turning to step 5.5.7.4.3, otherwise turning to step 5.5.7.5; Step 5.5.7.4.3 sets the reduction flag to True and assigns a first capture group in the max or min object to the reduction operator; step 5.5.7.5, performing binary operation on the right string of the equal sign to reduce left judgment, and judging whether the right string of the equal sign is shaped as 's+ expr', wherein s is a classified left variable, expr is a right expression positioned behind an operator, and the method is as follows: Step 5.5.7.5.1, using a match function built in Python, using a regular expression "\s { v } \s (++ \/])" to match the right character string of the equal sign, if the matching is successful, recording the returned object after the matching is successful as a binary operation left judgment object, turning to step 5.5.7.5.2; Step 5.5.7.5.2, marking the first capturing group of the binary operation left judging object as an operator capturing group, obtaining the ending index of the operator capturing group in the string on the right of the equal sign by utilizing the end method of the binary operation left judging object, marking the ending index as opend, slicing the string on the right of the equal sign to obtain a substring after opend, and marking the substring as preStr; Step 5.5.7.5.3 turns to step 5.5.7.5.4 if the operator capture group is "×and simultaneously preStr starts with" ×or the capture group is "/" andsimultaneously preStr starts with "×and otherwise turns to step 5.5.7.6; step 5.5.7.5.4 sets the reduction flag to True and assigns the operator capture group to the reduction operator, turning to step 5.5.7.6; step 5.5.7.6, performing binary operation reduction right judgment on the right character string of the equal number, and judging whether the right character string of the equal number is shaped as 'expr +s', wherein the method is as follows: Step 5.5.7.6.1, using a search function built in Python, using a regular expression "(+ -/]) \s { v } \s $" to match the right character string of the equal sign, if the matching is successful, recording the returned object after the matching is successful as a binary operation right judgment object, and turning to step 5.5.7.6.2; Step 5.5.7.6.2, marking the first capturing group of the binary operation right judging object as an operator capturing group, acquiring a starting index of the operator capturing group in the right character string of the equal sign by using a start method of the binary operation right judging object, marking the starting index as opstart, acquiring a character with an index of opstart-1 in the right character string of the equal sign, and naming the character as PRECHARACTER; step 5.5.7.6.3, if the operator capture set is "×or"/", and PRECHARACTER is equal to the operator capture set, go to step 5.5.8, otherwise go to step 5.5.7.6.4; Step 5.5.7.6.4 sets the reduction flag to True and assigns the operator capture group to the reduction operator; If the reduction mark is True in the step 5.5.8, adding the classified left variable and the reduction operator as the binary group into a temporary reduction variable list, and turning to the step 5.5.9, otherwise, directly turning to the step 5.5.9; Step 5.5.9, let vr=vr+1, if VR is less than or equal to VR, go to step 5.5.5, if VR > VR, go to step 5.5.10; step 5.5.10 adds the classified left variable to the temporary private variable list; Step 5.5.11, let lvl=lvl+1, if LVL is less than or equal to LVL, go to step 5.5.2, if LVL > LVL, indicate to get temporary reduction variable list, temporary private variable list, go to step 5.5.12; Step 5.5.12 removes the duplicate elements present in the temporary private variable list; step 5.5.13 removes elements in the temporary reduction variable list contained in the temporary private variable list; step 5.5.14, assigning the temporary private variable list to the nth list element of the private variable list, and assigning the temporary reduction variable list to the nth list element of the reduction variable list; Step 5.5.15, if N is less than or equal to N, turning to step 5.3, if N is greater than N, turning to step 5.5.16; And step 5.5.16, the variable classification module sends the reduced variable list to the intelligent dependency analysis engine, and sends the private variable list and the reduced variable list to the instruction generation and injection module.
13. The intelligent dependency analysis-based Fortran program parallel optimization method of claim 1, wherein the intelligent dependency analysis engine performs loop type checking, I/O operation checking, data dependency checking for each loop in the sixth step, and the method for generating the loop dependency list is as follows: step 6.1, initializing a cyclic dependency basic data list to be empty, and enabling the initial value of the cyclic dependency list to be equal to a cyclic nesting relation list; step 6.2, adding n empty lists into the circularly dependent basic data list; step 6.3, adding a value 'True' to the end of each list element in the circular dependency list as an initial parallelization mark; Step 6.4, loop type checking, I/O operation checking and data dependency checking are carried out on loops in the loop nesting relation list one by one, and the method comprises the following steps: step 6.4.1, making the cyclic nesting relationship list index n=1; Step 6.4.2, finding out the element with the same cyclic id number as the nth cyclic nesting relation list element in the cyclic nesting relation list in the cyclic metadata list, checking the cyclic type in the cyclic metadata list element corresponding to the nth element in the cyclic nesting relation list, judging that the nth element in the cyclic nesting relation list cannot be parallelized if the cyclic type of the nth element in the cyclic nesting relation list is labeled as while, and assigning the parallelization label of the nth element in the cyclic dependency list as False, turning to step 6.4.9, otherwise, indicating that the nth element in the cyclic nesting relation list may be parallelized, and turning to step 6.4.3; Step 6.4.3, checking the I/O operation of the nth list element in the cyclic nested relation list, if any element in the I/O row list in the row-level data access view is contained between the cyclic starting row number and the cyclic ending row number of the nth element in the cyclic nested relation list, judging that the nth list element in the cyclic nested relation list cannot be parallelized due to the fact that the cyclic contains I/O sentences, and assigning the parallelization identification of the nth element in the cyclic dependency list as 'False', turning to step 6.4.9, otherwise, indicating that the cyclic does not contain I/O sentences, and possibly enabling the nth element in the cyclic nested relation list to be parallelized, and turning to step 6.4.4; Step 6.4.4 is that the data dependence judgment is carried out for the nth cycle in the cyclic nested relation list, and a cyclic dependence basic data list is obtained; step 6.4.5, initializing a loop index variable of an nth loop in the loop nesting list as an empty string; Step 6.4.6, finding an element with the same cycle id number as the element of the nth cycle nesting relation list in the cycle nesting list in the cycle metadata list, and assigning the name of the cycle index variable of the element to the cycle index variable of the nth cycle; step 6.4.7, traversing an nth list of the cyclic dependency basic data list, judging whether an nth cycle in the cyclic nesting relation list has single-line data dependency according to row-level cyclic dependency judging data in the cyclic dependency basic data list, namely carrying out line-by-line data dependency detection on a Fortran source program, assigning 1 to a variable dependency mark and a multi-group dependency mark according to the intra-line dependency basic data and inter-line dependency basic data in the cyclic dependency basic data list, judging that an nth list element in the cyclic nesting relation list can not be parallelized according to the values of the variable dependency mark and the multi-group dependency mark, and setting a parallelization mark of the nth cycle in the cyclic dependency list as 'False' if the nth list element in the cyclic nesting relation list can not be parallelized; Step 6.4.8, detecting inter-row data dependency relations of all rows of the nth cycle, if variable read-write data dependency exists in the nth cycle, setting a parallelization flag of the nth cycle in a cyclic dependency list to be 'False', and turning to step 6.4.9; step 6.4.9, let n=n+1, if N is not less than 1 and not more than N, turning to step 6.4.2, if N is not less than N, explaining that three-level parallelization checking processing of N loops in the loop metadata list is finished, obtaining a loop dependency list, turning to step 6.5; step 6.5 sends the circular dependency list to the instruction generation and injection module.
14. The intelligent dependency analysis-based Fortran program parallel optimization method of claim 13, wherein the method for performing data dependency determination for the nth cycle in the cyclic nested relationship list in step 6.4.4 to obtain the cyclic dependency basic data list is as follows: step 6.4.4.1, initializing a dependent left variable, a dependent left array as an empty string, and initializing a dependent right variable list and a dependent right array list as an empty list; step 6.4.4.2 makes left variable list index lvl=1, right variable list index rvl =1, left array list index lal =1, and right array list index ral=1; Step 6.4.4.3, if the line number of the lvl-th left variable list element in the left variable list is not between the loop start line number and the loop end line number of the nth list element in the loop nesting relationship list, turning to step 6.4.4.9; Step 6.4.4.4, if the variable list of the lvl-th left variable list element in the left variable list is not empty, assigning the first element in the variable list of the lvl-th left variable list element in the left variable list to the dependent left variable, and turning to step 6.4.4.5; Step 6.4.4.5, assigning rvl th right variable list element in the right variable list to the dependent right variable list; If the variable list of the lal th left array list element in the left array list is not empty, the step 6.4.4.6 assigns the first element in the variable list of the lal th left variable list element in the left variable list to the dependent left array turning step 6.4.4.7, otherwise, the step 6.4.4.7 is directly turned; step 6.4.4.7, assigning the ral right variable list element in the right array list to the dependent right array list; Step 6.4.4.8, adding the line number, the dependent left variable, the dependent left array, the dependent right variable list and the dependent right array list of the lvl-th left variable list element in the left variable list as line-level cyclic dependency determination data to the nth list of the cyclic dependency basic data list, so that the size of the nth list of the cyclic dependency basic data list is NDL; Step 6.4.4.9 lets lvl=lvl+1, rvl= rvl +1, lal= lal +1, ral=ral+1, if LVL is less than or equal to LVL, go to step 6.4.4.3, and if LVL > LVL, indicate that a circularly dependent underlying data list is obtained, go to step 6.4.5.
15. The intelligent dependency analysis based Fortran program parallel optimization method of claim 13, wherein the traversing the nth list of the circular dependency base data list in step 6.4.7, based on the row-level circular dependency determination data in the circular dependency base data list, determines whether the nth cycle in the circular nested relationship list has a single row of data dependencies is as follows: Step 6.4.7.1 makes the index ndl=1 of the nth list of the circularly dependent basic data list; Step 6.4.7.2 adopts a single-row data dependency detection method, and carries out data dependency detection on ndl of the Fortran source program according to the data in the nth list in the cyclic dependency basic data list to obtain values of a variable dependency mark and an array dependency mark, wherein the method comprises the following steps: Step 6.4.7.2.1, initializing a variable dependency flag to be 0, an array dependency flag to be 0, a left array name index list to be empty, a right array name list to be empty, and a right array index list to be empty; Step 6.4.7.2.2, if the left variable of the ndl line of the Fortran source program is not null and the right variable list of the ndl line is not null, turning to step 6.4.7.2.3, otherwise turning to step 6.4.7.2.5; Step 6.4.7.2.3, if the n-th element of the reduced variable list does not have the left variable of the ndl line of the Fortran source program, turning to step 6.4.7.2.4, otherwise turning to step 6.4.7.2.5; Step 6.4.7.2.4, if the right variable list of the ndl row of the Fortran source program has the left variable of the ndl row, setting the variable dependency flag to be 1, turning to step 6.4.7.2.5, otherwise, directly turning to step 6.4.7.2.5; Step 6.4.7.2.5, if the left array of the ndl line of the Fortran source program is not empty and the right array list of the ndl line is not empty, turning to step 6.4.7.2.6, otherwise turning to step 6.4.7.2.15; Step 6.4.7.2.6 adopts an array name and subscript analysis method to analyze the array name and subscript of the left array of the ndl line of the Fortran source program to obtain a left array name and subscript list of the ndl line, and the method comprises the following steps: Step 6.4.7.2.6.1, initializing an array name index list to be an empty list; Step 6.4.7.2.6.2, dividing the left array of the ndl line by using '(' as a first divider to obtain a first array index string, then dividing the first array index string by using ')' as a second divider to obtain a second array index string, and finally dividing the second array index string by using '(' as a divider to obtain an array name index string; Step 6.4.7.2.6.3, adding the first element of the array name index character string, namely the array name character string, into an array name index list to enable the size to be aiL; Step 6.4.7.2.6.4, removing the blank of the second element and all the following elements in the array name index character string by using a play function built in Python, and adding the blank into the array name index list; Step 6.4.7.2.7, sequentially adding the elements in the array name index list into a left array name index list; step 6.4.7.2.8 adopts the array name and index analysis method described in step 6.4.7.2.6 to analyze the third element of all list elements in the right array list of the ndl row for a number of array names and index to obtain a right array name index list corresponding to the number of the right array list elements, adding the first element of all right array name index lists into a right array name list RARRAYNAMELST according to the analyzed sequence, and adding the second element and all following elements in all right array name index lists into a right array index list RArrayIndexLst as a list according to the analyzed sequence; step 6.4.7.2.9, if RARRAYNAMELST includes the first element of the left list of array name indices, go to 6.4.7.2.10, otherwise go to step 6.4.7.2.15; Step 6.4.7.2.10, initializing a left array cyclic variable index character string to be empty, and initializing a right array cyclic variable index character string list to be an empty list; Step 6.4.7.2.11, checking whether the second element of the left array name index list starts and all index string elements after the second element have elements containing the cyclic index variable of the nth cycle, if so, assigning all index string elements containing the cyclic index variable of the nth cycle to the left array cyclic variable index string list, turning to step 6.4.7.2.12, and if not, turning to step 6.4.7.2.15; Step 6.4.7.2.12, finding the index ail of all the elements in the right array name list element which are equal to the first element in the left array name, checking whether the element in the ail index list in the right array index list contains the cyclic index variable of the nth cycle, if so, adding all the elements containing the cyclic index variable of the nth cycle in the index list into the index string list of the cyclic variable index of the right array, turning to step 6.4.7.2.13, otherwise turning to step 6.4.7.3; if the left array cyclic variable index string list has a cyclic index variable equal to the nth cycle and only one index element in the left array cyclic variable index string list contains the cyclic index variable of the nth cycle, step 6.4.7.2.13 indicates that parallelization is possible, step 6.4.7.2.14 is performed, otherwise, the array dependency flag is set to 1, step 6.4.7.3 is performed; step 6.4.7.2.14, checking whether the elements of the cyclic index variable which are not equal to the nth cycle exist in the right array cyclic variable index string list, if so, indicating that true or false data dependence exists, setting an array dependence flag to be 1, turning to step 6.4.7.3, and if not, directly turning to step 6.4.7.3; Step 6.4.7.3, if the variable dependency flag is 1 or the array dependency flag is 1, indicating that the nth list element in the cyclic nested relation list cannot be parallelized, setting the parallelization flag of the nth cycle in the cyclic dependency list to be 'False', turning to step 6.4.7.4, if the variable dependency flag is not 1 and the array dependency flag is not 1, directly turning to 6.4.7.4; Step 6.4.7.4 causes ndl=ndl+1, and if NDL is less than or equal to NDL, go to step 6.4.7.2, and if NDL > NDL, the data dependency check process for all single lines of the nth cycle in the circular dependency base data list is completed, go to step 6.4.8.
16. The intelligent dependency analysis based Fortran program parallel optimization method of claim 13 or 14, wherein said method for detecting inter-row data dependency of all rows of the nth cycle in step 6.4.8 is: step 6.4.8.1, initializing a variable dependency flag and an array dependency flag to be 0; Step 6.4.8.2, letting the index ndl=1 of the nth list element in the circular dependency base data list; Step 6.4.8.3, if the left variable of the ndl line of the circularly dependent basic data list is not null, turning to step 6.4.8.4, otherwise turning to step 6.4.8.7; Step 6.4.8.4 sets the row index ndlpre =1 preceding the ndth row in the nth cyclic code row; step 6.4.8.5, if the left variable of the ndl line is in the right variable list of the ndlpre line, indicating that the data dependence exists after the variable is read, setting the parallelization flag of the nth cycle in the cyclic dependency list as 'False', turning to step 6.4.9, otherwise turning to step 6.4.8.6; Step 6.4.8.6 causes ndlpre = ndlpre +1, if ndlpre < ndl, go to step 6.4.8.5, if ndlpre is greater than or equal to ndl, indicate whether read-write data dependency processing is completed for ndl line of nth cycle, go to step 6.4.8.7; Step 6.4.8.7, let ndl=ndl+1, if NDL is less than or equal to NDL, go to step 6.4.8.3, if NDL > NDL, indicate that the line processing of the read-write data dependency relationship of all variables possibly existing in the nth cycle is completed, go to step 6.4.8.8; Step 6.4.8.8 makes the index ndl=1 of the nth list element in the circularly dependent basic data list; Step 6.4.8.9, if the left array of the ndl line is not empty, turning to step 6.4.8.10, otherwise turning to step 6.4.8.14; Step 6.4.8.10 causes the index ndll =ndl+1 following the ndl line in the nth cyclic code line; Step 6.4.8.11, if the right array list of ndll is not empty, turning to step 6.4.8.12, otherwise turning to step 6.4.8.14; Step 6.4.8.12, synthesizing the left array of-1, the empty string, the ndl line, the empty string list and the right array list of ndll line into inter-line array dependent basic data of ndl and ndll lines; Step 6.4.8.13 adopts an inter-row array dependency detection method, and according to the inter-row array dependency basic data of ndl and ndll rows, carries out inter-row array dependency detection on ndl and ndll of the Fortran source program to obtain the value of an array dependency mark, and the method comprises the following steps: Step 6.4.8.13.1, if the left array of the ndl line of the Fortran source program is not empty and the right array list of the ndll line is not empty, turning to step 6.4.8.13.2, otherwise turning to step 6.4.8.13.9; step 6.4.8.13.2 adopts the array name and index analysis method described in step 6.4.7.2.6 to analyze the array name and index of the left array of the ndl line of the Fortran source program, and obtains an array name index list of the left array of the ndl line; Step 6.4.8.13.3 adopts the array name and index analysis method described in step 6.4.7.2.6 to analyze the third element of all list elements in the right array list of the ndll th row of the Fortran source program for a number of array names and index to obtain a right array name index list with the number corresponding to the right array list elements, adds the first element of all right array name index lists into the right array name list RARRAYNAMELST according to the analyzed sequence, and adds the second element of all right array name index lists and all following elements as a list into the right array index list RArrayIndexLst according to the analyzed sequence; Step 6.4.8.13.4, if RARRAYNAMELST includes the first element of the left list of array name indices, go to 6.4.8.13.5, otherwise go to step 6.4.8.13.9; Step 6.4.8.13.5, initializing a left array cyclic variable index character string to be empty, and initializing a right array cyclic variable index character string list to be an empty list; Step 6.4.8.13.6, checking whether the second element of the left array name index list starts and all index string elements after the second element have elements containing the cyclic index variable of the nth cycle, if so, assigning all index string elements containing the cyclic index variable of the nth cycle to the left array cyclic variable index string list, turning to step 6.4.8.13.7, and if not, turning to step 6.4.8.13.9; Step 6.4.8.13.7, finding the index ail of all the elements in the right array name list element which are equal to the first element in the left array name, checking whether the element in the ail index list in the right array index list contains the cyclic index variable of the nth cycle, if so, adding all the elements containing the cyclic index variable of the nth cycle in the index list into the index string list of the cyclic variable index of the right array, turning to step 6.4.8.13.8, otherwise turning to step 6.4.8.14; If the left array cyclic variable index string list has a cyclic index variable equal to the nth cycle and only one index element in the left array cyclic variable index string list contains the cyclic index variable of the nth cycle, step 6.4.8.13.8 indicates that parallelization is possible, step 6.4.8.13.9 is performed, otherwise, the array dependency flag is set to 1, step 6.4.8.14 is performed; Step 6.4.8.13.9, checking whether the elements of the cyclic index variable which are not equal to the nth cycle exist in the right array cyclic variable index string list, if so, indicating that true or false data dependence exists, setting an array dependence flag to be 1, turning to step 6.4.8.14, and if not, directly turning to step 6.4.8.14; Step 6.4.8.14 lets ndll = ndll +1, if ndll is less than or equal to NDL, go to step 6.4.8.12, otherwise ndll > NDL, indicating that the read-after-write data dependency check possibly existing between NDL line and ndll line is completed, go to step 6.4.8.15; step 6.4.8.15 lets ndl=ndl+1, if NDL is less than or equal to NDL, turning to step 6.4.8.10, otherwise NDL > NDL, indicating that the row checking of all possible read-after-write data dependencies in the nth cycle is completed, turning to step 6.4.8.16; step 6.4.8.16 causes the index ndll =1 of the nth list element in the circularly dependent underlying data list; Step 6.4.8.17, if the right array list of ndll is not empty, turning to step 6.4.8.18, otherwise turning to step 6.4.8.22; step 6.4.8.18 sets the index ndl= ndll +1 following the ndll th line in the nth cyclic code line; Step 6.4.8.19, if the left array of the ndl line is not empty, turning to step 6.4.8.20, otherwise turning to step 6.4.8.21; Step 6.4.8.20, synthesizing the left array of the-1 st, empty string, the left array of the ndl line, the empty string list and the right array list of the ndll th line as inter-line dependency basic data of the ndl line and the ndll line, adopting the inter-line array dependency detection method of step 6.4.8.13, and performing inter-line array dependency detection on the ndl line and the ndll line according to the inter-line array dependency basic data of the ndl line and the ndll line to obtain the value of an array dependency mark, and turning to step 6.4.8.21 if the value of the array dependency mark is 0, and turning to step 6.4.9 if the value of the array dependency mark is 1; Step 6.4.8.21, let ndl=ndl+1, if NDL is less than or equal to NDL, go to step 6.4.8.19, if NDL > NDL, indicate that read-write data dependency checking possibly existing between NDL line and ndll line is completed, go to step 6.4.8.22; Step 6.4.8.22 lets ndll = ndll +1, if ndll is less than or equal to NDL, go to step 6.4.8.18, and if ndll is greater than NDL, go to step 6.4.9 to check all possible rows of write-after-read data dependencies in the nth cycle.
17. The method for parallel optimization of Fortran program based on intelligent dependency analysis as claimed in claim 1, wherein the seventh step is that the instruction generating and injecting module marks the loop in the loop dependency list as parallelizable, generates the most suitable OpenMP parallelization instruction set for the loop, injects the OpenMP parallelization instruction in the OpenMP parallelization instruction set to the beginning line and the end line of M parallelizable loops in the Fortran program source code, and generates the parallelized Fortran generation program source code by the method comprising: Step 7.1, analyzing which loops need to be parallelized at the upper level according to the loop dependency list, setting a parallelization mark as None, and the method is as follows: Step 7.1.1 letting the cyclic dependency list index n=1; Step 7.1.2, if the parent cycle id of the nth cycle in the cyclic dependency list is-1, the cycle is the outermost cycle, and the step is turned to 7.1.10, if the parent cycle id of the nth cycle in the cyclic dependency list is not-1, the step is turned to step 7.1.3; step 7.1.3, if the parallelization flag of the nth cycle in the cyclic dependency list is True, turning to step 7.1.4, otherwise turning to step 7.1.10; step 7.1.4, initializing an upper-level None cycle id list to be empty; step 7.1.5 sets initial parameter loopD to the nth circular list element in the circular dependency list; Step 7.1.6, if the parent cycle flag in loopD is not-1, go to step 7.1.7, otherwise go to step 7.1.9; Step 7.1.7 if the parallelization flag of the loop with the loop id number loopD in the parent loop id in the loop dependency list is "False", go to step 7.1.9, otherwise go to step 7.1.8; Step 7.1.8, adding the cycle id number in loopD into an upper-level None cycle id list, and then taking a cycle element with the cycle id number loopD in the parent cycle id in the cycle dependency list as an initial parameter loopD, and turning to step 7.1.6; Step 7.1.9 sets the parallelization flag of the loop with the id number in the loop dependency list in the upper loop id list as None, which indicates that the loop needs parallelization in the upper loop, and goes to step 7.1.10; Step 7.1.10, let n=n+1, if N is less than or equal to N, turn to step 7.1.2, if N > N, turn to step 7.2; step 7.2, initializing a circulating OpenMP instruction statement list to be empty; Step 7.3, combining the cyclic dependency list, the private variable list and the reduced variable list, generating corresponding OpenMP instruction sentences for the parallelizable cycles in the cyclic dependency list, obtaining a cyclic OpenMP instruction sentence list, and turning to step 7.4; Step 7.4, according to the Fortran program source code list and the cyclic OpenMP instruction statement list, inserting OpenMP instruction statements into the Fortran program source code to obtain the parallelized Fortran program source code, wherein the method comprises the following steps: step 7.4.1, dividing the file name of the Fortran source program into a base name and an extension name by using a splitext function built in Python, and then splicing the base name, the "_omp" and the extension name to obtain the parallelized file name of the Fortran source program; Step 7.4.2, using an open () function built in Python, opening the parallelized Fortran program source code file in a writing mode, and obtaining a file object f for subsequent file operation; step 7.4.3 initializing the code line sequence number to nn=1; Step 7.4.4, if the nn behavior in the Fortran program source code list is circularly dependent on the initial line of a certain circulating element in the list, making the circulating id of the circulating element be loopid, turning to step 7.4.5, otherwise turning to step 7.4.8; Step 7.4.5, if loopid1 is not present in any element of the cyclic OpenMP instruction statement list, indicates that the cycle with the cycle id loopid1 cannot be inserted into the OpenMP instruction statement, and goes to step 7.4.7, otherwise goes to step 7.4.6; Step 7.4.6, obtaining an OpenMP start instruction statement containing loopid elements in the cyclic OpenMP instruction statement list, writing the OpenMP start instruction statement into a result file by using a file object f, and turning to step 7.4.7; step 7.4.7, writing the nn line in the Fortran program source code list into the parallelized Fortran program source code by using the file object f; Step 7.4.8, if the nn behavior in the Fortran program source code list circulates the ending line of a certain circulating element in the dependency list, making the circulating id of the circulating element be loopid, turning to step 7.4.9, otherwise turning to step 7.4.12; step 7.4.9, writing the nn line in the Fortran program source code list into the parallelized Fortran program source code by using the file object f; if loopid is not present in any element of the circular OpenMP instruction statement list, step 7.4.10 goes to step 7.4.12, otherwise goes to step 7.4.11; Step 7.4.11 obtains an OpenMP end instruction statement containing loopid2 elements in the cyclic OpenMP instruction statement list, and writes the OpenMP end instruction statement into the parallelized Fortran program source code file by using the file object f; step 7.4.12, writing the nn line in the Fortran program source code list into the parallelized Fortran program source code file by using the file object f; Step 7.4.13 makes nn=nn+1, if NN is not less than 1 and not more than NN, turning to step 7.4.4, otherwise NN > NN, indicating that insertion of all parallelizable cyclic OpenMP instruction sentences in the Fortran program source code is completed, obtaining a parallelized Fortran program source code file, and ending.
18. The intelligent dependency analysis-based Fortran program parallel optimization method of claim 17, wherein the method for generating the corresponding OpenMP instruction statement for the parallelizable loop in the cyclic dependency list by combining the cyclic dependency list, the private variable list and the reduced variable list in step 7.3 is as follows: Step 7.3.1 let the circular dependency list index n=1, let OpenMP guide statement start string startOmpStr be "|$omp", and let OpenMP guide statement end string endOmpStr be "|$omp END PARALLEL do"; step 7.3.2, if the parallelization flag of the nth cycle in the cyclic dependency list is "True", turning to step 7.3.3, otherwise turning to step 7.3.9; step 7.3.3, if the parent cycle id of the nth cycle in the cyclic dependency list is-1, describing the nth cycle as the outermost cycle, directly generating an OpenMP instruction sentence for the nth cycle, and turning to step 7.3.4, otherwise turning to step 7.3.7; Step 7.3.4, splicing startOmpStr's 0 and ' parallel do's into a character string startOmpStr; Step 7.3.5, adding a private variable list corresponding to the nth cycle in the cyclic dependency list in the private variable list into a private clause for controlling the variable scope by adopting a private variable adding method, wherein the method comprises the following steps of: Step 7.3.5.1, splicing startOmpStr1 with "private" to form startOmpStr; step 7.3.5.2, sequentially adding the private variables in the nth cycle in the private variable list to the private clause to form a form of "private (private variable 1, private variable 2,.)"; If the reduced variable list corresponding to the nth cycle in the cyclic dependency list is not empty in the reduced variable list in step 7.3.6, indicating that the reduced variable exists in the nth cycle, turning to step 7.3.7, otherwise turning to step 7.3.8; step 7.3.7 adopts a reduction variable adding method to add the reduction variable list in the reduction variable list of the nth cycle into a reduction clause for processing cross-thread data reduction, and the method is as follows: step 7.3.7.1 splices startOmpStr with the "reduction" to form startOmpStr; Step 7.3.7.2 sequentially adding the reduction variables in the nth cycle in the reduction variable list to the reduction clause to form a form of "reduction (operator 1: reduction variable 1, operator 2: reduction variable 2.)"; step 7.3.8 adds the loop id, startOmpStr, endOmpStr0 of the nth loop in the loop dependency list as a list to the loop OpenMP instruction statement list; step 7.3.9, let n=n+1, if N is less than or equal to N, go to step 7.3.2, if N > N, indicate that the parallelizable cyclic instruction statement in the cyclic dependency list is generated, obtain the cyclic OpenMP instruction statement list, go to step 7.4.

Description

Fortran program parallel optimization method based on intelligent dependency analysis Technical Field The invention belongs to the crossing field of High Performance Computing (HPC) and software engineering, and particularly relates to a Fortran program parallel optimization method based on intelligent dependency analysis. Background High Performance Computing (HPC) is used as a core engine for driving modern scientific discovery and engineering innovation, and plays an indispensable role in the leading-edge fields of climate simulation, drug development, aerospace design and the like. The importance of the HPC system is becoming more and more prominent, and particularly when dealing with extremely large-scale complex computing problems, the HPC system has become a key to breaking through the traditional computing bottleneck. In the face of the processing requirement of data above T level and P level, how to efficiently drive huge computing clusters and fully release the potential extreme performance of the computing clusters has become one of the most urgent technical challenges at present. Fortran language, by virtue of its inherent advantages and excellent performance in the field of numerical computation, has long been the preferred language for computationally intensive applications such as physical simulation, hydrodynamic, structural analysis, etc., carrying a number of key scientific and engineering code assets. Currently, the architecture of high performance computing systems is undergoing profound changes, typically characterized by heterogeneous synergy of multi-core CPUs with many-core GPUs. This heterogeneous architecture, while providing a hardware basis for exceptional performance, presents an unprecedented parallelization challenge to the software level. Traditionally, implementing parallelization of computing tasks often relies on a developer's deep understanding of the underlying hardware architecture and extensive manual programming effort. The distributed parallel method mainly comprises a distributed parallel method based on a Message Passing Interface (MPI), realizes large-scale collaborative computation across nodes through explicit data communication, is suitable for a huge cluster environment, and has significant cognitive burden on developers due to high development cost and complex communication management. Another approach is open multiprocessing (OpenMP) parallel technology based on shared memory, which optimizes computation-intensive loops on single-node multi-core processors through compiler instructions, while improving programming convenience, its optimization effect is still highly dependent on deep analysis and human intervention by developers on code structures, data access patterns, and potential bottlenecks. In heterogeneous computing environments, if parallelization optimization is not effectively implemented, the powerful computing power of the HPC system faces serious idling and waste. For example, if the core computing cycle in the Fortran code cannot be mapped to the parallel units such as the multi-core CPU or the GPU, a large number of processor cores will be in a waiting state, resulting in low utilization rate of computing resources, not only slowing down the running progress of the Fortran code, but also causing huge energy consumption. Especially when facing the traditional Fortran code base, parallel transformation is not completed by simply adding instructions. This requires that engineers or scientists have to conduct extremely fine-grained deep analysis of the original code to identify and resolve various complex data dependencies therein, such as loop-carrying dependencies across iterations, potential reduction operations, and synchronization requirements associated with I/O operations, etc. However, this process of parallelization optimization, which relies on manpower, faces a serious set of technical challenges in practice, greatly hampering the performance release of the Fortran program on modern HPC systems. Complexity of data dependency analysis-data dependency analysis in loops is a core difficulty of parallelization. Accurately identifying read-write dependencies among variables, particularly loop-carrying dependencies across iterations, is a key challenge to achieving proper parallelization. For example, in the loop doi=2, n, a (i) =a (i-1) +b (i), and endo, the value of a (i) is calculated for the previous iteration, directly dependent on the value of a (i-1) updated in the previous iteration (i-1). This cross-iteration read-write dependency constitutes a typical loop carrying dependency, making it impossible for the loop to directly parallelize between iterations. In addition, identifying and properly handling reduction operations (e.g., summing, integrating, maximizing, etc.) is also complex, and if it is not possible to accurately determine its reducibility and implement an appropriate parallel policy (e.g., the OpenMP reduction mechanism),