CN-121979569-A - Heterogeneous transplanting and parallel optimizing method for multiphase flow numerical simulation software
Abstract
The invention discloses a heterogeneous transplanting and parallel optimizing method of multiphase flow numerical simulation software, and belongs to the technical field of computational fluid mechanics heterogeneous parallel computing. The method realizes cross-language adaptation by establishing a cross-language calling interface, unifying an array storage access mode, adapting a function interface format and reconstructing a core computing function module, completes computing deployment through efficient management of a storage space of a device end, data oriented copying and synchronization, kernel function optimization definition, thread scheduling optimization, result feedback and parallel communication, optimizes grid data interaction efficiency by combining a grid management scheme with double-layer boundary expansion, and realizes efficient parallel scheduling and expansibility optimization by dividing a computing domain, distributing processes, synergetically dividing tasks and formulating a weak expansibility optimization strategy. The invention can realize seamless transplanting and efficient interoperation of software, improve single-step iteration throughput rate and acceleration ratio, relieve communication transmission bottleneck, and has good expansibility and engineering practicability.
Inventors
- HUANG BIAO
- HONG MING
- CHEN JIE
- WU QIN
- LIU JIAXUAN
Assignees
- 北京理工大学
- 江苏大学
Dates
- Publication Date
- 20260505
- Application Date
- 20251128
Claims (8)
- 1. A heterogeneous migration and parallel optimization method for multiphase flow numerical simulation software is characterized by comprising the steps of establishing a cross-language call interface, unifying array storage and access modes, adapting a function call interface, carrying out modularized reconstruction of a core computing function, realizing cross-language adaptation and computing function reconstruction, carrying out data oriented copying and synchronization, carrying out kernel function definition and thread scheduling optimization, carrying out result feedback and parallel communication, realizing deployment and execution of a computing function, realizing grid expansion and data interaction optimization based on a grid management and data interaction optimization method of double-layer boundary expansion, and realizing efficient and stable operation of complex multiphase and multi-field flow numerical simulation on a heterogeneous high-performance computing platform through computing domain division and process allocation, collaborative task division, weak expansibility optimization strategies, and realizing parallel scheduling and expansibility optimization.
- 2. The heterogeneous migration and parallel optimization method of multiphase flow numerical simulation software as recited in claim 1, comprising the steps of: step 1, cross-language adaptation and calculation function reconstruction; The cross-language migration method based on the Fortran and C mixed programming solves the compatibility problem, and realizes the efficient calling and reconstruction of the core computing module in a heterogeneous environment; step2, the deployment and execution of DCU calculation functions; After cross-language adaptation and calculation function rewriting are completed, transplanting the rewritten C-language calculation function to a DCU equipment end for execution, and realizing efficient parallel operation of multiphase flow numerical simulation core calculation on heterogeneous platforms; step 3, grid expansion and data interaction optimization; the grid management and data interaction optimization method based on double-layer boundary expansion reduces the data transmission times and the inter-process communication overhead of a host end and a device end; step 4, parallel scheduling and expansibility optimization; the CPU-DCU collaborative hybrid parallel architecture construction method based on the MPI communication mechanism and weak expansibility verification improves the calculation efficiency and expansibility in a multi-device environment.
- 3. The heterogeneous migration and parallel optimization method of multiphase flow numerical simulation software according to claim 2, wherein in step 1, cross-language adaptation and calculation function reconstruction comprises the following sub-steps: step 1.1, establishing a cross-language calling interface; constructing an external C function interface at the Fortran end through interface statements, and adopting a unified lowercase naming mode to avoid link errors caused by automatic underline addition of a compiler; The consistency of calling convention between two languages is realized by adding an external 'C' modification at the C end, and the identifiability and the linkable property of the cross-language function are ensured; Step 1.2, unifying the storage and access modes of the array; The Fortran adopts a column priority storage mode, the C language adopts a row priority storage mode, the multidimensional array is uniformly converted into a one-dimensional array at the C end, and the equivalent addressing of the index of the multidimensional array at the Fortran end and the one-dimensional array of the C language is realized through the offset mapping relation, so that the read-write consistency and correctness of the cross-language data are ensured; step 1.3, adapting a function call interface; The Fortran language distinguishes functions and subroutines, the C language only provides function definition, corresponding interfaces are respectively declared in the form of functions or subroutines at the Fortran end according to the return value characteristics, and the compatibility of the Fortran according to reference transmission and C language according to value transmission mechanisms is realized by adopting a pointer transmission mode; For the case of containing a composite data structure, synchronously establishing a structure body definition with consistent field sequence and memory size at the Fortran end and the C end so as to ensure the address consistency and data integrity of the structure body type parameter in the cross-language calling process; Step 1.4, modular reconstruction of a core computing function; Reconstructing an original Fortran core computing function into a C language version in a modularized mode, wherein a circulating logic is separated from a computing main body, a circulating control part is reserved at a C end, and a specific computing formula is packaged into an independent function module; The Fortran end only needs to transmit necessary input data and index information to the C end, and the C end further encapsulates the computing function into a kernel function executable on the DCU end through the HIP interface, so that the heterogeneous deployment and the accelerated execution of the computing core are realized.
- 4. The heterogeneous migration and parallel optimization method of multiphase flow numerical simulation software according to claim 2, wherein in step 2, the deployment and execution of DCU calculation functions comprises the following sub-steps: step 2.1, managing storage space at the equipment end; in the initialization stage, preparing input data by a host end and finishing preprocessing, opening up a corresponding storage space in a DCU end video memory through a hipMalloc interface for storing input and intermediate variables required by calculation, and simultaneously calling HIPSETDEVICE to realize one-to-one binding of an MPI process and target DCU equipment so as to ensure that equipment resources in a multi-process environment are independent and consistent with task allocation, and after the calculation task is finished, releasing the video memory space through a hipFree interface to finish closed loop of equipment end memory management; Step 2.2, data directional copying and synchronization between the host and the device; the method comprises the steps of realizing data directional transmission between a CPU end and a DCU end by HIPMEMCPY, copying initialized or updated data from the CPU end to a DCU end video memory before each round of iterative computation, and returning a computation result from the DCU end to a CPU end storage space after computation is completed so as to ensure data consistency required by subsequent communication and analysis; step 2.3, defining a kernel function and optimizing thread scheduling; Packaging the reconstructed computing main body into __ global __ kernel functions executable on a DCU, further modularizing the computing logic into __ device __ functions for multiplexing and maintenance, determining corresponding grid unit coordinates according to thread index parameters (threadIdx, blockIdx, blockDim), setting thread blocks and grid sizes by combining the whole grid scale, and realizing balanced distribution and parallel coverage of computing tasks; step 2.4, the result feedback is communicated with the MPI in parallel; After the communication is completed, the updated data is retransmitted back to the DCU end to provide input for the next round of iterative computation, so that the loop execution of the multi-process-multi-device collaborative computation is realized.
- 5. The heterogeneous migration and parallel optimization method of multiphase flow numerical simulation software according to claim 2, wherein in step 3, grid expansion and data interaction optimization are performed; In the parallel computing process, the DCU end needs to frequently transmit data back to the CPU end to finish boundary updating, and the grid management and data interaction optimization method based on double-layer boundary expansion reduces the data transmission times of the host end and the equipment end and the cross-process communication overhead, so that the problem of data transmission bottleneck is solved; on the basis of the local calculation domain corresponding to each calculation process, two layers of redundant boundary layers are arranged along the boundary direction of the region, wherein the first layer is a conventional inter-process communication boundary and is used for receiving data transmitted by adjacent processes to realize synchronization of boundary conditions; the first layer boundary data is transmitted by adjacent processes in an iterative synchronization stage, and the second layer boundary expansion data is directly generated at the DCU end through the calculation result of the previous iteration without additional communication; During calculation execution, the expanded calculation data area is loaded at one time at the DCU end, so that the kernel function can cover calculation operations of two layers of boundary units in the same round of execution, and the validity of local boundary data is maintained in continuous multiple iterations through redundant calculation in the second layer of boundary.
- 6. The heterogeneous migration and parallel optimization method of multiphase flow numerical simulation software according to claim 2, wherein in step 4, parallel scheduling and expansibility optimization comprises the following sub-steps: step 4.1, calculating domain division and process allocation; According to the grid division principle of the finite difference method, the whole calculation domain is divided regularly along the Y direction and the Z direction to form a plurality of independent sub-calculation regions, an MPI process is distributed for each sub-region in an initialization stage, each MPI process is bound with an independent DCU end, and a distributed parallel calculation mode of one-to-one correspondence between the processes and the equipment is realized; step 4.2, the CPU-DCU cooperates with task division; in the hybrid architecture, the CPU end mainly bears control and management functions; The DCU end is focused on the high parallel execution of the core computing module and is responsible for the numerical iteration and update of main physical quantities; the CPU end and the DCU end perform data interaction through a high-speed PCI-E bus to form a high-efficiency coordination mechanism of host control and equipment calculation; step 4.3, weak expansibility optimization strategy; When the parallel scale is increased, the input file automatically adjusts grid division and data scale according to the number of processes, so that the calculation load born by each process is kept approximately consistent with that of a single card in operation, and the calculation scale of unit equipment is kept constant, so that the calculation efficiency and the communication load of the whole system are ensured to be maintained at high level when the whole system is expanded to a larger scale, good weak expansibility characteristic and linear performance growth trend are realized, and the parallel efficiency under the conditions of multiple processes and multiple equipment is ensured.
- 7. The heterogeneous migration and parallel optimization method of multiphase flow numerical simulation software according to claim 4, wherein in step 2.3, the neighborhood cell data is cached in the kernel function by setting a small-scale local matrix, so that frequent access to a video memory is reduced, and the memory access efficiency and the overall execution performance of calculation are improved.
- 8. The heterogeneous migration and parallel optimization method of multiphase flow numerical simulation software of claim 6, wherein in step 4.2, the CPU end mainly bears control and management functions including calculation domain division, MPI process communication scheduling, boundary data synchronization and output and storage of result data; the DCU end is responsible for numerical iteration and updating, and the main physical quantities comprise a speed field, a pressure field and a volume fraction.
Description
Heterogeneous transplanting and parallel optimizing method for multiphase flow numerical simulation software Technical Field The invention relates to a heterogeneous transplantation and parallel optimization method of multiphase flow numerical simulation software, and belongs to the technical field of heterogeneous parallel computation in computational fluid mechanics. Background In the fields of ship and ocean engineering, aerospace engineering and hydraulic equipment, the geometric design complexity of engineering equipment is continuously improved, and the physical modeling accuracy requirement on the flow field characteristics of the equipment is synchronously improved. Flow mechanism studies have also evolved from macroscopic phenomenon analysis to microscopic scale fine exploration. Computational Fluid Dynamics (CFD) technology breaks through to higher precision and more complex application scenarios, solves exponential scaling of mesh scale, and continuous iterative optimization of complex flow simulation models. The traditional serial computing scheme or the small-scale parallel computing scheme based on the CPU can not meet the requirement of high-precision CFD simulation on computing efficiency, and even if a CPU-accelerator heterogeneous computing platform is introduced, the prior art scheme still has difficulty in efficiently supporting stable operation of large-scale CFD parallel computing tasks. Therefore, the key technical bottlenecks of hardware adaptation, data interaction and parallel scheduling are broken through, the application value of CFD simulation is released through high-performance computing capacity, and the efficiency of engineering equipment research and development and performance optimization is improved. MultiPHydro is domestic CFD simulation software, and a cavitation flow physical field numerical simulation method is constructed by means of a finite difference method, and is mainly applied to the fine numerical simulation of complex multiphase multi-field flow. The method is characterized in that MultiPHydro source codes are written based on Fortran language, in the transplanting and application process of a current high-performance heterogeneous parallel computing platform, due to the fact that ROCm open computing platform and HIP programming interfaces which are relied on by a main stream accelerator of a DCU are insufficient in compatibility with the Fortran language, fortran codes cannot directly call DCU kernel functions to achieve acceleration, in the cross-language adaptation process, the intrinsic difference of Fortran and C languages in an array storage mode and function call convention is required to be solved, the migration process of software to the heterogeneous platform is directly restricted, hardware calculation force is difficult to fully release, in addition, computing logic of a limited difference method requires inter-process frequent interaction boundary grid data, and in a DCU deployment scene, the time cost of data round-trip copying and communication operation is extremely high, and the overall computing efficiency of the software is directly reduced. Disclosure of Invention The invention aims to provide a heterogeneous migration and parallel optimization method for multiphase flow numerical simulation software, which is used for realizing efficient and stable operation of complex multiphase and multi-field flow numerical simulation on a heterogeneous high-performance computing platform, and aims to solve the technical problems of difficult heterogeneous platform adaptation, low cross-language data interaction efficiency and high data transmission overhead between a host computer terminal (CPU) and a device terminal (DCU) caused by the Fortran language characteristic of the conventional MultiPHydro numerical calculation software. The invention aims at realizing the following technical scheme: the invention relates to a heterogeneous transplanting and parallel optimizing method of multiphase flow numerical simulation software, which comprises the following steps: step 1, cross-language adaptation and calculation function reconstruction; MultiPHydro numerical calculation software is written based on Fortran language, and DCU end heterogeneous calculation platform mainly relies on C/C++ interface, and cross-language migration method based on Fortran and C mixed programming solves compatibility problem, realizes efficient calling and reconstruction of core calculation module under heterogeneous environment, and specifically comprises the following sub-steps: step 1.1, establishing a cross-language calling interface; constructing an external C function interface at the Fortran end through interface statements, and adopting a unified lowercase naming mode to avoid link errors caused by automatic underline addition of a compiler; The consistency of calling convention between two languages is realized by adding an external 'C' modification at the C end, and