CN-121980061-A - GPU-based large grid partitioning data structure and subdivision acceleration method

CN121980061ACN 121980061 ACN121980061 ACN 121980061ACN-121980061-A

Abstract

The invention is suitable for the field of bulk grids, provides a large-scale bulk grid block data structure based on GPU and a subdivision acceleration method, which are mainly used for parallel subdivision and reconstruction of a large-scale bulk grid (volume Mesh), the method can fully utilize the shared memory under the GPU architecture, improves the local computing efficiency, avoids the global video memory overflow problem, and is suitable for engineering simulation, 3D modeling, physical solution, scientific computing and other scenes.

Inventors

HUANG ZHANGJIN
DING XIAOYING
Yuan Zhumin

Assignees

中国科学技术大学

Dates

Publication Date: 20260505
Application Date: 20260407

Claims (9)

1. The large-scale grid block data structure and the subdivision acceleration method based on the GPU are characterized by comprising the following steps: s1, dividing an input body grid into a plurality of sub-blocks, establishing a global-local bidirectional index table of topological elements for each sub-block, identifying sub-block boundaries, performing neighborhood ring expansion to establish k-ring neighborhood cache, adopting an improved sparse matrix form to bidirectionally store the adjacent relationship of data structures with adjacent dimensions in a single sub-block, and loading the sub-block and corresponding k-ring neighborhood data into a shared memory; s2, pre-calculating center point coordinates of the edge, the surface and the body in the shared memory according to subdivision rules, and executing edge point calculation, surface point calculation, body point calculation and old point update in parallel; S3, in the shared memory, topology assembly is carried out after subdivision calculation is completed, and each calculation stage is registered as a CUDA Graph node for batch scheduling; s4, writing back the calculation result in the sub-blocks to the global array according to the local-global index table, carrying out hash deduplication according to the global ID, and integrating the results of all the sub-blocks into the global grid structure to form a new subdivision grid; And S5, carrying out vertex combination on repeated points of the inter-block boundaries, and reconstructing minimum necessary adjacency at the boundaries to restore consistency.
2. The GPU-based large grid partitioning data structure and subdivision acceleration method according to claim 1 are characterized in that in S1, an input grid is divided into a plurality of sub-blocks, specifically, the input grid is divided into a plurality of sub-blocks in a spatially uniform and self-adaptive mode, and clustering calculation is carried out by using the gravity centers of all units of the grid in a K-Means clustering mode.
3. The GPU-based large mesh grid partitioning data structure and subdivision acceleration method of claim 1, wherein the improved sparse matrix employs a prefix and (prefix sum) array and an index value (value) array together to implement sparse matrix expression, the prefix and the index representing the neighboring edges of the vertex index the position of the segments in the index value array, and the indices of all neighboring edges of the vertex i are recorded in the values of the pre [ i ] th through pre [ i+1] th in the index value array.
4. A GPU-based large grid block data structure and subdivision acceleration method as claimed in claim 3, wherein said modified sparse matrix compresses direction information into an array of index values, with negative numbers representing opposite directions, and when the direction of the queried element is opposite to the current element, the corresponding index value is stored as the original index value minus one.
5. The GPU-based large mesh block data structure and subdivision acceleration method of claim 1, wherein the k-ring neighborhood cache stores a copy of read-only data necessary for computation, for boundary validity in computation within a sub-block, and the local index of the k-ring neighborhood element is greater than the local index of the element to which the sub-block actually belongs.
6. The GPU-based large mesh block data structure and subdivision acceleration method of claim 1, wherein the subdivision rule in S2 is a Catmull-Clark subdivision rule, and the edge point calculation, the face point calculation, and the old point update are performed in parallel by using a special calculation rule for the boundary element according to boolean mask information of whether the topology element is located at the global boundary of the mesh.
7. The GPU-based large grid partitioning data structure and subdivision acceleration method as claimed in claim 1, wherein the topology assembly in S3 is specifically: And constructing corresponding point, side and surface sets for each hexahedral unit, numbering the surfaces on the basis, further establishing the numbers of each side and each point according to the adjacent relation, assembling the corresponding body points, the surface points, the side points and the updated vertexes according to the numbers, and constructing one hexahedral unit into eight new hexahedral units.
8. The GPU-based large grid partitioning data structure and subdivision acceleration method of claim 1, wherein when S1 loads data into the shared memory, a segmentation and alignment policy is adopted, each thread bundle is responsible for aligning a segment of data, and synchronization is performed through __ syncthreads () function after loading is completed.
9. The GPU-based large grid partitioning data structure and subdivision acceleration method as claimed in claim 1, wherein the merging of the repetition points of the inter-block boundaries in S5 is performed, in particular by performing a hash and tolerance based vertex merging.

Description

GPU-based large grid partitioning data structure and subdivision acceleration method Technical Field The invention belongs to the field of body grids, and particularly relates to a large body grid block data structure based on a Graphic Processing Unit (GPU) and a subdivision acceleration method. Background The volume grid is a basic data representation form for accurately describing three-dimensional space discrete bodies in the scientific calculation fields of computational fluid mechanics, structural analysis, physical simulation and the like. The method is characterized by the adjacency relationship among vertexes, edges, faces and body units (such as tetrahedrons or hexahedrons), geometric and topological structures are described, and core calculation such as finite element solving, field value interpolation, physical constraint propagation and the like is supported. As model complexity increases, the volumetric mesh typically contains millions or billions of cells (e.g., tetrahedrons or hexahedrons), which become very large in data size. The traditional global subdivision method loads the whole grid to the GPU at one time, and global parallel computing can obtain high parallelism theoretically, but the following problems usually occur in practice: 1. And when the grid scale is overlarge, the global calculation needs to distribute the data structures of all nodes, edges, faces and body units at one time, and the upper limit of the video memory is easily exceeded. 2. The shared memory utilization rate is low, namely the shared memory of the GPU can only be used in blocks, and the global algorithm cannot fully utilize the high-speed characteristic of the shared memory, so that the memory access efficiency is low. 3. The data locality is poor, global computation involves a large number of cross-block data accesses, the access and storage modes are dispersed, the cache hit rate is low, and the throughput performance of the GPU is seriously restricted. 4. And the synchronization and scheduling overhead is high, and in the global calculation model, the dependency and synchronization among threads frequently occur, so that the GPU execution efficiency is reduced. Conventional improvements to these problems include splitting the grid into multiple sub-models for batch upload (but still recording adjacencies, accessing data by global thought), using multiple GPUs or distributed computing, or pre-computing topology at the CPU side. These methods either require complex cross-device communications or are still computationally efficient. GPU still faces the bottleneck of poor locality, large memory occupation and low expandability in a large-scale grid subdivision scene. Disclosure of Invention The invention aims to provide a large grid partitioning data structure based on a GPU and a subdivision acceleration method, and aims to solve the problems in the background technology. The invention is realized in such a way that the large grid partitioning data structure and the subdivision acceleration method based on the GPU comprise the following steps: And (3) determining the granularity of the block and dividing the space, dividing the input body grid into a plurality of sub-blocks according to a space uniform self-adaptive mode, and carrying out clustering calculation by using the gravity centers of all units of the grid by adopting a K-Means clustering mode. The block granularity is determined jointly by the shared memory upper limit of the target GPU, the thread bundle size, and the collaborative loading policy. A global-local bi-directional index table of topology elements therein is established for each block. Performing block boundary identification and neighborhood ring expansion, performing pre-analysis on neighborhood requirements according to a required subdivision rule, scanning each block boundary, marking cross-block data related to subdivision, establishing a k-ring neighborhood cache for the cross-block data, and storing read-only data copies necessary for calculation. After the construction of the local indexes of all topological elements of a single block is completed, adopting an improved sparse matrix form to bidirectionally store the adjacent relations of the data structures adjacent to each other, wherein the improved sparse matrix realizes sparse matrix expression by adopting a prefix and double-array of side index values together, and represents the opposite direction by negative numbers. And (3) carrying out shared memory layout, carrying out a loading and synchronizing protocol in a block, after a thread block is started, copying the block and k-ring neighborhood data of the block from a global memory to the shared memory by cooperation of all threads, and carrying out GPU thread block (block) synchronization once through __ syncthreads () function after loading. The parallel subdivision point calculation and topology assembly steps are as follows: According to the selected subdivision scheme,