CN-122019147-A - FPGA-based dynamic hypergraph hardware accelerator and acceleration method
Abstract
The invention relates to a dynamic hypergraph hardware accelerator based on an FPGA and an acceleration method. The method comprises the steps of reading and analyzing an input pattern hypergraph by a central processing unit to obtain a matching plan and sending the matching plan to an FPGA processing unit, verifying the constraint related to a higher-order intersection and a unique mapping condition of a vertex, positioning and marking a candidate result which is invalid due to updating operation by the FPGA processing unit, expanding a neighbor domain in the data hypergraph by taking an updating edge as an anchor point to construct a local dependency sub-graph, carrying out front pruning and multi-constraint screening on the candidate result after pruning based on the matching plan in the range of the local dependency sub-graph, carrying out bitmap intersection operation to carry out constraint verification on the candidate result after pruning, discarding the candidate result which does not pass the constraint verification, generating a normalized key to carry out quick pre-check based on the candidate result which passes the constraint verification, thereby realizing duplicate removal pre-filtering, and sending the candidate result which passes the duplicate pre-filtering to the central processing unit.
Inventors
- ZHANG YU
- Guo Yuluo
- ZHAO JIN
- WU YONGDI
- WANG ZIXIAO
- JIN HAI
- LIAO XIAOFEI
Assignees
- 华中科技大学
Dates
- Publication Date
- 20260512
- Application Date
- 20260107
Claims (10)
- 1. A dynamic hypergraph hardware accelerator based on an FPGA, the hardware accelerator comprising: The central processing unit (101) reads and analyzes the input pattern hypergraph to obtain a matching plan and sends the matching plan to the FPGA processing unit (102); The FPGA processing unit (102) is used for positioning and marking candidate results which fail due to updating operation, expanding neighbor domains in a data hypergraph by taking an updating edge as an anchor point to construct a local dependency subgraph, performing front pruning based on the matching plan and performing multiple constraint screening to reduce redundant candidates in the range of the local dependency subgraph, performing bitmap intersection operation on the pruned candidate results to perform constraint verification, discarding candidate results which do not pass the constraint verification to achieve early truncation, generating a standardized key based on the candidate results which pass the constraint verification to perform quick pre-checking so as to achieve de-duplication pre-filtering, and sending the candidate results which pass the de-duplication pre-filtering to the central processing unit (101).
- 2. The dynamic hypergraph hardware accelerator of claim 1, wherein the central processing unit (101) comprises: The pattern analysis module (1011) is used for analyzing the pattern hypergraph to generate a matching sequence, a degree constraint, an intersection constraint and a symmetry breaking constraint, so as to form a matching plan and sending the matching plan to the FPGA processing unit (102), wherein the matching plan is used as a guiding basis for candidate result generation and screening; the incremental log management module (1012) receives the dynamic update request, maintains the hypergraph storage structure based on the strategy of deletion-before-addition, generates an incremental log entry and sends a read-only update view to the FPGA processing unit (102); The complex consistency checking module (1013) verifies the constraint related to the higher-order intersection and the unique mapping condition of the vertex, and undertakes candidate expansion and rollback processing of the super-limit with the super degree to ensure the correctness; the global result index maintenance module (1014) stores candidate results to the global index table to ensure consistency of the result set in the dynamic scenario.
- 3. The dynamic hypergraph hardware accelerator of claim 1 or 2, wherein the FPGA processing unit (102) comprises: the candidate invalidation processing module (1021) locates and marks candidate results with dynamic changes according to the increment log, and adds invalidation marks in the global index table; A dependent sub-graph generation module (1022) which uses the updated edge as an anchor point, expands the adjacent structure of the updated edge in a local range, constructs a local dependent sub-graph, and reduces the candidate generation range; A candidate generation and pruning module (1023) that performs candidate expansion based on the matching plan and multiple constraint screening within a locally dependent subgraph range to reduce redundancy candidates at the source; an intersection constraint calculation module (1024) for calculating pairwise intersections and multipath intersections of the pruned candidate results to perform constraint verification, and discarding candidate results which do not pass the constraint verification so as to realize early truncation; the de-duplication pre-filtering module (1025) generates a normalized key of the candidate result, performs quick pre-checking on the normalized key, and sends a new candidate result obtained after de-duplication pre-filtering to the central processing unit (101) for accurate query; The result write module (1026) writes candidate results that pass the de-prefilter into the global result index table.
- 4. A dynamic hypergraph hardware accelerator according to any one of claims 1-3, wherein the candidate failure processing module (1021) in the FPGA processing unit (102) comprises: Inputting FIFO and operation splitter, receiving increment log and distinguishing operation type; the index finder locates candidate results containing updated edges and stores a mapping relationship between the superedge numbers and the candidate result ID list; The comparator array and the failure marker are used for parallelly comparing the superside number in the increment log with the candidate index buffer area and marking the hit candidate result with the failure marker; The first writer updates the failure candidate state to the global result index table; The FSM controller coordinates the operation sequence of the subunits and returns an ACK signal to the central processing unit (101).
- 5. The dynamic hypergraph hardware accelerator of any one of claims 1-4, wherein the dependency sub-graph generation module (1022) comprises: inputting FIFO and task analyzer, analyzing newly added or modified update side information; the neighborhood retriever is used for retrieving all adjacent supersides sharing the vertexes with the updated sides, and rapidly inquiring all the superside sets containing the same vertexes based on the inverted index structure according to the vertex set of the updated sides; The K-order neighborhood expander expands the first-order or multi-order neighborhood of the updated edge according to the mode maximum dependent order K; The dependency sub-graph assembler combines the updated edges and the adjacent edge sets obtained by expansion of the updated edges into a local dependency sub-graph; And the second writer stores the local dependency subgraph into a storage module (200) for subsequent flow water treatment.
- 6. The dynamic hypergraph hardware accelerator of any one of claims 1-5, wherein the candidate generation and pruning module (1023) comprises: Inputting a parser to parse the vertex and adjacency relationship of the local dependency subgraph; The degree/label checker performs preliminary screening on candidate results and eliminates candidates which do not meet constraint; The dissimilarity checker checks whether the vertexes in the mapping table are duplicated or not, and if the same vertexes are found to be distributed to the two mode vertexes, the candidate result is immediately discarded; a symmetry breaker that eliminates equivalent candidate arrangements due to pattern hypergraph symmetry; And a candidate expander for generating legal candidate results and outputting the legal candidate results to an intersection constraint computing module (1024).
- 7. The dynamic hypergraph hardware accelerator of any one of claims 1-6, wherein an intersection constraint computation module (1024) in the FPGA processing unit (102) comprises: the bitmap register encodes the vertex set of the candidate superside into a bitmap form; The logic AND arithmetic unit performs bitwise logic AND operation on bitmaps of two or more candidate supersides to obtain intersection results of the bitmaps; popcount a counter for counting the number of vertexes in the intersection result and comparing the number of vertexes with the intersection base number in the mode constraint; And an early truncation logic for immediately terminating the calculation and discarding the candidate result when it is determined that the current candidate is unlikely to satisfy the constraint condition.
- 8. The dynamic hypergraph hardware accelerator of any one of claims 1-7, wherein the deduplication pre-filter module (1025) in the FPGA processing unit (102) comprises: A normalized key generator for mapping the candidate result to a unique key value; a bloom filter for quickly determining whether a candidate already exists; And a pre-filtering output for sending the candidate results, which may be new, to the central processing unit (101) for accurate querying.
- 9. The FPGA-based dynamic hypergraph hardware acceleration method is characterized by comprising the following steps of: The central processing unit (101) reads and analyzes the input pattern hypergraph to obtain a matching plan and sends the matching plan to the FPGA processing unit (102); The FPGA processing unit (102) positions and marks candidate results which fail due to updating operation, extends neighbor domains in a data hypergraph by taking updating edges as anchor points to construct local dependency subgraphs, performs front pruning based on the matching plan and performs multiple constraint screening to reduce redundant candidates in the range of the local dependency subgraphs, performs bitmap intersection operation on the pruned candidate results to perform constraint verification, discards candidate results which fail the constraint verification to achieve early truncation, generates a normalization key based on the candidate results which pass the constraint verification to perform quick pre-check so as to achieve de-duplication pre-filtering, and sends the candidate results which pass the de-duplication pre-filtering to the central processing unit (101).
- 10. The utility model provides an increment pattern matching hardware accelerator towards dynamic hypergraph, characterized by, including central processing unit (101) and with central processing unit (101) communication connection's FPGA processing unit (102), wherein, FPGA processing unit (102) includes: The dependent sub-graph generation module (1022) is used for constructing a local dependent sub-graph by using the updated superedge as an anchor point and through first-order/multi-order neighborhood expansion in a limited neighborhood radius when the dynamic supergraph is newly added or modified and updated so as to limit the increment calculation to the updating influence range and avoid recalculation of the full graph; A candidate generation and pruning module (1023) for generating a candidate mapping set and performing pre-pruning according to a pattern matching plan within the local dependency subgraph; An intersection constraint calculation module (1024) for performing hypergraph adjacency AND intersection constraint verification on the candidate mapping set, wherein the intersection constraint calculation module (1024) comprises a bitmap register, a logic AND operator, a popcount counter AND an early truncation logic, AND performs intersection calculation AND early stopping in a hardware pipeline manner to reduce the calculation cost of the failure candidate result; A deduplication pre-filtering module (1025) comprising a normalization key generator for converting the candidate mapping relationship into a normalization key independent of the order, and a bloom filter for probabilistic pre-screening the normalization key such that only possible new candidate results are output to the central processing unit (101) for accurate deduplication, thereby reducing the global index access pressure, and The central processing unit (101) is configured to perform accurate query/write on the possibly new candidate results output via the deduplication pre-filtering module (1025) and to perform a rollback process when the computational complexity of candidate expansion or intersection constraint verification exceeds a preset threshold.
Description
FPGA-based dynamic hypergraph hardware accelerator and acceleration method Technical Field The invention relates to the technical field of hardware accelerators, in particular to a dynamic hypergraph hardware accelerator based on an FPGA and an acceleration method. Background Along with the rapid development of big data and artificial intelligence technology, graph structure data becomes a core representation form for describing a complex relationship network, and plays a key role in the fields of social network analysis, recommendation systems, traffic prediction, bioinformatics, knowledge graph and the like. Traditional graph calculation mainly aims at a static common graph (composed of vertexes and edges), but the relationship among the vertexes in a real scene often presents high-order multi-element characteristics, and modeling is needed through a hypergraph (HYPERGRAPH). The hypergraph takes the hyperedge as a basic unit, allows a single hyperedge to be connected with more than two vertexes, can more accurately express the association of group interaction and high-order semantics, and is obviously superior to the traditional graph model in the tasks of community discovery, hyperedge prediction, hypergraph neural network (HYPERGRAPH NEURAL NETWORK, HGNN) and the like. However, the hypergraph in the practical application often dynamically evolves along with time, namely, the creation of groups in a social platform or the elimination of the addition and deletion of corresponding hyperedges are performed, and the real-time update of the association relationship is caused by the behavior change of users in an e-commerce recommendation system. This high frequency dynamics poses a serious challenge to the existing computing framework. The current mainstream hypergraph mode mining and matching method is based on CPU/GPU realization, and has four main bottlenecks: 1. the overall recalculation cost is high, namely, the structural update needs to totally recalculate the candidate set and the matching relation, so that redundancy calculation and delay are increased; 2. The storage access efficiency is low, the adjacency relation information is stored in a complex way by high-order connection, the cache failure is caused by irregular memory access, and the bandwidth performance bottleneck is caused; 3. The candidate redundancy is serious, namely a great number of failure candidate results are generated in the mining process, so that the intersection calculation and consistency verification burden is obviously increased; 4. Hardware support is lacking, an existing general parallel computing framework (such as a GPU) is mainly used for optimizing regular dense tensor computation, and although partial improvement schemes try to optimize through sparse matrix representation (such as an incidence matrix), the special high-order relevance and irregular pattern dependence of hypergraph data are difficult to deal with, so that serious access conflict and computation redundancy exist when hypergraph intersection operation is processed, and the performance is limited. The existing improvement schemes (such as incremental computation, pruning strategy and index acceleration) are limited to a software level, and hardware parallelism and pipelining advantages are difficult to release. Especially in the dynamic hypergraph scene, the methods cannot meet the dual requirements of low-delay updating and high-throughput mining. Therefore, a hardware and software collaborative acceleration scheme for dynamic hypergraph is needed to be designed, namely core processes such as candidate generation, intersection constraint verification and result maintenance are optimized in a hardware level through the customizable and parallel architecture of the FPGA, incremental recalculation of local dependent subgraphs is supported, global traversal redundancy is avoided, and therefore the real-time efficiency and throughput capacity of dynamic hypergraph mode mining are remarkably improved while the calculation accuracy is guaranteed. For example, CN115328850a discloses a hardware accelerator for hypergraph processing and a method of operating the same, the hardware accelerator comprising a data loader, an address translator, a task trigger, a processor and a combiner. The data loading device is used for reading hypergraph partitioning data from an off-chip memory according to the sequence of hypergraph data organization and hypergraph partitioning under the condition that a loading trigger merging execution model taking data as a center is preset, the address translating device is used for deploying the hypergraph data into a private register and/or a buffer memory of the processor according to the priority sequence of the loaded data and recording offset information, the task triggering device is used for generating a calculation task according to the loaded data and dispatching the calculation task to the processor, the processor is used for re