CN-121980060-A - Point-to-point query FPGA acceleration system and method for flow chart data

CN121980060ACN 121980060 ACN121980060 ACN 121980060ACN-121980060-A

Abstract

A point-to-point query FPGA acceleration system and method for stream graph data is disclosed, wherein an input interface unit receives a query request and update data and distributes the query request and update data, a CAG maintenance and storage unit screens high-frequency edge to construct an on-chip contribution perception graph by utilizing asymmetric characteristics between source and destination vertexes, an out-of-order update scheduler determines delay processing or immediate triggering calculation by judging whether a deleted edge hits a current key path, a high-speed boundary extraction unit searches the latest path length from the CAG as a global pruning boundary and broadcasts the global pruning boundary when triggering, a parallel pruning and rollback processing array judges whether vertexes fall into an old interval based on the global pruning boundary in parallel, if so, the state of the vertexes is synchronously reset, incremental iterative calculation is executed until a result converges and is output. The invention uses the hardware level index and rollback mechanism to obviously improve the calculation accuracy and microsecond level response efficiency under the update of the dynamic graph.

Inventors

ZHANG YU
HE ZIRUI
FU YUTAO
ZHAO JIN
LIAO XIAOFEI
JIN HAI

Assignees

华中科技大学

Dates

Publication Date: 20260505
Application Date: 20260205

Claims (10)

1. The point-to-point query FPGA acceleration system for the flow chart data is characterized by comprising: An input interface unit (310) for parsing the point-to-point query request and the stream map update data, and performing task distribution according to the data type; The CAG maintenance and storage unit (220) is used for screening high-frequency access edges with path contribution from the original full-quantity graph based on the asymmetry characteristics between the source vertex and the destination vertex, and constructing a contribution perception graph of the sub-graph structure so as to continuously maintain a topological structure related to the current query request in the dynamic updating process of graph data; determining to put the update operation into a buffer queue for deferred processing or immediately generating a trigger signal to activate a recalculation flow by determining whether the delete edge hits the current critical path; A high-speed boundary extraction unit (230) which extracts the current critical path length from the source vertex to the destination vertex as a global pruning boundary in the contribution aware graph through a hardware depth-first search pipeline and broadcasts the current critical path length to a parallel pruning and rollback processing array (110) when a recalculation is triggered; And the parallel pruning and rollback processing array (110) receives the global pruning boundary, parallelly judges whether the vertex states influenced by the updating of the graph fall into a stale interval formed by the compact lower boundary of the previous snapshot and the current global pruning boundary, and synchronously resets the state values of related vertices and executes incremental iterative computation if the vertex states fall into the stale interval until all vertex states reach convergence and outputs a query result.
2. The system of claim 1, wherein the CAG maintenance and storage unit (220) comprises: The parallelism perception tracker (221) selects one side with fewer nodes as a tracking direction by comparing the active node numbers of the source vertex direction and the destination vertex direction, and screens out a relevant path and a high-frequency access edge for connecting the source vertex and the destination vertex on the original full-scale graph; And the CAG storage area (222) is used for storing the screened high-frequency access edges so as to construct and dynamically update the topological structure of the contribution awareness graph.
3. The system according to claim 1 or 2, wherein the parallelism-aware tracker (221) screens high frequency access edges by: Calculating an accumulated path weight passing through the current edge by using a hardware adder, and comparing the accumulated path weight with a preset threshold by using a hardware comparator; and generating a write enable signal to write corresponding edge data into the CAG storage area (222) in response to a determination that the accumulated path weight is less than a preset threshold, wherein the preset threshold is set based on a scaling factor of a current global pruning boundary.
4. A system according to any one of claims 1-3, wherein the out-of-order update scheduler (210) comprises: The key path matching logic circuit (211) analyzes the deleting instruction, extracts the edge identifier and compares the edge identifier with the edge set which is stored in the register and forms the current key path as an index key; and the write buffer queue (212) temporarily stores corresponding non-critical path updating operation when the critical path matching logic circuit (211) outputs a miss signal so as to solve read-write conflict and improve system throughput.
5. The system of any of claims 1-4, wherein the critical path matching logic (211) in the out-of-order update scheduler (210) is configured to: when determining that the deleting edge hits the critical path, immediately generating a trigger signal with high priority to activate the high-speed boundary extraction unit (230), and triggering the write buffer queue (212) to package and send the buffered data; When it is determined that the delete edge misses the critical path, a non-critical match signal is generated, marking the delete operation as a secure update and directing the data path to the write buffer queue (212).
6. The system according to any one of claims 1-5, wherein the parallel pruning and rollback processing array (110) comprises a number of isomorphic processing units (110 a), the processing units (110 a) comprising: the rollback judging and comparing circuit (113) comprises two parallel hardware value comparators and a logic AND gate, and is used for respectively judging the interval between the distance state value of the vertex and the compact lower boundary and the global pruning boundary of the previous snapshot and outputting a rollback trigger signal; a state reset circuit (111) comprising a one-out-of-two multiplexer controlled by the rollback trigger signal, one input connected to a constant infinity generator, the other input connected to a state update and propagation logic circuit (112); And a state update and propagation logic circuit (112) composed of a hardware adder and a comparator for calculating the sum of the distance state value and the edge weight of the neighboring vertex and outputting the calculation result to a state reset circuit (111) when the update condition is satisfied, the state reset circuit (111) updating the value of the state register (400) in the processing unit (110 a) according to the determination result.
7. The system according to any one of claims 1-6, wherein the processing unit (110 a) in the parallel pruning and rollback processing array (110) is configured to: When the rollback judging and comparing circuit (113) judges that the vertex satisfies LB is less than or equal to Sp (v) < UB, the logic AND gate outputs an effective level enabling state resetting circuit (111) to forcedly reset the vertex state in the state register (400) to infinity so as to eliminate the calculation redundancy caused by old pruning, wherein LB is the compact lower limit of the last snapshot, sp (v) is the distance state value of the vertex, and UB is the current global pruning boundary; when the rollback trigger signal is inactive, the state reset circuit (111) turns on a default path, allowing the state update and propagation logic circuit (112) to perform normal relaxation operations until the path costs of all vertices in the array reach a converged state.
8. The point-to-point query FPGA acceleration method for the flow chart data is characterized by comprising the following steps of: analyzing the point-to-point query request and the flow chart update data, and distributing tasks according to the data types; based on the asymmetry characteristic between the source vertex and the destination vertex, high-frequency access edges with path contribution are screened out from the original full-quantity graph, and a contribution perception graph of a sub-graph structure is constructed so as to continuously maintain a topological structure related to the current query request in the dynamic updating process of graph data; Determining to put the updating operation into a buffer queue for delay processing or immediately generating a trigger signal to activate a recalculation flow by judging whether the deleting edge hits the current critical path; when the recalculation is triggered, extracting the current key path length from the source vertex to the destination vertex in the contribution perception graph through a hardware depth-first search pipeline as a global pruning boundary, and broadcasting the current key path length to a parallel pruning and rollback processing array (110); And if the state value of the related vertex is judged to fall into the old interval, synchronously resetting the state value of the related vertex and executing incremental iterative computation until all vertex states reach convergence and outputting a query result.
9. The method of claim 8, wherein the method for screening high-frequency access edges with path contributions from the original full-scale graph based on the asymmetry characteristics between the source vertex and the destination vertex, and constructing the contribution perception graph of the sub-graph structure comprises: Selecting one side with a smaller number of nodes as a tracking direction by comparing the number of active nodes in the source vertex direction and the destination vertex direction, and screening out a relevant path and a high-frequency access edge for connecting the source vertex and the destination vertex on an original full-scale graph; And storing the screened high-frequency access edges to construct and dynamically update the topological structure of the contribution awareness graph.
10. A method according to claim 8 or 9, wherein the method by determining whether the delete edge hits the current critical path comprises: Analyzing the deleting instruction, extracting an edge identifier, and comparing the edge identifier serving as an index key with an edge set which is stored in a register and forms a current critical path; when the deleting edge is judged to hit the critical path, a trigger signal with high priority is immediately generated to activate the high-speed boundary extraction unit (230), and the write buffer queue (212) is triggered to package and send the buffer data; when it is determined that the delete edge misses the critical path, a non-critical match signal is generated, marking the delete operation as a secure update and directing the data path to the write buffer queue (212).

Description

Point-to-point query FPGA acceleration system and method for flow chart data Technical Field The invention relates to the technical field of big data processing and high-performance computing, in particular to a point-to-point query FPGA acceleration system and method for flow chart data. Background The graph calculation plays a key role in modern data analysis and is widely applied to the fields of social network analysis, financial wind control, path planning, recommendation systems and the like. Point-to-Point (P2P) queries, such as shortest path, widest path, and connectivity analysis, are the most basic and high frequency operations in graph analysis. With the increasing speed of real-time data generation, graph data is typically updated dynamically (including the addition and deletion of edges) in a streaming fashion, which requires that the computing system be able to maintain and update analysis results on the dynamic graph in real-time. The existing flow graph computing system generally adopts an incremental processing model to avoid full-quantity recalculation, but a software scheme based on a CPU has significant limitation in processing point-to-point queries, namely, in the initial stage of incremental computation, the system always needs to perform a large number of invalid graph traversal to determine a new pruning threshold value to cause serious computation resource waste due to lack of effective boundary constraint, when the graph structure performs pruning operation, pruning conditions based on the previous snapshot can be invalid (namely, old pruning), if the previous computing state is directly multiplexed, an effective rollback mechanism is lacked, a result is inaccurate, and in addition, on a general CPU architecture, huge memory overhead and random access delay are caused by maintenance of an auxiliary index structure (such as a dependency tree), and microsecond real-time response requirements are difficult to meet. Therefore, a dedicated processing architecture is needed that can utilize the hardware acceleration feature, achieve fast boundary extraction in the context of flow map updates, accurately increment pruning, and ensure computation accuracy. CN118484818A discloses a privacy protection sub-graph matching method for a streaming graph, which comprises the steps that a trusted terminal encrypts query data to be processed and edge data in the streaming graph based on a copying secret sharing mechanism, secret shares of the edge data and secret shares of the query data to be processed are respectively sent to a first terminal, a second terminal and a third terminal, the trusted terminal respectively sends plaintext data of edge indexes in sub-queries to be matched and inquired to the first terminal, the second terminal and the third terminal, and the first terminal, the second terminal and the third terminal determine sub-graph matching results of the query to be matched in the streaming graph based on locally held data. The technical scheme encrypts inquiry data and edge data by a copy secret sharing mechanism and distributes the inquiry data and the edge data to a plurality of terminals to finish sub-graph matching in a cooperation way. Meanwhile, the secret sharing and multi-terminal cooperation mechanism of the technical scheme not only does not solve the problem of memory overhead, but also is contrary to microsecond real-time requirements due to additional communication delay and calculation overhead, so that the technical scheme cannot effectively solve the key defects of slow boundary extraction, pruning failure, response delay and the like in the prior art. Furthermore, since the applicant has studied numerous documents and patents on the one hand, and since the applicant has made the present invention, the text is not to be limited to all details and matters of detail, but this is by no means the present invention does not feature these prior art features, but rather the present invention has features of all prior art, and the applicant has remained in the background art to which this invention pertains. Disclosure of Invention The invention aims to solve the technical problems that when the prior flow chart computing system performs point-to-point (P2P) query analysis, the existing flow chart computing system lacks a hardware level index structure and a pruning mechanism, and in the process of dynamic updating of the chart structure (such as adding or deleting edges), the problems of low boundary extraction speed, massive computation redundancy caused by invalid traversal and inaccurate computation results caused by invalid pruning conditions exist. In order to solve the technical problems, the invention provides a point-to-point query FPGA acceleration system for stream graph data, which comprises an input interface unit, a CAG maintenance and storage unit, a high-speed boundary extraction unit, an out-of-order update scheduler and a parallel pruning and rollback proc