Search

CN-121981034-A - Automatic verification method and system for FPGA network card

CN121981034ACN 121981034 ACN121981034 ACN 121981034ACN-121981034-A

Abstract

The invention relates to the technical field of integrated circuit verification and discloses an automatic verification method and system for an FPGA network card, wherein the automatic verification method comprises the steps of recording a causal fingerprint sequence for representing an event logic relationship in real time at the FPGA network card end, and calculating a state entropy for representing the dynamic change degree of a module; when a functional error occurs, the upper computer reconstructs a global causal graph based on the causal fingerprint sequence, positions root cause candidates in the graph by combining with abnormal time points indicated by the state entropy, then performs closed-loop verification on the root cause candidates by executing hypothetical error injection in a simulation environment, and finally generates a monitoring configuration file containing attention points or shielding points based on a verification conclusion, and transmits the monitoring configuration file to the FPGA network card for dynamically adjusting the recording behavior of subsequent causal events. The invention can realize the rapid and accurate positioning of the fault root cause, the high-credibility verification of the conclusion and the self-adaptive optimization of the monitoring strategy.

Inventors

  • ZHANG KUN

Assignees

  • 芯超越信息技术(杭州)有限公司

Dates

Publication Date
20260505
Application Date
20260123

Claims (10)

  1. 1. An automatic verification method for an FPGA network card is characterized by comprising the following steps: S1, recording causal events generated by a plurality of functional modules in the FPGA network card when the FPGA network card operates, and forming a causal fingerprint sequence; S2, monitoring a state vector of at least one of the plurality of functional modules, and calculating a state entropy representing the dynamic change degree of the state vector in real time; s3, uploading the causal fingerprint sequence to an upper computer when the FPGA network card has a functional error; S4, reconstructing a global causal graph representing the causal relation of the event before the occurrence of the functional error by the upper computer based on the causal fingerprint sequence, and positioning at least one root cause candidate according to the state entropy and the global causal graph; S5, the upper computer controls a simulation environment, and executes hypothetical error injection aiming at the root cause candidate, so as to verify whether the root cause candidate is a root cause causing the functional error; And S6, the upper computer generates a monitoring configuration file based on the verification conclusion of the hypothetical error injection, and transmits the monitoring configuration file to the FPGA network card for dynamically adjusting the recording behavior of the subsequent causal event.
  2. 2. The method for automatic verification of FPGA network card according to claim 1, wherein the step S1 includes: Generating a corresponding causal fingerprint for each causal event generated by the plurality of function modules; Wherein the generation of each causal fingerprint comprises causal fingerprints of source events triggering the causal event, thereby linking a plurality of causal fingerprints to form the causal fingerprint sequence.
  3. 3. The method according to claim 1, wherein in the step S4, the step of locating at least one root cause candidate according to the state entropy and the global causal graph comprises: carrying out time domain association on the abnormal time points indicated in the state entropy and the causal events in the global causal graph, and marking out entropy abnormal events; Proceeding from the event node representing the functional error in the global causal graph, performing reverse traversal, and positioning to the root candidate accompanied by the entropy abnormal event.
  4. 4. The automatic verification method for FPGA network card according to claim 1, wherein the step S5 includes: The upper computer extracts a minimized trigger path which causes the root cause candidate to occur according to the global causal graph; The upper computer generates a test case based on the minimized trigger path and controls the simulation environment to run the test case; when the simulation environment runs to the point that the root cause candidate is about to happen, the upper computer injects preset disturbance into a functional module where the root cause candidate is located through a back door access interface; the upper computer records the simulation causal path generated in the simulation environment after the disturbance injection, and compares the simulation causal path with the original causal path extracted from the global causal graph from the root cause candidate to the functional error, so as to complete the verification.
  5. 5. The method for automatically verifying an FPGA network card according to claim 1, wherein in the step S6, the step of generating the monitoring configuration file by the host computer includes: If the root cause candidate is verified as the root cause, defining the operation codes of all events on the original causal path which cause the root cause as a focus set; If the root cause candidate is verified as a non-root cause, defining the operation codes of all the events on the verified simulation causal path as a shielding point set; The monitoring profile includes the set of points of interest and/or the set of mask points.
  6. 6. The method according to claim 5, wherein in the step S6, the dynamically adjusting the recording behavior of the subsequent causal event means: After the FPGA network card receives the monitoring configuration file, the recording behavior of any subsequently generated causal event is adjusted to be: If the operation code of the causal event belongs to the attention point set, recording the causal event by adopting a high fidelity mode; and if the operation code of the causal event belongs to the shielding point set, recording or not recording the causal event by adopting a low fidelity mode.
  7. 7. The method of claim 6, wherein the employing a high fidelity mode record includes including context data associated with the causal event when generating a causal fingerprint of the causal event.
  8. 8. The automatic verification method for FPGA network cards according to claim 1, wherein in the step S2, the step of calculating the state entropy characterizing the degree of dynamic change of the state vector in real time includes: calculating the accumulated Hamming distance variation of the state vector in a preset time window to obtain local activity; and carrying out hash operation on the local activity degree to obtain the state entropy.
  9. 9. The method of claim 4, wherein the predetermined perturbation includes forcing a flip state bit or delaying a handshake signal.
  10. 10. An automated verification system for an FPGA network card, which is characterized by being used for implementing an automated verification method for an FPGA network card according to any one of claims 1-9, and comprising an internal hardware module of the FPGA network card and an upper computer; The FPGA network card internal hardware module is used for executing online monitoring to generate a causal fingerprint sequence and a state entropy, and adjusting the online monitoring behavior according to a monitoring configuration file issued by the upper computer; And the upper computer is used for analyzing the monitoring data received from the hardware module, positioning and verifying the root cause and generating the monitoring configuration file based on the verification conclusion.

Description

Automatic verification method and system for FPGA network card Technical Field The invention relates to the technical field of integrated circuit verification, in particular to an automatic verification method and system for an FPGA network card. Background Because of its high flexibility and parallel processing capability, field Programmable Gate Arrays (FPGAs) are widely used in the design and implementation of complex systems-on-a-chip (socs) such as high performance network interface cards (network cards). With the continuous increase of the logic scale and the functional complexity in the FPGA network card, functional errors which are difficult to reproduce may occur when the FPGA network card is operated for a long time or when extreme network loads are processed. The accurate and efficient root cause positioning of the errors is a key link for ensuring the reliability of products, and is also a technical challenge facing the current FPGA verification field. Existing FPGA fault diagnosis techniques typically rely on an on-line logic analyzer or simulator to capture a large number of signal waveforms or event traces. However, there are inherent limitations to this approach. When a functional error is finally detected, the initial root cause may already occur, and the analyst needs to trace back in massive recorded data lacking effective indexes, which, like a maritime needle, not only consumes a great deal of engineering time, but also the accuracy of positioning is severely dependent on the personal experience of engineers. The existing data recording mode usually only focuses on the logic sequence of events, but lacks direct measurement of abnormal physical states of a hardware module, so that an analysis focus cannot be quickly converged to an initial time window in which an unstable state occurs in the system, and the positioning efficiency is reduced. In addition, root cause conclusions drawn through data analysis often stay at the speculative level. The prior art solutions lack an automated, closed loop verification mechanism to actively verify whether a suspected root cause candidate actually leads to the ultimate observed functional error. This uncertainty in conclusions makes the repair decisions lacking adequate data support, increasing the risk of design iterations. Meanwhile, existing FPGA on-line monitoring strategies are typically static. I.e. the level of detail of the monitored signal points and data records is fixed before a verification task is started. The static mechanism cannot adaptively adjust subsequent monitoring behaviors according to the conclusion of error analysis. This results in limited on-chip storage resources that may be continuously used to record events that have been proven to be irrelevant to errors, but the monitoring granularity of newly discovered critical event paths cannot be automatically increased, resulting in waste of verification resources, and affecting the overall efficiency of long-term verification work. Disclosure of Invention The technical problem to be solved by the invention is that when the traditional FPGA network card has functional errors in the running process, an automatic verification mechanism which can accurately and efficiently locate the root cause of the errors and dynamically optimize the subsequent monitoring behaviors according to the locating conclusion is lacking. In order to solve the technical problems, the invention provides the following technical scheme: The first aspect of the present invention provides an automatic verification method for an FPGA network card, the method comprising the steps of: S1, recording causal events generated by a plurality of functional modules in the FPGA network card when the FPGA network card operates, and forming a causal fingerprint sequence; S2, monitoring a state vector of at least one of the plurality of functional modules, and calculating a state entropy representing the dynamic change degree of the state vector in real time; s3, uploading the causal fingerprint sequence and the state entropy to an upper computer when the FPGA network card has a functional error; S4, reconstructing a global causal graph representing the causal relation of the event before the occurrence of the functional error by the upper computer based on the causal fingerprint sequence, and positioning at least one root cause candidate according to the state entropy and the global causal graph; S5, the upper computer controls a simulation environment, and executes hypothetical error injection aiming at the root cause candidate, so as to verify whether the root cause candidate is the root cause causing the functional error; and S6, the upper computer generates a monitoring configuration file based on the verification conclusion of the hypothetical error injection, and transmits the monitoring configuration file to the FPGA network card for dynamically adjusting the recording behavior of the subsequent causal event. In a s