Search

CN-121997854-A - Chip logic synthesis method based on meta reinforcement learning and multi-mode circuit representation

CN121997854ACN 121997854 ACN121997854 ACN 121997854ACN-121997854-A

Abstract

The invention discloses a chip logic synthesis method based on meta reinforcement learning and multi-mode circuit representation, which comprises the steps of constructing a circuit basic data set, an operator sequence data set, a pre-training data set and a reinforcement learning-logic comprehensive interaction environment, adopting model-independent meta learning to construct a double-layer mechanism of outer circulation general parameter learning and inner circulation new circuit adaptation, training general parameters, constructing a collaborative multi-stage reinforcement learning framework and a multi-near-end strategy optimization agent, extracting structural function mode original features, optimization history mode original features and scalar feature mode original features of the circuit, generating a unified circuit feature representation vector through cross-mode fusion, and guiding the multi-near-end strategy optimization agent to execute full-flow optimization by combining a delay reward function. According to the invention, through multi-mode characteristic characterization and multi-stage collaborative optimization, cross-circuit efficient migration is realized, the global balance of circuit performance, power consumption and area is improved, and the problems of poor migration and optimized fracture in the traditional method are solved.

Inventors

  • XU QIMIN
  • LIU SHUKAI
  • WANG SIYU
  • CHEN CAILIAN

Assignees

  • 上海交通大学

Dates

Publication Date
20260508
Application Date
20260317

Claims (10)

  1. 1. The chip logic synthesis method based on meta reinforcement learning and multi-mode circuit representation is characterized by comprising the following steps of: s1, constructing a circuit basic data set, an operator sequence data set and a pre-training data set, developing a reinforcement learning-logic comprehensive interaction environment, and realizing the functions of state acquisition, action execution, rewarding calculation and constraint judgment; S2, based on the interactive environment of the step S1, adopting model-independent element learning to construct a double-layer mechanism of external circulation general parameter learning and internal circulation new circuit adaptation, and training to obtain general initialization parameters ; S3, loading the general initialization parameters obtained in the step S2 based on the interaction environment of the step S1 Constructing a collaborative multi-stage reinforcement learning framework, constructing a multi-near-end strategy optimization intelligent body framework containing logic optimization intelligent bodies, technology mapping intelligent bodies and mapped optimization intelligent bodies in the framework, defining a double-component action space, and enabling the logic optimization intelligent bodies, the technology mapping intelligent bodies and the mapped optimization intelligent bodies to share an environment state and an action history so as to realize multi-stage collaboration; s4, based on the circuit basic data set and the pre-training data set, extracting structural function mode original features, optimized history mode original features and scalar feature mode original features of the circuit, and generating unified circuit feature expression vectors through cross-mode fusion ; S5, representing the characteristic vector Inputting the collaborative multi-stage reinforcement learning framework, guiding the multi-near-end strategy optimization agent to execute logic optimization, technical mapping and post-mapping optimization by combining with a delay rewarding function, and outputting the exclusive optimization parameters of the circuit to be optimized And (5) completing circuit optimization.
  2. 2. The method for chip logic synthesis based on meta reinforcement learning and multi-modal circuit representation as claimed in claim 1, wherein the specific process of data set construction in step S1 is: Obtaining circuit RTL codes from EPFL and ISCAS85 reference libraries, compiling the RTL codes into an AIG netlist through an ABC tool, extracting the number of circuit nodes, the number of edges, the number of logic layers, the number of initial LUTs and initial delay, and dividing the circuit RTL codes into an arithmetic circuit, a control circuit and a storage circuit according to function types to construct a circuit basic data set; Simulating a logic comprehensive whole flow by using an ABC tool, collecting operator sequences comprising three stages of logic optimization, technical mapping and post-mapping optimization, marking the circuit type, the LUT number before and after optimization and delay corresponding to each sequence, and constructing an operator sequence data set; the pre-training dataset was constructed using ITC '99, IWLS'05, open-Core dataset.
  3. 3. The method for integrating chip logic based on meta reinforcement learning and multi-mode circuit representation according to claim 1, wherein the functions of state acquisition, action execution, reward calculation and constraint judgment in step S1 are implemented as follows: the state acquisition function acquires a current circuit netlist, scalar features and a historical operator sequence by calling an ABC tool command; the action execution function converts a basic operator or script fragment output by the agent into an ABC tool command and executes the ABC tool command to update the state of the circuit; Real-time recording circuit initial LUT number of rewarding calculation function With the current minimum LUT number And whether the constraint judging function monitoring circuit delay exceeds 1.1 times of the delay of the express 2rs script or not, if yes, the current optimization episode is terminated, wherein episode refers to a complete optimization process from the initial state and environment interaction of the multi-near-end strategy optimization agent to the triggering termination condition.
  4. 4. The method for chip logic synthesis based on meta-reinforcement learning and multi-modal circuit representation of claim 3, the training process in step S2 includes: s21, initializing the Actor-Critic basic network parameters optimized by the near-end strategy by adopting the uniform distribution of the Xavier; S22, randomly sampling 4 circuit tasks with different function types from the circuit basic data set based on the universal initialization parameters Initializing a network, interacting with the interaction environment in the step S1, and collecting 10 support tracks, wherein each track comprises 25 optimization steps; S23, calculating the performance loss of each task, and updating the universal initialization parameters through gradient descent Obtaining temporary parameters Updating the formula to Wherein Calculating the meta-loss of 4 tasks Updating generic initialization parameters by gradient descent Updating the formula to Wherein ; S24, repeating the steps S22-S23 until the general initialization parameters Average LUT amplitude reduction continuous 5-round fluctuation on EPFL reference library verification circuit is less than 1%, and universal initialization parameters are determined Converging; S25, loading general initialization parameters for the new circuit Initializing a network, collecting 5 fine tuning tracks, and adopting a learning rate Performing gradient updating for 1-2 rounds to obtain specific optimized parameters 。
  5. 5. The method for chip logic synthesis based on meta reinforcement learning and multi-modal circuit representation as set forth in claim 4, wherein the performance loss in step S23 is calculated by: Wherein, the As a discount factor, the number of times the discount is calculated, In order to update the parameters before the update, The divergence constraint coefficient is 0.01.
  6. 6. The method for chip logic synthesis based on meta reinforcement learning and multi-mode circuit representation of claim 1, wherein the multi-near-end policy optimization intelligent architecture is characterized in that a shared backbone network is a 2-layer fully-connected network, the number of neurons in each layer is 128 and 64, an activation function is LeakyReLU, a negative slope is 0.01, and the shared backbone network receives a circuit feature representation vector outputted in the step S4 Extracting high-order characteristics of the circuit and providing unified characteristic input for the logic optimizing agent, the technical mapping agent and the mapped optimizing agent.
  7. 7. The method of claim 6, wherein the logic optimization agent, the technology mapping agent and the mapped optimization agent share the Critic branch of the 1-layer fully-connected structure, the output dimension is 1, and the logic optimization agent, the technology mapping agent and the mapped optimization agent are used for outputting the value of the current state 。
  8. 8. The method for synthesizing chip logic based on meta reinforcement learning and multi-modal circuit representation according to claim 1, wherein the step S4 is to extract structural function mode original features, optimized history mode original features and scalar feature mode original features of the circuit, specifically: Converting AIG netlist into graph structure by pre-trained DeepGate model, extracting structural features by 1-layer graph convolution layer, extracting functional features by 2-layer full-connection layer, reducing dimension by multi-scale pooling and 1-layer full-connection layer, and outputting 128-dimensional structural function mode embedded vector The structural function mode embedding vector The original characteristics of the structural function modes are converted; Coding the historical operator sequence by adopting a pre-trained Mamba coder, inputting the coder containing 2 Mamba block, and outputting 128-dimensional optimized historical mode embedded vectors through linear projection, state space convolution and global average pooling The optimized history modality embedded vector The original characteristics of the optimized historical modes are obtained through conversion; Through 2-layer MLP (multi-level hierarchical processing) on circuit scalar feature normalization, the input dimension 8 and the output dimension 32 of the first layer, the input dimension 32 and the output dimension 128 of the second layer, the activation functions are PReLU, and the scalar feature mode embedded vector of 128 dimensions is output The scalar feature modality embedding vector The scalar feature mode is obtained by converting the scalar feature mode original features.
  9. 9. The method for chip logic synthesis based on meta-reinforcement learning and multi-modal circuit representation as recited in claim 8, wherein the unified circuit feature representation vector is generated by cross-modal fusion The method specifically comprises the following steps: The said 、 、 Respectively normalizing the linear projection and LayerNorm to obtain normalized structural function mode embedded vector Optimizing history modality embedded vectors Scalar feature modality embedding vector And according to the above For inquiring about To Is a key Value of Attention weight is calculated Wherein Output attention characteristics =128 ; Will be Inputting a linear layer, and obtaining a gating vector through sigmoid activation According to the formula Generating circuit feature representation vectors 。
  10. 10. The method for chip logic synthesis based on meta-reinforcement learning and multi-modal circuit representation of claim 1, the method is characterized in that the calculation process of the delay reward function in the step S5 is as follows: The calculation rule of the process rewards is that when optimizing step t <25, if the current circuit LUT number Less than the historical optimal LUT number Then Otherwise rstep = 0; The terminal rewards calculation rule is that when the optimization step t=25, if the current circuit delay is not more than 1.1 times of the delay of the combiness 2rs script Wherein For the initial number of LUTs, To optimize the process to minimize LUT numbers, otherwise =0; Total rewards 。

Description

Chip logic synthesis method based on meta reinforcement learning and multi-mode circuit representation Technical Field The invention relates to the technical field of artificial intelligence and electronic design automation, in particular to a chip logic synthesis method based on meta reinforcement learning and multi-mode circuit representation. Background Electronic design automation (Electronic Design Automation, EDA) is a core technology supporting the development of the semiconductor industry, and the whole process from design intention to mass production of integrated circuits is realized through an automatic tool chain, so that key links such as system design, RTL design, logic synthesis, layout and wiring are covered. The Logic Synthesis (LS) is used as a core hub for connecting abstract function description and physical circuit realization, and is used for automatically converting Verilog/VHDL RTL codes written by engineers into a gate-level netlist consisting of an AND gate, a trigger and the like, optimizing the key tasks of a circuit structure under the three constraints of performance, power consumption and area (PPA), and directly determining the upper performance limit and the mass production cost of a chip by the optimized quality. As chip technology enters advanced nodes of 7nm and below, the circuit scale is increased from millions of logic gates to billions, heterogeneous circuit types are increasingly abundant, the traditional logic synthesis method is difficult to adapt to industry requirements, and the technical evolution presents obvious stage characteristics and limitations: Early logic synthesis takes a fixed script and heuristic algorithm as cores, and belongs to a rule driving technical paradigm. An engineer needs to write a preset operator sequence script according to the circuit function type (such as an arithmetic circuit and a control circuit), for example, the arithmetic circuit focuses on 'rewriting, fan-out optimization, technical mapping', and the control circuit focuses on 'timing repair, redundancy deletion and post-mapping optimization'. The method has high stability and strong interpretability, can meet the design requirement of a medium-small scale circuit (node number is less than 1000), but has insufficient universality of a fixed script when facing a large-scale heterogeneous circuit, is difficult to balance multi-objective optimization requirements, and the optimization effect is highly dependent on the experience of engineers. With penetration of the deep learning technology in the EDA field, logic synthesis enters a learning stage of data driving, and a core idea is to automatically explore an optimal optimization strategy through a machine learning algorithm. The reinforcement learning (Reinforcement Learning, RL) becomes a mainstream technical path, through constructing a Markov decision process of 'circuit state → action → rewarding', an intelligent agent firstly introduces deep reinforcement learning (A2C algorithm) into logic synthesis through a DRiLLS scheme in 2020, reduces the gate number of a focusing logic optimization stage, achieves an effect superior to a traditional script on an EPFL reference library circuit, only covers a single optimization stage, does not consider full-flow synergy, introduces Bayesian optimization in a BSBO scheme in 2022, improves exploration efficiency through the relation between a dynamic modeling operator and optimization quality, performs better on a small-scale control circuit, but is difficult to adapt to a large-scale complex circuit, designs a mixed action space of 'discrete operator + continuous parameter' through a EasySO scheme in 2023, gives consideration to selection flexibility and adjustment precision, still needs to improve optimization precision, has low adaptation efficiency and high calculation cost, and is difficult to meet engineering application requirements for each new circuit. In order to improve sample efficiency and generalization capability, research in recent years further merges 'imitation learning' and 'pre-training' technologies, explores a 'knowledge migration' path, wherein a AlphaSyn scheme in 2023 is merged into Monte Carlo tree search, exploration is conducted by means of expert script guiding strategies, sample efficiency is improved by 3 times compared with that of a traditional RL, but local optimization is easily trapped in large-scale circuit optimization, a PIRLLS scheme in 2025 adopts a 'imitation expert script pre-training and RL fine tuning' two-stage framework, optimization time is shortened by 60% compared with DRiLLS, and generalization capability on heterogeneous circuits is limited due to the fact that expert script quality is excessively relied. Therefore, a method for integrating chip logic with cross-circuit migration capability, full-flow collaborative optimization capability, deep circuit understanding capability and global target direction is needed in the art to break