CN-121707004-B - Quantum circuit mapping method and system based on deep reinforcement learning

CN121707004BCN 121707004 BCN121707004 BCN 121707004BCN-121707004-B

Abstract

The invention relates to the technical field of quantum computation, and discloses a quantum circuit mapping method and a system based on deep reinforcement learning, wherein the method is characterized in that an original logic quantum circuit is analyzed and simplified into a ZX-graph, the ZX-graph is further converted into a quantum circuit dependency graph, the computation complexity is obviously reduced, the mapping task of a large-scale quantum circuit can be processed in reasonable time, and the mapping task is further characterized in that the quantum circuit dependency graph and quantum chip parameters are utilized, the method comprises the steps of constructing a Markov decision process environment, training a deep reinforcement learning agent by using the Markov decision process environment to obtain an optimal mapping strategy optimization model, and outputting a corresponding optimal quantum bit mapping scheme for a current logic circuit to be mapped by using the optimal mapping strategy optimization model, so that quantum chips with various topological structures can be adapted, and the generalization capability and adaptability of the mapping scheme are improved.

Inventors

YANG MO
Zhou Tibu
ZHANG YUCONG

Assignees

中山大学

Dates

Publication Date: 20260512
Application Date: 20260211

Claims (7)

1. The quantum circuit mapping method based on deep reinforcement learning is characterized by comprising the following steps of: Obtaining an original logic quantum circuit, analyzing and simplifying the original logic quantum circuit into a ZX-diagram, and converting the ZX-diagram into a quantum circuit dependent diagram, wherein the method comprises the following steps of: Mapping each single quantum gate in the original logic quantum circuit into a single node in a ZX-graph, mapping each double quantum gate in the original logic quantum circuit into a node and a first directed edge in the ZX-graph, and labeling node parameters for each single node and each node to form a ZX-graph with parameters, wherein the first directed edge is used for representing entanglement relation among quantum bits; The spider fusion, boundary elimination and phase offset rules based on ZX algorithm optimize and simplify the ZX-graph with the parameters to obtain a simplified ZX-graph with logic equivalence; Mapping each node in the simplified ZX-graph into a node in the quantum circuit dependency graph, and constructing the quantum circuit dependency graph for reflecting quantum gate execution sequence constraint by taking causal dependency relations among quantum gates as second directed edges in the quantum circuit dependency graph; obtaining quantum chip parameters, and constructing a Markov decision process environment according to the quantum circuit dependency graph and the quantum chip parameters, wherein the method comprises the following steps of: Determining a plurality of feature vectors of a current quantum mapping scene according to the quantum circuit dependency graph and the quantum chip parameters, and splicing the plurality of feature vectors of the current quantum mapping scene after standardization to form a state space; determining an action space to comprise a quantum bit initial allocation action, a SWAP gate scheduling action and an adjustable coupler control action; Defining a state transfer function as deterministic transfer, and constructing a multi-target weighted rewarding function based on quantum gate execution fidelity as rewarding items, and line depth, SWAP gate number and adjustable coupler control times as punishment items; constructing the Markov decision process environment according to the state space, the action space, the state transfer function and the multi-objective weighted reward function and introducing discount factors; The quantum chip parameters comprise static topological parameters, dynamic calibration parameters and quantum bit mapping relations, wherein the feature vectors comprise chip topological structure feature vectors, quantum circuit dependency graph feature vectors, quantum bit mapping relation feature vectors and hardware dynamic parameter feature vectors, the steps of determining a plurality of feature vectors of a current quantum mapping scene according to the quantum circuit dependency graph and the quantum chip parameters, and splicing the plurality of feature vectors of the current quantum mapping scene after standardization to form a state space comprise the following steps: Extracting features of the quantum circuit dependency graph by using a graph attention network to obtain features of all nodes in the quantum circuit dependency graph, and carrying out global pooling on the features of all nodes in the quantum circuit dependency graph to obtain a feature vector of the quantum circuit dependency graph; Constructing an adjacent matrix through the static topological parameters, constructing a Laplace matrix for the adjacent matrix, performing feature decomposition on the Laplace matrix, extracting a front 32-dimensional Laplace feature vector, and fusing an adjustable coupler position code to obtain the chip topological structure feature vector; after the dynamic calibration parameters are normalized, splicing to form the hardware dynamic parameter feature vector; Constructing a mapping matrix according to the quantum bit mapping relation, and vectorizing the mapping matrix to obtain the quantum bit mapping relation feature vector; The chip topological structure feature vector, the quantum circuit dependency graph feature vector, the quantum bit mapping relation feature vector and the hardware dynamic parameter feature vector are subjected to standardized processing and then spliced to form the state space; Training the deep reinforcement learning agent based on the Markov decision process environment to obtain an optimal mapping strategy optimization model; Obtaining a current logic circuit to be mapped and current quantum chip parameters corresponding to the current logic circuit to be mapped, and generating an optimal quantum bit mapping scheme of the current logic circuit to be mapped by combining the optimal mapping strategy optimization model.
2. The deep reinforcement learning-based quantum circuit mapping method according to claim 1, wherein training the deep reinforcement learning agent based on the markov decision process environment to obtain an optimal mapping strategy optimization model comprises: constructing an intelligent agent for combined training of a strategy network and a value network based on a PPO algorithm, wherein the intelligent agent is a shared backbone structure which is integrated with a fully connected network through a graph neural network by introducing an action mask mechanism; Based on a PPO algorithm, selecting legal actions from the action probability distribution by the agent according to the action probability distribution output by a strategy network and the action mask mechanism and executing the legal actions according to the current state space in the Markov decision process environment; After the legal action is executed, updating a state space through the state transfer function, calculating instant rewards through the multi-objective weighted rewards function, and calculating long-term accumulated rewards after the legal action is executed by combining the discount factors; Updating the dominance function value according to the long-term accumulated rewards through generalized dominance estimation, combining the updated dominance function value, updating parameters of a strategy network and a value network through a minimum clipped objective function until the fluctuation amplitude of the instant rewards average value of continuous preset training iteration times tends to be converged, and outputting the latest agent as the optimal mapping strategy optimization model.
3. The deep reinforcement learning-based quantum circuit mapping method according to claim 1, wherein the obtaining the current logic circuit to be mapped and the current quantum chip parameters corresponding to the current logic circuit to be mapped, and generating the optimal quantum bit mapping scheme of the current logic circuit to be mapped in combination with the optimal mapping policy optimization model, includes: Acquiring a current logic circuit to be mapped and current quantum chip parameters corresponding to the current logic circuit to be mapped, and determining a current quantum circuit dependency graph according to the current logic circuit to be mapped; Determining a current state space corresponding to the current logic circuit to be mapped according to the current quantum circuit dependency graph and the current quantum chip parameters; Inputting the current state space into the optimal mapping strategy optimization model, and outputting the optimal action space of the current logic circuit to be mapped; decoding and de-duplicating the optimal action space to obtain a simplified action space; and performing legal verification on the simplified action space, and converting the action space passing the legal verification into a standardized optimal quantum bit mapping scheme.
4. The quantum circuit mapping method based on deep reinforcement learning according to claim 1, wherein the obtaining the current logic circuit to be mapped and the current quantum chip parameters corresponding to the current logic circuit to be mapped, and generating the optimal quantum bit mapping scheme of the current logic circuit to be mapped by combining the optimal mapping policy optimization model, further comprises: Determining a multi-dimensional performance evaluation index according to the optimal quantum bit mapping scheme, wherein the multi-dimensional performance evaluation index at least comprises one of circuit fidelity, execution time, constraint satisfaction rate, resource efficiency and robustness index; And under the condition that any one of the multi-dimensional performance evaluation indexes does not reach a preset index threshold, feeding the multi-dimensional performance evaluation index back to the optimal mapping strategy optimization model, adjusting the rewarding weight of the optimal mapping strategy optimization model, and re-executing the training process of the optimal mapping strategy optimization model until all the multi-dimensional performance evaluation indexes reach the preset index threshold, so as to obtain the latest quantum bit mapping scheme updated to the optimal quantum bit mapping scheme.
5. A quantum wire mapping system based on deep reinforcement learning, comprising: the dependency graph conversion module is used for acquiring an original logic quantum circuit, analyzing and simplifying the original logic quantum circuit into a ZX-graph, and converting the ZX-graph into a quantum circuit dependency graph; Obtaining an original logic quantum circuit, analyzing and simplifying the original logic quantum circuit into a ZX-diagram, and converting the ZX-diagram into a quantum circuit dependent diagram, wherein the method comprises the following steps of: Mapping each single quantum gate in the original logic quantum circuit into a single node in a ZX-graph, mapping each double quantum gate in the original logic quantum circuit into a node and a first directed edge in the ZX-graph, and labeling node parameters for each single node and each node to form a ZX-graph with parameters, wherein the first directed edge is used for representing entanglement relation among quantum bits; The spider fusion, boundary elimination and phase offset rules based on ZX algorithm optimize and simplify the ZX-graph with the parameters to obtain a simplified ZX-graph with logic equivalence; Mapping each node in the simplified ZX-graph into a node in the quantum circuit dependency graph, and constructing the quantum circuit dependency graph for reflecting quantum gate execution sequence constraint by taking causal dependency relations among quantum gates as second directed edges in the quantum circuit dependency graph; the environment construction module is used for acquiring quantum chip parameters and constructing a Markov decision process environment according to the quantum circuit dependency graph and the quantum chip parameters; obtaining quantum chip parameters, and constructing a Markov decision process environment according to the quantum circuit dependency graph and the quantum chip parameters, wherein the method comprises the following steps of: Determining a plurality of feature vectors of a current quantum mapping scene according to the quantum circuit dependency graph and the quantum chip parameters, and splicing the plurality of feature vectors of the current quantum mapping scene after standardization to form a state space; determining an action space to comprise a quantum bit initial allocation action, a SWAP gate scheduling action and an adjustable coupler control action; Defining a state transfer function as deterministic transfer, and constructing a multi-target weighted rewarding function based on quantum gate execution fidelity as rewarding items, and line depth, SWAP gate number and adjustable coupler control times as punishment items; constructing the Markov decision process environment according to the state space, the action space, the state transfer function and the multi-objective weighted reward function and introducing discount factors; The quantum chip parameters comprise static topological parameters, dynamic calibration parameters and quantum bit mapping relations, wherein the feature vectors comprise chip topological structure feature vectors, quantum circuit dependency graph feature vectors, quantum bit mapping relation feature vectors and hardware dynamic parameter feature vectors, the steps of determining a plurality of feature vectors of a current quantum mapping scene according to the quantum circuit dependency graph and the quantum chip parameters, and splicing the plurality of feature vectors of the current quantum mapping scene after standardization to form a state space comprise the following steps: Extracting features of the quantum circuit dependency graph by using a graph attention network to obtain features of all nodes in the quantum circuit dependency graph, and carrying out global pooling on the features of all nodes in the quantum circuit dependency graph to obtain a feature vector of the quantum circuit dependency graph; Constructing an adjacent matrix through the static topological parameters, constructing a Laplace matrix for the adjacent matrix, performing feature decomposition on the Laplace matrix, extracting a front 32-dimensional Laplace feature vector, and fusing an adjustable coupler position code to obtain the chip topological structure feature vector; after the dynamic calibration parameters are normalized, splicing to form the hardware dynamic parameter feature vector; Constructing a mapping matrix according to the quantum bit mapping relation, and vectorizing the mapping matrix to obtain the quantum bit mapping relation feature vector; The chip topological structure feature vector, the quantum circuit dependency graph feature vector, the quantum bit mapping relation feature vector and the hardware dynamic parameter feature vector are subjected to standardized processing and then spliced to form the state space; the optimization model training module is used for training the deep reinforcement learning intelligent agent based on the Markov decision process environment to obtain an optimal mapping strategy optimization model; The mapping scheme optimizing module is used for acquiring the current logic circuit to be mapped and the current quantum chip parameters corresponding to the current logic circuit to be mapped, and generating an optimal quantum bit mapping scheme of the current logic circuit to be mapped by combining the optimal mapping strategy optimizing model.
6. An electronic device comprising a memory and a processor, wherein the memory stores a computer program that, when executed by the processor, causes the processor to perform the steps of the deep reinforcement learning based quantum wire mapping method of any one of claims 1-4.
7. A computer readable storage medium having stored thereon a computer program, characterized in that the computer program when executed implements the steps of the deep reinforcement learning based quantum wire mapping method according to any of claims 1-4.

Description

Quantum circuit mapping method and system based on deep reinforcement learning Technical Field The invention relates to the technical field of quantum computing, in particular to a quantum circuit mapping method and system based on deep reinforcement learning. Background The quantum circuit mapping is a key step of connecting a logic quantum circuit and a physical quantum processor in quantum computing, and the key task is to distribute logic quantum bits to physical quantum bits, optimize the execution time sequence of quantum gates and ensure that the logic circuit can be executed in actual physical hardware with high efficiency and high fidelity. The performance directly determines the execution efficiency and fidelity of the quantum algorithm on real hardware. The existing quantum circuit mapping technology method is high in calculation complexity, difficult to output an effective mapping scheme in a reasonable time, and is lack of generalization capability and poor in adaptability due to the fact that the existing quantum circuit mapping technology method is used for processing large-scale quantum circuits (the logic bit number is more than or equal to 20). Disclosure of Invention In view of the above, the present invention provides a quantum circuit mapping method and system based on deep reinforcement learning in order to solve the above technical problems. The first aspect of the invention provides a quantum circuit mapping method based on deep reinforcement learning, which comprises the following steps: acquiring an original logic quantum circuit, analyzing and simplifying the original logic quantum circuit into a ZX-diagram, and converting the ZX-diagram into a quantum circuit dependency diagram; acquiring quantum chip parameters, and constructing a Markov decision process environment according to the quantum circuit dependency graph and the quantum chip parameters; Training the deep reinforcement learning agent based on the Markov decision process environment to obtain an optimal mapping strategy optimization model; Obtaining a current logic circuit to be mapped and current quantum chip parameters corresponding to the current logic circuit to be mapped, and generating an optimal quantum bit mapping scheme of the current logic circuit to be mapped by combining the optimal mapping strategy optimization model. Preferably, the parsing and simplifying the original logic quantum circuit into a ZX-graph, and converting the ZX-graph into a quantum circuit dependency graph, includes: Mapping each single quantum gate in the original logic quantum circuit into a single node in a ZX-graph, mapping each double quantum gate in the original logic quantum circuit into a node and a first directed edge in the ZX-graph, and labeling node parameters for each single node and each node to form a ZX-graph with parameters, wherein the first directed edge is used for representing entanglement relation among quantum bits; The spider fusion, boundary elimination and phase offset rules based on ZX algorithm optimize and simplify the ZX-graph with the parameters to obtain a simplified ZX-graph with logic equivalence; Mapping each node in the simplified ZX-graph to a node in the quantum circuit dependency graph, and constructing the quantum circuit dependency graph for reflecting the execution sequence constraint of the quantum gates by taking causal dependency relations among the quantum gates as second directed edges in the quantum circuit dependency graph. Preferably, said constructing a markov decision process environment from said quantum wire dependency graph and said quantum chip parameters comprises: Determining a plurality of feature vectors of a current quantum mapping scene according to the quantum circuit dependency graph and the quantum chip parameters, and splicing the plurality of feature vectors of the current quantum mapping scene after standardization to form a state space; determining an action space to comprise a quantum bit initial allocation action, a SWAP gate scheduling action and an adjustable coupler control action; Defining a state transfer function as deterministic transfer, and constructing a multi-target weighted rewarding function based on quantum gate execution fidelity as rewarding items, and line depth, SWAP gate number and adjustable coupler control times as punishment items; and constructing the Markov decision process environment according to the state space, the action space, the state transfer function and the multi-objective weighted reward function and introducing discount factors. Preferably, the quantum chip parameters comprise static topological parameters, dynamic calibration parameters and quantum bit mapping relations, and the feature vectors comprise chip topological structure feature vectors, quantum circuit dependency graph feature vectors, quantum bit mapping relation feature vectors and hardware dynamic parameter feature vectors; The determining a plurality of feature vector