Search

CN-121983115-A - Gene regulation network prediction method and system based on explicit correlation modeling

CN121983115ACN 121983115 ACN121983115 ACN 121983115ACN-121983115-A

Abstract

The invention discloses a gene regulation network prediction method and system based on explicit association modeling, wherein the method comprises the steps of obtaining a gene expression matrix of single-cell RNA sequencing data and an adjacency matrix of an priori regulation graph constructed based on priori knowledge, inputting the adjacency matrix into a graph neural network model, wherein the graph neural network model is configured to perform explicit modeling on links in the priori regulation graph through an intra-layer message transmission space and an inter-layer message transmission space to obtain link characterization, predicting whether regulation relations exist between gene pairs through a classifier based on the link characterization so as to infer a gene regulation network, and the framework of the graph neural network model is determined in a self-adaptive mode according to input data through an automatic framework searching algorithm. The framework provided by the scheme displays modeling link representation in the message transmission process, and deduces the gene pair regulation and control relationship based on link embedding by using MLP, so that the utilization and organization of complex connection information of a priori regulation and control diagram are enhanced from the source, and the deduction accuracy is effectively improved.

Inventors

  • WANG JIACHENG
  • CHEN YAOJIA
  • HAN HONGBIN
  • ZOU QUAN

Assignees

  • 电子科技大学长三角研究院(衢州)

Dates

Publication Date
20260505
Application Date
20251212

Claims (10)

  1. 1. A gene regulation network prediction method based on explicit association modeling is characterized by comprising the following steps: Acquiring a gene expression matrix of single-cell RNA sequencing data and constructing an adjacency matrix of a priori regulation map based on priori knowledge; Inputting the gene expression matrix and the adjacency matrix into a graph neural network model; Wherein the graph neural network model is configured to explicitly model links in the prior regulatory graph through intra-layer and inter-layer messaging spaces to obtain a link characterization; Predicting whether a regulation relationship exists between the gene pairs through a classifier based on the link characterization so as to infer a gene regulation network; The architecture of the graph neural network model is adaptively determined according to input data through an automatic architecture search algorithm.
  2. 2. The method for predicting gene regulation network based on explicit correlation modeling according to claim 1, wherein the graphic neural network model comprises a plurality of information transmission network layers and a pooling layer, and the information transmission network layers and the pooling layer are connected based on the interlayer message transmission space; the intra-layer messaging space is used for data processing inside the messaging network layer.
  3. 3. The explicit correlation modeling based genetic regulatory network prediction method of claim 2, wherein the intra-layer messaging space performs data processing to explicitly model links by: (1) (2) aggregating neighborhood information of the node v through a formula (1), and updating characterization of the node v through a formula (2); Wherein, the Representing a layer-one aggregation operation; is a neighborhood type specific weight matrix; v denotes the target node, u denotes the neighbor node of node v, Representing a set of all neighbor nodes directly connected to node v; Belongs to a set { self, neighbor }, and is used for distinguishing the connection of the node itself and the neighborhood thereof; a weight matrix representing the node itself; Is an embedded representation of the layer i node u; is an embedded representation of the layer i node v; representing a combined function operation of the first layer; Representing the activation function of the first layer.
  4. 4. The method for predicting a gene regulation network based on explicit correlation modeling according to claim 3, wherein the method comprises a neighborhood aggregation candidate set comprising a plurality of candidate aggregation modes, a combination function candidate set comprising a plurality of candidate combination modes, and an activation function candidate set comprising a plurality of candidate activation modes; the aggregation operation, the combination function operation and the activation function operation of the intra-layer message passing space are respectively selected and determined from the neighborhood aggregation candidate set, the combination function candidate set and the activation function candidate set by the automatic architecture search algorithm.
  5. 5. The explicit correlation modeling based gene regulatory network prediction method according to claim 2, wherein in the inter-layer messaging space, the fusion of the output and input of the upper layer of information delivery network layer is used as the input of the lower layer of information delivery network layer, and the fusion mode is selected and determined by the automatic architecture search algorithm from the inter-layer connection candidate set containing a plurality of candidate connection modes.
  6. 6. The explicit correlation modeling based gene regulatory network prediction method of claim 5, wherein in the inter-layer messaging space, the outputs of all messaging network layers are aggregated as candidate information sources to be input to the pooling layer, and the aggregation manner is selected and determined by the automatic architecture search algorithm from an inter-layer aggregation candidate set comprising a plurality of candidate aggregation modes.
  7. 7. The explicit correlation modeling based genetic control network prediction method of claim 2, comprising a set of link pooling operation candidates comprising a plurality of candidate pooling patterns, the automated architecture search algorithm further configured to select a pooling operation for the pooling layer from the set of link pooling operation candidates.
  8. 8. The explicit correlation modeling based gene regulation network prediction method of claim 2, wherein the automatic architecture search algorithm adaptively determines the architecture of the graph neural network model by: (3) by optimizing the structured parameters Parameterized distribution Searching optimal graph neural network model operation combinations from search space ; To evaluate functions for measuring combinations of operations And a set of specific model weights In a given prior regulation map Predicted performance of the above; the search space covers intra-layer messaging space, inter-layer messaging space, and pooling layer operations.
  9. 9. A gene regulation network prediction system based on explicit correlation modeling, comprising: The data acquisition module (1) is used for acquiring a gene expression matrix of single-cell RNA sequencing data and an adjacency matrix of a priori regulation map constructed based on priori knowledge; The model processing module (2) comprises a graph neural network model and is used for receiving the gene expression matrix and the adjacency matrix, and carrying out explicit modeling on links in the prior regulation graph through an intra-layer message passing space and an inter-layer message passing space so as to obtain link characterization; A prediction module (3) for predicting whether a regulation relationship exists between the gene pairs through a classifier based on the link characterization to infer a gene regulation network; and the architecture searching module (4) is used for adaptively determining the architecture of the graph neural network model according to the input data through an automatic architecture searching algorithm.
  10. 10. The explicit correlation modeling based graph neural network inference gene regulation network system of claim 9, wherein the intra-layer messaging space in the model processing module (2) is configured to perform data processing to explicitly model links by: (1) (2) Aggregating neighborhood information of the node v through a formula (1), and updating characterization of the node v through a formula (2); Aggregation operations of the intra-layer messaging spaces Combined function operation Activating function operations Configured to select a determination from a neighborhood aggregation candidate set, a combination function candidate set, and an activation function candidate set, respectively; the inter-layer messaging space is configured to contain connection operations selected from an inter-layer connection candidate set, and aggregation operations selected from an inter-layer aggregation candidate set; The connection operation is used for integrating the output and the input of the upper information transmission network layer to be used as the input of the lower information transmission network layer; the aggregation operation is used for aggregating the output of all information transmission network layers and inputting the output to the pooling layer; the pooling layer is configured to include a pooling operation selected from a link pooling operation candidate set; The architecture search module (4) is configured to execute a random differentiable search algorithm based on a specific distribution, searching for an optimal graph neural network model operation combination θ from a search space by optimizing a distribution ρβ (θ) parameterized by a structuring parameter β, the search space covering operations of an intra-layer messaging space, an inter-layer messaging space, and a pooling layer.

Description

Gene regulation network prediction method and system based on explicit correlation modeling Technical Field The invention belongs to the field of computer bioinformatics, and particularly relates to a gene regulation network prediction method and system based on explicit association modeling. Background The gene regulation network reveals the interaction relationship between genes, and is a key for understanding the core mechanism of vital activities. As single cell RNA sequencing technology matures, the reverse derivation of gene regulatory network GRN from scRNA-seq data using computational models has become a research hotspot in bioinformatics. The existing calculation methods can be roughly divided into three types of unsupervised/self-supervised learning, semi-supervised learning and supervised learning. The unsupervised or self-supervised learning method does not depend on the known regulation relation labels, and the gene regulation network is deduced by mining the internal mode of the gene expression data, so that the method shows advantages in the scene of label information deficiency, but the reconstruction accuracy of the method on the real and complex biological data is often poor. The semi-supervised learning method uses part of the known labels as guidance, but has limited performance improvement and also faces the precision bottleneck. The supervised learning method forms the GRN inference problem into a link prediction task, and utilizes the known transcription factor-target gene regulation and control relationship as a training label to train a model to predict unknown regulation and control effects. However, these methods generally consider genes as independent node pairs, the model design of which fails to fully consider and utilize the inherent topological structure characteristics of the graph of the gene regulatory network itself, essentially by learning the statistical correlation between gene pairs, ignoring the complex regulatory patterns formed by the correlation of multiple genes in local neighborhoods, resulting in deviations in the inferred results in capturing the structural characteristics of real GRNs. In recent years, graph neural networks have emerged as advanced models such as GENESPIDER and GENELink because of their powerful graph structure data modeling capabilities introduced into GRN inference. The methods model genes and interactions thereof as graphs, and utilize GNN to learn gene characterization, so that the performance is improved to a certain extent. However, these GNN models still have defects, such as the existing GNN models generally follow the paradigm of the node center, information aggregation and characterization learning are performed at the node level, the regulation and control relationship (i.e., link) between genes is implicit and indirect, inferred through the similarity or operation embedded by the nodes, the link itself cannot be explicitly and directly modeled, rich semantics contained in link information in the prior regulation and control graph cannot be fully utilized, and the understanding capability of the model on complex regulation and control logic is limited. Furthermore, existing GNN architecture designs rely on expert knowledge and lack adaptability, with inference models being cured in specific, artificially designed examples of graph neural network architectures, given a graph learning task, one GNN architecture may perform well on a certain dataset, but cannot be guaranteed to be equally effective on other scRNA-seq datasets with different data distributions. Because of the diversity of single-cell datasets, manually designing and debugging GNN architectures for each new dataset is a heavy and challenging task, which greatly limits the popularization and application of the model and the upper performance limit. Disclosure of Invention The invention aims to provide a gene regulation network prediction method and system based on explicit correlation modeling aiming at the problems existing in the prior art. In order to achieve the above purpose, the present invention adopts the following technical scheme: a gene regulation network prediction method based on explicit association modeling comprises the following steps: Acquiring a gene expression matrix of single-cell RNA sequencing data and constructing an adjacency matrix of a priori regulation map based on priori knowledge; Inputting the gene expression matrix and the adjacency matrix into a graph neural network model; Wherein the graph neural network model is configured to explicitly model links in the prior regulatory graph through intra-layer and inter-layer messaging spaces to obtain a link characterization; Predicting whether a regulation relationship exists between the gene pairs through a classifier based on the link characterization so as to infer a gene regulation network; The architecture of the graph neural network model is adaptively determined according to input data thro