Search

CN-121709015-B - MICP functional gene regulation and control network analysis method, device, equipment and medium

CN121709015BCN 121709015 BCN121709015 BCN 121709015BCN-121709015-B

Abstract

The invention relates to the technical field of bioinformatics, and discloses a method, a device, equipment and a medium for analyzing an MICP functional gene regulation network, wherein the method binarizes and maps the expression data of the MICP functional gene into a one-dimensional ordered sequence; the method comprises the steps of encoding a gene state sequence into a quantum state vector, carrying out high-efficiency compression representation on the quantum state vector by utilizing a matrix product state tensor network, calculating quantum mutual information between gene pairs on the basis of the quantum state vector to construct a binary regulation skeleton network, further calculating multi-element quantum mutual information to identify a remarkable multi-gene cooperation or redundancy regulation module, finally, screening a reliable relationship through statistical test, integrating and constructing a comprehensive regulation network, carrying out functional analysis, breaking through the bottleneck of a traditional method on capturing a high-order relationship through the quantization and tensor network, and remarkably improving the robustness and analysis depth of network inference.

Inventors

  • LAI YONGMING
  • HONG YIQIANG
  • LIU QIAN
  • CHEN XU

Assignees

  • 龙岩学院

Dates

Publication Date
20260508
Application Date
20260210

Claims (6)

  1. 1. A method for analyzing a MICP functional gene regulatory network, comprising: Performing binarization processing on the original expression data of MICP functional genes of a plurality of samples to obtain a binary expression matrix, wherein rows and columns of the binary expression matrix represent genes, columns represent samples, and matrix elements are used for representing the expression states of corresponding genes in the corresponding samples; normalizing the two-dimensional coordinates of each gene, and mapping the two-dimensional coordinates to a target square area; Calculating a one-dimensional index value corresponding to each gene on the Hilbert space curve according to the two-dimensional coordinates of each gene; sequencing all genes according to the sequence from small index value to large index value to obtain a one-dimensional gene sequence, wherein genes with similar positions in a gene characteristic distribution diagram are kept adjacent in the one-dimensional gene sequence; Determining a gene binary state sequence of each sample according to the one-dimensional gene sequence, and encoding each gene binary state sequence into a quantum state basis vector; counting each quantum state basis vector in all samples Frequency of occurrence, obtaining probability distribution thereof ; Based on the probability distribution Constructing a system quantum state vector representing the whole gene system, wherein the system quantum state vector is based on The formula, wherein, Is the basis vector of the quantum state, Is the probability amplitude of the quantum state basis vector; decomposing the system quantum state vector into a product sequence of tensors by a matrix product state method, wherein each tensor corresponds to one gene, and the key dimension between adjacent tensors is used for representing the correlation strength between the genes; Establishing a chain matrix product state structure formed by connecting N tensors in series as a matrix product state representation of the quantum state vector; based on the matrix product state representation, binary quantum mutual information between any gene pair and multi-quantum mutual information between multiple genes are calculated; Constructing a symmetrical gene dependency matrix based on binary quantum mutual information among all gene pairs; Performing displacement test on each binary quantum mutual information in the symmetrical gene dependency matrix to screen out a regulation and control gene pair; Constructing a gene regulation skeleton network based on the regulation gene pairs, wherein each gene is used as a network node, and the gene pairs are connected based on edges; performing displacement test on all the multi-element quantum mutual information to screen out multi-element regulation and control relation; integrating the multi-element regulation and control relation, and identifying and forming a multi-gene regulation and control module; Constructing a comprehensive regulation network based on the gene regulation skeleton network and the multi-gene regulation module; Inputting MICP functional gene expression data of a sample to be tested into the comprehensive regulation network for analysis, so as to identify a core gene or a functional module which plays a role in MICP efficiency in the network, and determining a target spot of strain transformation based on the identified core gene or functional module.
  2. 2. The MICP-functional gene regulation network analysis method of claim 1, wherein the binarizing the raw expression data of the MICP-functional genes of the plurality of samples to obtain a binary expression matrix comprises: the dimension reduction processing is carried out on the original expression data based on a principal component analysis method, and the dimension reduction processing comprises the following steps: zero-averaging the original expression data, and calculating a covariance matrix of the matrix subjected to zero-averaging; performing eigenvalue decomposition on the covariance matrix, and selecting a front target principal component as a low-dimensional representation; the data points of each gene in the covariance matrix are projected into a two-dimensional plane formed by the principal components of the front target to obtain two-dimensional coordinates of each gene.
  3. 3. The method for analyzing MICP functional gene regulatory network according to claim 1, wherein said one-dimensional gene sequence is used to determine a binary state sequence of each gene of each sample, and each binary state sequence of each gene is encoded into a quantum state basis vector, For each sample, sequentially obtaining the binary expression state of each gene according to the gene sequence defined by the one-dimensional gene sequence to form a binary state sequence of the sample; Mapping each binary bit in the binary state sequence into a corresponding computation ground state of one qubit, wherein 0 is mapped into 1 Is mapped to 0 Indicates that the gene is in an inactive state, 1 indicates that the gene is in an active state; and mapping the binary state sequence of each sample into a plurality of quantum bits to calculate the direct product of the ground state, and obtaining the quantum state base vector of each sample.
  4. 4. A MICP functional gene regulation network resolution device, the device comprising: The binary expression matrix determining module is used for carrying out binarization processing on the original expression data of the MICP functional genes of the plurality of samples to obtain a binary expression matrix, wherein the rows and the columns of the binary expression matrix represent genes, the columns represent the samples, and matrix elements are used for representing the expression states of the corresponding genes in the corresponding samples; the system comprises a one-dimensional gene sequence determining module, a one-dimensional index value calculating module, a one-dimensional gene sequence determining module and a one-dimensional gene sequence determining module, wherein the two-dimensional coordinate determining module is used for normalizing the two-dimensional coordinate of each gene and mapping the two-dimensional coordinate of each gene to a target square area; the quantum state base vector determining module is used for determining a gene binary state sequence of each sample according to the one-dimensional gene sequence and encoding each gene binary state sequence into a quantum state base vector; the system quantum state vector construction module is used for counting all quantum state basis vectors in all samples Frequency of occurrence, obtaining probability distribution thereof Based on the probability distribution Constructing a system quantum state vector representing the whole gene system, wherein the system quantum state vector is based on The formula, wherein, Is the basis vector of the quantum state, Is the probability amplitude of the quantum state basis vector; The matrix product state representation determining module is used for decomposing the system quantum state vector into a product sequence of tensors through a matrix product state method, and establishing a chain matrix product state structure formed by connecting N tensors in series as the matrix product state representation of the quantum state vector, wherein each tensor corresponds to one gene, and the key dimension between adjacent tensors is used for representing the correlation strength between the genes; the quantum mutual information determining module is used for calculating binary quantum mutual information between any gene pair and multi-element quantum mutual information between multiple genes based on the matrix product state representation; The gene regulation network construction module is used for constructing a symmetrical gene dependency matrix based on binary quantum mutual information among all gene pairs, carrying out displacement test on each binary quantum mutual information in the symmetrical gene dependency matrix to screen out regulation gene pairs, constructing a gene regulation skeleton network based on the regulation gene pairs, carrying out displacement test on all multi-element quantum mutual information to screen out multi-element regulation relation, integrating the multi-element regulation relation, identifying and forming a multi-gene regulation module, wherein each gene is used as a network node, and the gene pairs are connected based on edges; the comprehensive regulation network construction module is used for constructing a comprehensive regulation network based on the gene regulation skeleton network and the multi-gene regulation module; the analysis and identification module is used for inputting MICP functional gene expression data of a sample to be tested into the comprehensive regulation network for analysis so as to identify a core gene or a functional module which plays a role in MICP efficiency in the network, so that a target point of strain transformation is determined based on the identified core gene or functional module.
  5. 5. An electronic device comprising a memory and at least one processor, the memory having instructions stored therein, the at least one processor invoking the instructions in the memory to cause the electronic device to perform the steps of the MICP functional gene regulation network resolution method of any one of claims 1-3.
  6. 6. A computer readable storage medium having instructions stored thereon, wherein the instructions, when executed by a processor, implement the steps of the MICP functional gene regulation network resolution method of any one of claims 1-3.

Description

MICP functional gene regulation and control network analysis method, device, equipment and medium Technical Field The invention belongs to the technical field of bioinformatics, and particularly relates to a method, a device, equipment and a medium for analyzing a MICP functional gene regulation network. Background The existing gene regulation network inference methods (such as correlation analysis, mutual information, regression models, bayesian networks and the like) have various advantages in theory, but have common defects in processing single cell and environmental microorganism data. The main problems include that the result is not stable due to data noise and sparsity, the high-order polygenic synergistic relationship is difficult to capture, the causal direction is difficult to identify, and the environmental driving effect and the pseudo-correlation are often misjudged to be direct regulation and control. In addition, although the mutual information and substitution test method can find a nonlinear relationship, the estimation deviation is large when the sample size is insufficient, and the calculation overhead is too high. In the MICP research scenario, gene regulation is commonly affected by multiple species, environmental factors and metabolic pathway coupling, and existing methods often have difficulty in revealing complex high-order regulation patterns and lack of repeatability of application in real environments. Attempts such as multi-group learning integration and deep learning methods have been made to alleviate the problems to some extent, but relying on prior comments and experimental verification is difficult, and the limitations are particularly prominent on non-model microorganisms. Thus, in general, the prior art has significantly short plates in terms of high-order dependency capture, statistical robustness, causal direction decisions, and environmental suitability. Disclosure of Invention Aiming at the defects of the prior art, the invention aims to provide a method, a device, equipment and a medium for analyzing MICP functional gene regulation network. According to one aspect of the present application, there is disclosed a MICP functional gene regulatory network resolution method comprising: Performing binarization processing on the original expression data of MICP functional genes of a plurality of samples to obtain a binary expression matrix, wherein rows and columns of the binary expression matrix represent genes, columns represent samples, and matrix elements are used for representing the expression states of corresponding genes in the corresponding samples; Mapping each gene in the binary expression matrix into a one-dimensional gene sequence through a space filling curve according to a two-dimensional feature vector, wherein genes with similar positions in a gene feature distribution map are kept adjacent in the one-dimensional gene sequence; The one-dimensional gene sequence is used for determining a gene binary state sequence of each sample, and each gene binary state sequence is encoded into a quantum state basis vector; Constructing a system quantum state vector representing the whole gene system based on all quantum state basis vectors of all samples; Decomposing and compressing the system quantum state vector by utilizing a matrix product state tensor network to obtain a matrix product state representation of the quantum state vector; based on the matrix product state representation, binary quantum mutual information between any gene pair and multi-quantum mutual information between multiple genes are calculated; constructing a gene regulation skeleton network based on the binary quantum mutual information, and identifying a multi-gene regulation module based on the multi-element quantum mutual information; Constructing a comprehensive regulation network based on the gene regulation skeleton network and the multi-gene regulation module; Inputting MICP functional gene expression data of a sample to be tested into the comprehensive regulation network for analysis, so as to identify a core gene or a functional module which plays a role in MICP efficiency in the network, and determining a target spot of strain transformation based on the identified core gene or functional module. In some embodiments, binarizing the raw expression data of the MICP function genes of the plurality of samples to obtain a binary expression matrix comprises: the dimension reduction processing is carried out on the original expression data based on a principal component analysis method, and the dimension reduction processing comprises the following steps: zero-averaging the original expression data, and calculating a covariance matrix of the matrix subjected to zero-averaging; performing eigenvalue decomposition on the covariance matrix, and selecting a front target principal component as a low-dimensional representation; the data points of each gene in the covariance matrix are projected into a