CN-121983110-A - RNA rate-based cytokinetic modeling and gene regulation network construction method and system
Abstract
The invention discloses a method and a system for constructing a cell dynamics modeling and gene regulation network based on RNA rate, wherein the method comprises the steps of obtaining single cell RNA sequencing data, constructing a gene expression matrix according to the unsplit abundance and the splicing abundance of genes, constructing a transcription dynamics model based on a normal differential equation, and modeling transcription dynamics parameters as functions of an unsplit abundance vector, a splicing abundance vector, a cell embedding vector and a priori gene regulation network Constructing a deep learning model framework, constructing a loss function through a cell layer and a gene layer, and guiding a fitting function of the deep learning model And extracting an attention matrix from the deep learning model obtained by training to serve as an inferred single-cell gene regulation network. The invention effectively solves the problems of inaccurate deduction and sensitivity to data noise caused by neglecting gene association in the existing method.
Inventors
- ZHENG RUIQING
- LI MIN
- Miao Ziyang
- SHI XINGYUAN
Assignees
- 中南大学
Dates
- Publication Date
- 20260505
- Application Date
- 20251224
Claims (10)
- 1. A method for constructing a network of cellular dynamics modeling and gene regulation based on RNA rates, comprising: acquiring single-cell RNA sequencing data, and constructing a gene expression matrix according to the non-spliced RNA abundance and the spliced RNA abundance of the gene; Constructing a transcription kinetics model based on ordinary differential equations, and modeling transcription kinetics parameters as a function of non-spliced RNA abundance vectors, cell intercalation vectors, and a priori gene regulation network ; Constructing a deep learning model framework, constructing a loss function through a cell layer and a gene layer, and guiding the fitting function of the deep learning model Obtaining transcription kinetic parameters; And extracting an attention matrix from the deep learning model obtained through training to serve as an inferred single-cell gene regulation network.
- 2. The method for constructing a network for cell dynamics modeling and gene regulation according to claim 1, wherein the construction of a transcription dynamics model based on a normal differential equation is expressed as: ; ; Modeling dynamic parameters to obtain functions Expressed as: ; In the formula, And Respectively represent cells Mesogenic genes Is not spliced RNA abundance and spliced RNA abundance; 、 And Respectively represent the corresponding transcription, splicing and degradation rates, Respectively represent cells Vectors constructed corresponding to transcription, splicing and degradation rates of all genes in (a); And Is a cell Is a vector of non-spliced RNA abundance and a vector of spliced RNA abundance; cell intercalation vectors for preserving cell heterogeneity; representing a predefined a priori gene regulation network, in particular a binary directed adjacency matrix.
- 3. The method for constructing a cell dynamics modeling and gene regulation network according to claim 1, wherein a deep learning model architecture is constructed, and the method comprises a transcription coding module, a regulation network sensing module and a dynamics parameter deducing module; the transcription coding module converts the abundance of the unsplit RNA and the abundance of the splicing RNA into corresponding vector embedding respectively by utilizing the weight capable of learning gene specificity, and performs splicing fusion with the cell embedding vector to obtain a gene embedding vector; The regulation network perception module adopts a masking multi-head cross attention mechanism, takes a binary directed adjacent matrix corresponding to a priori gene regulation network as a masking matrix, filters the non-existing regulation relationship, calculates the gene regulation attention score of cells and updates a gene embedding vector; The dynamics parameter deducing module utilizes a multi-layer perceptron to carry out nonlinear transformation on the updated gene embedding vector, predicts dynamics parameters and deduces the non-spliced RNA abundance and spliced RNA abundance of the cellular genes at the next moment.
- 4. The method of claim 3, wherein the transcriptional coding module treats the non-spliced RNA abundance and the spliced RNA abundance by: ; ; Wherein, the And Respectively represent cells Mesogenic genes Is not spliced and spliced is embedded; And Respectively corresponding gene specificity learnable weights of non-splicing embedding and splicing embedding; characterizing a transformation matrix for the cells; cell intercalation vectors for preserving cell heterogeneity; Representing a splicing operation; the regulation network perception module calculates regulation relations and updates gene embedding by using the following mask multi-head cross attention mechanism: ; = ; Wherein, the Is a cell Target gene in China Is used as a query vector; Is a cell Medium potential regulatory genes As a key vector; Representing input to And A cross-attention calculation is performed and, Representing cells Medium potential regulatory genes For target gene Is recorded as a regulation relationship; 、 And A learnable weight matrix for the query, key, and value, respectively; a binary mask matrix constructed for the prior gene regulation network for filtering regulation edges not existing in the prior gene regulation network when calculating the attention score; Learnable weight matrix for keys Is a dimension of (2); And Respectively are provided with 、 Embedding the genes updated by the cross-attention mechanism; The kinetic parameter inference module is based on updated gene embedding And Splicing and inputting the same into a multi-layer perceptron, and predicting kinetic parameters by using Softplus activation function 。
- 5. The method of claim 1, wherein the predicted kinetic parameters are used to infer the non-spliced RNA abundance and spliced RNA abundance of the cellular gene at the next time, expressed as: ; ; In the formula, To infer the time step of the gene regulatory network, Respectively represent cells Mesogenic genes Kinetic parameters of (a), i.e. transcription, splicing and degradation rates; And Respectively represent cells Mesogenic genes Is not spliced RNA abundance and spliced RNA abundance, And Respectively, infer cells Mesogenic genes The non-spliced RNA abundance and the spliced RNA abundance at the next moment.
- 6. The method for constructing a network for modeling and gene regulation according to claim 1, wherein the contrast loss function is constructed by ternary organization at the cell level: ; Wherein, the Representing cells Is a contrast loss of (2); representing cells A velocity vector of the abundance of non-spliced RNAs, , And Respectively cells And deducing the abundance of the non-spliced RNA at the next moment; Is based on attention weight Is a weighted distance function of (2); attention weighting The vector is obtained by the average of the regulation and control attention score matrix, and the length is the same as the gene quantity; representing cells An average of the non-spliced RNA abundance velocity vectors of several neighboring cells, Representing cells An average of the non-spliced RNA abundance velocity vectors of several distal cells; Is a boundary value.
- 7. The method for constructing a network for modeling and gene regulation according to claim 1, wherein constructing a loss function at a gene level comprises: first, for each cell Using inferred cell velocity Embedding cells into vectors Projection to future state to generate pseudo cell embedding And is embedded in pseudo cells Searching for the nearest cell in the local neighborhood of (3) : ; In the formula, Representing pseudo cell intercalation Is used for the cell index in the local neighborhood of (c), Is a distance threshold parameter; then define the loss function of gene level as predicted value and cell The difference between: ; In the formula, A loss function representing a gene level; representing cells A velocity vector of the abundance of non-spliced RNAs, , And Respectively cells And deducing the abundance of the non-spliced RNA at the next moment; representing cells As a velocity vector of non-spliced RNA abundance of cells Deducing a speed supervision target; representing cells The velocity vector of the abundance of spliced RNA, :, Sum s Respectively cells And deducing the abundance of the spliced RNA at the next moment; representing cells Velocity vector of spliced RNA abundance.
- 8. The method of claim 1, further comprising assessing the importance of regulatory genes and the level of activity of a given biological pathway using an attention matrix.
- 9. The method for constructing a network for modeling and gene regulation according to claim 8, wherein evaluating the importance of the regulated gene using an attention moment array comprises: Identifying key regulatory genes, namely sequencing each target gene according to the attention scores of the regulatory genes in an attention matrix, selecting the regulatory genes with preset proportion as key regulatory factors of the target genes, and integrating the key regulatory factors into a comprehensive set; calculating importance scores, namely accumulating the attention scores of all the target genes regulated by each regulation gene in the set, and taking the accumulated result as the final importance score of the regulation gene; cell type-specific analysis, performing differential analysis based on the importance scores of regulatory genes of all cells, and taking the regulatory genes showing significant differential importance as "cell type-specific regulatory genes"; assessing the level of activity of a given biological pathway in a cell using an attention matrix assessment, comprising: First, a gene set and an attention moment array of a given channel are identified Intersection of sets of mesogenes; Then the genes in the crossing are divided into regulatory gene sets And target gene set ; Then through accumulation aggregation Regulatory genes and collections thereof In the method, target genes in the strain matrix are in attention to Corresponding regulation weight in the given channel is calculated Intracellular importance score : ; Wherein, the Expression of regulatory genes For target gene Is a weight value of attention of (a).
- 10. A single-cell level dynamic gene regulation network inference system based on a graph neural network and contrast learning, comprising a memory and a processor, wherein the memory stores a computer program, and the computer program, when executed by the processor, causes the processor to implement the method of any one of claims 1-9.
Description
RNA rate-based cytokinetic modeling and gene regulation network construction method and system Technical Field The invention belongs to the field of bioinformatics, and relates to a method and a system for constructing a cell dynamics modeling and gene regulation network based on an RNA rate. Background With the rapid development of single cell sequencing technology, especially the wide application of single cell RNA sequencing (scRNA-seq) data, life science research has entered a new era of resolution of biological processes from single cell resolution. Cells are the fundamental unit of organism structure and function, and the development, differentiation and morphogenesis processes are precisely controlled by gene regulation networks. However, conventional scRNA-seq technology can only provide a static transcriptome snapshot of cells at a specific time point, and it is difficult to directly capture dynamic information of the evolution of cell states over time. To reconstruct the dynamic processes behind these static data, cell trajectory inference (or pseudo-timing analysis) techniques have evolved to reveal key regulatory factors and modules that drive cell fate decisions by constructing a continuous path of cell development. It has important significance for analyzing embryo development mechanism and exploring the development and development process of diseases (such as cancer and autoimmune diseases). Although traditional trajectory inference methods reveal to some extent the transformation relationships between cells, their limitations are increasingly prominent in the face of increasingly complex biological system studies. Conventional pseudo-timing methods typically rely on fixed topological manifold assumptions, are difficult to handle for multi-start, multi-lineage complex dynamics systems, and often require human specification of trajectory start points. In response to this challenge, an "RNA rate" (RNA vector) based approach was proposed to predict future transcriptional status of cells by constructing a transcriptional kinetic model using the abundance ratio between precursor mRNA (unspliced RNA) and mature mRNA (spliced RNA). The method can autonomously infer the cell differentiation direction in an end-to-end manner, and provides a powerful tool for understanding the dynamic coupling relationship between gene regulation and cell differentiation. At present, analysis methods for the relationship between a cell dynamic process and gene regulation mainly fall into the following categories: (1) Traditional track inference method based on manifold learning, wherein the method mainly maps Gao Weishan cell data to a low-dimensional space through a dimension reduction algorithm and constructs a continuous change path of cell states based on graph theory or a minimum spanning tree algorithm. Although this kind of method performs well in a simple linear differentiation process, it is essentially to construct a static topological structure based on the similarity between cells, cannot capture the real transcription dynamics direction, and relies heavily on a priori knowledge to determine the differentiation start and end points, and is difficult to automatically resolve the complex cell fate bifurcation mechanism. (2) RNA rate inference methods based on independent gene hypothesis such methods (e.g., early Velocyto and later scVelo, etc.) introduce a transcription kinetic equation that extrapolates cell velocity by fitting the splicing and degradation rates of individual genes. However, these methods are mostly based on "gene independence assumption", i.e. each gene is treated as an isolated variable when solving kinetic parameters, ignoring complex expression correlations between genes and regulatory network constraints. This simplified process, while reducing computational complexity, also results in neglecting the interplay between genes, making the inferred flow fields prone to bias in complex biological scenarios. (3) In order to explore key factors driving cell trajectories, the existing research generally adopts a two-step strategy, namely, determining cell time sequences by using a trajectory inference or RNA rate method, and constructing a gene regulation network in downstream analysis by combining a correlation analysis or causal inference method (such as Granger Causality). This strategy breaks dynamic modeling apart from network construction, resulting in a process where regulatory network information cannot be fed back and corrected for trajectory inference, and difficulty in dynamically identifying specific regulatory factors that play a critical role at different stages of cellular development. Although there are a number of analytical tools currently available for cell dynamics and regulatory networks, many challenges remain in practical use. Firstly, the existing RNA rate model is extremely sensitive to the selection of characteristic genes, and the core genes really determining the dif