Search

CN-121983136-A - Multi-group chemical analysis system for single cell and space transcriptome data integration

CN121983136ACN 121983136 ACN121983136 ACN 121983136ACN-121983136-A

Abstract

The invention provides a multi-group analysis system for single cell and space transcriptome data integration, which relates to the technical field of bioinformatics and computational biology and comprises an input processing module, a data integration and alignment module, a joint analysis module and a visual output module; according to the invention, by introducing a dual-feature anchoring strategy and combining a differential expression gene and a conservative gene co-transformation relationship, a shared feature set with definite biological significance is constructed, a constraint basis with high confidence is provided for subsequent integration, and further, a probability map model is adopted for space alignment, so that not only is accurate positioning of cell types realized, but also uncertainty of space distribution is quantified, a mapping result breaks through the limitation of traditional hard distribution, the biological reality of cell state mixing in a tissue microenvironment is more attached, and the whole flow height of data correction, feature anchoring and probability mapping to space communication and co-expression network analysis is ensured.

Inventors

  • WANG LIN
  • LIU XIAOCHUN
  • WANG SHUYU
  • WEI LINGYUN
  • WANG WENZHEN
  • Geng Mengyu
  • JIA NAN
  • BU XINGPENG

Assignees

  • 山西医科大学

Dates

Publication Date
20260505
Application Date
20260121

Claims (8)

  1. 1. A multi-group chemical analysis system for single cell and space transcriptome data integration is characterized by comprising: The input processing module is used for receiving single-cell transcriptome data and space transcriptome data, respectively carrying out standardization and cross-platform batch effect correction, and outputting corrected single-cell gene expression data and space gene expression data; The data integration alignment module is used for receiving single-cell gene expression data and space gene expression data, then based on inferred cell type information in the single-cell data, correlating the cell type information to space coordinates of the space gene expression data by utilizing a probability map model, and outputting space cell types to form a probability matrix; The joint analysis module is used for receiving the probability matrix formed by the spatial cell types, then executing intercellular communication inference and multi-group chemical co-expression network analysis of spatial resolution, and outputting analysis results; And the visual output module is used for receiving the analysis result and converting the analysis result into an interactive space visual chart and a structural analysis report.
  2. 2. The multi-set of chemical analysis system for single cell and spatial transcriptome data integration of claim 1, wherein the input processing module comprises a cross-platform calibration unit for identifying housekeeping gene sets common to single cell and spatial transcriptome data and applying a linear model-based batch effect calibration algorithm to bring the gene expression profiles of the two sets of data to a comparable scale.
  3. 3. The multiple sets of analysis system for single cell and spatial transcriptome data integration of claim 1, wherein the data integration alignment module comprises a dual-feature anchoring unit configured to perform a first anchoring analysis and a second anchoring analysis to obtain a first feature set and a second feature set, respectively, and then to combine the first feature set and the second feature set to form a shared feature set, wherein the first anchoring analysis identifies cell type-specific differentially expressed genes based on single cell gene expression data and identifies spatially differentially expressed genes based on spatially gene expression data, and an intersection of the two is taken to form the first feature set; The second anchoring analysis is to screen the gene pairs with stable cross-cell type correlation from single-cell gene expression data, verify the correlation consistency in the spatial gene expression data and keep the consistent gene pairs to form a second feature set.
  4. 4. A multi-group analysis system for single cell and space transcriptome data integration according to claim 3, wherein the data integration alignment module further comprises a probability map model alignment unit for using the expression data of the genes in the shared feature set as an observation input, and further using the cell type labels deduced from the single cell data and the expression features thereof as a hidden state prior to construct a conditional random field model with space points as nodes, and calculating the probability of each space point belonging to each cell type through model deduction, thereby generating the space cell type composition probability matrix.
  5. 5. The multi-set analysis system for single cell and space transcriptome data integration according to claim 1, wherein the joint analysis module comprises a space cell communication inference unit for quantifying the ligand-receptor interaction intensity in a local spatial region between different cell types by assigning a cell type source weight to the ligand and receptor expression level of each space point according to the spatial cell type composition probability matrix and the spatial point proximity relation network.
  6. 6. The multi-group analysis system for single-cell and spatial transcriptome data integration according to claim 4, wherein the joint analysis module comprises a multi-group co-expression analysis unit for combining single-cell gene expression data and spatial gene expression data based on a shared feature set, performing weighted gene co-expression network analysis on the combined data, identifying gene modules conserved in space and cell type dimensions, and finally correlating the gene modules with spatial activity distribution and biological functions thereof.
  7. 7. The multi-set of analysis systems for single cell and spatial transcriptome data integration of claim 1, wherein the visual output module outputs an interactive spatial profile capable of displaying in linkage at least one of spatial cell type composition probability, spatial ligand-receptor interaction strength and direction, and active hot spot region of the spatial co-expression module.
  8. 8. The multi-set of chemical analysis system for single cell and spatial transcriptome data integration according to claim 1, wherein the visual output module comprises an automated report generating unit for integrating key results of spatially differentially expressed genes, significant cell interaction pairs, co-expression modules and functional annotations thereof to automatically generate a structured analysis report.

Description

Multi-group chemical analysis system for single cell and space transcriptome data integration Technical Field The invention relates to the technical field of bioinformatics and computational biology, in particular to a multicellular and spatial transcriptome data integration multi-group analysis system. Background With the rapid development of single cell sequencing technology and space transcriptome technology, life science research has entered a new stage capable of simultaneously analyzing cell heterogeneity and its space organization rule. Single cell transcriptome technology provides gene expression profiling at single cell resolution, but loses spatial positional information of the cells in the native tissue. Spatial transcriptome techniques preserve the spatial coordinates of gene expression, but their resolution is typically a "spot" comprising a plurality of cells, and the cell type composition is unknown. Currently, the main methods for integrating the two are mainly classified into two types, namely a label transfer method based on pairing, namely a cell type label "mapping" or "deconvolution" annotated in single cell data onto a space spot, and a low-dimensional space alignment method based on co-embedding. However, these methods have the following limitations: (1) Most methods do hard allocation or simple linear decomposition, and the expression mode and topological structure of the spatial data are not fully utilized as constraints, so that the continuity of the cell state in space is ignored. (2) Analysis flow splitting-integration and downstream analysis (e.g., cell communication, co-expression module recognition) are typically separate steps, and a closed loop is not formed, which may lead to inconsistent information loss and conclusions. (3) The characteristic selection depends on experience, the gene characteristic (anchor point) selection strategy which is dependent on the integration process is single, and is usually only based on a hypervariable gene or a marker gene, and the stronger constraint of the consistency of the relationship between genes in space and single cell dimension cannot be utilized, so the invention provides a multi-group analysis system for integrating single cell and space transcriptome data to solve the problems in the prior art. Disclosure of Invention Aiming at the problems, the invention aims to provide a multi-group analysis system for integrating single cell and space transcriptome data, and the invention realizes high-confidence cell state space mapping by introducing a strategy of aligning dual feature anchoring with a probability map model, and performs integrated space multi-group combination analysis on the basis, so as to finally generate a biological hole capable of realizing interactive verification, thereby solving the problems in the prior art. In order to achieve the aim, the invention is realized by the following technical scheme that the multi-set analysis system for integrating single cell and space transcriptome data comprises: The input processing module is used for receiving single-cell transcriptome data and space transcriptome data, respectively carrying out standardization and cross-platform batch effect correction, and outputting corrected single-cell gene expression data and space gene expression data; The data integration alignment module is used for receiving single-cell gene expression data and space gene expression data, then based on inferred cell type information in the single-cell data, correlating the cell type information to space coordinates of the space gene expression data by utilizing a probability map model, and outputting space cell types to form a probability matrix; The joint analysis module is used for receiving the probability matrix formed by the spatial cell types, then executing intercellular communication inference and multi-group chemical co-expression network analysis of spatial resolution, and outputting analysis results; And the visual output module is used for receiving the analysis result and converting the analysis result into an interactive space visual chart and a structural analysis report. The input processing module comprises a cross-platform correction unit which is used for identifying a housekeeping gene set shared by single cell and space transcriptome data and applying a batch effect correction algorithm based on a linear model to enable gene expression spectrums of two groups of data to be in a comparable scale. The data integration alignment module comprises a dual-feature anchoring unit, a first feature set and a second feature set, wherein the dual-feature anchoring unit is used for executing a first anchoring analysis and a second anchoring analysis to respectively obtain the first feature set and the second feature set, then the first feature set and the second feature set are combined to form a shared feature set, the first anchoring analysis is used for identifying cell type specific differ