Search

CN-121983123-A - Space gene expression map prediction method based on pathological image

CN121983123ACN 121983123 ACN121983123 ACN 121983123ACN-121983123-A

Abstract

The invention provides a space gene expression map prediction method based on a pathological image, which relates to the technical field of biomedicine, and the method comprises the steps of obtaining an original dyeing image by carrying out high-resolution digital scanning on the pathological image; the method comprises the steps of aggregating an original dyeing image and a space expression matrix to obtain a training data set, training a map prediction model by using the training data set to obtain a trained map prediction model, analyzing a space gene expression map of the original dyeing image by using the trained map prediction model to obtain a space expression heat map, and splicing the space expression heat maps to obtain a space gene expression map prediction result. The method solves the problem of low calculation precision in the high-resolution reconstruction, sparse distribution modeling and space consistency optimization in the prior art.

Inventors

  • LI TAIWEN
  • Zhao Guile
  • LU YIWEN
  • CUI HAO
  • DUAN DINGYU
  • LI JING
  • LI CHUNJIE
  • CHEN QIANMING

Assignees

  • 四川大学

Dates

Publication Date
20260505
Application Date
20260122

Claims (10)

  1. 1. A space gene expression map prediction method based on pathological images is characterized by comprising the following steps: S1, performing high-resolution digital scanning on a pathological image to obtain an original dyed image; S2, aggregating the original dyeing image and the space expression matrix to obtain a training data set; s3, training the map prediction model by using a training data set to obtain a trained map prediction model; S4, analyzing a spatial gene expression map of the original dyeing image by using a trained map prediction model to obtain a spatial expression heat map; And S5, performing post-treatment on the spatial expression heat map by using a map structure optimization module, and splicing the treated spatial expression heat map to obtain a spatial gene expression map prediction result, thereby completing prediction of the spatial gene expression map.
  2. 2. The pathological image-based spatial gene expression profile prediction method according to claim 1, wherein S1 comprises: The pathological image is subjected to high-resolution digital scanning and color normalization processing, and is cut into blocks with fixed sizes in a sliding window mode, and the space coordinates of each pathological image in an original image are saved; Based on the blocks of the pathological image, removing the background area through an image edge detection algorithm, and reserving the tissue area to obtain an original dyed image.
  3. 3. The pathological image-based spatial gene expression profile prediction method according to claim 2, wherein the S2 comprises: Aggregating the original dyeing image and the space expression matrix to enable each image block to correspond to one space expression submatrix, and generating a space gene expression diagram corresponding to each block; Based on the space gene expression diagram, the image block, the space position index and the gene meta information file, a training data set is obtained through structuring processing.
  4. 4. The pathological image-based spatial gene expression profile prediction method according to claim 1, wherein the profile prediction model comprises: The image coding module is used for carrying out detail extraction and weighted fusion processing on the training data set to obtain a multi-scale image expression result; the gene embedding module is used for training each gene in the training target gene set to obtain gene vectors; The double-channel decoding module is used for utilizing two parallel channels of the cross attention decoder and the similarity decoder and carrying out self-adaptive weighting in the fusion layer; the cross attention decoding channel expands the image features into a pixel grid level feature sequence to obtain multi-head cross attention output, and then maps the initial response features back to the space grid through up-sampling and space rearrangement operation to obtain an initial response diagram of each gene in the image space; the similarity decoding channel is used for obtaining vectors of image features at each spatial position, defining a trainable weight vector and bias for each gene, calculating expression intensity in a feature similarity form, and splicing the vectors at all positions to obtain a similarity result; And the fusion layer is used for carrying out weighted fusion on the initial response graph and the similarity result to obtain the space expression heat graph.
  5. 5. The pathology image-based spatial gene expression profile prediction method according to claim 4, wherein the image encoding module comprises: The mobile window transformer is used for modeling global space dependence in the training data set image through a local window and a hierarchical self-attention mechanism to obtain a self-attention result; the convolutional neural network is used for carrying out convolution, normalization and residual error connection on the training data set to obtain a local texture extraction result; and the splicing module is used for splicing and weighting fusion of the self-attention result and the local texture extraction result in the channel dimension to obtain a multi-scale image expression result.
  6. 6. The pathological image-based spatial gene expression profile prediction method according to claim 5, wherein the expression of the multi-scale image expression result is: ; ; ; ; Wherein, the Representing the blocks of the pathology image that are input, Representing a real value domain, representing a pixel intensity value or a continuous feature value of a pathological image, H representing a pixel height of the pathological image block in a vertical direction, W representing a pixel width of the pathological image block in a horizontal direction, C representing SwinTransformer a number of channels of an output feature map, H representing a spatial dimension of the feature map in the vertical direction after being downsampled by a SwinTransformer network, W representing a spatial dimension of the feature map in the horizontal direction after being downsampled by a SwinTransformer network, Representing a global spatial feature representation extracted from the input pathology image block by SwinTransformer, Representing a feature extraction network based on SwinTransformer architecture, Representing a local texture feature representation extracted from an input pathological image block by a convolutional neural network, Representing a feature extraction network based on a convolutional neural network architecture, The number of channels representing the output characteristic diagram of the convolutional neural network, Representing a multi-scale image feature expression result obtained after the global space features and the local texture features are fused, Representing a feature fusion function for performing weighted fusion, channel compression and nonlinear mapping on the spliced features, Representing the operation of stitching multiple feature maps in the channel dimension.
  7. 7. The pathological image-based spatial gene expression profile prediction method according to claim 4, wherein the expression of gene embedding is: ; ; the expression of the initial response diagram is: ; ; ; the expression of the similarity result is as follows: ; ; Wherein, the The expression of the gene-embedding matrix is given, Represents the embedding vector corresponding to the kth gene, K represents the number of training target genes, d represents the dimension of the gene embedding vector, Representing a real-valued domain of numbers, Representing a feature matrix resulting from the rearrangement of image features, A characteristic rearrangement operation is represented as such, Representing the image characteristics output by the image coding module, N representing the number of spatial positions, C representing the number of channels of the image characteristics, Representing the query matrix and, The matrix of keys is represented and, A matrix of values is represented and, 、 And Representing a matrix of trainable projections, Representing the initial response characteristics obtained by the cross-attention mechanism, The normalization function is represented as a function of the normalization, Representing the number of characteristic channels calculated by the cross-attention mechanism, Representing the predicted expression value of the kth gene at the spatial position p obtained through the similarity decoding channel, Representing a non-linear activation function, Representing a trainable weight vector corresponding to the kth gene, Representing the image feature vector at the spatial position p, The term of the bias is indicated, And h represents the resolution of the spatial expression diagram in the vertical direction, and w represents the resolution of the spatial expression diagram in the horizontal direction.
  8. 8. The pathological image-based spatial gene expression profile prediction method according to claim 4, wherein the expression of the spatial expression heat map is: ; ; Wherein, the Representing the adaptive fusion weight corresponding to the kth gene, Representing a non-linear activation function, The embedded vector representing the kth gene, The global average pooling result representing the image features, Representing the weight vector to be trained, Representing the bias parameters that may be trained, Represents the final predicted expression value of the kth gene at spatial position p, Representing predicted expression values resulting from the cross-attention decoding pass, Representing the predicted expression values resulting from the similarity decoding channel.
  9. 9. The pathological image-based spatial gene expression profile prediction method according to claim 1, wherein the expression of the processed spatial expression heat map is: ; ; ; Wherein, the A structure of the space-diagram is shown, A set of nodes of a graph is represented, A set of graph edges is represented, The drawing is represented by a laplace matrix, The matrix of degrees of representation, Representing the adjacency matrix, Representing the spatial expression vector of the kth gene after the smoothing of the graph structure, Representing the identity matrix of the cell, Representing the coefficient of smoothing and the coefficient of smoothing, Representing the initial spatial expression vector corresponding to the kth gene.
  10. 10. The pathological image-based spatial gene expression profile prediction method according to claim 1, wherein the expression of the loss function of the profile prediction model is: ; ; ; ; ; Wherein, the Representing the overall loss function of the atlas prediction model, 、 、 And Representing the non-negative weight coefficient of the model, Representing a zero-expansion negative binomial distribution loss term, Representing a Pearson correlation loss term, Representing a Spearman rank dependent loss term, Representing the total variation regular loss term, The number of spatial positions is represented by the number of spatial positions, Represents the number of training target genes, i represents the spatial position index, g represents the g-th gene, Representing the probability that a gene expression value is observed given the model parameters under a zero-expansion negative binomial distribution model, Representing the true expression value of the g-th gene at the i-th spatial position, The mean parameter representing the model prediction is calculated, Represents the corresponding dispersion parameter of the g gene, Represents the zero expansion probability corresponding to the g gene, Representing the Pearson correlation coefficient between the predicted expression and the true expression of the g-th gene, Representing the Spearman rank correlation coefficient between the g-th gene predictive expression and the true expression, S representing the scale index, S representing the multiscale set, Representing the smoothed weights at the scale s, Representing the spatial coordinate index of the spatial representation in the horizontal direction, Representing the spatial coordinate index of the spatial representation in the vertical direction, And (3) representing a spatial prediction expression diagram corresponding to the g gene under the scale s.

Description

Space gene expression map prediction method based on pathological image Technical Field The specification relates to the technical field of biomedicine, in particular to a space gene expression map prediction method based on pathological images. Background In recent years, the rapid development of spatial transcriptome sequencing (Spatial Transcriptomics, ST) technology has enabled researchers to simultaneously observe gene expression and tissue morphology in two dimensions of tissue sections, thereby revealing spatial links between tumor heterogeneity, immune microenvironment and pathological structures. The technology provides a new means for the research of spatial molecular mechanisms of diseases, particularly cancers. However, existing ST platforms (e.g., 10xGenomicsVisium, slide-seq, stereo-seq, etc.) still have significant limitations in clinical and large-scale sample studies, including limited spatial resolution, high sequencing costs, long experimental period, and limited data throughput, and difficulty in covering complex biological tissues across heterogeneous areas such as solid tumors. Therefore, achieving high resolution spatial molecular map reconstruction under various tissue scenes still faces a technical bottleneck. With the development of digital pathology and image analysis technology, the inference of spatial molecular information of tissues by means of conventional hematoxylin-eosin (Hematoxylinand Eosin, H & E) staining pathology images is a new research direction. The H & E image is easy to obtain, low in cost and rich in information, and provides possibility for predicting spatial gene expression distribution under the condition of no sequencing. However, most of the existing deep learning-based image-transcriptome mapping methods are built on conventional resolution data (such as 55 μm spot of Visium platforms) or limited cancer species, and lack special high-resolution inference models for the features of highly heterogeneous tissues (such as head and neck cancer, breast cancer, pancreatic cancer, etc.). The tissue heterogeneity is remarkable, the microenvironment is complex, and the requirements on spatial resolution and prediction accuracy are extremely high, so that a mapping framework with stronger universality is established by combining high-resolution (such as 2-micron-scale) spatial transcriptome data so as to realize finer cell-scale spatial expression reconstruction. Disclosure of Invention Aiming at the defects in the prior art, the space gene expression map prediction method based on the pathological image solves the problem that the calculation accuracy is not high in the aspects of high-resolution reconstruction, sparse distribution modeling and space consistency optimization in the prior art. In order to achieve the aim of the invention, the technical scheme adopted by the invention is that the space gene expression map prediction method based on pathological images comprises the following steps: S1, performing high-resolution digital scanning on a pathological image to obtain an original dyed image; S2, aggregating the original dyeing image and the space expression matrix to obtain a training data set; s3, training the map prediction model by using a training data set to obtain a trained map prediction model; S4, analyzing a spatial gene expression map of the original dyeing image by using a trained map prediction model to obtain a spatial expression heat map; And S5, performing post-treatment on the spatial expression heat map by using a map structure optimization module, and splicing the treated spatial expression heat map to obtain a spatial gene expression map prediction result, thereby completing prediction of the spatial gene expression map. The space gene expression map prediction method based on the pathological image has the beneficial effects that the global structural features are extracted by using a moving window transformer network, and local texture information is extracted by combining a convolutional neural network, so that the multi-layer fusion of the global and local features is realized. Further, the step S1 includes: The pathological image is subjected to high-resolution digital scanning and color normalization processing, and is cut into blocks with fixed sizes in a sliding window mode, and the space coordinates of each pathological image in an original image are saved; Based on the blocks of the pathological image, removing the background area through an image edge detection algorithm, and reserving the tissue area to obtain an original dyed image. The method supports the input and output of H & E images based on high resolution (such as 2 mu m level), breaks through the limitation of the traditional spot level method, and realizes the reconstruction of the cell level space expression diagram. Further, the step S2 includes: Aggregating the original dyeing image and the space expression matrix to enable each image block to correspond to one