CN-121983111-A - Spatial resolution single cell state modeling method and system based on field self-adaption and layered fine tuning

CN121983111ACN 121983111 ACN121983111 ACN 121983111ACN-121983111-A

Abstract

The invention relates to a space analysis single-cell state modeling method and a system based on field self-adaption and layering fine tuning, wherein the method comprises the steps of obtaining multiple immunofluorescence images and single-cell segmentation masks and constructing a label-free data set; the method comprises the steps of constructing a mask self-encoder consisting of a ViT encoder and a linear decoder, adding a classification token in front of an image, performing field self-adaptive training on the mask self-encoder based on a label-free data set, learning the classification token and obtaining field self-adaptive weights of the ViT encoder, acquiring a labeled data set, constructing a state embedding generation model comprising a shared ViT backbone network and a two-stage classifier, adding the classification token in front of an image feature sequence in the labeled data set, performing layered training on the state embedding generation model and learning the classification token, inputting a cell image block into the trained state embedding generation model, outputting a classification result and cell state embedding, and performing interpretability analysis. Compared with the prior art, the invention can realize accurate cell classification and generate the cell state representation with biological interpretability.

Inventors

LV HUI
ZHANG YUJIA
ZHU YICHENG
KONG YAN
Zhong Bingxu

Assignees

上海交通大学

Dates

Publication Date: 20260505
Application Date: 20251229

Claims (10)

1. The spatial analysis single-cell state modeling method based on field self-adaption and layered fine tuning is characterized by comprising the following steps of: Acquiring multiple immunofluorescence images and corresponding single-cell segmentation masks generated by a cell segmentation algorithm, and performing image preprocessing based on the single-cell segmentation masks to construct a label-free data set; Constructing a mask self-encoder, wherein the mask self-encoder consists of a ViT encoder and a linear decoder, wherein the ViT encoder is used as a backbone network of a final state embedded generation model, classification tokens are added in front of an image feature sequence in an unlabeled dataset, the mask self-encoder is subjected to field self-adaptive training based on the unlabeled dataset, and the classification tokens are learned in the training process to obtain field self-adaptive weights of the ViT encoder; obtaining all available marked data sets, and performing duplication elimination after complete combination to obtain a labeled data set; Constructing a state embedding generation model, wherein the state embedding generation model comprises a shared ViT main network and a two-stage classifier, the shared ViT main network loads the domain self-adaptive weight of a ViT encoder obtained by domain self-adaptive training, and the two-stage classifier comprises a coarse classifier and a fine classifier; Adding classification tokens before an image feature sequence in a labeled data set, and carrying out layered training on a state embedding generation model based on the labeled data set, wherein the classification tokens are learned in the training process; inputting any cell image block into a trained state embedding generating model, outputting a classification result based on a two-stage classifier, obtaining cell state embedding based on an output vector of a classification token, and performing interpretability analysis.
2. The method for modeling spatially resolved single cell states based on domain adaptation and hierarchical fine tuning of claim 1 wherein the preprocessing is specifically traversing all available multiple immunofluorescence images, extracting each individual cell as a standard-sized, multi-channel image block centered on its centroid according to a cell segmentation mask, and storing each image block as an individual binary file.
3. The method for modeling a spatially resolved single cell state based on domain adaptation and hierarchical fine tuning according to claim 1, wherein the performing domain adaptation training on the mask self-encoder based on the unlabeled dataset is specifically that after each training batch, a cell image block in the unlabeled training set is randomly covered, the cell image block is input to a ViT encoder, the ViT encoder outputs a feature representation of the image block in a visible region, the original pixels of the covered region are reconstructed after being processed by a linear decoder, and the mask self-encoder is trained based on a mean square error between a reconstruction result and an original input image.
4. The method for modeling spatially resolved single cell states based on domain adaptation and hierarchical fine tuning according to claim 1, wherein the hierarchical training is specifically: thawing ViT parameters of the last several Transformer Block and the final normalization layer of the backbone network, and freezing the remaining parameters; Setting a preset small learning rate for the thawed ViT backbone network layer, and setting a preset large learning rate for the newly initialized two-stage classifier; training ViT a backbone network and a coarse classifier based on the labeled dataset such that the coarse classifier is capable of distinguishing epithelial cells, immune cells, and other cells; After the coarse classifier training is completed, the updated ViT backbone network weights are maintained, and the ViT backbone network and the fine classifier are trained based on the labeled dataset.
5. The method for modeling spatially resolved single cell states based on domain adaptation and hierarchical fine tuning according to claim 1, wherein the interpretability analysis specifically comprises: attention map visualization extracting ViT the attention map of the last encoder block of the backbone network, highlighting the subcellular regions of interest in the state embedding generation model in the form of a thermodynamic map for decision making; And (3) feature space visualization, namely extracting classification tokens generated by the state embedding generation model for all sub-areas in the cell image block, analyzing and reducing the dimension through the principal component, mapping the classification tokens to an RGB color space, and generating a semantic segmentation map visually reflecting the understanding of the model to the subcellular structure.
6. A spatially resolved single cell state modeling system based on domain adaptation and hierarchical fine tuning, comprising: The data acquisition module acquires multiple immunofluorescence images and corresponding single-cell segmentation masks generated by a cell segmentation algorithm; The image preprocessing module is used for preprocessing the image based on the single cell segmentation mask to construct a label-free data set; The data set construction module is used for acquiring all available marked data sets, and performing duplication removal after complete combination to obtain a labeled data set; the model training module is used for executing field self-adaptive training and layering training, wherein, Constructing a mask self-encoder, wherein the mask self-encoder consists of a ViT encoder and a linear decoder, the ViT encoder is used as a backbone network of a final state embedded generation model, classifying tokens are added before an image feature sequence in a label-free dataset, the mask self-encoder is subjected to field self-adaptive training based on the label-free dataset, and the classifying tokens are learned in the training process to obtain field self-adaptive weights of a ViT encoder; The hierarchical training comprises the steps of constructing a state embedding generation model, comprising a shared ViT trunk network and a two-stage classifier, wherein the shared ViT trunk network loads the domain self-adaptive weight of a ViT encoder obtained by domain self-adaptive training, and the two-stage classifier comprises a coarse classifier and a fine classifier; the embedding generation module inputs any cell image block into the trained state embedding generation model, outputs a classification result based on the two-stage classifier, and obtains cell state embedding based on an output vector of the classification token; and the interpretability analysis module is used for performing interpretability analysis based on the output of the embedding generation module.
7. The spatially resolved single-cell state modeling system as defined in claim 6, wherein the preprocessing is performed by traversing all available multiple immunofluorescence images, extracting each individual cell as a standard-sized, multi-channel image block centered on its centroid according to a cell segmentation mask, and storing each image block as an individual binary file.
8. The system for modeling spatially resolved single cell states based on domain adaptation and hierarchical fine tuning of claim 6 wherein the domain adaptation training of the masked self-encoder based on the unlabeled dataset is performed by randomly masking cell image blocks in an unlabeled training set for each training batch, inputting ViT the cell image blocks to the encoder, outputting a feature representation of the visible region image blocks by the ViT encoder, processing the feature representation by a linear decoder, reconstructing original pixels of the masked region, and training the masked self-encoder based on a mean square error between the reconstructed result and the original input image.
9. The spatially resolved single cell state modeling system based on domain adaptation and hierarchical fine tuning of claim 6, wherein the hierarchical training is specifically: thawing ViT parameters of the last several Transformer Block and the final normalization layer of the backbone network, and freezing the remaining parameters; Setting a preset small learning rate for the thawed ViT backbone network layer, and setting a preset large learning rate for the newly initialized two-stage classifier; training ViT a backbone network and a coarse classifier based on the labeled dataset such that the coarse classifier is capable of distinguishing epithelial cells, immune cells, and other cells; After the coarse classifier training is completed, the updated ViT backbone network weights are maintained, and the ViT backbone network and the fine classifier are trained based on the labeled dataset.
10. The spatially resolved single cell state modeling system based on domain adaptation and hierarchical fine tuning of claim 6, wherein the interpretability analysis specifically comprises: attention map visualization extracting ViT the attention map of the last encoder block of the backbone network, highlighting the subcellular regions of interest in the state embedding generation model in the form of a thermodynamic map for decision making; And (3) feature space visualization, namely extracting classification tokens generated by the state embedding generation model for all sub-areas in the cell image block, analyzing and reducing the dimension through the principal component, mapping the classification tokens to an RGB color space, and generating a semantic segmentation map visually reflecting the understanding of the model to the subcellular structure.

Description

Spatial resolution single cell state modeling method and system based on field self-adaption and layered fine tuning Technical Field The invention relates to the field of biological image analysis, in particular to a spatially resolved single-cell state modeling method and system based on field self-adaption and layered fine tuning. Background Cells are the fundamental unit of life, and their "state" is a complex concept defined by the expression of proteins, subcellular localization, and the microenvironment in which they are located, which determines the function, development, and disease progression of a tissue. Traditional methods simplify this complexity by assigning discrete, static tags (e.g., "T cells", "tumor cells") to cells. However, this simplification ignores two core characteristics of the cell state, continuity (e.g., T cells from resting to activated) and situational dependence (the state of one cell is deeply affected by its neighboring cells). Chinese patent CN118800318a discloses a modeling method, a device, an electronic device and a storage medium based on a cell state, and adopts a modeling method based on a single cell state which fuses high-low dimension double paths, by obtaining a cell state matrix, calculating RNA rates of a first dimension and a second dimension, and by processing by a mutual learning module, obtaining a transition probability matrix, and constructing a model for calculating RNA rates by double paths. The method is completely dependent on single-cell histology data to carry out mathematical calculation and model derivation, and the generated cell state is an abstract mathematical vector. The process is decoupled from the original spatial location of cells in tissue, morphological structure and protein expression in situ information, resulting in a modeled "cellular state" that cannot be directly correlated with an intuitive biological phenotype, and thus lacks real, visual evidence-derived biological interpretability. In recent years, advances in spatial histology techniques such as multiplex immunofluorescence (mIF) have enabled simultaneous measurement of multiple protein markers with unprecedented resolution while preserving tissue spatial information. However, how to integrate these high-dimensional image data into a unified model that accurately and quantitatively describes each cell's continuous and context-dependent state is a core challenge facing the current field. In particular, the prior art has the following drawbacks: 1. Discretization and statization of cell states existing methods mostly categorize cells into predefined, discrete "categories," an excessive simplification of biological reality. Such "snap-shot" tags fail to capture intermediate and continuous transitions of cells during differentiation, activation or malignant transformation, thereby losing a significant amount of dynamic information. 2. The model generalization ability is poor (Domain Shift problem), namely, when a deep learning model pre-trained on natural images is directly applied to mIF data of a specific laboratory or a specific cancer species, the model performance is rapidly reduced due to Domain Shift (Domain Shift) caused by differences of dyeing process, scanner model, tissue source and the like, and generalization cannot be effectively performed. 3. The recognition capability for rare cell categories is insufficient, and the number of different cell subsets is greatly different in a complex tumor microenvironment. Traditional "flattened" single classification models tend to be dominated by the majority of cell classes when trained, resulting in poor recognition of a subset of cells (e.g., regulatory T cells) that are rare but biologically significant. 4. Lack of interpretability and biological insight most deep learning models look like "black boxes" and the decision process is not transparent. The difficulty in the prior art in revealing models is based on which key biological features (such as the localization of specific proteins in the cell membrane or nucleus) are judged, limiting the value of models in aiding in scientific discovery. Disclosure of Invention The invention aims to provide a spatial analytic single cell state modeling method and a spatial analytic single cell state modeling system based on field self-adaption and layered fine tuning, which can not only realize accurate cell classification, but also learn continuous state representation with biological interpretability of each single cell and integrated with multi-channel protein expression information through a two-stage training strategy of field self-adaption training and layered training. The aim of the invention can be achieved by the following technical scheme: A spatially resolved single-cell state modeling method based on field adaptation and layered fine tuning comprises the following steps: Acquiring multiple immunofluorescence images and corresponding single-cell segmentation masks generated