CN-121999273-A - Interpretation method and interpretation device for full-field slice image

CN121999273ACN 121999273 ACN121999273 ACN 121999273ACN-121999273-A

Abstract

The application discloses an interpretation method of a full-view slice image, which comprises the steps of reasoning a full-view slice image through a trained neural network model to obtain first classification information used for representing a global classification result of the full-view slice image and second classification information used for representing classification results of all regions of interest in the full-view slice image, wherein the neural network model comprises a lesion cell identification model, a splicing layer, a first branch model and a second branch model, and the interpretation result of the full-view slice image is obtained by fusing the first classification information and the second classification information output by the trained neural network model. The embodiment of the application is beneficial to improving the accuracy of the hierarchical interpretation of the full graph.

Inventors

YANG ZHIMING
Request for anonymity

Assignees

深思考人工智能机器人科技（北京）有限公司

Dates

Publication Date: 20260508
Application Date: 20251223

Claims (10)

1. A method for interpreting full-view slice images is characterized in that the full-view slice images are inferred through a trained neural network model, Wherein, the The neural network model includes: A lesion cell recognition model for recognizing a lesion cell based on the full-field slice image, The method is used for splicing the characteristics of the region of interest of the lesion cells in each block of the full-view slice image output by the lesion cell identification model to obtain a spliced layer of the characteristic sequence of the region of interest, A first branch model for obtaining context information of a region of interest feature sequence from the stitching layer for predicting first classification information of the full-view slice image, wherein the first classification information is used for characterizing a global classification result of the full-view slice image, The second branch model is used for carrying out multi-instance learning on the feature sequences of the regions of interest from the splicing layer so as to predict second classification information of the full-view slice image, wherein the second classification information is used for representing global classification results aggregated by the classification results of all the regions of interest in the full-view slice image; and fusing the first classification information and the second classification information output by the trained neural network model to obtain the interpretation result of the full-view slice image.
2. The interpretation method of claim 1, wherein the length of the region of interest feature sequence depends on the number of regions of interest of diseased cells of the full-field slice image identified by the cell identification model, the lengths of the region of interest feature sequences of different full-field slice images being different; The multi-instance learning takes a region of interest feature sequence as an instance package and takes each region of interest feature in the region of interest feature sequence as an instance for learning.
3. The interpretation method of claim 1, wherein the first branch model comprises a multi-instance model for predicting the second classification information by multi-instance learning, and wherein the first branch model comprises a first classification information conversion model for predicting the first classification information based on context information of the region of interest feature sequence; The characteristics of the interested region of the lesion cells in each block are extracted in the following way: Extracting multi-level features of each block of the full-view slice image through a feature pyramid network in the lesion cell identification model, Acquiring candidate frames of the lesion cells in each block based on the extracted multi-level features of all the blocks through a region proposal network in the lesion cell identification model, Performing region of interest feature alignment operation or pooling operation on the multi-level features in each candidate frame to obtain region of interest feature blocks of each candidate frame, And splicing the region-of-interest feature blocks of each candidate frame into region-of-interest features through a splicing layer in the lesion cell identification model, wherein the region-of-interest features are used for representing the region-of-interest feature vectors of the lesion cells in the candidate frames.
4. An interpretation method as claimed in claim 3, characterized in that the first branch model further comprises a pooling layer for reducing the feature dimension of the region of interest feature sequence from the stitching layer, the pooling layer pooling the region of interest feature sequence and inputting it to the transformation model.
5. The interpretation method of claim 3 further comprising a normalization layer in each layer encoder of the transformation model for normalizing attention weights from a self-attention layer in the layer encoder, wherein the attention weights are used to characterize correlations between features in the sequence of region of interest features.
6. The interpretation method of claim 5, characterized in that the neural network model is trained in the following manner: Training the lesion cell recognition model by using the sample full-view slice image to obtain a trained lesion cell recognition model, Obtaining a sample characteristic sequence for training a branch model by using the trained lesion cell identification model, Respectively inputting the sample characteristic sequences into a first branch model and a second branch model to respectively obtain first sample classification information and second sample classification information, Aggregating the second sample classification information to obtain third sample classification information, wherein the third sample classification information is used for representing example packet classification information, Calculating the total loss function value of the branch model according to the first sample classification information, the second sample classification information and the third sample classification information, Model parameters of the branch model are adjusted according to the total loss function value until training is finished, Wherein, the The total loss function value includes: A consistency loss function value for characterizing consistency of the first sample classification information and the third sample classification information, A first global loss function value for characterizing a difference between the first sample classification information and the first expected classification information, A region of interest loss function value for characterizing a difference between the second sample classification information and the second desired classification information, and A second global loss function value characterizing a difference between the third sample classification information and the third expected classification information.
7. The interpretation method of claim 6, wherein the inputting the sample feature sequence into the first branch model and the second branch model, respectively, to obtain the first sample classification information and the second sample classification information, respectively, further comprises: constructing fourth sample classification information based on the first sample classification information, wherein the fourth sample classification information is used for representing global yin-yang classification information, Constructing fifth sample classification information based on the second sample classification information, wherein the fifth sample classification information is used for representing yin-yang classification information of the instance package level; the total loss function value further includes: a classification loss function value for characterizing a difference between the fourth sample classification information and the first desired classification information, and A classification loss function value for characterizing a difference between the fifth sample classification information and the second desired second classification information.
8. The method according to claim 6, wherein the obtaining a sample feature sequence for training a branch model using the trained lesion cell recognition model comprises: Inputting the sample full-view slice image into a trained pathological cell identification model, obtaining the characteristics of the interested areas of pathological cells in each block of the sample full-view slice image through reasoning of the trained pathological cell identification model, splicing the characteristics of the interested areas of the pathological cells in each block of the sample full-view slice image to obtain the characteristic sequence of the interested areas of the sample full-view slice image, For the sample region of interest feature sequence of any sample full-view slice image, adding a mask feature to the sample region of interest feature sequence under the condition that the length of the sample region of interest feature sequence is smaller than the upper sample length limit, so that the length of the sample region of interest feature sequence reaches the upper sample length limit to obtain the sample feature sequence of the sample full-view slice image, Wherein, the The dimensions of the mask features are the same as the dimensions of the region of interest features in the sample region of interest feature sequence.
9. The interpretation method of claim 5, characterized in that the normalization layer performs the following operation on attention weights from a self-attention layer in the encoder during training: In case the length of the sample region of interest feature sequence is less than or equal to the upper sample length limit, the attention weight between the i-th region of interest feature in the sample region of interest feature sequence and the j-th region of interest feature in the sample region of interest feature sequence is the ratio of the exponential attention score of the i-th region of interest feature relative to the j-th region of interest feature, wherein i, j are natural numbers, the exponential attention score is determined from the attention weight of the i-th region of interest feature relative to the j-th region of interest feature from the self-attention layer of the encoder, In the case that the length of the sample region of interest feature sequence is greater than the sample length upper limit, the attention weight between the i-th region of interest feature in the sample region of interest feature sequence and the j-th region of interest feature in the sample region of interest feature sequence is a value of 0.
10. An interpretation apparatus for full-field slice images, characterized in that the interpretation apparatus comprises: The reasoning module is used for reasoning the full-view slice image through the trained neural network model to obtain first classification information used for representing the global classification result of the full-view slice image and second classification information used for representing the global classification result aggregated by the classification result of each region of interest in the full-view slice image, Wherein, the The neural network model includes: A lesion cell recognition model for recognizing a lesion cell based on the full-field slice image, The method is used for splicing the characteristics of the region of interest of the lesion cells in each block of the full-view slice image output by the lesion cell identification model to obtain a spliced layer of the characteristic sequence of the region of interest, A first branch model for obtaining context information of a region of interest feature sequence from the stitching layer to predict first classification information of the full field slice image, A second branch model for predicting second classification information of the full-view slice image by multi-instance learning the region-of-interest feature sequence from the stitching layer, And fusing the first classification information and the second classification information output by the trained neural network model to obtain the interpretation result of the full-view slice image.

Description

Interpretation method and interpretation device for full-field slice image Technical Field The application relates to the field of biological image recognition, in particular to an interpretation method of a full-field slice image. Background Cytological examination is an important diagnostic technique that is widely used in clinical pathology. The technology realizes the identification of early tumors, infectious lesions and inflammatory diseases by morphological observation and interpretation of body fluid or exfoliated cells of patients. Common cytologic types include cervical cytology, urinary abscission cytology, airway cytology, pleural effusion cytology, fine needle puncture cytology, and the like. The basic principle is that a slide specimen is prepared by fixing and staining body fluid or exfoliated cells of a patient, a pathologist or cytology specialist observes the nuclear mass proportion, morphological structure and chromatin characteristics of the cells under a microscope, and the differential diagnosis is carried out according to a corresponding interpretation system such as a TBS system (The Bethesda System,), a Paris system and the like. The traditional manual interpretation method relies on the visual observation under a microscope, has the problems of strong subjectivity, low efficiency, poor consistency among observers and the like, and is particularly easy to generate false negative or false positive results in high-throughput screening. With the development of digital pathology and artificial intelligence in recent years, slide samples can be converted into full-view slice images (WSI) of high-resolution digital slices, and automatic detection and classification are realized by using a deep learning model, so that a doctor is assisted in finishing preliminary screening and classification. Current mainstream methods infer full graph classification based on Convolutional Neural Network (CNN) or multi-instance learning (MIL) frameworks, voting, statistics, or aggregation of region of interest (ROI) features in local regions. This way of interpretation results in a low accuracy of the interpretation result. Disclosure of Invention The embodiment of the application provides an interpretation method of full-view slice images, which is used for improving the accuracy of full-view hierarchical interpretation results. The first aspect of the embodiment of the application provides a method for interpreting a full-view slice image, which infers the full-view slice image through a trained neural network model, Wherein, the The neural network model includes: A lesion cell recognition model for recognizing a lesion cell based on the full-field slice image, The method is used for splicing the characteristics of the region of interest of the lesion cells in each block of the full-view slice image output by the lesion cell identification model to obtain a spliced layer of the characteristic sequence of the region of interest, A first branch model for obtaining context information of a region of interest feature sequence from the stitching layer for predicting first classification information of the full-view slice image, wherein the first classification information is used for characterizing a global classification result of the full-view slice image, The second branch model is used for carrying out multi-instance learning on the feature sequences of the regions of interest from the splicing layer so as to predict second classification information of the full-view slice image, wherein the second classification information is used for representing global classification results aggregated by the classification results of all the regions of interest in the full-view slice image; and fusing the first classification information and the second classification information output by the trained neural network model to obtain the interpretation result of the full-view slice image. As a possible implementation manner, the length of the region-of-interest feature sequence depends on the number of regions of interest of lesion cells of the full-field slice image identified by the cell identification model, and the lengths of the region-of-interest feature sequences of different full-field slice images are different; As a possible implementation manner, the multi-instance learning takes a region of interest feature sequence as an instance package and takes each region of interest feature in the region of interest feature sequence as an instance. As a possible implementation manner, the first branch model includes: predicting a first classification information transformation model based on context information of a region of interest feature sequence, the second classification model comprising a multi-instance model for predicting second classification information by multi-instance learning; The characteristics of the interested region of the lesion cells in each block are extracted in the following way: Extracting mu