CN-121725232-B - Method and system for extracting region of interest of lung disease pathology full-slice image

CN121725232BCN 121725232 BCN121725232 BCN 121725232BCN-121725232-B

Abstract

The invention discloses a method and a system for extracting an interested region of a lung disease pathology full-slice image, relates to extraction of the interested region, and aims to solve the technical problem of low accuracy of the extraction of the interested region in the prior art. Comprises obtaining lung image data, calculating semantic similarity score and visual characterization vector of each block, obtaining optimal threshold according to semantic similarity score, and screening index set For index set Visual representation vector of middle block Clustering, and selecting the block with the highest semantic similarity score in each cluster as a representative block to obtain an index set Based on index set Extracting feature matrix of each representative block And tracing back to the space coordinate positions of the representative blocks in the corresponding lung image data through indexes to obtain the region of interest. And carrying out self-adaptive threshold filtering and identification according to the semantic similarity score, and reserving heterogeneous distribution of different wettability characteristics in the slice by the selected region of interest.

Inventors

WANG CHENGDI
SHAO JUN
ZHANG ZHIHAN
LUO GUOFENG
SHI FENG
LI YUAN
He Yichu
SUN TIANYANG

Assignees

四川大学华西医院
华西精准医学产业创新中心有限公司

Dates

Publication Date: 20260508
Application Date: 20260226

Claims (7)

1. The method for extracting the region of interest of the lung disease pathology full-slice image is characterized by comprising the following steps of: step 1, acquiring lung image data; Acquiring lung image data, and generating a corresponding semantic similarity score according to an ith block in the lung image data Visual representation vector ; Generating semantic similarity scores In this case, a semantic similarity score between each block in the lung slice and the input prompt is generated using the CONCH, PLIP, or UNI pathology visual base model ; Step2, screening an index set; Scoring according to semantic similarity Obtaining an optimal threshold by adopting a self-adaptive threshold filtering method, a percentile threshold method, a bimodal detection method or an entropy-based threshold selection method, and screening an index set according to the optimal threshold ; Step3, clustering the feature space; for index set Visual representation vector of middle block Clustering to obtain cluster attribution labels of each block Member index set of each cluster , ; Step 4, extracting the region of interest; Selecting the block with the highest semantic similarity score in each cluster as a representative block to obtain an index set Based on index set Extracting feature matrix of each representative block The method comprises the steps of indexing and backtracking to the space coordinate positions of representative blocks in corresponding lung image data to obtain an interested region; Wherein, the Representing the semantic similarity score for the i-th block, A visual representation vector representing the i-th block, Indicating the cluster number to which the i-th block belongs, Represent the first The number of the clusters, The number of cluster numbers is indicated, The representation belonging to an index set And the cluster number is Is a set of blocks of (a) a block of (b).
2. The method for extracting a region of interest from a pathological whole-section image of a pulmonary disease according to claim 1, wherein in step 2, when an optimal threshold is obtained by an adaptive threshold filtering method, the adaptive threshold filtering method comprises the following specific steps: Step 2-1, constructing a histogram; scoring semantic similarity The score range of (2) is equally divided into 256 sections, and the frequency of each section is counted to form a normalized histogram , And (2) and ; Step 2-2, calculating an accumulation amount; for each possible threshold t, the cumulative weight of classes below the threshold is calculated separately Cumulative mean of classes below threshold All mean value ; Step 2-3, calculating the inter-class variance; cumulative weight based on classes below a threshold Cumulative mean of classes below threshold All mean value Calculating inter-class variance ; Step 2-4, determining an optimal threshold value; According to the inter-class variance Determining an optimal threshold ; Step 2-5, adding constraint; For the optimal threshold value Adding a lower limit constraint based on the mean value and the standard deviation to obtain a final threshold value ; Step 2-6, screening an index set; screening out a block i meeting the condition, wherein the screening condition is as follows: index constitution set satisfying all conditions ; Wherein, the Representing the semantic similarity score for the i-th block.
3. The method for extracting a region of interest from a full-slice image of a pathology of a pulmonary disease according to claim 2, wherein in step 2-2, the cumulative weight of the class below the threshold is calculated The calculation formula of (2) is as follows: ; cumulative mean of classes below threshold The calculation formula of (2) is as follows: ; all mean value The calculation formula of (2) is as follows: ; in step 2-3, the inter-class variance The calculation formula of (2) is as follows: ; In step 2-4, the optimal threshold value The calculation formula of (2) is as follows: ; In step 2-5, the final threshold value The calculation formula of (2) is as follows: ; Wherein, the Representing a normalized histogram of the nth bin, The threshold value is indicated and the threshold value, Representing the mean of the similarity scores for all M blocks, Representing the standard deviation of the similarity scores of all M blocks.
4. The method for extracting the region of interest of the lung disease pathology full-section image according to claim 1, wherein in the step 3, the specific steps of feature space clustering are as follows: Step 3-1, normalizing a feature matrix; Gathering indices Visual representation vector of corresponding block in (b) Combining the characteristic matrixes, performing line-by-line L2 normalization processing on the characteristic matrixes, and projecting normalized characteristic vectors to the unit hypersphere; Step 3-2, configuring cluster parameters; Configuring a cluster parameter including the number of blocks in a cluster Initialization policy, distance metric, batch size, maximum number of iterations, convergence tolerance; step 3-3, iterative optimization; Executing an iteration process by adopting MiniBatch K-Means algorithm until convergence or maximum iteration times are reached, and obtaining a final centroid ; Step 3-4, global cluster allocation; Using the final centroid Performing cluster allocation on all M samples in the candidate set, and outputting an index set Cluster attribution label for each block in a plurality of blocks Member index set of each cluster ; Wherein, the Represent the first The centroid vector of the individual clusters, Representing the coordinate indexes of all members included in the kth cluster.
5. The method for extracting a region of interest from a pathological whole-section image of a pulmonary disease according to claim 4, wherein in step 3-1, the calculation formula of the L2 normalization process is: ; projecting the normalized feature vectors to the unit hypersphere, wherein a monotonic mapping relation exists between the square of the Euclidean distance between any two normalized feature vectors and the cosine similarity of the two normalized feature vectors, namely: ; In step 3-2, the clustering parameters are configured as follows: Number of clusters Setting a designated target sampling number N; initializing a strategy, namely K-means++; distance measure, euclidean distance; Batch size min (1024, M); The maximum iteration number is 300; convergence tolerance 10 -4 ; in the step 3-3, the specific steps of iterative optimization are as follows: Step 3-3-1, initializing; K initial centroids are selected by using K-means++ strategy ; Step 3-3-2, randomly sampling in batches; From an index set Randomly extracting a subset with the size q from the corresponding feature vector set ; Step 3-3-3, cluster allocation; for each sample in the batch , Sample is taken Assigned to the cluster corresponding to the closest centroid: ; Step 3-3-4, updating mass center increment; For each cluster Updating a centroid based on samples in the current lot assigned to the cluster; ; Step 3-3-5, convergence judgment; If the displacement norms of the centroids are not smaller than the convergence tolerance and the maximum iteration number is not reached, returning to the step 3-3-1 and reselecting K initial centroids, and if the displacement norms of all centroids are smaller than the convergence tolerance or the maximum iteration number is reached, terminating the iteration to obtain the final centroids ; In the step 3-4, the specific method for global cluster allocation is as follows: Using the final centroid Performing cluster allocation on all M samples in the candidate set to obtain cluster labels The calculation formula of cluster allocation is as follows: ; After the cluster allocation is finished, outputting an index set Cluster attribution label for each block in a plurality of blocks Member index set of each cluster ; Wherein, the A visual representation vector representing the i-th block, A visual feature vector representing the normalized i-th block, A visual feature vector representing the normalized j-th block, Representing the initial centroid of the kth cluster, Representing the centroid of the kth cluster, Representing the set of samples in the current lot assigned to cluster k, Representing the learning rate associated with the number of historical samples of the cluster, Represents the centroid of the kth cluster at the qth iteration, Representing the centroid of the kth cluster at the q+1th iteration.
6. The method for extracting a region of interest from a pathological whole-section image of pulmonary disease according to claim 1, wherein step 4 is to number each cluster at the time of extracting the region of interest The method comprises the following specific steps of: step 4-1, acquiring a cluster member index set; Acquiring a member index set of the cluster: ; Step 4-2, empty cluster judgment; If it is Skipping the cluster without generating a representation, if the set If the cluster is a non-empty cluster, executing the step 4-3; step 4-3, extremum retrieval; Performing extremum retrieval within non-empty clusters: ; step 4-4, generating a region of interest; Will be Adding the final representative index set: based on index set Extracting feature matrix of representative block The space coordinate positions of the representative blocks in the original lung slice are traced back through indexes to obtain an interested region; Wherein, the Indicating the cluster number to which the i-th block belongs, Represent the first The number of the clusters, Representing the semantic similarity score for the i-th block, An index indicating the optimal block selected in the kth cluster.
7. A region of interest extraction system for a pathological whole-slice image of a pulmonary disease, comprising: A lung image data acquisition module for acquiring lung image data and generating a corresponding semantic similarity score according to the ith block in the lung image data Visual representation vector ; Generating semantic similarity scores In this case, a semantic similarity score between each block in the lung slice and the input prompt is generated using the CONCH, PLIP, or UNI pathology visual base model ; An index set screening module for scoring according to semantic similarity Obtaining an optimal threshold by adopting a self-adaptive threshold filtering method, a percentile threshold method, a bimodal detection method or an entropy-based threshold selection method, and screening an index set according to the optimal threshold ; A feature space clustering module for indexing the set Visual representation vector of middle block Clustering to obtain cluster attribution labels of each block Member index set of each cluster , ; The interested region extraction module is used for selecting the block with the highest semantic similarity score in each cluster as a representative block to obtain an index set Based on index set Extracting feature matrix of each representative block The method comprises the steps of indexing and backtracking to the space coordinate positions of representative blocks in corresponding lung image data to obtain an interested region; Wherein, the Representing the semantic similarity score for the i-th block, A visual representation vector representing the i-th block, Indicating the cluster number to which the i-th block belongs, Represent the first The number of the clusters, The number of cluster numbers is indicated, The representation belonging to an index set And the cluster number is Is a set of blocks of (a) a block of (b).

Description

Method and system for extracting region of interest of lung disease pathology full-slice image Technical Field The invention belongs to the technical field of artificial intelligence, relates to extraction of an interested region, and particularly relates to a method and a system for extracting an interested region of a lung disease pathology full-slice image. Background Digital pathology (Digital Pathology) is an emerging discipline of digitizing, informationizing tissue section examinations under conventional optical microscopy, and is considered as a key supportive technique for the transition of pathology from empirical to precise medicine. Its core data carrier, a full slice image, digitizes an entire tissue slice by a high resolution scanner, forming a very large scale medical image up to the gigapixel level (Gig pixel-scale). In the traditional manual film reading process or the emerging computer aided diagnosis process, the identification and positioning of the region of interest (ROI) are indispensable core links, namely pathologists need to rapidly position the regions with diagnostic value (such as tumor infiltration foci, special differentiation modes, vascular invasion and the like) in massive tissue structures, while in the artificial intelligent processing process, a deep learning model is often limited by computational resources, billions of pixel-level images cannot be directly processed, and the analysis range must be reduced to a computable scale depending on the region of interest extraction links. For example, a typical lung adenocarcinoma pathological section can generate about 10 x 10 ten thousand pixels of images at 40 x magnification, and can contain 5,000 to 50,000 blocks, but a large number of the blocks are non-diagnostic areas (e.g., normal alveolar tissue, fibrous interstitium, blood vessels, dead space, technical artifacts, etc.). The pathologist needs to browse, locate, mark repeatedly under different magnifications, which consumes a lot of time, and the region of interest selection can also suffer from subjective heterogeneity due to the personal experience, professional background, and cognitive preference of the doctor. Therefore, the manpower-intensive marking mode faces serious challenges for sustainability under the requirements of data set construction in big data scenes and downstream artificial intelligent diagnosis and treatment tasks. Therefore, it is important to provide an automatic extraction method of the region of interest based on deep learning. The patent application with the application number 202111255965.1 discloses an aortic region of interest extraction method, electronic equipment and a storage medium, which are used for acquiring a lung mask image by firstly carrying out segmentation processing on an acquired medical image to be extracted, acquiring lower boundary position information of a lung region and coordinate information of a minimum horizontal circumscribed rectangle of the lung region according to the lung mask image, respectively acquiring physical distance information from a starting image layer and a terminating image layer of the medical image to the lower boundary of the lung region according to the lower boundary position information of the lung region, respectively acquiring upper boundary position information and lower boundary position information of the aortic region of interest corresponding to the medical image to be extracted according to the initial image layer and the terminating image layer of the medical image to be extracted, and respectively extracting the aortic region of interest according to the coordinate information of the minimum horizontal circumscribed rectangle and the upper boundary position information and the lower boundary position information of the aortic region of interest to be extracted. The method can automatically and accurately extract the region of interest of the aorta, and has low cost and high efficiency. The above patent application of the invention identifies and extracts the region of interest of the pulmonary aorta, but the region of interest of the lung adenocarcinoma (particularly invasive lung adenocarcinoma) is extracted differently. The invasive lung adenocarcinoma can be in five major tissue student long modes, namely an adherent mode (Lepidic), a acinus mode (Acinar), a nipple mode (PAPILLARY), a micro nipple mode (Micropapillary) and a Solid mode (Solid), and the key pathological facts are that about 70% -80% of invasive lung adenocarcinoma simultaneously comprises two or more histological modes, and the pathological report needs to record the area percentage of each mode in 5% increments. For example, a report may be "acinar (60%), adherent (20%), and micro-papillary (20%) composition", with different patterns having significantly different prognostic significance (e.g., micro-papillary and solid cues are more aggressive and less predictive). Thus, in addition to the major components th