CN-122023870-A - Whole-slide image target detection method based on deep learning and tumor detection system applying method
Abstract
The invention belongs to the technical field of artificial intelligence and medical image analysis, and particularly relates to a full-slide image target detection method based on deep learning and a tumor detection system applying the method. The method comprises the steps of preprocessing a full-slide image, processing the full-slide image into a patch level image, taking a patch level tumor image as input, taking classification and regression results of lesion tumors as output, constructing a single-stage target detection model, introducing LS convolution, simulating large-scale observation and focusing, improving the detection efficiency of tumors, improving a GD multi-scale fusion module, collecting and distributing cross-level information in a C2f multi-scale feature fusion network, and outputting detection results of lesion tumors in the patch level tumor image. The invention can automatically, efficiently and accurately detect the lesion tumor from the full-slide image and contributes to disease screening.
Inventors
- YANG LINGLING
- Mei Aoshuang
- HAN XUE
- WU ZE
- OU JIALI
- XIE XING
Assignees
- 南通大学
Dates
- Publication Date
- 20260512
- Application Date
- 20251223
Claims (10)
- 1. The full-slide image target detection method based on deep learning is characterized by comprising the following steps of: S1, preprocessing a full-slide image, dividing the full-slide image into a plurality of image blocks, screening and enhancing the image blocks, and obtaining an input image block set; S2, constructing a single-stage target detection model, taking the input image block as input, and outputting the category and position information of a target area; S3, introducing an LS convolution module into a neck network and a detection head of the detection model, wherein the LS convolution module comprises a large-core sensing module and a small-core aggregation module and is used for realizing multi-scale feature sensing and fusion; S4, introducing an improved GD multi-scale fusion module into the detection model, and collecting and distributing cross-level information in a feature pyramid network through a FAM feature alignment module, an IFM information fusion module and a Inject information injection module; s5, outputting a detection result of the target area in the input image block based on the single-stage target detection model.
- 2. The method according to claim 1, wherein step S1 specifically comprises: S1-1, acquiring a full-slide image, and cutting the full-slide image into image blocks with uniform sizes according to a sliding window algorithm; s1-2, constructing labeling information for the image blocks, and removing the image blocks which do not contain the target area in a training data set; S1-3, carrying out data enhancement processing on the image block; S1-4, dividing the processed data set into a training set, a verification set and a test set.
- 3. The method according to claim 1, wherein in step S2, the loss function of the single-stage object detection model includes an object confidence loss, a classification loss, and a bounding box regression loss, and the total loss function is a weighted sum of the three; the target confidence loss and the classification loss are calculated by adopting a binary cross entropy function; the bounding box regression loss is calculated by adopting a composite function based on the consistency of the intersection ratio, the center point distance and the length-width ratio.
- 4. The method according to claim 1, wherein step S3 specifically comprises: In the neck network, the large core sensing module is utilized to sense the global context of the high-level feature map, and the sensing result is injected into the low-level feature map through the small core aggregation module; and in the detection head, the LS convolution module is applied to the fused feature map, the large-kernel perception module is utilized to capture the whole context information of the detection area, and the small-kernel aggregation module is utilized to refine local feature details.
- 5. The method of claim 4, wherein the step of determining the position of the first electrode is performed, The large-kernel perception module adopts large-kernel depth separable convolution to carry out large-range context modeling on an input feature map, namely firstly compressing the number of channels to 1/2 of the number of original channels through point-by-point convolution, then capturing long-distance space dependency relationship by utilizing the large-kernel depth convolution, and finally generating context self-adaptive weight through point-by-point convolution to guide feature fusion of a small-kernel aggregation stage; The small kernel aggregation module adopts grouping dynamic convolution to realize the fine aggregation of local detail characteristics based on the self-adaptive weight generated by the large kernel perception module, divides an input channel into G groups, G=channel number/8, shares the aggregation weight in each group, and outputs a weight matrix by remolding the large kernel perception module Wherein For small kernel size, the self-adaptive fusion of the features of the high correlation region is realized by carrying out dynamic weighted convolution on the local features in each group.
- 6. The method according to claim 1, wherein step S4 specifically comprises: In the C2f multi-scale feature fusion network, respectively fusing the high-level and low-level feature graphs through an oligomerization dispersion branch and a high-polymerization dispersion branch; The oligomeric dispersion branches through LowFAM modules to The size of the feature map is used as a reference, And Downsampling to global pooling layer The dimensions of the product are such that, Then the channel is spliced after the bilinear interpolation is expanded to the same size, and then the LowFAM is output by LowIFM module fusion convolution, structure re-parameterization RepConv Block and splitting operation RepConv Block As input, the processed information is split into two outputs in the channel dimension, and the information of different levels is fused, and the calculation flow is as follows: ; ; ; Wherein, the The output of the LowFAM module is represented, Representing the characteristic diagram after information fusion processing, a HighFAM module of a high-aggregation-dispersion branch to The size of the feature map is used as a reference, And The HighIFM module integrates a plurality of self-care, MLP and splitting operations, and the operation process is as follows: ; ; ; Integrating Global information X_Global generated by IFM and Local information X_local of each level through Inject module, expanding receiving range by means of low LAF and high LAF, performing convolution, sampling and interpolation on Local and Global information, finishing information splicing in channel dimension, and obtaining final output after processing through C2f module.
- 7. A depth learning-based tumor detection system in a whole slide image, the system comprising: the image preprocessing and screening module is configured to execute preprocessing on the whole slide image WSI, process the whole slide image WSI into a patch level image, screen the patch level image and enhance the data to obtain a patch level tumor image for model input; The detection model construction module is configured to be executed, and a single-stage target detection model is constructed by taking the patch-level tumor image as input and taking the classification and regression results of lesion tumors as output; The LS convolution module is configured to execute, and is used for introducing LS convolution, wherein the LS convolution comprises a large kernel perception module LKP and a small kernel aggregation module SKA, and is used for realizing the feature perception and fusion of large-size observation and focusing in a neck network and a detection head of a detection network; The GD multi-scale fusion module is configured to be executed, an improved GD multi-scale fusion module is introduced into the detection model, and cross-level information collection and distribution are carried out in a feature pyramid network through a FAM feature alignment module, an IFM information fusion module and a Inject information injection module; and the result output module is configured to output the detection result of the lesion tumor in the patch-level tumor image.
- 8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the computer program is executed to implement the steps of the method according to any of claims 1 to 6.
- 9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program is configured to implement the steps of the method of any one of claims 1 to 6 when called by a processor.
- 10. A computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps of the method of any of claims 1 to 6.
Description
Whole-slide image target detection method based on deep learning and tumor detection system applying method Technical Field The invention belongs to the technical field of artificial intelligence and medical image analysis, and particularly relates to a full-slide image target detection method based on deep learning and a tumor detection system applying the method. Background WSI (white SLIDE IMAGE, whole-slide image) has a key role in tumor pathology diagnosis, which provides detailed cell and tissue morphology information to pathologists with extremely high resolution. However, the conventional manual detection method mainly relies on a pathologist to observe the WSI region by region under a microscope, which requires a lot of time and effort, and often requires several hours or more for complete evaluation of a WSI, which is very prone to cause problems of low diagnosis efficiency and backlog when facing a large number of clinical samples. Meanwhile, the manual detection is highly dependent on experience and professional level of pathologists, the diagnosis results of different doctors may have obvious differences, and the doctors are easy to generate fatigue, inattention and other conditions in long-time work, which may lead to missed diagnosis of micro tumor focus or misdiagnosis of benign and malignant tissues, and the accuracy and consistency of diagnosis are difficult to ensure. In addition, the manual detection lacks objective quantitative analysis standards, so that the accurate discrimination is difficult to carry out on complex tumor tissue forms such as cancers with high heterogeneity or juncture lesions, and the standardization and standardization development of pathological diagnosis are severely restricted. With the wide application of the deep learning technology in the medical image field, a full-glass image tumor detection method based on a convolutional neural network and other models gradually becomes a research hotspot. According to the method, the characteristics in the image are automatically learned, so that the detection efficiency can be improved to a certain extent, and the manual intervention is reduced. However, due to the specificity of the whole slide image, the existing methods still face a number of challenges. On the one hand, the expression form of the tumor in WSI has obvious multi-scale characteristics, from single abnormal cells to large-scale cancer nest tissues, the characteristic difference under different scales is extremely large, the traditional deep learning model is difficult to effectively capture the cross-scale information association, and the problems of missed detection of small tumor cells or incomplete segmentation of large tumor areas are easy to occur. On the other hand, the ultra-high resolution of the WSI makes the input of the WSI into the deep learning model directly require huge calculation resources and memory space, and the conventional blocking processing method can relieve the memory pressure, but can damage the spatial continuity of tumor tissues, so that the model cannot accurately understand the context relationship, and meanwhile, the repeated calculation in the blocking processing also increases the time cost and affects the real-time performance of detection. Disclosure of Invention The invention aims to solve the technical problem of providing a full-slide image target detection method based on deep learning and a tumor detection system applying the method, so that lesion tumors can be automatically, efficiently and accurately detected from the full-slide image, and the method contributes to disease screening. In order to solve the technical problems, the invention adopts the following technical scheme: First, the invention provides a full-slide image target detection method based on deep learning, which comprises the following steps: S1, preprocessing a full-slide image, dividing the full-slide image into a plurality of image blocks, screening and enhancing the image blocks, and obtaining an input image block set; S2, constructing a single-stage target detection model, taking the input image block as input, and outputting the category and position information of a target area; S3, introducing an LS convolution module into a neck network and a detection head of the detection model, wherein the LS convolution module comprises a large-core sensing module and a small-core aggregation module and is used for realizing multi-scale feature sensing and fusion; S4, introducing an improved GD multi-scale fusion module into the detection model, and collecting and distributing cross-level information in a feature pyramid network through a FAM feature alignment module, an IFM information fusion module and a Inject information injection module; s5, outputting a detection result of the target area in the input image block based on the single-stage target detection model. Preferably, step S1 specifically includes: S1-1, acquiring a full-slide image, an