Search

CN-121983286-A - Machine learning analysis method, system and electronic equipment based on prostate tumor multidimensional disorder data

CN121983286ACN 121983286 ACN121983286 ACN 121983286ACN-121983286-A

Abstract

The invention provides a machine learning analysis method, a machine learning analysis system and electronic equipment based on prostate tumor multidimensional disease data, wherein S1, original image data, original clinical data and pathological results of prostate dominant lesion tissues are acquired, S2, object image histology characteristics are screened and acquired, an image histology model is built according to the object image histology characteristics and the pathological results, a deep learning model is built according to the original image data and the pathological results, object clinical characteristics are screened and acquired, a clinical prediction model is built according to the object clinical characteristics and the pathological results, S3, fusion analysis is conducted on the image histology model, the deep learning model and the clinical prediction model aiming at the prediction results of the prostate tumor multidimensional disease data, object fusion characteristics are screened and acquired, a multi-mode fusion model is built according to the object fusion characteristics and the pathological results, and S4, a two-dimensional nomogram is built based on the multi-mode fusion model.

Inventors

  • LI ZIYAO
  • YE LIN
  • YE BOWEN

Assignees

  • 上海市东方医院(同济大学附属东方医院)

Dates

Publication Date
20260505
Application Date
20260126

Claims (10)

  1. 1. A machine learning analysis method based on prostate tumor multidimensional disorder data, comprising the steps of: S1, acquiring original image data and original clinical data of prostate dominant focal tissues of individuals in a group, preprocessing the original image data, and acquiring corresponding pathological results of the prostate dominant focal tissues of the individuals in the group in advance; S2, screening the original image data to obtain target image histology characteristics, establishing an image histology model according to the target image histology characteristics and combining a pathological result, establishing a deep learning model according to the original image data and combining a pathological result, screening the original clinical data to obtain target clinical characteristics, and establishing a clinical prediction model according to the target clinical characteristics and combining a pathological result; S3, carrying out fusion analysis on the image histology model, the deep learning model and the clinical prediction model aiming at the prediction result of the prostate tumor multidimensional disease data, screening to obtain target fusion characteristics, and establishing a multi-mode fusion model according to the target fusion characteristics and combining pathological results; and S4, based on the multi-mode fusion model, a two-dimensional nomogram is established, and the multi-mode fusion model and the nomogram are respectively used for executing analysis and prediction functions on the prostate tumor multidimensional disorder data of the target individual from different approaches.
  2. 2. The machine learning analysis method based on prostate tumor multidimensional condition data of claim 1, wherein locating prostate dominant focal tissue of a group of individuals in S1 comprises the steps of: S11, scanning and acquiring a digital image of prostate multiple focus tissues of the individuals in the group through a multiparameter magnetic resonance scanning technology, selecting a group of prostate focus tissues with the largest volume, scoring the group of prostate focus tissues according to a prostate image report and a data system, and marking the group of prostate multiple focus tissues as prostate dominant focus tissues of the individuals in the group.
  3. 3. The machine learning analysis method based on prostate tumor multidimensional condition data according to claim 1, wherein the raw image data category includes T2 weighted imaging, diffusion weighted imaging and apparent diffusion coefficient image, and preprocessing is performed on the raw image data in S1, comprising the steps of: S12, marking the interested region in the T2 weighted imaging, the diffusion weighted imaging and the apparent diffusion coefficient image respectively, wherein the interested region comprises a three-dimensional format and a two-dimensional format, extracting geometric characteristics of the prostate tumor according to the three-dimensional shape in the interested region of the T2 weighted imaging, the diffusion weighted imaging and the apparent diffusion coefficient image respectively, extracting intensity characteristics of the prostate tumor according to the intensity distribution condition of voxels inside the prostate tumor respectively, and extracting texture characteristics of the prostate tumor in a gray level symbiotic matrix, a gray level run-length matrix, a gray level dependent matrix, a gray level region size matrix or a neighborhood gray level difference matrix mode respectively.
  4. 4. The machine learning analysis method based on prostate tumor multidimensional condition data according to claim 1, wherein the step of screening the raw image data for the target image histology feature and the step of screening the raw clinical data for the target clinical feature in S2 comprises the steps of: S21, screening all image histology characteristics by a t-test and/or Mann-Whitney U test mode, and reserving the image histology characteristics with p value less than 0.05; calculating the correlation between different image histology characteristics through the Spearman rank correlation coefficient, and deleting one redundant image histology characteristic when the rho value between any two image histology characteristics is more than or equal to 0.9; Screening the image histology features through a LASSO-Cox regression model to obtain the target image histology features, wherein the number of the target image histology features is limited to 26 groups, and calculating based on the target image histology features to obtain an image histology feature scoring algorithm which is used for predicting benign and malignant risk values of the prostate tumor according to the geometric features, the intensity features and the texture features of the prostate dominant focus tissue; s22, screening all clinical characteristics by a baseline statistical method, and reserving the clinical characteristics with p value less than 0.05; and screening clinical characteristics through a Logistic regression model to obtain the target clinical characteristics.
  5. 5. The machine learning analysis method based on prostate tumor multidimensional condition data according to claim 4, wherein the screening of clinical features by Logistic regression model in S22 comprises the steps of: S221, in a Logistic regression model, the independent prediction capability of any clinical feature is evaluated, the target clinical feature is obtained through screening, and the influence weight distribution of a plurality of target clinical features in the optimal prediction variable combination formed by the target clinical features is calculated.
  6. 6. The machine learning analysis method based on prostate tumor multidimensional disorder data according to claim 5, wherein the target clinical features in the optimal predictive variable combination include a score of prostate image report and data system, a free prostate specific antigen/total prostate specific antigen ratio, body surface area.
  7. 7. The machine learning analysis method based on prostate tumor multidimensional condition data according to claim 4, wherein in S2, an image histology model is built according to the target image histology characteristics and in combination with pathology results, a deep learning model is built according to the original image data and in combination with pathology results, and a clinical prediction model is built according to the target clinical characteristics and in combination with pathology results, comprising the steps of: S23, inputting the target image histology characteristics and the corresponding pathological results into a logistic regression model, a support vector machine model, a K nearest neighbor model, a decision tree model, a random forest model, a XGBoost model or a LightGBM model, training the model based on a 5-fold cross validation method, and establishing the image histology model; s24, inputting the two-dimensional format region of interest of the original image data and a corresponding pathological result into ResNet deep learning models for training, and establishing the deep learning models; Screening the deep learning features automatically extracted by the deep learning model through a LASSO-Cox regression model to obtain target deep learning features, wherein the number of the target deep learning features is limited to 33 groups, and a deep learning feature scoring algorithm is calculated and obtained based on the target deep learning features and is used for predicting benign and malignant risk values of the prostate tumor according to the target deep learning features of the original image data; S25, inputting the target clinical characteristics and the corresponding pathological results into a logistic regression model, a support vector machine model, a K nearest neighbor model, a decision tree model, a random forest model, a XGBoost model or a LightGBM model, training the model based on a 5-fold cross validation method, and establishing the clinical prediction model.
  8. 8. The machine learning analysis method based on prostate tumor multidimensional disorder data according to claim 7, wherein establishing a multimodal fusion model based on the fusion features and in combination with pathology results in S3 comprises the steps of: S31, respectively outputting benign and malignant risk value prediction results aiming at the same prostate tumor multidimensional disease data by the image histology model, the deep learning model and the clinical prediction model, and carrying out probability fusion on the three groups of benign and malignant risk value prediction results by a weighted average or meta classifier mode to generate comprehensive prediction probability; And screening data features of different sources affecting comprehensive prediction probability through an LASSO-Cox regression model to obtain target fusion features, wherein the number of the target fusion features is limited to 64 groups, a comprehensive scoring algorithm is calculated and obtained based on the target fusion features, and the comprehensive scoring algorithm is used for synchronously combining the target image histology features, the target deep learning features and the target clinical features to predict benign and malignant risk values of the prostate tumor.
  9. 9. A machine learning analysis system based on prostate tumor multidimensional condition data, characterized in that it is configured to perform a machine learning analysis method based on prostate tumor multidimensional condition data according to any one of claims 1 to 8, comprising: the data acquisition module is used for acquiring original image data, original clinical data and corresponding pathological results of the prostate leading focus tissues of the individuals in the group and performing preprocessing on the original image data; The feature extraction module is used for respectively extracting target image histology features and target clinical features from the original image data and the original clinical data; The independent model building module is used for building an image histology model, a deep learning model and a clinical prediction model; The comprehensive model building module is used for building a multi-mode fusion model according to the image histology model, the deep learning model and the clinical prediction model; and the alignment chart generating module is used for generating a two-dimensional alignment chart according to the multi-mode fusion model.
  10. 10. An electronic device is characterized by comprising a memory and a processor; the memory is used for storing computer instructions; The processor for invoking computer instructions stored in the memory to cause the electronic device to perform the machine learning analysis method based on prostate tumor multidimensional disorder data as recited in any one of claims 1-8.

Description

Machine learning analysis method, system and electronic equipment based on prostate tumor multidimensional disorder data Technical Field The invention belongs to the technical field of automatic processing of medical data, and particularly relates to a machine learning analysis method, a system and electronic equipment based on prostate tumor multidimensional disorder data. Background Prostate cancer (PCa) is one of the most common malignant tumors in men worldwide, and with the annual increase in prostate cancer rate, early diagnosis and treatment have become key to improving patient prognosis and increasing survival rate. However, the conventional prostate cancer diagnosis method, such as tissue biopsy, has certain invasiveness and limitation, and cannot fully reflect the heterogeneity of tumor, while the conventional manual analysis method of medical image data or clinical data relies on manually designed image histology features, simple textures or clinical features, which are difficult to efficiently and accurately distinguish by manpower, resulting in lower efficiency and accuracy of manual analysis operation of medical image data or clinical data. Therefore, a scheme for realizing automatic and accurate processing of medical image data or clinical data is urgently needed. Disclosure of Invention The invention provides a machine learning analysis method, a system and electronic equipment based on prostate tumor multidimensional disease data, which are used for solving the technical problems that in the prior art, the conventional prostate puncture biopsy has invasiveness to human tissues, and the efficiency and the accuracy of the conventional medical image data or clinical data manual analysis mode are low. In order to solve the problems, the technical scheme of the invention is that the machine learning analysis method based on the prostate tumor multidimensional disease data comprises the following steps: S1, acquiring original image data and original clinical data of prostate dominant focal tissues of individuals in a group, preprocessing the original image data, and acquiring corresponding pathological results of the prostate dominant focal tissues of the individuals in the group in advance; S2, screening the original image data to obtain target image histology characteristics, establishing an image histology model according to the target image histology characteristics and combining a pathological result, establishing a deep learning model according to the original image data and combining a pathological result, screening the original clinical data to obtain target clinical characteristics, and establishing a clinical prediction model according to the target clinical characteristics and combining a pathological result; S3, carrying out fusion analysis on the image histology model, the deep learning model and the clinical prediction model aiming at the prediction result of the prostate tumor multidimensional disease data, screening to obtain target fusion characteristics, and establishing a multi-mode fusion model according to the target fusion characteristics and combining pathological results; and S4, based on the multi-mode fusion model, a two-dimensional nomogram is established, and the multi-mode fusion model and the nomogram are respectively used for executing analysis and prediction functions on the prostate tumor multidimensional disorder data of the target individual from different approaches. Preferably, the localization of prostate dominant focal tissue into a group of individuals in S1 comprises the steps of: S11, scanning and acquiring a digital image of prostate multiple focus tissues of the individuals in the group through a multiparameter magnetic resonance scanning technology, selecting a group of prostate focus tissues with the largest volume, scoring the group of prostate focus tissues according to a prostate image report and a data system, and marking the group of prostate multiple focus tissues as prostate dominant focus tissues of the individuals in the group. Preferably, the raw image data category includes T2 weighted imaging, diffusion weighted imaging and apparent diffusion coefficient image, and the preprocessing is performed on the raw image data in S1, including the steps of: S12, marking the interested region in the T2 weighted imaging, the diffusion weighted imaging and the apparent diffusion coefficient image respectively, wherein the interested region comprises a three-dimensional format and a two-dimensional format, extracting geometric characteristics of the prostate tumor according to the three-dimensional shape in the interested region of the T2 weighted imaging, the diffusion weighted imaging and the apparent diffusion coefficient image respectively, extracting intensity characteristics of the prostate tumor according to the intensity distribution condition of voxels inside the prostate tumor respectively, and extracting texture characteristics of the prostat