Search

CN-121983337-A - Colorectal advanced adenoma data prediction method, system, terminal and storage medium based on machine learning

CN121983337ACN 121983337 ACN121983337 ACN 121983337ACN-121983337-A

Abstract

The invention relates to the technical field of data prediction and discloses a colorectal advanced adenoma data prediction method, a colorectal advanced adenoma data prediction system, a colorectal advanced adenoma data terminal and a colorectal advanced adenoma data storage medium based on machine learning, wherein the method comprises the steps of screening out corresponding candidate prediction factors according to colorectal advanced adenoma data of each target sample; dividing candidate prediction factors into a training set and a verification set, constructing a target machine learning model by using the training set, inputting the verification set into a plurality of decision trees of the target machine learning model, outputting original prediction probability, and converting the original prediction probability into classification probability to obtain a prediction result. The method utilizes the efficient, interpretable and visual machine learning early warning model to generate the visual decision path, can effectively integrate the conventional non-invasive index, and realizes the individual prediction of the adenoma data in the colorectal progression period.

Inventors

  • ZHAO LI
  • HUANG XIAOYANG
  • WU WEIQING
  • HE HUI
  • PENG RUI

Assignees

  • 深圳市人民医院
  • 中国医学科学院肿瘤医院深圳医院

Dates

Publication Date
20260505
Application Date
20260123

Claims (10)

  1. 1. A machine-learning-based colorectal advanced adenoma data prediction method, the machine-learning-based colorectal advanced adenoma data prediction method comprising: obtaining a plurality of target samples, and screening out corresponding candidate prediction factors according to colorectal advanced adenoma data of each target sample; Dividing all the candidate prediction factors into a training set and a verification set, training a target machine learning model by using the training set, inputting the verification set into a plurality of decision trees of the target machine learning model, and outputting the original prediction probability of each target object; And respectively converting each original prediction probability into a classification probability, and converting each classification probability into a corresponding prediction result by using a two-classification algorithm.
  2. 2. The machine learning based colorectal advanced adenoma data prediction method of claim 1, wherein the obtaining a plurality of target samples and screening out corresponding candidate predictors based on colorectal advanced adenoma data of each of the target samples specifically comprises: Obtaining a target sample of a plurality of colorectal progressive adenomas, and extracting colorectal progressive adenoma data for each of the target samples; Extracting a plurality of indexes with a loss rate greater than a preset integrity for the colorectal advanced adenoma data of each target sample; Screening a plurality of prediction factors from each index to serve as candidate prediction factors; the indexes comprise basic information, physical examination information, blood examination information, urine information and faeces information.
  3. 3. The machine learning based colorectal advanced adenoma data prediction method of claim 1, wherein the obtaining a plurality of target samples and screening out corresponding candidate predictors based on colorectal advanced adenoma data for each of the target samples further comprises: constructing a logistic regression model, inputting all the candidate prediction factors into the logistic regression model one by one, and detecting the model fitting goodness of the logistic regression model; If the model fitting goodness exceeds the preset goodness, defining a candidate prediction factor corresponding to the model fitting goodness as an optimal variable; and defining all the optimal variables as optimal variable combinations, and adding a plurality of model fitting goodness corresponding to the optimal variable combinations into the logistic regression model to obtain a target logistic regression model.
  4. 4. The machine learning based colorectal advanced adenoma data prediction method of claim 1, wherein the dividing all the candidate prediction factors into a training set and a validation set, training a target machine learning model using the training set, inputting the validation set into a plurality of decision trees of the target machine learning model, and outputting the original prediction probability of each target object, specifically comprises: Screening a plurality of target prediction factors with minimum cross validation mean square error from all the candidate prediction factors of a target object, dividing data corresponding to all the target prediction factors into a training set and a validation set, and training a constructed target machine learning model by using the training set; Acquiring a plurality of decision trees constructed by the target machine learning model, and distributing each data in each target prediction factor in the verification set to a corresponding leaf node in the decision tree for each decision tree; For each leaf node, calculating a real label mean value and a predicted label mean value of each target prediction factor, and constructing a loss function according to the real label mean value and the predicted label mean value: ; Where L represents the loss function, m represents the true label mean, Representing a predictive label mean; Determining the weight of the leaf node according to the loss function, and calculating the leaf prediction probability of the leaf node according to the weight and the real label mean value until the leaf prediction probability of all the leaf nodes is obtained; Calculating the decision tree prediction probability of each decision tree according to all the leaf prediction probabilities and the corresponding weights; According to the tree weight of each decision tree, calculating the original prediction probability of the target object: ; where V represents the original prediction probability, The decision tree prediction probability of the jth decision tree is represented, Representing the weight of the leaf with a true label mean of m.
  5. 5. The machine-learning based colorectal advanced adenoma data prediction method of claim 4, wherein the determining the weight of the leaf node according to the loss function, calculating the leaf prediction probability of the leaf node according to the weight and the true label mean value, specifically comprises: defining a first derivative of the loss function on the predictive label as a gradient of the target predictive factor, constructing a second derivative from the gradient according to the gradient of each piece of data on each leaf node, and calculating the weight of the leaf node according to the gradient and the second derivative: ; ; ; Wherein, the Representing the first derivative on the leaf node i, Representing the second derivative at the leaf node i, Representing the complexity of the decision tree, Representing the i-th authentic label, The i-th predictive label of the last decision tree of the current decision tree is represented, and t represents the current decision tree; Constructing leaf prediction probability of the leaf node according to the weight and the real label mean value of the target prediction factor: ; Wherein, the Representing the leaf prediction probability.
  6. 6. The machine learning based colorectal advanced adenoma data prediction method according to claim 1, wherein the converting each of the original prediction probabilities into a classification probability and converting each of the classification probabilities into a corresponding prediction result using a two-classification algorithm comprises: for each original prediction probability, converting the original prediction probability into a classification probability by using a preset function: ; Wherein P represents a classification probability, V represents an original prediction probability, exp represents a preset function; If the classification probability is higher than a preset probability value, defining the target object as a high-risk object; And if the classification probability is not higher than a preset probability value, defining the target object as a low-risk object.
  7. 7. A machine learning based colorectal advanced adenoma data prediction method according to claim 3, wherein said converting each of said original prediction probabilities into a respective classification probability and converting each of said classification probabilities into a corresponding prediction result using a two-classification algorithm, further comprising: Inputting a verification set into the target logistic regression model, and outputting the logistic regression prediction probability of each target object in the verification set; constructing a logistic regression (AUC) index of the target logistic regression model according to each logistic regression prediction probability; Constructing a target AUC index of the target machine learning model according to each original prediction probability; And comparing the logistic regression AUC index with the target AUC index to obtain a performance comparison result.
  8. 8. A machine-learning based colorectal advanced adenoma data prediction system for implementing the machine-learning based colorectal advanced adenoma data prediction method of any of claims 1-7, the machine-learning based colorectal advanced adenoma data prediction system comprising: The data acquisition module is used for acquiring a plurality of target samples and screening out corresponding candidate prediction factors according to colorectal advanced adenoma data of each target sample; The prediction module is used for dividing all the candidate prediction factors into a training set and a verification set, training a target machine learning model by utilizing the training set, inputting the verification set into a plurality of decision trees of the target machine learning model, and outputting the original prediction probability of each target object; And the result output module is used for converting each original prediction probability into a classification probability respectively and converting each classification probability into a corresponding prediction result by utilizing a two-classification algorithm.
  9. 9. A terminal comprising a memory, a processor and a machine-learning based colorectal advanced adenoma data prediction program stored on the memory and executable on the processor, which machine-learning based colorectal advanced adenoma data prediction program when executed by the processor implements the steps of the machine-learning based colorectal advanced adenoma data prediction method of any of claims 1-7.
  10. 10. A computer readable storage medium, characterized in that it stores a machine learning based colorectal advanced adenoma data prediction program, which when executed by a processor, implements the steps of the machine learning based colorectal advanced adenoma data prediction method according to any of claims 1-7.

Description

Colorectal advanced adenoma data prediction method, system, terminal and storage medium based on machine learning Technical Field The invention relates to the technical field of data prediction, in particular to a colorectal advanced adenoma data prediction method, a colorectal advanced adenoma data prediction system, a colorectal advanced adenoma data prediction terminal and a colorectal advanced adenoma data prediction computer readable storage medium based on machine learning. Background The automatic and real-time analysis of colorectal tissue morphology has important application value for improving the efficiency of related health examination. Currently, automated analysis techniques for this field face a number of technical bottlenecks. Firstly, the mainstream technical scheme depends on complex calculation models, and internal decision logic is opaque, so that the result has insufficient interpretation and reliability. Secondly, in an application scene requiring instant feedback, the existing algorithm has contradiction between calculation efficiency and recognition accuracy, and a complex model structure often brings higher calculation delay, so that the real-time requirement is difficult to meet. In addition, the conventional inspection method capable of providing high-precision information has objective limitations in wide application due to factors such as complexity of operation, cost, and the like. These factors together restrict the standardized, scaled deployment and practical performance improvement of automated analysis techniques in related fields. Accordingly, the prior art is still in need of improvement and development. Disclosure of Invention The invention mainly aims to provide a machine learning-based colorectal advanced adenoma data prediction method, a machine learning-based colorectal advanced adenoma data prediction system, a machine learning-based colorectal advanced adenoma data prediction terminal and a machine learning-based colorectal advanced adenoma data prediction computer storage medium, and aims to solve the problems that in the prior art, an automatic analysis technology aiming at colorectal tissue morphology is opaque in model decision logic and poor in interpretation, and high accuracy and low calculation delay are difficult to ensure simultaneously in a real-time application scene. In order to achieve the above object, the present invention provides a machine learning-based colorectal advanced adenoma data prediction method, comprising the steps of: Obtaining a plurality of target samples, and screening out corresponding candidate prediction factors according to target data of each target sample; Dividing all the candidate prediction factors into a training set and a verification set, training a target machine learning model by using the training set, inputting the verification set into a plurality of decision trees of the target machine learning model, and outputting the original prediction probability of each target object; And respectively converting each original prediction probability into a classification probability, and converting each classification probability into a corresponding prediction result by using a two-classification algorithm. Optionally, the machine learning-based colorectal advanced adenoma data prediction method includes obtaining a plurality of target samples, and screening out corresponding candidate prediction factors according to colorectal advanced adenoma data of each target sample, where the method specifically includes: Obtaining a target sample of a plurality of colorectal progressive adenomas, and extracting colorectal progressive adenoma data for each of the target samples; Extracting a plurality of indexes with a loss rate greater than a preset integrity for the colorectal advanced adenoma data of each target sample; Screening a plurality of prediction factors from each index to serve as candidate prediction factors; the indexes comprise basic information, physical examination information, blood examination information, urine information and faeces information. Optionally, the machine learning-based colorectal advanced adenoma data prediction method includes the steps of obtaining a plurality of target samples, screening out corresponding candidate prediction factors according to colorectal advanced adenoma data of each target sample, and then further including: constructing a logistic regression model, inputting all the candidate prediction factors into the logistic regression model one by one, and detecting the model fitting goodness of the logistic regression model; If the model fitting goodness exceeds the preset goodness, defining a candidate prediction factor corresponding to the model fitting goodness as an optimal variable; and defining all the optimal variables as optimal variable combinations, and adding a plurality of model fitting goodness corresponding to the optimal variable combinations into the logistic regression