CN-122025147-A - Brain disease risk prediction method and system based on big data analysis
Abstract
The invention discloses a brain disease risk prediction method and system based on big data analysis, and belongs to the technical field of brain disease risk prediction. The method comprises the steps of carrying out standardized pretreatment and labeled classification on big data related to brain diseases to generate a characteristic data set, mining specific morbidity characteristics and risk factors in the set, combing association rules, training risk prediction sub-models for various disease types based on the data, constructing a multi-sub-model hierarchical prediction system, collecting object data to be predicted, matching the disease types, and calling corresponding sub-models to complete risk assessment. The system comprises a plurality of modules which cooperatively operate to realize the full-flow closed loop of data storage, feature processing, model management and result output. The method improves pertinence, accuracy and high efficiency of risk prediction, realizes full-flow traceability and model dynamic optimization, and provides reliable technical support for early screening and risk early warning of brain diseases.
Inventors
- YIN XUEJING
- DING YONGJUN
- Liang Sancheng
- WANG ZHENYU
- ZHANG HENGXING
- CHEN KAIJUN
- Zong Xuechao
- XIE DONG
- TIAN XIAO
- LI MENGYA
- HE YINLEI
- ZHAO PU
- CHENG FUCHUAN
Assignees
- 中国通信建设第四工程局有限公司
Dates
- Publication Date
- 20260512
- Application Date
- 20260311
Claims (10)
- 1. The brain disease risk prediction method based on big data analysis is characterized by comprising the following steps of: S1, developing and standardizing the collected big data related to the brain diseases, removing invalid data, and then finishing labeled classification according to specific types of the brain diseases to generate characteristic data sets corresponding to the brain diseases of all types; S2, feature mining operation is carried out on feature data sets corresponding to the brain diseases of all types, specific onset features and core risk factors of the brain diseases of all types are extracted, association rules between the specific onset features and the core risk factors are combed through an association analysis module, and an association rule data set is generated; S3, based on a feature data set corresponding to each type of brain disease, combining the extracted specific morbidity feature, core risk factors and association rule data set, respectively training to obtain corresponding risk prediction sub-models for each type of brain disease, performing risk feature matching and evaluation operation on each type of brain disease, and constructing a multi-sub-model-level brain disease risk prediction system for integrating the risk prediction sub-models corresponding to each type of brain disease and realizing on-demand retrieval of the risk prediction sub-models, wherein each risk prediction sub-model and the corresponding brain disease type form a unique mapping connection relation; S4, basic health features and sign data of the object to be predicted are collected, after the corresponding brain disease types are matched, a risk prediction sub-model corresponding to the types is called from a brain disease risk prediction system of a multi-sub-model level, the basic health features and the sign data are input into the risk prediction sub-model, and risk assessment operation of the corresponding brain disease types is completed.
- 2. The method according to claim 1, characterized in that step S1 comprises the sub-steps of: s1.1, performing format unified processing on brain disease related big data, and converting the brain disease related big data in a heterogeneous storage format into a preset unified data format; s1.2, carrying out missing value supplementing treatment on the brain disease related big data with unified formats, and perfecting the brain disease related big data with data missing by adopting an interpolation method; s1.3, carrying out outlier identification and elimination on the brain disease related big data with the missing value supplementation to generate pretreated brain disease related big data; S1.4, adding a disease type label to the pretreated brain disease related big data according to clinical classification standards of brain diseases, finishing labeled classification and generating a characteristic data set corresponding to each type of brain disease.
- 3. The method according to claim 1, characterized in that step S2 comprises the sub-steps of: S2.1, performing feature primary screening on feature data sets corresponding to various brain diseases, removing redundant feature data which are not associated with the onset of the brain diseases, and reserving effective feature data; s2.2, carrying out feature quantization processing on the reserved effective feature data, and converting the non-numerical effective feature data into numerical effective feature data; S2.3, sorting the feature importance of the numerical effective feature data, and extracting feature data with the front sorting as specific onset features and core risk factors of various brain diseases; s2.4, inputting the extracted specific morbidity characteristics and the core risk factors into a correlation analysis module, mining the correlation between the extracted specific morbidity characteristics and the core risk factors through the module, combing according to brain disease types to form a corresponding correlation rule data set, and synchronously outputting the correlation rule data set.
- 4. The method according to claim 1, characterized in that step S3 comprises the sub-steps of: S3.1, dividing characteristic data sets corresponding to various brain diseases into a training data set and a verification data set, wherein both the two data sets comprise specific morbidity characteristics, core risk factors and complete data dimensions of an association rule data set; s3.2, constructing a network structure of an initial prediction model, wherein the initial prediction model is used for subsequent training to obtain risk predictor models corresponding to various brain diseases, setting a loss function and an optimizer parameter of model training, carrying out iterative training on the initial prediction model by utilizing a training data set, and dynamically adjusting weight parameters and bias parameters of the model; S3.3, inputting the verification data set into the trained initial prediction model, verifying the fitting degree and generalization capability of the model through a verification module, and generating a risk predictor model to be verified, which corresponds to each type of brain disease, for pre-operation of risk assessment of each type of brain disease; S3.4, reserving the risk prediction sub-model to be verified, classifying the sub-model according to the brain disease type, building a brain disease risk prediction system of a multi-sub-model level consisting of multi-level sub-models, integrating the risk prediction sub-models corresponding to various brain diseases, achieving on-demand retrieval of the risk prediction sub-models, and confirming the unique mapping connection relation between each level sub-model and the corresponding disease type.
- 5. The method according to claim 1, characterized in that step S4 comprises the sub-steps of: S4.1, basic health characteristics and physical sign data of an object to be predicted are collected, standardized processing is carried out on the collected data according to the preprocessing standard of S1, and standardized data to be predicted are generated; s4.2, carrying out feature matching operation on the standardized data to be predicted and the specific onset characteristics of various brain diseases, and determining the brain disease type corresponding to the object to be predicted according to an operation result; S4.3, a risk prediction sub-model corresponding to the matching result is called from a brain disease risk prediction system of a multi-sub-model level, and operation connection of standardized data to be predicted and the risk prediction sub-model is established; S4.4, inputting standardized data to be predicted into the risk prediction sub-model with established operation connection, and generating a risk assessment result of the brain disease type corresponding to the object to be predicted through forward propagation operation of the model.
- 6. The method according to claim 1, wherein in step S1, after the labeling classification is completed, a feature data storage module is built, feature data sets corresponding to each type of brain disease are respectively stored in independent data partitions of the feature data storage module, unique disease type identifiers are set for each data partition, partition isolation storage of each feature data set is achieved, and the feature data storage module establishes a real-time data transmission connection with the feature mining operation after the preprocessing of S1 is completed.
- 7. The method according to claim 6, wherein in step S2, when the association rule between the specific morbidity feature and the core risk factor is combed, an association weight value is configured for each group of association relation through a weight calculation module, the association weight value and the corresponding association rule are input into an association rule data set together, the association rule data set is in one-to-one association connection with the feature data sets of the various brain diseases, and the operation data of the weight calculation module are synchronously stored into the feature data storage module.
- 8. The method according to claim 7, wherein in step S3, a multi-submodel management module is built, the module forms a bidirectional data connection with the brain disease risk prediction system of the multi-submodel hierarchy, the multi-submodel management module performs independent parameter retrieval, model updating and performance detection on each risk prediction submodel, the training log, weight parameters and bias parameters of each risk prediction submodel are synchronously stored to the multi-submodel management module, and the multi-submodel management module establishes a data interaction connection with the feature data storage module.
- 9. The method according to claim 8, wherein in step S4, after the risk assessment operation is completed, the risk assessment result is input to the result output module for performing a structuring process, feature matching data and parameter operation data of the model operation are retained in the structuring process, the result output module establishes a real-time data transmission connection with the risk prediction sub-model, and the result output module synchronously receives the operation intermediate data and the final risk assessment result of the model, and synchronously interacts the operation data with the multi-sub-model management module.
- 10. A brain disease risk prediction system based on big data analysis, for executing the brain disease risk prediction method based on big data analysis according to any one of claims 1 to 9, characterized by comprising a data preprocessing module, a feature mining module, a model training module, a risk assessment module, a feature data storage module, a weight calculation module, a multi-submodel management module and a result output module; The system comprises a data preprocessing module, a feature data storage module, a feature mining module, a feature computing module, a model training module, a risk evaluation module, a result output module, a risk prediction sub-model, a model constructing module and a risk prediction module, wherein the data preprocessing module is in one-way connection with the feature data storage module, the data preprocessing module outputs a feature data set corresponding to each type of brain diseases to the feature data storage module to finish partition isolation storage, the feature mining module is respectively in two-way connection with the feature data storage module and the weight computing module, the feature mining module is respectively in two-way connection with the feature data storage module, the weight computing module is in one-way connection with the feature mining module and inputs a correlation rule data set for the associated weight value configured for the association relationship, the model training module is respectively in two-way connection with the feature data storage module, the weight computing module and the multiple sub-model management module, the model training module is respectively in two-way connected with the feature data storage module and the correlation rule data set for risk feature matching and evaluation operation for each type of brain diseases, the multiple sub-model management module performs overall process management on each risk prediction sub-model and synchronously stores model related data, the risk evaluation module is respectively in two-way connected with the multiple-model management module and the result output module, the risk evaluation module is correspondingly in the multiple sub-model management module and acquires corresponding risk prediction sub-model to complete evaluation operation, the operation result is output to the result and output module and the model is required to be constructed to complete, and the result and the model and has a real-time operation function and a function and is required to be integrated with the model and a full-level and a risk model and has a function and is required to be integrated.
Description
Brain disease risk prediction method and system based on big data analysis Technical Field The invention relates to the technical field of brain disease risk prediction, in particular to a brain disease risk prediction method and system based on big data analysis. Background Along with the improvement of the medical informatization level and the deep penetration of big data and artificial intelligence technology in the medical field, the brain disease risk prediction field is gradually developing to the intelligent direction of data driving. At present, the data acquisition channels related to brain diseases are increasingly abundant, and the data acquisition channels cover various types of clinical diagnosis and treatment data, daily health monitoring data, medical history recording data and the like, so that a mass data basis is provided for risk prediction. Meanwhile, the application of algorithm models such as machine learning, deep learning and the like in disease risk assessment is more and more extensive, and various prediction models are tried to identify the potential morbidity risk of brain diseases and assist medical staff in early screening and intervention work. The brain diseases of different types have obvious differences in pathogenesis, influence factors and the like, related researches are also focused on the influence of the specificity of the disease types on the prediction result, the requirements of accurate and targeted risk prediction schemes in industry are continuously increased, the scientificity and the practicability of risk prediction are hoped to be improved through technical means, and more powerful support is provided for brain health management. However, there are still many urgent problems to be solved in the existing brain disease risk prediction technology. Firstly, due to the diversity of data sources, the acquired big data related to brain diseases often have the conditions of heterogeneous formats and uneven data quality, a unified standardized pretreatment flow is lacking, and systematic labeling classification is not carried out according to the disease types, so that the data is difficult to directly use for efficient feature mining and model training. Secondly, the prior art adopts a generalized processing mode in the feature extraction link, and cannot accurately mine specific onset features and risk factors of different types of brain diseases, and the association rule between the features and the risk factors is not clearly combed, so that the input data pertinence of model training is insufficient. Moreover, most of the current prediction models are single model architecture, so that the prediction models are tried to adapt to the prediction requirements of various brain diseases, the specificity of different diseases is difficult to consider, the prediction accuracy is limited, and a unified model management and calling system is lacked, so that the on-demand quick calling of the models cannot be realized. Finally, the conventional scheme lacks a standardized flow in the links of disease type matching and risk assessment of the object to be predicted, so that the data to be predicted is difficult to accurately dock with the prediction model of the corresponding disease, and the efficiency and accuracy of risk assessment are affected. The existence of these problems results in the overall effect of brain disease risk prediction failing to be expected, and it is difficult to meet the demands of clinical application and health management for accurate and efficient risk prediction. Disclosure of Invention The invention aims to overcome the defects of the prior art and provides a brain disease risk prediction method and system based on big data analysis. The aim of the invention is realized by the following technical scheme: There is provided a brain disease risk prediction method based on big data analysis, the method comprising the steps of: S1, developing and standardizing the collected big data related to the brain diseases, removing invalid data, and then finishing labeled classification according to specific types of the brain diseases to generate characteristic data sets corresponding to the brain diseases of all types; S2, feature mining operation is carried out on feature data sets corresponding to the brain diseases of all types, specific onset features and core risk factors of the brain diseases of all types are extracted, association rules between the specific onset features and the core risk factors are combed through an association analysis module, and an association rule data set is generated; S3, based on a feature data set corresponding to each type of brain disease, combining the extracted specific morbidity feature, core risk factors and association rule data set, respectively training to obtain corresponding risk prediction sub-models for each type of brain disease, performing risk feature matching and evaluation operation on each type of brain