CN-119920468-B - Professional health examination result prediction method based on expert system and machine learning
Abstract
The invention discloses a professional health examination result prediction method based on an expert system and machine learning, which comprises the steps of obtaining physical examination file information, cleaning and standardizing, adding visit frequency information to form an original data set, extracting important features through deep learning, recombining to obtain a new data set, identifying all hazard factor types of each individual examination person in the new data set, carrying out logic judgment on each hazard factor according to professional health standards to obtain a corresponding prediction label so as to form a decision set, respectively marking the decision set with occupational contraindication and suspected occupational diseases, marking characteristic SOD and characteristic OC corresponding to the result, adding the new data set, randomly dividing the new data set subjected to expert system links into a training set and a testing set, selecting more than two machine learning models, respectively training by using the training set, weighting and averaging the prediction results of more than two models to obtain a final output result. The present invention more effectively identifies occupational health risks.
Inventors
- LIN LI
- HUANG JIAMING
- XIONG JINBO
- SHEN BO
- LIU PEIFANG
- JIN BIAO
- LIU YUAN
- LI ZHIRUI
Assignees
- 福建师范大学
Dates
- Publication Date
- 20260508
- Application Date
- 20250102
Claims (9)
- 1. The professional health examination result prediction method based on the expert system and the machine learning is characterized by comprising the following steps of: The method comprises the steps of 1, a data processing link, wherein the data processing link is to acquire physical examination archive information, clean and normalize the physical examination archive information, and then add the information of the times of treatment to form an original data set; Step 2, identifying all hazard factor types of each person in the new data set, logically judging each hazard factor according to the regulations of the hazard factor in the appointed occupational health standard to obtain a corresponding prediction label, summarizing to form a decision set, respectively marking the decision set with occupational contraindications and suspected occupational diseases, and adding the characteristic SOD and the characteristic OC corresponding to the marking results to the new data set, wherein the specific steps of the step 2 are as follows: step 2-1, identifying all hazard factor types of physical examination personnel in the data set: , representing the number of features of a single hazard factor, Represent the first Traversing all hazard factor feature sets F for each physical examination record to obtain a single hazard factor feature set G corresponding to each physical examination record, the first The single hazard factor characteristic G of the individual detection record is ; , A number of single hazard factor features equal to 1 in value; step 2-2, identifying and acquiring hazard factors of each physical examination record Corresponding physical examination characteristics T, T is the collection of physical examination characteristics T, and the physical examination characteristics of the ith individual examination record are collected Namely, is : , Is the number of physical examination features; step 2-3, for each hazard factor The logic function H is constructed, and the specific expression is as follows: ; Step 2-4, establishing a corresponding decision set for the single hazard factor feature set G , , , The decision set D of the j-th individual detection record is the value of the single hazard factor judged by the logic function H and representing the corresponding single hazard factor characteristic set G ; Step 2-5, marking the decision set with occupational contraindications and suspected occupational diseases respectively to obtain new characteristic SOD and characteristic OC, and adding the new characteristic SOD and the characteristic OC into the new data set; wherein, the characteristic SOD represents the judgment of the expert system simulation expert decision on the suspected occupational disease, 1 Is the existence of suspected occupational disease, 0 is the nonexistence, the feature OC represents the judgment of expert system simulation expert decision on occupational contraindications, 1 Is the existence of occupational contraindications, 0 is the absence; and step 3, training the mixed model, namely randomly dividing a new data set which is subjected to expert system links into a training set and a testing set, selecting more than two machine learning models, respectively training by using the training set to obtain respective prediction results, and carrying out weighted average on the prediction results of the more than two models by a weighted voting method to obtain a final output result.
- 2. The professional health examination result prediction method based on the expert system and the machine learning as set forth in claim 1, wherein the step 1 specifically includes the steps of: Step 1-1, traversing the physical examination archive information and deleting redundant characteristic information irrelevant to the physical examination conclusion, wherein the redundant characteristic information is personal information characteristics, and the personal information characteristics comprise names, physical examination numbers, names of human units and areas to which the human units belong; Step 1-2, extracting hazard factor information from physical examination archive information, converting each single hazard factor in the hazard factor information into a corresponding dummy variable to form a dummy variable set, and mapping the single hazard factor name of each dummy variable into a hazard factor name which is classified in a specified occupational health standard, wherein the hazard factor information comprises physical examination hazard factors and contact hazard factors; step 1-3, identifying and acquiring time characteristic information from physical examination archive information, uniformly converting the month number and then storing the time characteristic information in an integer form; Step 1-4, supplementing the information of the times of treatment on the basis of the information extracted from the physical examination archives to form an original data set together; And 1-5, constructing a neural network model, inputting partial characteristic data of the original data set into the neural network model, performing deep learning extraction on the original data set to obtain important characteristics, and forming a new characteristic data set by the important characteristics and the residual partial characteristic data of the original data set together.
- 3. The method for predicting professional health examination results based on expert system and machine learning as set forth in claim 2, wherein the time characteristic information in the steps 1-3 comprises total working age, next working age, start-stop date.
- 4. The professional health examination result prediction method based on the expert system and the machine learning as claimed in claim 2, wherein the neural network model in the steps 1-5 comprises an input layer, a hidden layer and an output layer, wherein the input layer is used for receiving data with dimensions being feature quantities in a training set, the hidden layer comprises two linear layers, the first linear layer maps input features to 256-dimensional feature vectors, dropout regularization is added after the first linear layer, the dropping rate of Dropout regularization is 0.2, the second linear layer compresses the 256-dimensional feature vectors output by the first linear layer to obtain 128-dimensional feature vectors, the data features output by the second linear layer serve as important features, each linear layer is followed by a ReLU activation function to introduce nonlinearity and enhance the expression capability of the model, and the linear layer of the output layer maps the 128-dimensional feature vectors to the quantity of target categories.
- 5. The professional health examination result prediction method based on the expert system and the machine learning as set forth in claim 1, wherein the neural network model in the steps 1-5 trains the network by using a cross entropy loss function and an Adam optimizer.
- 6. The method for predicting professional health examination results based on expert system and machine learning according to claim 1, wherein the prediction labels in step 2 include "review", "suspected professional disease", "professional contraindications" and other diseases or abnormalities.
- 7. The professional health examination result prediction method based on the expert system and the machine learning according to claim 1, wherein in step 3, the accuracy of the test set in each learning model is used as an evaluation index, and grid search is adopted to find the hyper-parameters of each learning model.
- 8. The professional health examination result prediction method based on the expert system and the machine learning as set forth in claim 1, wherein the specific steps of the step 3 are as follows: Step 3-1, randomly dividing a new data set which is subjected to expert system links into a training set and a testing set according to the proportion of 7:3, respectively training more than two machine learning models by using the training set, searching by adopting grids in a set super-parameter range of each learning model, and searching super-parameters of the model by taking the accuracy of prediction of the testing set on the model as an evaluation index; And 3-2, carrying out weighted average on the prediction results of more than two models by a weighted voting method to obtain a final output result, wherein the accuracy of each model in a Test set is adopted as an evaluation index to allocate the weight of each model.
- 9. The professional health examination result prediction method based on the expert system and the machine learning according to any one of the claims 1, 7 and 8, wherein the formula of the weighted voting is as follows: ; Wherein, the Is the first The prediction results of the individual models are used, Is the first The weights of the models are assigned according to the performance of each model on the test set.
Description
Professional health examination result prediction method based on expert system and machine learning Technical Field The invention relates to the field of predictive medicine, in particular to a professional health examination result prediction method based on an expert system and machine learning. Background The basic task of predictive medicine is to predict the likelihood of an individual contracting a disease, which is crucial for the establishment of effective preventive measures, with the aim of preventing the disease completely or at least reducing the influence of the disease on the patient. In recent years, data mining and machine learning techniques have made significant progress in solving the problems of predictive medicine and bioinformatics. However, due to the diversity of data sets and the specificity of predictive medical tasks, there are relatively few studies of data analysis, processing, mining, and predictive models for professional health examination outcome diagnostic conclusions. Although some analysis and mining work is performed in other relevant data sets, the research often does not deeply discuss the characteristics of the data, and the used model is not ideal in prediction effect and has limitation in practical application. The performance of prior art k-nearest neighbor (KNN) based algorithms depends to a large extent on the quality and quantity of training data. If there is a bias or incomplete in the training data, the prediction accuracy of the model will be affected. Secondly, the KNN algorithm requires calculation of distances between the sample to be predicted and all training samples when processing a large-scale data set, so that the calculation complexity is high, and the prediction process is slow. Furthermore, KNN algorithms are highly sensitive to feature scale, and therefore typically require feature normalization before application. And Support Vector Machine (SVM) based algorithms are very sensitive to parameter selection, especially regularization parameters C and kernel parameters. The selection of these parameters typically requires optimization by cross-validation or the like. Second, SVM may suffer from computational efficiency problems when processing large-scale data sets, especially in cases where the feature dimensions are very high. Furthermore, SVM models are relatively poorly interpreted, so-called "black box" problems, which can be a challenge in the field of occupational health examinations and diagnostics where model interpretation is required. Furthermore, while Random Forest (RF) based algorithms represent a significant advantage in dealing with predictions of occupational health exams and diagnosis of occupational disease, they also face some challenges and limitations in practical applications. First, due to the complexity of the random forest model, it may contain a large number of decision trees, which not only increases the training time of the model, but also may result in higher memory consumption. In addition, the interpretation of the model is also affected to a certain extent, and the prediction results of a plurality of decision trees are integrated, so that the decision process of the model is not transparent enough and is difficult to interpret to non-professional users. The performance of random forests depends to a large extent on the quality and representativeness of the training data, and deviations or incompetence of the data can affect the prediction accuracy of the model. At the same time, while random forests reduce the risk of overfitting by an integrated approach, overfitting can still result if model parameters, such as the number of trees and the number of feature choices, are not properly adjusted. Disclosure of Invention The invention aims to provide a professional health examination result prediction method based on an expert system and machine learning. The technical scheme adopted by the invention is as follows: The professional health examination result prediction method based on the expert system and the machine learning comprises the following steps: The method comprises the steps of 1, a data processing link, wherein the data processing link is to acquire physical examination archive information, clean and normalize the physical examination archive information, and then add the information of the times of treatment to form an original data set; further, the step 1 specifically includes the following steps: Step 1-1, traversing the information of the physical examination files, and deleting redundant characteristic information irrelevant to the physical examination conclusion; further, the redundant feature information is a personal information feature including a name, a physical examination number, a person unit name, a region to which the person unit belongs. Step 1-2, extracting hazard factor information from physical examination archive information, converting each single hazard factor in the hazard factor i