CN-121997012-A - Feature selection method and system for screening influence factors of chemical accidents
Abstract
The invention discloses a feature selection method and a feature selection system for screening influence factors of chemical accidents, which comprise the steps of determining all features causing chemical accidents according to historical chemical data of a target chemical device area, screening a plurality of features meeting preset importance requirements on the chemical accidents as preselected features of the chemical accidents in the current period, inputting the preselected features into a preset feature subset optimization model to obtain an optimal feature subset, aiming at each feature in the preselected features, obtaining evaluation results which respectively represent the importance of the features in the current period and the importance of the features in the historical period based on expert knowledge, evaluating the reliability degree of each evaluation result according to the source of each evaluation result, and further combining the optimal feature subset to generate comprehensive score values representing the actual importance of each feature on the chemical accidents in the current period, thereby obtaining the optimal feature which is suitable for the chemical accidents in the current period. The invention can scientifically select the characteristics of causing chemical accidents.
Inventors
- HOU XIAOJING
- HOU XIAOBO
- WU GUANJUN
- MAO WENFENG
- WANG WEIQIANG
- ZHANG JIALIANG
Assignees
- 中国石油化工股份有限公司
- 中石化安全工程研究院有限公司
Dates
- Publication Date
- 20260508
- Application Date
- 20241101
Claims (12)
- 1. A characteristic selection method for screening influence factors of chemical accidents is characterized by comprising the following steps: According to the historical chemical data of the target chemical device area, determining all the characteristics causing chemical accidents, and screening a plurality of characteristics meeting preset importance requirements on the chemical accidents from the characteristics as preselected characteristics of the chemical accidents in the current period; inputting the preselected features into a preset feature subset optimization model to obtain an optimal feature subset; And aiming at each characteristic in the preselected characteristics, acquiring a plurality of first evaluation results which are made based on expert knowledge and represent the characteristic importance of the current period and a plurality of second evaluation results which represent the characteristic importance of the historical period, evaluating the reliability degree of each evaluation result according to the source of each evaluation result, and further generating a comprehensive grading value which represents the actual importance of each characteristic to the chemical accident of the current period by combining the optimal characteristic subset, thereby acquiring the optimal characteristic which is suitable for the chemical accident of the current period.
- 2. The feature selection method according to claim 1, wherein in the step of screening therefrom a plurality of features satisfying a preset importance requirement for the chemical accident as preselected features of the chemical accident of the current period, comprising: And respectively carrying out sparsity evaluation, correlation evaluation and redundancy evaluation on all the features, so that a plurality of features meeting preset sparsity requirements, preset correlation requirements and preset redundancy requirements simultaneously are used as the plurality of features meeting preset importance requirements on the chemical accident, and the preselected features of the chemical accident in the current period are obtained.
- 3. The feature selection method according to claim 1 or 2, characterized in that the preset feature subset preference model adopts an ML model.
- 4. The feature selection method according to any one of claims 1 to 3, wherein the first evaluation result and the second evaluation result are important, non-important, or not determinable, and wherein the step of generating a comprehensive score value indicating the actual importance of each feature to the chemical industry accident in the current period includes: configuring a grading value for each evaluation result according to the importance degree; For each of the preselected features, marking a scoring value of each first scoring result as a first scoring value; Extracting a second evaluation result made by an expert with chemical accident influence factor evaluation experience from the plurality of second evaluation results aiming at each feature in the preselected features, marking the score value of each extracted second evaluation result as a second score value, further determining the reliability degree of each extracted second evaluation result according to the professional field of the expert making each extracted second evaluation result, carrying out weight assignment on the corresponding second evaluation result according to the reliability degree, and calculating the total score value of the chemical accident comprehensive importance of each feature in the current period by combining the first score value and the second score value based on the weight assignment; And correcting the total grading value according to the feature coincidence state of the preselected features and the optimal feature subset, so as to obtain the comprehensive grading value.
- 5. The feature selection method according to claim 4, wherein the total score value is calculated using the following expression: Where f denotes a total score value, f d denotes a first score value, b 1 and b 2 denote coefficients, respectively, m denotes the total number of first and second evaluation results, N denotes the same number as the actual evaluation result among all the evaluation results, f j denotes a second score value, w denotes a weight, N denotes the total number of first evaluation results, and x denotes the number of first evaluation results.
- 6. The feature selection method according to claim 4 or 5, characterized in that in determining the reliability degree of each extracted second evaluation result according to the professional field to which the expert who made each extracted second evaluation result belongs, comprising: If the professional field is chemical industry, judging that the reliability of the extracted second evaluation result is highest; If the professional field is emergency, judging that the reliability of the extracted second evaluation result is moderate; And if the professional field is neither chemical nor emergency, judging that the reliability degree of the extracted second evaluation result is the lowest.
- 7. The feature selection method according to claim 6, wherein in the process of assigning weights to the respective second evaluation results, comprising: The weight of the second evaluation result with the highest reliability is assigned to be 2; assigning the weight of the second evaluation result with moderate reliability to 1.5; and (5) assigning the weight of the second evaluation result with the lowest reliability to 1.
- 8. The feature selection method according to any one of claims 4 to 7, characterized in that in the process of configuring the score value for each evaluation result in accordance with the degree of importance, it comprises: For the evaluation results, important, non-important and indeterminate whether important, the score values were respectively set to 1 point, 0.5 point and 0 point.
- 9. The feature selection method according to any one of claims 4 to 8, characterized in that in the process of obtaining the integrated score value, it includes: And comparing the preselected characteristics with the optimal characteristic subsets, respectively endowing different grading values for the superposition characteristics and the non-superposition characteristics between the preselected characteristics and the optimal characteristic subsets, and correcting the corresponding total grading values by utilizing the endowed grading values to obtain corresponding comprehensive grading values.
- 10. The feature selection method according to claim 9, characterized in that, The score value assigned to each coincident feature is 1; The score value assigned to each non-coincident feature is 0.
- 11. A computer readable storage medium containing a series of instructions for performing the steps of the feature selection method of screening for a chemical accident impact according to any one of claims 1 to 10.
- 12. A feature selection system for screening influence factors of chemical accidents, which is characterized by comprising the following modules: The characteristic preselection module is used for determining all the characteristics causing the chemical accident according to the historical chemical data of the target chemical device area, and screening a plurality of characteristics meeting preset importance requirements on the chemical accident from the characteristics as preselection characteristics of the chemical accident in the current period; a feature subset optimization module for inputting the pre-selected features into a pre-set feature subset optimization model to obtain an optimal feature subset; The optimal feature selection module is used for acquiring a plurality of first evaluation results representing the feature importance of the current period and a plurality of second evaluation results representing the feature importance of the historical period, which are made based on expert knowledge, for each feature in the preselected features, evaluating the reliability degree of each evaluation result according to the source of each evaluation result, and further generating a comprehensive score value representing the actual importance of each feature to the chemical accident of the current period by combining the optimal feature subset, and obtaining the optimal feature suitable for the chemical accident of the current period based on the first evaluation results and the second evaluation results.
Description
Feature selection method and system for screening influence factors of chemical accidents Technical Field The invention belongs to the technical field of chemical engineering safety, and particularly relates to a feature selection method and a feature selection system for screening influence factors of chemical engineering accidents. Background The selection of the appropriate descriptor or feature is one of the core issues in exploring chemical plant parameters-chemical industry incidents using machine learning models. Aiming at the actual complex chemical accident process, the mapping relation between the independent variable and the dependent variable is difficult to uniformly characterize by adopting single linearity or nonlinearity. The prior art discloses a soft measurement method for Dioxin (DXN) emission concentration in a solid waste incineration (MSWI) process based on multilayer feature selection, and belongs to the field of soft measurement. Firstly, constructing a comprehensive evaluation value index from a single feature and DXN correlation view angle by combining correlation coefficients and mutual information to realize layer 1 feature selection of MSWI subsystem process variables, secondly, operating a feature selection algorithm based on GA-PLS for multiple times from a multi-feature redundancy and feature selection robustness view angle to realize layer 2 feature selection, and finally, carrying out layer 3 feature selection by combining statistical frequency, model prediction performance and mechanism knowledge of upper layer selection features to construct and obtain a DXN emission concentration soft measurement model, thereby realizing effective measurement of DXN emission concentration. The prior art also discloses a cancer gene classification method, equipment and storage medium based on the two-stage depth feature selection, which comprises training a cancer gene classification model and cancer gene classification. In the process of training a cancer gene classification model, training data is obtained, three feature selection algorithms are integrated in the first stage to perform overall feature selection to obtain a feature subset, an unsupervised neural network is used in the second stage to obtain the optimal representation of the feature subset, and the optimal representation of the feature subset is divided into a training set and a testing set and is input into the neural network to be trained. In the cancer gene classification process, the cancer gene data to be detected is preprocessed and then input into a trained cancer gene classification model, so that the cancer gene classification is realized. The cancer gene classification method realizes feature selection considering all aspects by using an integrated feature selection method, and obtains cleaner gene features by extracting the optimal representation of the features by using an unsupervised neural network, thereby improving classification accuracy. In addition, the prior art also discloses a method for evaluating the suitability of the bottom-sowing scallop culture area site selection based on an analytic hierarchy process, which comprises the following steps of constructing an index system for site selection evaluation, carrying out quantization treatment on all secondary indexes by adopting a plurality of normalization methods, establishing a quantitative evaluation standard corresponding to each secondary index, adding a classification quantization treatment process aiming at the secondary index system, calculating normalized values of all secondary indexes step by establishing the quantitative evaluation standard, then calculating normalized values of primary indexes by using the normalized values of the secondary indexes, calculating an evaluation result value by using the normalized values of the primary indexes and the weight values of the primary indexes, and giving a suitability grade evaluation result by adopting a weighted linear combination mode, thereby realizing the suitability evaluation on the target sea area site selection. The evaluation method enables the ocean pasture site selection evaluation process to be objective, targeted, scientific and comprehensive, and can realize efficient calculation and visual presentation of evaluation results. Feature Selection (FS) mainly includes three types of methods, stability selection, recursive feature elimination, and univariate feature selection based on mutual information. These methods, while yielding models with good generalization performance, require a series of parameter adjustments and model selections. In connection with the foregoing prior art, the inventors have found that current feature selection algorithms typically require cumbersome hyper-parametric tuning while ignoring the expert's prior knowledge that is more relevant with respect to certain features, which may result in some key features being deleted. In the face of complex and variabl