CN-121997203-A - Shale organic matter enrichment main control factor quantitative characterization method and system based on machine learning
Abstract
The invention relates to a quantitative characterization method and a quantitative characterization system for shale organic matter enrichment main control factors based on machine learning, and belongs to the technical field of unconventional energy exploration and evaluation. According to the invention, a shale organic matter enrichment influence factor parameter system covering ancient climate, ancient productivity, redox conditions, land source input and deposition rate is constructed, the index collinearity problem is solved through main component analysis and dimension reduction treatment, a gradient lifting tree model is adopted to establish a nonlinear relation between multisource geological parameters and organic matter enrichment degree, four characteristic importance assessment methods including statistical analysis, information theory, model structure and model interpretation are integrated for systematic analysis, and comprehensive influence values are converted into standardized percentage output. The method effectively solves the problems of accuracy and reliability of identification of the shale organic matter enrichment main control factors, realizes the transformation from qualitative description to quantitative characterization, and provides scientific and efficient quantitative technical support for shale gas resource evaluation and exploration deployment.
Inventors
- CHENG YUNHAO
- CHEN LEI
- ZHANG ZUYOU
- Wu Shuaicai
- Liao Chongjie
- XIONG MIN
- LIU XIANGYU
- CHEN XIN
Assignees
- 西南石油大学
Dates
- Publication Date
- 20260508
- Application Date
- 20260127
Claims (10)
- 1. The quantitative characterization method of the shale organic matter enrichment master control factor based on machine learning is characterized by comprising the following steps of: Constructing a shale organic matter enrichment influence factor parameter system, wherein the shale organic matter enrichment influence factor parameter system comprises an ancient climate index, an ancient productivity index, a redox index, a land input intensity index and a deposition rate change index; Performing index correlation analysis and co-linear feature identification on the influence factor parameter system; Performing characteristic dimension reduction treatment on the paleo-productivity index and the redox index with multiple collinearity, and performing dimension reduction on the paleo-productivity index and the redox index by adopting a principal component analysis method to obtain principal component variables; Taking the total organic carbon content as a characterization parameter of shale organic matter enrichment degree, constructing a machine learning model based on a gradient lifting tree based on a main component variable after dimension reduction and an environmental index which does not participate in dimension reduction treatment, and modeling a nonlinear relation between multi-source deposition and environmental factors and shale organic matter enrichment degree; Performing systematic evaluation on the action degree of each input characteristic parameter in the machine learning model by adopting a plurality of characteristic importance evaluation methods, wherein the plurality of characteristic importance evaluation methods comprise a statistical analysis method, an information theory analysis method, a model structure related method and a model interpretation method; Normalizing the results obtained by different feature importance evaluation methods, and carrying out weighted fusion according to preset weights to obtain the comprehensive influence value of each input feature parameter; and converting the comprehensive influence value into a percentage form, and outputting a quantitative characterization result of the shale organic matter enrichment main control factor.
- 2. The method for quantitatively characterizing shale organic matter enrichment master factor according to claim 1, wherein in the step of constructing the shale organic matter enrichment influence factor parameter system, preferable indexes include a paleo-climate index, a paleo-productivity index, a redox index, a land input intensity index and a deposition rate change index.
- 3. The quantitative characterization method of shale organic matter enrichment master control factors according to claim 1, wherein in the step of feature dimension reduction treatment, a main component analysis method is adopted to respectively reduce dimensions of ancient productivity indexes and redox indexes, and a plurality of first main components with accumulated interpretation variances reaching 70% -80% are selected as comprehensive variables.
- 4. The quantitative characterization method of shale organic matter enrichment master control factors according to claim 1, wherein the machine learning model is a XGBoost algorithm-based regression model, wherein the maximum depth of a tree is 3-10, the learning rate is 0.01-0.03, the number of weak learners is 50-500, and the sub-sampling ratio of samples is 0.5-1.0.
- 5. The method for quantitatively characterizing shale organic matter enrichment master factor according to claim 1, wherein the plurality of feature importance assessment methods comprise: Statistical analysis methods, namely adopting pearson correlation coefficient analysis and ANOVA variance analysis; Adopting mutual information regression analysis; Calculating a decision tree gain value and an arrangement characteristic importance based on a machine learning model; model interpretation method SHAP value analysis was used.
- 6. The quantitative characterization method of shale organic matter enrichment master factor according to claim 1, wherein in the step of weighting and fusing the results obtained by different feature importance evaluation methods, the weight of the SHAP value analysis result is 30%, the weight of the permutation feature importance analysis result is 25%, the weight of the ANOVA variance analysis result is 20%, the weight of the decision tree gain analysis result is 15%, and the weight of the mutual information regression analysis result is 10%.
- 7. The method for quantitatively characterizing shale organic matter enrichment master factor according to claim 1, wherein in the step of outputting the quantitative characterization result of the shale organic matter enrichment master factor, the quantitative characterization result is stored and displayed in a data table, a graphical result or an electronic data form.
- 8. A shale organic matter enrichment master factor quantitative characterization system based on machine learning, which is characterized by being used for implementing the shale organic matter enrichment master factor quantitative characterization method according to any one of claims 1 to 7, comprising the following steps: the data acquisition module is used for acquiring multisource sedimentology parameters, geochemical parameters and mineralogy parameters corresponding to the shale samples; The parameter screening and preprocessing module is used for carrying out index optimization on the acquired multi-source parameters; The feature dimension reduction module is used for carrying out correlation analysis and feature dimension reduction processing on the optimized index parameters so as to construct an input feature parameter set for machine learning analysis; the machine learning modeling module is used for constructing a shale organic matter enrichment machine learning model based on the input characteristic parameter set, and training and verifying the model; the feature importance analysis module is used for calculating the influence degree of each input feature parameter on the shale organic matter enrichment degree based on various feature importance evaluation methods; And the fusion and output module is used for carrying out fusion calculation on the evaluation results of different feature importance, converting the evaluation results into a percentage form and outputting quantitative characterization results of the shale organic matter enrichment main control factors.
- 9. The quantitative characterization system of shale organic matter enrichment master control factors according to claim 8, wherein a machine learning model adopted in the machine learning modeling module is a regression model based on XGBoost algorithm, wherein the maximum depth of a tree is 3-10, the learning rate is 0.01-0.03, the number of weak learners is 50-500, and the sample sub-sampling ratio is 0.5-1.0.
- 10. The quantitative characterization system of shale organic matter enrichment master control factors according to claim 8, wherein weights for weighted fusion of different feature importance assessment results in the fusion and output module are set to be 30% for SHAP value analysis results, 25% for arrangement feature importance analysis results, 20% for ANOVA variance analysis results, 15% for decision tree gain analysis results and 10% for mutual information regression analysis results. .
Description
Shale organic matter enrichment main control factor quantitative characterization method and system based on machine learning Technical Field The invention relates to the technical field of unconventional energy exploration and evaluation, in particular to a shale organic matter enrichment main control factor quantitative characterization method and system based on machine learning. Background Shale gas is used as an important unconventional energy resource, and the exploration and development value of the shale gas is closely related to the enrichment degree of shale organic matters. Shale organic matter enrichment is a key link of shale gas formation and preservation, and the enrichment degree is comprehensively controlled by various geological factors, including paleoclimates, paleoproductivity, redox conditions, land input intensity, deposition rate and the like. The main control factors for the shale organic matter enrichment are accurately identified and quantitatively characterized, and the method has important guiding significance for shale gas resource evaluation and exploration deployment. Currently, studies of shale organic matter enrichment master factors rely primarily on geostatistical methods and simple correlation analysis. The traditional method generally adopts a single index or a simple linear regression model to analyze the relation between each factor and the organic matter content, and has the following obvious limitations: First, there is a general problem of multiple collinearity between the multisource geologic parameters, resulting in unreliable results of conventional correlation analysis. For example, there is often a high correlation between ancient productivity indexes (e.g., ba bio, P/Ti, P/Al) and redox indexes (e.g., U/Th, V/(V+Ni)), making it difficult to accurately evaluate the contribution of a single index. Secondly, the relationship between shale organic matter enrichment and multifactor has obvious nonlinear characteristics, and the traditional linear model is difficult to accurately describe the complex relationship. The impact of geological environmental factors on organic matter enrichment is often not a simple linear superposition, but rather there is a complex interaction. Third, existing studies lack a systematic feature importance assessment mechanism. The contribution of each factor in the organic matter enrichment process cannot be comprehensively reflected by a single characteristic importance assessment method (such as a pearson correlation coefficient), and the results of different assessment methods are different, so that an effective fusion mechanism is lacked. Fourth, the lack of standardized procedures for quantitative characterization methods results in difficult comparisons and integration of results between different studies. At present, research stays in a qualitative description stage, and the quantitative contribution degree of a main control factor is difficult to provide, so that the application in actual exploration and development is limited. In recent years, the application of machine learning methods in the geological field is gradually increasing, and algorithms such as random forests, support vector machines and the like are tried to be used for geological parameter prediction. However, aiming at the quantitative characterization of shale organic matter enrichment master control factors, the prior research still has the defects that 1) the multiple collinearity problem among geological parameters is not effectively treated, 2) the systematic analysis is not carried out by adopting a plurality of characteristic importance assessment methods, 3) a scientific fusion mechanism for the results of different assessment methods is not adopted, and 4) a standardized quantitative characterization flow is not established. Therefore, a quantitative characterization method for the shale organic matter enrichment master control factors, which can effectively process complex relations among multi-source geological parameters and accurately quantify contribution degrees of all factors, is needed to improve the precision and efficiency of shale gas resource evaluation. Disclosure of Invention Aiming at the defects of the prior art, the invention provides a shale organic matter enrichment main control factor quantitative characterization method and system based on machine learning. The method aims to solve the problems of insufficient multi-collinearity processing, inaccurate modeling of nonlinear relations, insufficient feature importance assessment system, lack of standardized flow in quantitative characterization and the like in the prior art. In order to achieve the above object, the present invention adopts the following technical scheme: A shale organic matter enrichment main control factor quantitative characterization method based on machine learning comprises the following steps: Constructing a shale organic matter enrichment influence factor param