Search

CN-120633905-B - Multi-factor coupling-based mountain area extreme storm event prediction and analysis method

CN120633905BCN 120633905 BCN120633905 BCN 120633905BCN-120633905-B

Abstract

The invention discloses a multi-factor coupling-based method for predicting and analyzing extreme storm events in a hilly area, which comprises the following steps of S1 data collection and preprocessing, dividing the preprocessed data into a training data set and a test data set, S2 multi-factor feature extraction, main meteorological features extraction, calculation of a topography index to represent the influence of topography on precipitation, S3 key influence factor screening, S4 prediction model construction and training, and construction of the extreme storm event prediction model by adopting a support vector machine algorithm, and S5 prediction result evaluation, wherein the evaluation comprises accuracy, recall rate, precision rate and comprehensive index F1 value for balancing the recall rate and the precision rate. The method reduces redundancy and noise of data, improves efficiency and accuracy of a prediction model, and can more accurately capture complex relations among multiple factors, thereby improving prediction accuracy of extreme storm events.

Inventors

  • LIU XIAO
  • LIU RONGHUA
  • TIAN JIYANG
  • LIU QI
  • SUN CHAOXING
  • LIU XIAOWAN

Assignees

  • 中国水利水电科学研究院

Dates

Publication Date
20260512
Application Date
20250522

Claims (8)

  1. 1. The method for predicting and analyzing the extreme storm event of the hilly area based on multi-factor coupling is characterized by comprising the following steps: Step S1, data collection and preprocessing, namely collecting an ERA5 data set of a European medium-term weather forecast center, actual measurement data of a weather station in a research area and DEM digital elevation model data reflecting the topographic and topographic features of the research area, preprocessing the collected data, including quality control and standardization processing, wherein the quality control sequentially comprises the steps of adopting a space-time coupled dynamic threshold method, multi-variable joint inspection, space-time density cluster verification and physical mechanism inversion verification to remove abnormal values and filling missing values, and dividing the preprocessed data into a training data set and a test data set; the physical mechanism inversion verification is to perform water vapor flux on abnormal values which are removed by standby Verifying, judging whether the 95 th percentile of the historical extremum of the research area is exceeded, if yes, judging that the actual extreme event is generated, and reserving the data; s2, multi-factor feature extraction, namely performing dimension reduction processing by using a principal component analysis method, extracting main meteorological features and calculating a topography index To represent the effect of terrain on precipitation: Wherein a is the confluence area and b is the gradient; Step S3, screening key influence factors, namely sorting the importance of the extracted multi-factor features by using a random forest algorithm, and selecting the features with higher importance scores as the key influence factors of the extreme storm event, wherein the method comprises the following steps: Step S31, calculating the overall importance, namely constructing a plurality of decision trees by utilizing a random forest to classify or regress the data, randomly selecting a part of samples and features from each decision tree to train, and evaluating the overall importance by calculating the average reduction of the non-purity of each feature in all the decision trees after training, and setting the features In the first place The importance in the decision tree is that Features of The overall importance calculation formula of (2) is: In the formula, Is characterized by Is used to determine the overall importance of the (c) in the (c) system, For the number of decision trees, Is characterized by In the first place Importance in the decision tree; step S32, selecting key influencing factors, namely sorting the multi-factor features extracted in the step S2 from large to small according to overall importance, and selecting the front with higher overall importance The individual features are key contributors to the extreme stormwater event, wherein, The range of the values is as follows , Selecting the front part with higher overall importance for the total number of multi-factor features extracted in the step S2 And when the number of features is equal to or greater than a second preset threshold, ensuring that the accumulated importance is larger than or equal to a second preset threshold, wherein the accumulated importance Cumulative Importance is expressed as follows: In the formula, Features ordered from big to small Is of overall importance; s4, constructing and training a prediction model, namely constructing an extreme storm event prediction model by adopting a support vector machine algorithm, using key influence factors of the screened extreme storm event as input features of the prediction model, and training the prediction model until model training is completed by using 'whether the extreme storm event occurs' as an output label of the prediction model; Step S5, evaluating the prediction result, namely evaluating the extreme storm event prediction model trained in step S4 by using the test data set in step S1, wherein the evaluation indexes comprise accuracy rate, recall rate, precision rate and comprehensive indexes for balancing the recall rate and the precision rate A value; The specific step of eliminating the abnormal value in the step S1 includes: firstly, single-variable space-time dynamic threshold detection, namely constructing a space-time sliding window for each meteorological variable, and calculating a dynamic threshold : In the formula, For the 75 th percentile within the space-time window, Is a four-component bit distance, the four-component bit distance is equal to the four-component bit distance, , 25 Percentile for the space-time window; As a mean of the station and neighborhood elevation differences, In order to refer to the elevation parameter(s), In order to be a topography adjusting factor, In order to be a site elevation, In order to study the average elevation of the region, Is Cheng Biaozhun difference high; for suspected outliers exceeding a threshold, the curvature of the time series is passed Verification, the formula is: If the following conditions are satisfied, it is determined that the abnormal value is: In the formula, For the time step at the current moment, At time for meteorological variable The observed value of the position is calculated, Is time of The observations of the next time step later, Time of The observations of the previous time step, Is the standard deviation of the curvature of the lens, Is the seasonal average value; Second, calculating the distance of the multi-variable Marshall's distance : In the formula, In the case of a multi-variable observation vector, As a mean value vector of the data set, Transpose operators that are matrices or vectors; Is covariance matrix; Introducing terrain weight correction threshold : In the formula, As the number of variables to be used, Is of degree of freedom of 99% Fractional number of the chi-square distribution, Is the topography index of the current point location, In order to study the area maximum topography index, In order to be the confluence area, Is a gradient; If it meets Then a multivariate coupling outlier is determined; thirdly, space-time density clustering verification, namely calculating a composite space-time distance measure : In the formula, In order to be a spatial euclidean distance, In order to provide for the time interval of time, To investigate the maximum euclidean distance between all spatial point pairs within a region, To investigate the maximum time span within the time range, For the variable trend pearson correlation coefficient, 、 、 In order to adjust the weight coefficient of the weight, ; Calculating an adaptive density parameter: In the formula, For the neighborhood radius, As the mean value of the distance distribution, For the distance standard deviation, minPts is the minimum cluster number, Is the total number of samples; If the data sample points do not belong to any density clusters and are not covered by more than 80% of adjacent points, judging the data sample points as abnormal values; performing physical mechanism inversion verification, namely performing water vapor flux on abnormal values removed by standby Verification, the formula is: the data is retained if the following conditions are met, if it is determined to be a true extreme event: In the formula, To investigate the 95 th percentile of the regional history extremum, The specific humidity, i.e. the mass of water vapor in the unit mass of humid air, Is a three-dimensional wind speed vector, In the east-west direction, In the north-south direction, the direction is the south, Is in the vertical direction and is in the vertical direction, For east-west wind speed, east is positive, Wind speed in the north-south direction, north is positive, The wind speed in the vertical direction is positive upwards.
  2. 2. The method for predicting and analyzing extreme storm events in hilly area based on multi-factor coupling as set forth in claim 1, wherein said filling missing values in step S1 is specifically performed by filling missing values by linear interpolation, and the original data sequence is set as The linear interpolation formula is: In the formula, In order to be able to delete the value, Is that The valid data that is immediately before the next, Is that After which the next valid data is stored, Is the position of the missing value To the previous valid data The next adjacent is effective data of (a) Is a number of intervals of (a).
  3. 3. The method for predicting and analyzing extreme storm events in hilly area based on multi-factor coupling as set forth in claim 1, wherein the normalization process in step S1 is specifically to convert the values of each variable into a standard normal distribution with a mean value of 0 and a standard deviation of 1 to eliminate the influence of different variable dimensions, and the normalization process formula is as follows: In the formula, Is the first Sample number The values of the individual variables are normalized and, Is the first Sample number The value of the individual variable is used to determine, Is the first The average value of the individual variables is calculated, Is the first Standard deviation of the individual variables.
  4. 4. The method for predicting and analyzing extreme storm events in a hilly area based on multi-factor coupling as claimed in claim 1, wherein said multi-factor feature extraction in step S2 specifically comprises the sub-steps of: Step S21, calculating a covariance matrix S, wherein the weather station measured data of the training data set in the step S1 is set as Data matrix of (2) Wherein For the number of samples to be taken, Calculated for the total number of meteorological features Covariance matrix S: In the formula, Is the transpose operator of the matrix or vector, Is that Is a transpose matrix, covariance matrix of (a) Is that Is a real symmetric matrix of (a); step S22, eigenvalue decomposition, namely, the covariance matrix obtained in step S21 is subjected to There is an orthogonal matrix And diagonal matrix Such that: Wherein the diagonal matrix Is covariance matrix Is of the characteristic value of (2) Orthogonal matrix Is the corresponding unit orthogonal feature vector ; Step S23, dimension reduction processing before selection Feature vectors corresponding to the feature values Respectively as column vectors to form a projection matrix : Wherein, the The range of the values is as follows Before selecting When the feature values are obtained, the accumulated variance contribution rate is ensured to be larger than or equal to a first preset threshold value, and the accumulated variance contribution rate Cumulative Variance Ratio is expressed as the following formula: In the formula, The ith characteristic value is sorted from big to small; by projection matrix Matrix of weather station measured data of training data set in step S1 Projecting to a low-dimensional space to obtain a principal component matrix Y: Finally, the first k main meteorological features are extracted; Step S24, calculating the topography index, namely calculating the topography index on the digital elevation model data of the training data set in the step S1 。
  5. 5. The method for predicting and analyzing extreme storm events in hilly area based on multi-factor coupling as set forth in claim 1, wherein the support vector machine algorithm in step S4 is specifically a formula for solving an optimization problem by a support vector machine for the linearly separable case: where w is a weight vector, b is a bias term, In order to input a sample of the sample, To map the input samples to a function of the high-dimensional feature space, For the class label of the sample, ; For the linear inseparable case, the equation for solving the optimization problem by the support vector machine is: In the formula, In order to relax the variables of the variables, Is a penalty factor.
  6. 6. The method for predicting and analyzing extreme storm events in hilly area based on multi-factor coupling as set forth in claim 1, wherein said training of said predictive model in step S4 until model training is completed is performed by optimizing model parameters by using grid search traversal kernel function parameter combination in combination with cross-validation during training, and selecting optimal penalty factors And kernel function parameters; The model training completion specifically comprises the following criteria: the optimization algorithm meets a convergence condition that the change of an objective function between adjacent iterations is smaller than a third preset threshold value, and the training sample meets Karush-Kuhn-Tucker conditions; if the optimization algorithm is not converged within the fixed iteration times, stopping forcedly, and avoiding infinite loop; The performance of the test data set is stable, namely, on the cross verification or the test data set, if the performance of the model is not improved after the specific round is continuous, training is stopped in advance, and overfitting is prevented; Parameter optimization is accomplished when an optimal penalty factor is found by grid search And the kernel function parameters, and the solution of the model on the training set meets the constraint condition of the optimization problem, and the training is considered to be completed.
  7. 7. The method for predicting and analyzing extreme storm events in hilly area based on multi-factor coupling as set forth in claim 1, wherein the calculation formula of the Accuracy in step S5 is as follows: In the formula, For the number of real examples, As a number of true counter-examples, As the number of false positive examples, The number of false counter examples; is in the range of 0 Indicates that the extreme storm event prediction is completely incorrect, and 1 indicates that the extreme storm event prediction is completely correct.
  8. 8. The method for predicting and analyzing extreme storm events in a hilly area based on multi-factor coupling as set forth in claim 1, wherein said F1 value in step S5 is: In the formula, Value representation And The harmonic mean of the two indices, representing the two indices is equally important; The range of values is 0 Represents And At least one of 0,1 represents And Are all 1.

Description

Multi-factor coupling-based mountain area extreme storm event prediction and analysis method Technical Field The invention relates to the technical field of meteorological disaster prediction, in particular to a hilly area extreme storm event prediction and analysis method based on multi-factor coupling. Background Under the large background of global climate change, extreme storm events frequently occur, serious natural disasters such as landslide, debris flow, flood and the like are brought to hilly areas, and huge threats are caused to life and property safety and ecological environment of people. AI and machine learning are widely used in weather prediction, and prior studies have widely explored how machine learning can be used in rainfall prediction and disaster prevention and reduction. However, most of the existing storm prediction methods focus on analysis of single meteorological elements or few factors, particularly mountain and complex terrains, and the prior art fails to fully consider coupling effects among the complex terrains and terrains of hillside areas, atmospheric circulation, water vapor transportation and other factors, so that prediction accuracy and reliability of extreme storm events are insufficient. Therefore, the development of the method for predicting and analyzing the extreme storm event in the hilly area, which can comprehensively consider the influence of multiple factors, has important practical significance. Disclosure of Invention In order to solve the problems, the invention provides a multi-factor coupling-based method for predicting and analyzing extreme storm events in a hilly region, which is used for comprehensively analyzing various meteorological and geographic factors and constructing a prediction model by using a combination of principal component analysis, random forests and a support vector machine so as to improve the prediction precision of the extreme storm events in the hilly region. The invention discloses a method for predicting and analyzing extreme storm events in a hilly area based on multi-factor coupling, which comprises the following steps: Step S1, data collection and preprocessing, namely collecting an ERA5 data set of a European medium-term weather forecast center, actual measurement data of a weather station in a research area and DEM digital elevation model data reflecting the topographic and topographic features of the research area, preprocessing the collected data, including quality control and standardization processing, wherein the quality control includes adopting a space-time coupled dynamic threshold method and a multivariate combined test mixed strategy to remove abnormal values and filling missing values, and dividing the preprocessed data into a training data set and a test data set; s2, multi-factor feature extraction, namely performing dimension reduction treatment by using a principal component analysis method, extracting main meteorological features, and calculating a topography index to represent the influence of topography on precipitation; step S3, screening key influence factors, namely sorting the importance of the extracted multi-factor features by using a random forest algorithm, and selecting the features with higher importance scores as the key influence factors of extreme storm events; s4, constructing and training a prediction model, namely constructing an extreme storm event prediction model by adopting a support vector machine algorithm, using key influence factors of the screened extreme storm event as input features of the prediction model, and training the prediction model until model training is completed by using 'whether the extreme storm event occurs' as an output label of the prediction model; And S5, evaluating a prediction result, namely evaluating the extreme storm event prediction model trained in the step S4 by using the test data set in the step S1, wherein evaluation indexes comprise accuracy, recall rate, precision and comprehensive index F1 value balancing the recall rate and the precision rate. Further, the specific step of eliminating the outlier in the step S1 includes: Firstly, single-variable space-time dynamic threshold detection, namely constructing a space-time sliding window for each meteorological variable, and calculating a dynamic threshold Q upper (t, s): Wherein Q 75 (t, s) is 75 th percentile in the space-time window, IQR (t, s) is quartile distance, IQR (t, s) =Q 75(t,s)-Q25(t,s),Q25 (t, s) is 25 th percentile in the space-time window, Δh is the average value of the elevation differences of the weather station and the neighborhood, h ref is a reference elevation parameter, k (h) is a terrain adjustment factor, h is the elevation of the weather station, h meam is the average elevation of the investigation region, and sigma h is the difference of Cheng Biaozhun; for suspected outliers exceeding a threshold, the curvature of the time series is passed Verification, the formula is: If th