CN-122020172-A - Lightning density construction system and method based on multisource meteorological factors and machine learning
Abstract
The invention belongs to the technical field of meteorological data processing and earth system information, and provides a lightning density construction system and method based on multi-source meteorological factors and machine learning. According to the invention, ground lightning observation data is used as a modeling target, thermodynamic conditions, dynamic conditions and cloud micro-physical related variables in analysis meteorological data are used as input features, a plurality of machine learning models are respectively constructed for training, the prediction results of different models are subjected to fusion processing by adopting an integration method, and global lightning density data is output under unified spatial resolution and time scale. Meanwhile, the stability and consistency of model prediction results are quantitatively evaluated by calculating the accuracy evaluation index of the grid point scale, so that a global lightning density data set with continuous space, consistent time and higher stability is obtained.
Inventors
- WANG JUN
- ZHENG HAO
Assignees
- 南京大学
Dates
- Publication Date
- 20260512
- Application Date
- 20260130
Claims (4)
- 1. The lightning density construction system based on multi-source meteorological factors and machine learning is characterized by comprising a data acquisition module, a data preprocessing module, a characteristic construction module, a machine learning modeling module, an accuracy evaluation module, a key factor analysis module and a model integration and data output module, wherein: The data acquisition module is used for acquiring lightning observation data; the data preprocessing module is used for preprocessing lightning observation data and outputting standardized weather environment variable data and lightning target data of a unified space-time grid; the characteristic construction module is used for coding the standardized meteorological environment variable data and the lightning target data into model-learnable input to obtain a standardized characteristic matrix and corresponding lightning density label data; The machine learning modeling module is used for capturing the evolution rule of the lightning density through learning bias of different algorithms to obtain a prediction result; The precision evaluation module is used for generating a grid point level evaluation field and a statistical table according to the prediction result and outputting an evaluation report and a log record; The key factor analysis module is used for outputting a key factor list, a contribution measurement result and an explanatory analysis chart according to the prediction result and the output of the precision evaluation module; the model integration and data output module is used for storing the multi-model prediction result and final output data.
- 2. The lightning density construction method based on the multisource meteorological factors and machine learning is characterized by comprising the following steps of: Step 1, a data acquisition module is used for acquiring lightning observation data; Step 2, a data preprocessing module is used for preprocessing lightning observation data and outputting standardized weather environment variable data and lightning target data of a unified space-time grid; Step 3, the characteristic construction module is used for coding standardized meteorological environment variable data and lightning target data into model-learnable input to obtain a standardized characteristic matrix and corresponding lightning density label data; Step 4, a machine learning modeling module is used for capturing the evolution rule of the lightning density through learning bias of different algorithms to obtain a prediction result; step 5, the precision evaluation module is used for generating a grid level evaluation field and a statistical table according to the prediction result, and outputting an evaluation report and a log record; Step 6, the key factor analysis module is used for outputting a key factor list, a contribution measurement result and an explanatory analysis chart according to the prediction result and the output of the precision evaluation module; and 7, the model integration and data output module is used for storing the multi-model prediction result and final output data.
- 3. The lightning density construction method based on multi-source weather factors and machine learning according to claim 2, wherein step 4 comprises: Constructing a plurality of machine learning models with different structural types, carrying out lightning density prediction modeling in parallel, and dividing the machine learning models into different model types according to model structural characteristics, wherein the model types comprise an integrated learning model based on a gradient lifting decision tree, a random forest model based on a bagging method and a deep learning model based on a multi-layer neural network structure; In the process of training each model, respectively determining key parameter sets of the model structures, introducing automatic parameter optimization in a preset parameter space, and carrying out iterative adjustment on model parameters according to predictive performance feedback on verification data; the model screening and overfitting control step based on stability judgment comprises the steps of evaluating the variation condition of model prediction performance under different sample dividing conditions through a cross verification mechanism in a model training stage; and (3) model curing and calling preparation steps, namely after model training, parameter optimization and stability judgment are completed, curing the trained model, carrying out serialization storage on model structure information and corresponding parameters, and establishing a model index relation.
- 4. The lightning density construction method based on multi-source weather factors and machine learning according to claim 2, wherein step 6 comprises: The model screening and interpretation path selection step comprises the steps of firstly removing a model with relatively low prediction performance after model training and integrated prediction are completed, respectively determining corresponding characteristic contribution analysis paths for different models according to the structure type of a machine learning model based on an evaluation result, wherein a characteristic contribution decomposition method is selected for the model based on a gradient lifting decision tree structure; the method comprises a model structure-based feature contribution preliminary evaluation step, a feature contribution quantification step, a feature importance measurement step and a feature replacement step, wherein the model structure-based feature contribution preliminary evaluation step is used for respectively calculating the overall contribution degree of input features to a prediction result according to different models, namely, a feature contribution quantification method based on a game theory is adopted for a model based on a gradient lifting decision tree, and feature contribution is quantified; A step of determining a cross-model key feature set, which is to summarize and cross-compare features with higher contribution in each model based on feature contribution ordering results obtained by different models, select features with higher contribution in multiple models and construct a cross-model consistent key feature set; After determining a model with optimal prediction performance and a corresponding key feature set thereof, selecting key features with top feature contribution rank, and analyzing the contribution change condition of the key features to the prediction result in different value intervals based on a sample-level feature contribution decomposition result; and finally, obtaining the dependency relationship and response trend between the key characteristics and the predicted result, and identifying the influence characteristics of the key meteorological factors on the lightning density change under different conditions.
Description
Lightning density construction system and method based on multisource meteorological factors and machine learning Technical Field The invention belongs to the technical field of meteorological data processing and earth system information, and particularly relates to a lightning density construction system and method based on multisource meteorological factors and machine learning. Background Lightning activity is an important direct characterization of strong convective weather processes, closely related to extreme disasters such as precipitation, ice microphysics, deep convective latent heat release, tropospheric upper charge structure, thunderstorm wind, hail and short-time strong precipitation. For applications such as climate diagnosis, extreme event assessment and risk management, it is highly desirable to construct a global lightning density dataset with long time series, spatial continuity, high resolution and consistent caliber for characterizing the years or years of lightning and its regional differences, and for providing data support for improvements in flow parameterization in climate mode, statistical analysis and training and verification of data-driven models, and for business applications such as weather service, insurance assessment and infrastructure lightning protection. However, existing lightning observation means and related data products have difficulty in meeting the above requirements at the same time in terms of spatial coverage, temporal continuity, data consistency, etc., and are mainly characterized in the following aspects: Firstly, the space coverage and sampling conditions are limited; Optical lightning detection (such as OTD/LIS) based on low orbit satellites suffers from orbit characteristics and instantaneous field of view range, and has obvious problems of non-uniformity of time and space sampling, especially in high latitude areas, insufficient effective sample data. The existing LIS/OTD weather state products are mainly given in the form of lattice point statistics in the average sense of years, the product attribute of the existing LIS/OTD weather state products essentially belongs to weather state description of lightning activity, many-year month-by-month global field lightning density lattice point data of continuous time change characteristics are difficult to directly provide, the spatial resolution of the existing LIS/OTD weather state products is relatively coarse, and the requirements of fine weather diagnosis and long-term change analysis are difficult to meet. Secondly, detecting deviation exists in the ground-based global lightning positioning network; The very low frequency radio signal based global ground based lightning location network (WWLLN) enables long term, near real time global observation of lightning activity and can generate lightning density products with higher resolution. However, this type of network has a certain selectivity to lightning type and intensity, its detection efficiency varies with time, area and sensor site configuration, especially in early operation stages (2005-2012), the gradual increase in the number of base stations tends to introduce non-physical time-varying trends. Although the follow-up products (e.g., WGLC) have corrected for such factors as detection efficiency, the time series is still relatively limited in length and difficult to cover earlier historical periods. (III) the lack of directly available lightning density products for the analysis data; The existing weather analysis data provides rich convection environment, dynamic conditions and cloud micro-physical related variables, but generally does not directly contain long-term global lightning density grid point data which can be strictly consistent with the caliber of observed data. Related research relies on empirical parameterization or statistical methods to estimate by using analytical variables as surrogate indicators of lightning activity, but there is still a lack of unified, systematic lightning density dataset construction techniques. In view of the foregoing, there is a need for a generation technique capable of comprehensively utilizing multi-source weather and cloud micro physical and dynamic environmental factors to construct a lightning density data set with uniform spatial grid and time resolution, global coverage and long time series characteristics, so as to better support related scientific research and business applications. Disclosure of Invention In order to solve the technical problems, the invention provides a lightning density construction system and a lightning density construction method based on multisource meteorological factors and machine learning, so as to solve the problems in the prior art, and the technical scheme adopted by the invention is as follows: the lightning density construction system based on multi-source meteorological factors and machine learning comprises a data acquisition module, a data preprocessing module,