CN-121980535-A - Sample expansion method, device and equipment of geographic regression model and storage medium
Abstract
The invention provides a sample expansion method, device, equipment and storage medium of a geographic regression model, wherein the method comprises the steps of pairing samples in an initial sample set of the geographic regression model in pairs to obtain an expanded sample set containing a plurality of paired samples; the method comprises the steps of constructing a plurality of panel data items based on a plurality of paired samples, carrying out interpolation simulation on independent variable amounts of any panel data item, generating a plurality of pieces of simulation input data by combining independent variable state amounts of any panel data item, inputting the plurality of pieces of simulation input data into a trained machine learning regression model to obtain a plurality of pieces of simulation output data, adding the plurality of pieces of simulation input data to the independent variable state amounts of any panel data item, and adding the plurality of pieces of simulation output data to the dependent variable state amounts of any panel data item to obtain a plurality of simulation geographic samples. Therefore, the scale of the training sample can be effectively expanded, and the training effect of the geographic regression model is improved.
Inventors
- YE SIJING
- WANG JILONG
- SONG CHANGQING
- Gao Peichao
- MU WANGSHU
Assignees
- 北京师范大学
Dates
- Publication Date
- 20260505
- Application Date
- 20251223
Claims (10)
- 1. A method for sample expansion of a geographic regression model, comprising: Pairing samples in an initial sample set of a geographic regression model in pairs to obtain an expanded sample set containing a plurality of paired samples, wherein the initial sample set contains a plurality of geographic samples, and each geographic sample comprises at least one independent variable state quantity and at least one dependent variable state quantity; constructing a plurality of panel data items based on the paired samples, wherein the panel data items comprise independent variable state quantities, independent variable variation quantities, dependent variable state quantities and dependent variable variation quantities of the paired samples; For any panel data item, performing interpolation simulation on the independent variable quantity of the any panel data item, and generating a plurality of pieces of simulation input data by combining the independent variable state quantity of the any panel data item; adding the plurality of pieces of analog input data to the independent variable state quantity of the arbitrary panel data item, and adding the plurality of pieces of analog output data to the independent variable state quantity of the arbitrary panel data item to obtain a plurality of analog geographic samples; The machine learning regression model is trained based on the plurality of panel data items, wherein the independent variable state quantity and the independent variable variation quantity of each panel data item are used as the input of the machine learning model, and the dependent variable variation quantity is used as the output of the machine learning model.
- 2. The method for sample expansion of a geographic regression model of claim 1 wherein each geographic sample further comprises spatial coordinates; the constructing a panel data entry based on the plurality of paired samples, comprising: determining a range threshold based on the variational function point cloud picture of the dependent variable state quantity; Screening paired samples with the space Euclidean distance smaller than the range threshold value from the expanded sample set to obtain an effective sample set; Based on the plurality of paired samples in the valid sample set, a panel data entry is constructed.
- 3. The sample extension method of a geographic regression model of claim 1 wherein constructing a plurality of panel data entries based on the plurality of paired samples comprises: Constructing a first panel data entry and a second panel data entry based on a pairing sample consisting of the first geographic sample and the second geographic sample; Wherein the first panel data entry includes an independent variable change amount of the first geographic sample subtracted from the independent variable state amount of the second geographic sample, an independent variable change amount of the first geographic sample subtracted from the dependent variable state amount of the second geographic sample, and an independent variable state amount of the second geographic sample; the second panel data entry includes an independent variable change amount of the second geographic sample subtracted from the independent variable state amount of the first geographic sample, an independent variable state amount of the first geographic sample, and output data of the second panel data entry includes an independent variable change amount of the second geographic sample subtracted from the dependent variable state amount of the first geographic sample, and an dependent variable state amount of the first geographic sample.
- 4. A method for sample expansion of a geographic regression model according to any one of claims 1 to 3, further comprising: dividing the panel data items into a plurality of categories based on geographic samples corresponding to the independent variable state quantities in the panel data items; And sequencing the simulated geographic samples generated by the panel data items of each category according to the independent variable state quantity of the simulated geographic samples, and determining the credible simulated geographic samples based on the confidence interval of each category.
- 5. The method for sample augmentation of a geographic regression model of claim 4, further comprising: and calculating and/or visually displaying the change trend based on the independent variable state quantity and the dependent variable state quantity of the credible simulation geographic sample.
- 6. The method for sample expansion of a geographic regression model according to claim 5, wherein the performing trend calculation and/or visual presentation based on the independent variable state quantity and the dependent variable state quantity of the trusted simulated geographic sample comprises: Drawing a point line diagram respectively by taking the independent variable state quantity of the credible simulation geographic sample as a horizontal axis and taking the dependent variable state quantity of the credible simulation geographic sample as a vertical axis, and based on the credible simulation geographic samples generated by different types of panel data items; And extracting change trend characteristics in the point diagram based on a statistical method, wherein the change trend characteristics are used for reflecting the action mechanism between the target independent variable and the dependent variable.
- 7. A sample expansion device for a geographic regression model, comprising: The pairing module is used for pairwise pairing samples in an initial sample set of the geographic regression model to obtain an expanded sample set containing a plurality of paired samples, wherein the initial sample set contains a plurality of geographic samples, and each geographic sample comprises at least one independent variable state quantity and at least one dependent variable state quantity; the construction module is used for constructing a plurality of panel data items based on the paired samples, wherein the panel data items comprise independent variable state quantities, independent variable variation quantities, dependent variable state quantities and dependent variable variation quantities of the paired samples; the simulation module is used for carrying out interpolation simulation on the independent variable quantity of any panel data item, generating a plurality of pieces of simulation input data by combining the independent variable state quantity of the any panel data item, and inputting the plurality of pieces of simulation input data into a trained machine learning regression model to obtain a plurality of pieces of simulation output data; the acquisition module is used for adding the plurality of pieces of analog input data and the independent variable state quantity of the arbitrary panel data item, and adding the plurality of pieces of analog output data and the independent variable state quantity of the arbitrary panel data item to obtain a plurality of analog geographic samples; The machine learning regression model is trained based on the plurality of panel data items, wherein the independent variable state quantity and the independent variable variation quantity of each panel data item are used as the input of the machine learning model, and the dependent variable variation quantity is used as the output of the machine learning model.
- 8. An electronic device comprising a memory, a processor and a computer program stored on the memory and running on the processor, wherein the processor implements the sample expansion method of the geographic regression model of any of claims 1 to 6 when executing the computer program.
- 9. A non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor implements a sample expansion method of a geographical regression model according to any one of claims 1 to 6.
- 10. A computer program product comprising a computer program which, when executed by a processor, implements a sample expansion method of a geographical regression model according to any one of claims 1 to 6.
Description
Sample expansion method, device and equipment of geographic regression model and storage medium Technical Field The present invention relates to the field of sample expansion technologies, and in particular, to a method, an apparatus, a device, and a storage medium for sample expansion of a geographic regression model. Background The mechanism research of geography is used as a key link for connecting geography pattern analysis and geography process simulation, and has important significance for understanding geography causes and predicting development trend. At present, the mechanism research in the field of geography depends on multiple linear regression, geographic weighted regression (Geographically Weighted Regression, GWR), and models such as multi-scale geographic weighted regression (Multiscale Geographically Weighted Regression, MGWR). Although the method has a certain interpretability, the core limitation of the method is that strict assumption of linear relation between independent variables and dependent variables is difficult to effectively describe complex nonlinear effects commonly existing in the geographic process. In recent years, machine learning methods such as support vector regression, random forest regression, gradient lifting regression, neural networks and the like are gradually introduced, and nonlinear relations can be fitted well, but the application of the method is still limited by serious shortages of the number of training samples. The construction of a robust and high-precision machine learning model has rigid dependence on massive high-quality samples, however, geospatial data often faces the problem of insufficient sample number due to high acquisition cost and limited coverage, the reliability and depth of geographic mechanism research are severely restricted, the generalization capability and credibility of the model are obviously reduced in a data scarcity scene, and the certainty of mechanism inference is influenced. Disclosure of Invention The invention provides a sample expansion method, device, equipment and storage medium of a geographic regression model, which are used for solving the problems in the prior art. The invention provides a sample expansion method of a geographic regression model, which comprises the following steps: Pairing samples in an initial sample set of a geographic regression model in pairs to obtain an expanded sample set containing a plurality of paired samples, wherein the initial sample set contains a plurality of geographic samples, and each geographic sample comprises at least one independent variable state quantity and at least one dependent variable state quantity; constructing a plurality of panel data items based on the paired samples, wherein the panel data items comprise independent variable state quantities, independent variable variation quantities, dependent variable state quantities and dependent variable variation quantities of the paired samples; For any panel data item, performing interpolation simulation on the independent variable quantity of the any panel data item, and generating a plurality of pieces of simulation input data by combining the independent variable state quantity of the any panel data item; adding the plurality of pieces of analog input data to the independent variable state quantity of the arbitrary panel data item, and adding the plurality of pieces of analog output data to the independent variable state quantity of the arbitrary panel data item to obtain a plurality of analog geographic samples; The machine learning regression model is trained based on the plurality of panel data items, wherein the independent variable state quantity and the independent variable variation quantity of each panel data item are used as the input of the machine learning model, and the dependent variable variation quantity is used as the output of the machine learning model. According to the sample expansion method of the geographic regression model provided by the invention, each geographic sample also comprises space coordinates; the constructing a panel data entry based on the plurality of paired samples, comprising: determining a range threshold based on the variational function point cloud picture of the dependent variable state quantity; Screening paired samples with the space Euclidean distance smaller than the range threshold value from the expanded sample set to obtain an effective sample set; Based on the plurality of paired samples in the valid sample set, a panel data entry is constructed. According to the sample expansion method of the geographic regression model provided by the invention, a plurality of panel data items are constructed based on the plurality of paired samples, and the sample expansion method comprises the following steps: Constructing a first panel data entry and a second panel data entry based on a pairing sample consisting of the first geographic sample and the second geographic sample; Wherei