Search

CN-122020603-A - Model auxiliary calibration estimation method, device, electronic equipment and medium

CN122020603ACN 122020603 ACN122020603 ACN 122020603ACN-122020603-A

Abstract

The invention relates to a model auxiliary calibration estimation method, a device, electronic equipment and a medium, which belong to the technical field of data processing, wherein the method comprises the steps of extracting a non-probability sample from a target overall, and establishing a multiple linear regression model between a research variable and a covariate of the non-probability sample; estimating model parameters in a multiple linear regression model to obtain and obtain estimation of a research variable based on the estimated model parameters, obtaining a diagonal matrix corresponding to an original weight of a sample unit in a non-probability sample, obtaining a KL distance between the diagonal matrix and a preset calibration weight, constructing a calibration constraint condition of a target calibration weight based on the estimation of the research variable, calibrating the calibration weight with the minimum KL distance under the calibration constraint condition to obtain the target calibration weight, and determining an estimation result of the overall mean value of the target calibration weight based on the target calibration weight. The invention improves the calibration estimation effect of the non-probability samples.

Inventors

  • Hou Lanbao
  • YANG YIRAN
  • Lv Muxi
  • DU HONGJI

Assignees

  • 荆楚理工学院

Dates

Publication Date
20260512
Application Date
20251231

Claims (10)

  1. 1. A method for model-assisted calibration estimation, comprising: extracting a non-probability sample from the target population, and establishing a multiple linear regression model between a research variable and a covariate of the non-probability sample; estimating model parameters in the multiple linear regression model to obtain estimated model parameters, and obtaining the estimation of the research variables based on the estimated model parameters; Obtaining a diagonal matrix corresponding to an original weight of a sample unit in a non-probability sample, wherein the original weight is obtained based on the sample loading probability of the sample unit; Obtaining a Kullback-Leibler distance between a diagonal matrix and a preset calibration weight; Constructing calibration constraints for the target calibration weights based on the estimates of the study variables; under the calibration constraint condition, calibrating the calibration weight by using the minimum of the Kullback-Leibler distance to obtain a target calibration weight; an estimation of an overall mean of the target calibration weights is determined based on the target calibration weights.
  2. 2. The model aided calibration estimation method of claim 1, wherein the expression of the multiple linear regression model is: In the formula, Representing the study variable; representing covariates; Representing the parameters of the multiple linear regression model, ; Representing error terms , , 。
  3. 3. The method of claim 1, wherein estimating the model parameters in the multiple linear regression model to obtain estimated model parameters comprises: and estimating model parameters in the multiple linear regression model based on a punishment regression algorithm to obtain estimated model parameters.
  4. 4. The method for model-assisted calibration estimation according to claim 3, wherein estimating model parameters in a multiple linear regression model based on a penalty regression algorithm to obtain estimated model parameters comprises: Obtaining a punishment function of a punishment regression algorithm and a covariance square sum of a study variable; Solving a target model parameter which enables the covariance square sum of the research variable to be minimum under the constraint of the punishment function based on a Lagrange algorithm; And estimating the target model parameters to obtain estimated model parameters.
  5. 5. The model aided calibration estimation method of claim 1, wherein the constructing calibration constraints for target calibration weights based on the estimates of the study variables comprises: Obtaining the size constraint condition of the target calibration weight and the estimated constraint condition of the research variable; taking the size constraint and the estimated constraint of the investigated variable as the calibration constraint of the target calibration weight.
  6. 6. The method for model assisted calibration estimation according to claim 1, wherein the calibrating the calibration weight with the minimum Kullback-Leibler distance under the calibration constraint condition to obtain the target calibration weight includes: And (3) calibrating the calibration weight with the minimum of the Kullback-Leibler distance under the calibration constraint condition based on the Lagrangian method to obtain the target calibration weight.
  7. 7. The model assisted calibration estimation method of claim 1 wherein the expression of the estimation of the overall mean is: In the formula, An estimate of the overall mean value is represented, Representing the total number of cells in the target population, i representing the i-th sample cell, S representing the non-probability sample, The variables of the study are represented by the values, A constraint function representing a covariate, The lagrangian parameter is represented as such, The overall set of targets is represented and, The initial weights of the sample cell i are represented, And representing model parameters obtained by a punishment regression algorithm.
  8. 8. An apparatus for model assisted calibration estimation, comprising: the multi-linear regression model construction module is used for extracting a non-probability sample from the target overall and establishing a multi-linear regression model between a research variable and a covariate of the non-probability sample; the research variable estimation acquisition module is used for estimating model parameters in the multiple linear regression model to obtain estimated model parameters, and obtaining the estimation of the research variables based on the estimated model parameters; The diagonal matrix acquisition module is used for acquiring a diagonal matrix corresponding to an original weight of a sample unit in the non-probability sample, wherein the original weight is obtained based on the sample entering probability of the sample unit; The distance acquisition module is used for acquiring the Kullback-Leibler distance between the diagonal matrix and a preset calibration weight; A constraint condition determination module for constructing calibration constraints for the target calibration weights based on the estimates of the study variables; The target calibration weight determining module is used for calibrating the calibration weight with the minimum of the Kullback-Leibler distance under the calibration constraint condition to obtain a target calibration weight; and the estimation module of the overall mean value is used for determining the estimation result of the overall mean value of the target calibration weight based on the target calibration weight.
  9. 9. An electronic device comprising a memory and a processor, wherein, The memory is used for storing programs; The processor, coupled to the memory, is configured to execute the program stored in the memory to implement the steps of a model-assisted calibration estimation method according to any one of the preceding claims 1 to 7.
  10. 10. A computer readable storage medium storing a computer readable program or instructions which when executed by a processor is capable of carrying out the steps of a model assisted calibration estimation method according to any one of the preceding claims 1 to 7.

Description

Model auxiliary calibration estimation method, device, electronic equipment and medium Technical Field The present invention relates to the field of data processing technologies, and in particular, to a method and apparatus for model-assisted calibration estimation, an electronic device, and a medium. Background With the development of big data and networks, network investigation is becoming more popular in the big data background, and becomes an important sampling investigation method at present, and is widely applied to the fields of market investigation, polls, academic research and the like. However, most of the network investigation samples belong to non-probability samples, the sample entering probability is unknown, statistical inference on the population by using the non-probability samples has a certain difficulty, and how to perform statistical inference on the population according to the non-probability samples has become a hot problem in the current era. The current method for overall estimation of non-probability samples can be divided into two main categories, namely an estimation method based on pseudo design and an estimation method based on model, according to whether the samples are assumed to originate from a super overall model. If the target variable of the finite population is considered as a fixed value, the method of estimating the probability of loading of non-probability samples from the finite population to infer the population is a pseudo-design based estimation method. If the finite population is assumed to be derived from an infinite super-population, the target variable is regarded as a random variable, and the method for deducing the population by establishing a super-population model for the target variable is a model-based estimation method. The prior art model-aided calibration estimation method for non-probability samples is relatively few in discussions. Only Chen et al (2018) discusses model-assisted calibration estimation of non-probability samples at chi-square distance, pan Yingli et al (2021) discusses model-assisted SCAD calibration and ALASSO calibration methods of non-probability samples at chi-square distance, wang Xiaoning et al (2025) explored non-probability sample-to-probability sample fusion inference methods of Adaptive LASSO model-assisted calibration at chi-square distance. However, wu and Lu (2016) have demonstrated better calibration estimates at the corrected backward Kullback-Leibler distance than at the chi-square distance. Liu et al (2024) compare the effect of calibration estimation of non-probability samples at different distances, but do not explore the calibration estimation at the corrected backward Kullback-Leibler distance. Therefore, in the prior art, a model auxiliary calibration estimation of a non-probability sample under the corrected backward Kullback-Leibler distance is lacked, so that the calibration estimation effect of the non-probability sample is improved. Disclosure of Invention In view of the foregoing, it is desirable to provide a method, apparatus, electronic device and medium for model-assisted calibration estimation, so as to achieve the purpose of calibration estimation of non-probability samples. To achieve the above object, in a first aspect, the present invention provides a model-assisted calibration estimation method, including: extracting a non-probability sample from the target population, and establishing a multiple linear regression model between a research variable and a covariate of the non-probability sample; estimating model parameters in the multiple linear regression model to obtain estimated model parameters, and obtaining the estimation of the research variables based on the estimated model parameters; Obtaining a diagonal matrix corresponding to an original weight of a sample unit in a non-probability sample, wherein the original weight is obtained based on the sample loading probability of the sample unit; Obtaining a Kullback-Leibler distance between a diagonal matrix and a preset calibration weight; Constructing calibration constraints for the target calibration weights based on the estimates of the study variables; under the calibration constraint condition, calibrating the calibration weight by using the minimum of the Kullback-Leibler distance to obtain a target calibration weight; an estimation of an overall mean of the target calibration weights is determined based on the target calibration weights. In one possible implementation, the expression of the multiple linear regression model is: ; ; ; In the formula, The variables of the study are represented by the values,Representing covariates; Representing the parameters of the multiple linear regression model, ;Representing error terms,,。 In one possible implementation manner, the estimating the model parameters in the multiple linear regression model to obtain estimated model parameters includes: and estimating model parameters in the multiple linea