CN-122022161-A - Multi-correction-area knowledge crack robust inference method based on generating quantile regression

CN122022161ACN 122022161 ACN122022161 ACN 122022161ACN-122022161-A

Abstract

The invention discloses a multi-correction area knowledge crack robust inference method based on generating type quantile regression, which comprises tensor modeling and feature construction of multi-correction area academic data, construction of a local quantile regression function and a local loss function, training and optimizing of the local quantile regression function to obtain a local regression parameter vector of an ith correction area, generation of a proxy sample of a central node, construction of a global quantile regression function, estimation of a global knowledge mastering quantile curved surface, construction of a knowledge crack tensor by adopting the local quantile regression function and the global quantile regression function, comprehensive crack index calculation, structural smoothing of the comprehensive crack indexes based on a first-repair relation among knowledge points, crack significance inspection, formation of crack feature vectors with structural features in any correction area, clustering of the crack feature vectors of any correction area, construction of a crack matrix, and acquisition of a multi-correction area knowledge crack robust inference result set.

Inventors

BAI XIANGYU
LI YONGFU

Assignees

四川启鸣达人科技有限公司

Dates

Publication Date: 20260512
Application Date: 20260129

Claims (10)

1. The multi-correction area knowledge crack robust inference method based on the generated quantile regression is characterized by comprising the following steps of: tensor modeling and feature construction are carried out on the school data of the multi-school zone; Constructing a local quantile regression function and a local loss function in the ith calibration area, and training and optimizing the local quantile regression function to obtain a local regression parameter vector of the ith calibration area; taking the local regression parameter vector of the ith calibration area as a conditional distribution generator, and generating a proxy sample of the central node; constructing a global quantile regression function, and estimating a global knowledge mastering quantile curved surface; Constructing a knowledge fracture tensor by adopting a local quantile regression function and a global quantile regression function, and carrying out comprehensive fracture index calculation; carrying out structural smoothing treatment on the comprehensive crack indexes based on the first repair relation among knowledge points, carrying out crack significance inspection, and forming a crack characteristic vector with structural characteristics in any correction area; Clustering the crack characteristic vectors of any correction area; and constructing a crack matrix, and obtaining a result set of knowledge crack key inference of the multiple areas.
2. The robust inference method of multi-school district knowledge cracks based on generated quantile regression according to claim 1, wherein tensor modeling and feature construction are performed on multi-school district academic data, comprising the steps of: modeling observation indexes of the school data of the multiple school areas to obtain the school performance indexes of the school areas i, the students j, the knowledge points k and the time steps t , ;i∈ , Represents the number of school zones, j E , Representing the number of students in school district i, K represents the total number of knowledge points, t E , Indicating the number of evaluation rounds or the total number of time steps; academic performance index corresponding to any school district i, student j, knowledge point k and time step t Constructing a feature vector: Wherein d represents a characteristic dimension, T represents a matrix transposition operation; using learning performance index of school district i, student j, knowledge point k and time step t Building an observation tensor And utilize the feature vector Building feature tensors 。
3. The robust inference method of multi-correction zone knowledge cracks based on generated quantile regression according to claim 2, wherein the steps of constructing a local quantile regression function and a local loss function in an ith correction zone, and training and optimizing the local quantile regression function to obtain a local regression parameter vector of the ith correction zone, comprise the following steps of presetting quantile level sets: Wherein M represents a fractional number of digits; local regression parameter vector introduced into correction zone i Constructing a local quantile regression function, wherein the expression is as follows: Wherein, the method comprises the steps of, Expressed in given input feature vector Under the condition that the correction area i is at the knowledge point k and the quantile A local quantile regression function; a regression function representing one of a linear model, a kernel method, and a shallow neural network; , representing a d-dimensional real vector space; for all sample sets of school zone i: ; Constructing a local loss function: ; Wherein, the Representing the local loss function of school zone i; Representation for quantiles Pinball loss functions of M e M; Represents the mth quantile level; representing regularization parameters; Representing local regression parameter vectors acting on correction zone i Is a regularization term of (2); local loss function for calibration zone i Performing minimization treatment on the local regression parameter vector of the correction area i And (5) optimizing.
4. The multi-correction area knowledge crack robust inference method based on the generated quantile regression according to claim 3, wherein the method is characterized in that a local regression parameter vector of an ith correction area is taken as a conditional distribution generator and a proxy sample of a central node is generated, a global quantile regression function is constructed, and a global knowledge mastering quantile curved surface is estimated, and the method comprises the following steps: Setting a reference feature scene set R represents the number of reference scenes; the trained correction area i is utilized to calculate the number of bits at the knowledge point k Local quantile regression function under Obtaining a proxy score, wherein the expression is as follows: ; Wherein, the Indicating that the correction area i is at the knowledge point k and the quantile Predictive quantile values under the reference scene r; representing the R-th reference feature vector, R ε R; generating a proxy sample for the central node: ; agent sample based on central node, for any knowledge point k, quantile Construction of global quantile regression function The expression is: ; Wherein, the Representing a regression function adopted by the central node; A global parameter vector representing the knowledge point k.
5. The multi-region knowledge crack robust inference method based on generative quantile regression of claim 4, further comprising: for school zone i at knowledge point k, quantile Predictive quantile value in reference scene r Introducing linear disturbance based on residual standard deviation, wherein the expression is as follows: ; Wherein, the method comprises the steps of, Representing the knowledge point k and the quantile of the correction area i after linear disturbance based on residual standard deviation is introduced Predictive quantile values under the reference scene r; C represents a small-amplitude random variable with zero mean value; representing the obtained correction area i in the local training stage at the knowledge point k and at the quantile The residual standard deviation estimate below.
6. The multi-region knowledge crack robust inference method based on generative quantile regression of claim 4, further comprising: introducing disturbance based on the quantile spacing, wherein the expression is as follows: ; ; Wherein, the Indicating that the correction area i is at the knowledge point k and the quantile Is a local quantile regression function; C represents a small-amplitude random variable with zero mean value; representing the amount of disturbance of the natural spacing configuration between the quantiles.
7. The multi-region knowledge fracture robust inference method based on generative quantile regression of claim 4 or 5 or 6, wherein constructing a knowledge fracture tensor using a local quantile regression function and a global quantile regression function, and performing a comprehensive fracture index calculation, comprises: The knowledge fracture tensor is expressed as follows: ; Wherein, the Representing the correction area i, the knowledge point k and the quantile A quantile difference value of the reference scene r; The expression of the comprehensive crack index is as follows: Wherein, the method comprises the steps of, Representing the comprehensive crack index of the school zone i at the knowledge point k; Representing non-negative weight coefficients for weighting the quantised difference values.
8. The robust inference method of multi-correction-zone knowledge cracks based on generated quantile regression of claim 7, wherein the performing structural smoothing of the composite crack index based on the pre-repair relationship between knowledge points and performing crack saliency verification to form a crack feature vector with structural features in any correction zone comprises: all knowledge points are set to form a node set Presetting the first repair relation among knowledge points to form an edge set , wherein, Representing knowledge points Is the knowledge point The first repair content of (2); introducing optimized variables to carry out structured smoothing treatment, wherein the expression is as follows: ; Wherein, the Representing the comprehensive crack index of the school zone i after being smoothed on the knowledge point k; representing a smooth crack value of the school zone i to the knowledge point k; Representing the pair of knowledge points in school district i Is a smooth crack value of (2); Representing the pair of knowledge points in school district i Is a smooth crack value of (a).
9. The multi-region knowledge fracture robust inference method based on generative quantile regression of claim 8, wherein clustering the fracture feature vectors of any region comprises the steps of: construction of crack eigenvectors for correction zone i Setting the clustering quantity of the correction areas as The clustering center is Constructing a school zone clustering optimization objective function : ; Wherein, the Cluster class number representing school zone i; Cluster category number representing school zone i is Center vector of school zone category; Constructing crack feature vectors for knowledge points k Constructing an optimized objective function of knowledge point clustering The expression is: ; Wherein, the A cluster class number representing the knowledge point k; the cluster category number representing knowledge point k is Center vector of knowledge point class(s).
10. The multi-region knowledge crack robust inference method based on the generated quantile regression of claim 9, wherein constructing a crack matrix, finding a result set of multi-region knowledge crack robust inference, comprises the steps of: Constructing a crack matrix: ; based on the significance statistic, obtaining a crack significance grade label, wherein the expression is as follows: ; Wherein, the Standard deviation estimated values of a correction area i and a knowledge point k are represented; test statistics representing crack significance; Presetting a crack threshold value and combining test statistics of crack significance And the knowledge cracks are classified into several grades.

Description

Multi-correction-area knowledge crack robust inference method based on generating quantile regression Technical Field The invention relates to the technical field of online education, in particular to a multi-correction area knowledge crack robust inference method based on generating quantile regression. Background With the deep advancement of on-line education and regional education digitization, a concurrent teaching pattern of multiple school areas, years and multiple platforms is commonly formed in the same education group or the same administrative area. The method is characterized in that mass academic data are deposited in each school district around daily teaching, operation arrangement and examination evaluation. However, these data exhibit new characteristics in physical form and statistics that are distinct from traditional centralized data warehouses, resulting in the difficulty of the prior art in directly servicing the need for accurate diagnosis of systematic differences at the "school district-knowledge point" level by regional professor departments, particularly in three aspects: Firstly, the data is heterogeneous in multiple sources and stored in a scattered way, on one hand, different school areas can adopt question library systems, examination systems and operation systems of different manufacturers or different versions, the data structure, the field meaning and the coding mode are different, and on the other hand, the academic data is usually stored in the local of each school area or the database of each business system for reasons of management boundary, network structure, compliance requirement and the like, so that a multisource heterogeneous and scattered storage pattern is formed. Therefore, the data set summary in a simple sense is modeled uniformly, so that the cost is high in actual operation, and the method is difficult to realize even in partial scenes. Secondly, the statistical characteristics of heavy tail distribution and high heterogeneity exist, namely, in a real teaching environment, student achievements often show heavy tail distribution and high heterogeneity. On one hand, partial students have the conditions of lack of examination, random answering, substitution by others and the like, which results in more extremely high-score or extremely low-score samples, and on the other hand, the source structures, the shift settings and the layered teaching arrangement of different school zones are not the same, which results in the difference of the score distribution of each school zone on the same knowledge point, and the translation of the school zones is not simple 'whole upwards or whole downwards'. If the indexes such as average score and average accuracy rate of single evaluation are only relied on for transverse comparison, the indexes are easily interfered by extreme values, and the real differences of all school areas on different fractional student groups are difficult to accurately reflect. Thirdly, the analysis of the regional learning condition of the cross-school needs to be carried out on the granularity of the knowledge points, namely, the focus of the education management department, the teaching and research institutions and the group educational administration management department is not limited to the individual student performances in a single-school region, but also whether the mastery level of different school regions on specific knowledge points or knowledge modules is systematically different is judged from the view point of 'school region-knowledge points'. For example, the overall mastery degree of a certain school district on knowledge modules such as function topics, space geometry and the like is continuously lower than the average level of the area or the similar school level. Such demands are more emphasized than individual-oriented recommendations and personalized path planning by comparing overall mastery of different school zones at knowledge point granularity, and identifying "systematic knowledge cracks", i.e., a certain school zone is steadily weaker than the overall level for a long period of time at part of knowledge points. At present, the conventional method is to count the average score and the average accuracy of each school zone according to knowledge points, or to perform simple regression analysis on the score and the question difficulty on the basis of centralized data. The method comprises the following technical problems that (1) school line comparison based on an average value only can output a centralized trend, the description capability of high-score and low-score tail parts is lacking, the method is easy to be interfered by extreme samples, (2) single-point regression or tree models only can give condition expectations, distribution difference information of a quantile level is difficult to provide, the method depends on centralized data and cannot adapt to the current situation of distributed storage, (3) a personaliz