Search

CN-122022061-A - Big data driven improved regression model student score real-time prediction method

CN122022061ACN 122022061 ACN122022061 ACN 122022061ACN-122022061-A

Abstract

The invention relates to the technical field of education data analysis and artificial intelligence, in particular to a big data driven improved regression model student score real-time prediction method which collects multi-source data such as historical score, learning behavior, knowledge point difficulty, staged test and the like of students, basic characteristics such as learning investment, learning effect and stability are constructed, and multiple types of cross characteristics are generated according to rules such as normalization, logarithmic attenuation, saturation mapping and confidence interval scaling. And (3) constructing a consistency index by calculating the mean value and variance of the importance of the features in different student populations, realizing cross-population feature stability screening, and inhibiting redundant features by combining feature correlation. Based on the regression prediction model fused by the filtered characteristic training tree model and the neural network, incremental update is executed when newly added data exceeds a threshold value, and real-time prediction of the results and risk prompt are realized. The invention can obviously improve the prediction accuracy and the model robustness, and is suitable for an online education and intelligent teaching system.

Inventors

  • JIANG YU
  • HU JIN

Assignees

  • 北京悦活教育科技有限责任公司

Dates

Publication Date
20260512
Application Date
20260211
Priority Date
20251126

Claims (10)

  1. 1. The big data driven improved regression model student score real-time prediction method is characterized by comprising the following steps of: s1, acquiring historical performance data, learning behavior data, knowledge point difficulty data, staged test data and time sequence data of a learning process of students, and taking the acquired data as original data; S2, extracting basic features including learning duration, review times, chapter difficulty coefficients, operation completion rate, wrong question number, score variation, score fluctuation degree and the like from the original data; s3, constructing a cross feature based on at least two basic features; S4, combining the basic features and the crossed features to form feature vectors for training; S5, executing a feature screening algorithm on the feature vector to obtain a target feature set; S6, training an improved regression prediction model based on the target feature set; and S7, inputting the feature vector of the target student into the regression prediction model, and outputting a predicted result.
  2. 2. The method for predicting student performance in real time by using the big data driven improved regression model according to claim 1, wherein the cross features in the step S3 comprise the following three types: Inputting difficulty cross characteristics, including combinations of learning duration and difficulty coefficients of corresponding knowledge points; inputting effect cross characteristics, including combination of review times and historical score lifting amplitude; process stability class cross features including a combination of recent N quiz achievement variances and stage average scores.
  3. 3. The method for predicting student performance in real time by using the big data driven improved regression model according to claim 2, wherein the feature construction rule used in the step S3 comprises the following steps: The basic features participating in the cross calculation are subjected to interval normalization processing; scaling the learning duration class characteristics by adopting a logarithmic decay function, and simulating a gain decreasing rule; the method comprises the steps that a saturation upper limit threshold N 1 is adopted for the features of the review times, and when the review times exceed N 1 , mapping is carried out according to a growth slowing function; And (3) adopting a confidence interval scaling strategy for the score fluctuation characteristics, and improving the characteristic weight when the fluctuation degree exceeds a preset threshold T 1 .
  4. 4. The method for predicting the student' S performance in real time by using the big data driven improved regression model of claim 1, wherein the step S5 comprises the following steps: s51, training a plurality of sub-models based on different student groups to obtain feature importance values of each feature in different groups S52, calculating the feature importance mean value of each feature Variance of feature importance Wherein And (3) with The following formula is satisfied: s53 based on the And (3) with Construction of a consistency index Wherein Is that And For characterizing the importance stability of a feature in different populations; S54 when When the corresponding feature is determined to be a feature of high stability across the population.
  5. 5. The method for real-time prediction of big data driven improved regression model student' S performance as set forth in claim 4, wherein said step S5 further comprises the step of based on a feature correlation matrix Determining any two features And (3) with When the correlation coefficient is When only the consistency index is reserved And constructing a redundancy-removed target feature set by using the higher features.
  6. 6. The method for real-time predicting student performance of big data driven improved regression model of claim 1, wherein said improved regression prediction model comprises a fusion model of tree model regression structure and neural network regression structure, and the weight of said fusion model is based on verification set error or loss function Is adaptively updated.
  7. 7. The method for predicting student performance of big data driven improved regression model in real time as set forth in claim 1, wherein when the number of newly added examination data records exceeds a set threshold value Based on the newly added data set And performing incremental updating on the parameters of the regression prediction model.
  8. 8. The method for real-time prediction of student' S performance in big data driven improved regression model of claim 1, wherein step S7 further comprises when the predicted performance is lower than a predetermined threshold And And outputting risk prompt information of corresponding grade.
  9. 9. A student score prediction system is characterized in that the system is used for realizing the big data driven improved regression model student score real-time prediction method according to any one of claims 1-8, and comprises a data acquisition module, a feature construction module, a feature screening module, a model training module and a real-time prediction module; The data acquisition module is used for acquiring historical achievement data, learning behavior data, knowledge point difficulty data, staged examination data and time sequence data of a learning process of students from the teaching management system, the online learning platform and the examination system; The feature construction module is used for preprocessing and standardizing the data, extracting basic features such as learning duration, review times, chapter difficulty coefficients, operation completion rate, wrong question number, score variation, score fluctuation degree and the like, constructing input difficulty types, input effect types and process stability type cross features based on the basic features, and generating corresponding feature vectors; the feature screening module is used for calculating the feature importance mean value of each feature And variance of Constructing a consistency index And based on a correlation matrix Redundant feature elimination is carried out, so that a target feature set is obtained; The model training module is used for training an improved regression prediction model based on the target feature set and executing incremental update of the model when the newly added data quantity exceeds a set threshold value; the real-time prediction module is used for acquiring the current feature vector of the target student, inputting the feature vector into the regression prediction model, outputting a corresponding predicted score, and generating risk prompt information of a corresponding level when the predicted score is lower than a preset threshold.
  10. 10. A computer readable storage medium having stored thereon a computer program for implementing the big data driven improved regression model student achievement real-time prediction method of any one of claims 1 to 8 when executed by a processor.

Description

Big data driven improved regression model student score real-time prediction method Technical Field The invention relates to the technical field of education data analysis and artificial intelligence, in particular to a big data driven improved regression model student score real-time prediction method. Background With the rapid development of intelligent education and online learning platforms, students generate a large amount of multi-source and multi-dimensional data in the learning process, including historical achievements, learning behavior data, chapter difficulties, test results, time sequence learning records and the like. Meanwhile, the data distribution difference among different classes, different grades or different capacity groups is obvious, the key degree of the same feature in different groups is often inconsistent, so that the feature stability is poor, the model generalization capability is weak, and the feature drift problem is easy to occur. In addition, the problems of feature redundancy, high feature correlation and the like commonly exist in the high-dimensional educational data, so that the model is easy to overfit and the prediction precision is reduced, and the existing method also lacks the real-time response capability to newly added data, so that the model is gradually invalid with the passage of time, therefore, the improved real-time prediction method for the student performance of the regression model driven by big data is provided for the problems. Disclosure of Invention The invention aims to provide a big data driven improved regression model student achievement real-time prediction method so as to solve the problems in the background technology. In order to achieve the above purpose, the present invention provides the following technical solutions: A big data driven improved regression model student achievement real-time prediction method comprises the following steps: s1, acquiring historical performance data, learning behavior data, knowledge point difficulty data, staged test data and time sequence data of a learning process of students, and taking the acquired data as original data; S2, extracting basic features including learning duration, review times, chapter difficulty coefficients, operation completion rate, wrong question number, score variation, score fluctuation degree and the like from the original data; s3, constructing a cross feature based on at least two basic features; S4, combining the basic features and the crossed features to form feature vectors for training; S5, executing a feature screening algorithm on the feature vector to obtain a target feature set; S6, training an improved regression prediction model based on the target feature set; and S7, inputting the feature vector of the target student into the regression prediction model, and outputting a predicted result. Preferably, the intersecting features in the step S3 include the following three types: Inputting difficulty cross characteristics, including combinations of learning duration and difficulty coefficients of corresponding knowledge points; inputting effect cross characteristics, including combination of review times and historical score lifting amplitude; process stability class cross features including a combination of recent N quiz achievement variances and stage average scores. Preferably, the feature construction rule used in the step S3 includes: The basic features participating in the cross calculation are subjected to interval normalization processing; scaling the learning duration class characteristics by adopting a logarithmic decay function, and simulating a gain decreasing rule; the method comprises the steps that a saturation upper limit threshold N 1 is adopted for the features of the review times, and when the review times exceed N 1, mapping is carried out according to a growth slowing function; And (3) adopting a confidence interval scaling strategy for the score fluctuation characteristics, and improving the characteristic weight when the fluctuation degree exceeds a preset threshold T 1. Preferably, the step S5 includes: s51, training a plurality of sub-models based on different student groups to obtain feature importance values of each feature in different groups S52, calculating the feature importance mean value of each featureVariance of feature importanceWhereinAnd (3) withThe following formula is satisfied: s53 based on the And (3) withConstruction of a consistency indexWhereinIs thatAndFor characterizing the importance stability of a feature in different populations; S54 when When the corresponding feature is determined to be a feature of high stability across the population. Preferably, the step S5 further comprises the step of based on a feature correlation matrixDetermining any two featuresAnd (3) withWhen the correlation coefficient isWhen only the consistency index is reservedAnd constructing a redundancy-removed target feature set by using the higher featu