CN-121999407-A - Campus teaching video intelligent classification method and system

CN121999407ACN 121999407 ACN121999407 ACN 121999407ACN-121999407-A

Abstract

The invention discloses a campus teaching video intelligent classification method and a system, which belong to the technical field of education, wherein the method comprises the steps of setting a classification model based on mixed characteristic engineering, and carrying out primary classification on original data; processing video data of a new version of teaching material based on the incremental random forest to update the classification model; and setting an educational knowledge graph checking mechanism to check the primary classification result. The method comprises the steps of setting a classification model based on mixed characteristic engineering, carrying out primary classification on original data, realizing data classification, meeting processing targets, processing video data of new teaching materials based on incremental random forests, updating the classification model, being capable of following development of the teaching materials and being suitable for the education field, and setting an education knowledge graph verification mechanism to verify results of primary classification, thereby further improving classification accuracy and conforming to the education flow.

Inventors

HUANG JIALIANG
YANG XI
HU DONGLAN
CHEN ZHIHONG

Assignees

广西壮族自治区公众信息产业有限公司

Dates

Publication Date: 20260508
Application Date: 20251218

Claims (10)

1. The intelligent campus teaching video classification method is characterized by comprising the following steps: Setting a classification model based on mixed characteristic engineering, and carrying out primary classification on the original data; Processing video data of a new version of teaching material based on the incremental random forest to update the classification model; and setting an educational knowledge graph checking mechanism to check the primary classification result.
2. The campus teaching video intelligent classification method according to claim 1, wherein the hybrid feature engineering comprises: injecting a subject special term dictionary based on general Chinese word segmentation, and filtering based on a teaching scene stop word list to accurately segment key teaching terms; Setting a joint hierarchical coding mechanism of a grade-discipline-teaching material version aiming at the structural characteristics so as to map discrete metadata into a low-dimensional dense vector; and calculating the matching degree of the text label and a preset knowledge point base through Jaccard similarity, generating an additional feature vector, and ensuring the logic consistency of a classification result and a teaching knowledge system.
3. The intelligent classification method of campus teaching video according to claim 2, wherein the processing video data of new version of teaching material based on incremental random forest to update the classification model comprises: in the model initialization stage, constructing a random forest model containing the scale of a basic decision tree based on historical data; when the version data of the newly added teaching material arrives, expanding the capacity of the model by adding decision trees in batches, and training a fixed number of new trees in each batch; inheriting the existing forest structure through the wall_start parameter, and avoiding the repeated calculation of the whole data; introducing a weight attenuation strategy, and carrying out progressive weight attenuation on the decision tree corresponding to the old version of teaching material; defining a version priority sampling rule, distributing higher sample weight for the data of the latest teaching material version, and accelerating the adaptation efficiency of the model to the new teaching content; And setting a learning period starting period, automatically triggering an incremental training task, and dynamically adjusting the size of a training batch by combining teaching material version updating marks so as to ensure real-time synchronization of a model and teaching practice.
4. The campus teaching video intelligent classification method according to claim 3, wherein the education knowledge graph verification comprises: Constructing a discipline knowledge point hierarchy and a dynamic rule engine, wherein, The subject knowledge point hierarchy comprises a three-level tree structure of subject-unit-knowledge points, and node attributes of the three-level tree structure comprise subject segments to which the knowledge points belong, standard frequency of courses and associated problem types.
5. The intelligent campus teaching video classification method according to claim 4, wherein the step of verifying the primary classification result comprises: Performing hierarchical compliance verification, triggering automatic replacement logic if the prediction label output by the classification model exceeds the allowable range of the knowledge point of the segment to which the current video belongs, and marking the prediction label exceeding the range as a super class label; And retrieving legal labels closest to the semantics of the super-class labels from the education knowledge graph, mapping the prediction labels to adjacent high-frequency father nodes in the graph by adopting a knowledge generalization strategy, so as to improve the robustness of the classification model to sparse data.
6. The utility model provides a campus teaching video intelligent classification system which characterized in that includes: The first module is used for setting a classification model based on mixed characteristic engineering and carrying out primary classification on the original data; the second module is used for processing video data of new edition of teaching materials based on the incremental random forest so as to update the classification model; and the third module is used for setting an educational knowledge graph checking mechanism so as to check the primary classification result.
7. The intelligent campus teaching video classification system according to claim 6, wherein the hybrid feature engineering comprises: injecting a subject special term dictionary based on general Chinese word segmentation, and filtering based on a teaching scene stop word list to accurately segment key teaching terms; Setting a joint hierarchical coding mechanism of a grade-discipline-teaching material version aiming at the structural characteristics so as to map discrete metadata into a low-dimensional dense vector; and calculating the matching degree of the text label and a preset knowledge point base through Jaccard similarity, generating an additional feature vector, and ensuring the logic consistency of a classification result and a teaching knowledge system.
8. The intelligent campus teaching video classification system according to claim 7, wherein the processing video data of new versions of teaching materials based on incremental random forests to update the classification model comprises: in the model initialization stage, constructing a random forest model containing the scale of a basic decision tree based on historical data; when the version data of the newly added teaching material arrives, expanding the capacity of the model by adding decision trees in batches, and training a fixed number of new trees in each batch; inheriting the existing forest structure through the wall_start parameter, and avoiding the repeated calculation of the whole data; introducing a weight attenuation strategy, and carrying out progressive weight attenuation on the decision tree corresponding to the old version of teaching material; defining a version priority sampling rule, distributing higher sample weight for the data of the latest teaching material version, and accelerating the adaptation efficiency of the model to the new teaching content; And setting a learning period starting period, automatically triggering an incremental training task, and dynamically adjusting the size of a training batch by combining teaching material version updating marks so as to ensure real-time synchronization of a model and teaching practice.
9. The campus teaching video intelligent classification system according to claim 8, wherein the educational knowledge graph verification comprises: Constructing a discipline knowledge point hierarchy and a dynamic rule engine, wherein, The subject knowledge point hierarchy comprises a three-level tree structure of subject-unit-knowledge points, and node attributes of the three-level tree structure comprise subject segments to which the knowledge points belong, standard frequency of courses and associated problem types.
10. The intelligent campus teaching video classification system according to claim 9, wherein said verifying the primary classification result comprises: Performing hierarchical compliance verification, triggering automatic replacement logic if the prediction label output by the classification model exceeds the allowable range of the knowledge point of the segment to which the current video belongs, and marking the prediction label exceeding the range as a super class label; And retrieving legal labels closest to the semantics of the super-class labels from the education knowledge graph, mapping the prediction labels to adjacent high-frequency father nodes in the graph by adopting a knowledge generalization strategy, so as to improve the robustness of the classification model to sparse data.

Description

Campus teaching video intelligent classification method and system Technical Field The invention relates to the technical field of education, in particular to an intelligent classification method and system for campus teaching videos. Background In the field of intelligent classification of campus teaching videos, the traditional technology generally adopts single-mode text tag characteristics (such as video titles, subject names and grade information) for model training, and relies on a universal natural language processing tool to extract keywords and combines a shallow machine learning model (such as logistic regression and SVM) to realize classification. The method has the obvious limitations that firstly, text feature processing is not optimized for education scenes, a general Chinese word segmentation tool (such as a standard Jieba) cannot accurately identify subject terms (such as Lenz's law and trigonometric functions) to cause the loss of key semantic information, secondly, structural metadata adopts independent thermal coding to neglect relevance, which causes characteristic dimension explosion and cannot express semantic similarity among versions, thirdly, model training relies on full-scale data retraining, high-frequency teaching material version updating and resource increment uploading requirements in the education scenes are difficult to adapt, retraining time consumption of tens of thousands of newly-added videos per month is long, model timeliness is severely restricted, fourthly, classification results lack of logical consistency verification with a course standard knowledge system, and problems such as super-class of learning segments or label semantic deviation are easy to occur. Disclosure of Invention The invention provides an intelligent classification method and system for campus teaching videos, which aim to solve the problems of larger limitation and the like in the prior art. In order to achieve the above purpose, the technical scheme adopted by the invention is as follows: A campus teaching video intelligent classification method comprises the steps of setting a classification model based on mixed feature engineering, performing primary classification on original data, processing video data of new teaching materials based on an incremental random forest to update the classification model, and setting an educational knowledge graph verification mechanism to verify the primary classification result. The mixed feature engineering comprises the steps of injecting a subject special term dictionary based on general Chinese word segmentation, filtering based on a teaching scene stop word list to accurately segment key teaching terms, setting a joint hierarchical coding mechanism of a grade-subject-teaching material version to map discrete metadata into a low-dimensional dense vector, calculating the matching degree of a text label and a preset knowledge point base through Jaccard similarity, generating an additional feature vector, and ensuring the logic consistency of a classification result and a teaching knowledge system. Further, the video data of the new teaching materials are processed based on the incremental random forest to update the classification model, and the method comprises the steps of constructing a random forest model containing a basic decision tree scale based on historical data in a model initialization stage, expanding model capacity in a mode of adding decision trees in batches when the new teaching material version data arrives, training a fixed number of new trees in each batch, inheriting an existing forest structure through a wall_start parameter to avoid repeated calculation of the full quantity data, introducing a weight attenuation strategy to carry out progressive weight attenuation on the decision trees corresponding to the old teaching materials, defining a version priority sampling rule, distributing higher sample weight for the data of the latest teaching materials, accelerating the adaptation efficiency of the model to the new teaching materials, setting a learning period starting period, automatically triggering an incremental training task, and dynamically adjusting the training batch size in combination with a teaching material version updating mark so as to ensure real-time synchronization of the model and teaching practice. Further, the educational knowledge graph verification comprises the steps of constructing a discipline knowledge point hierarchy and a dynamic rule engine, wherein the discipline knowledge point hierarchy comprises a three-level tree structure of discipline-unit-knowledge points, and node attributes of the three-level tree structure comprise the discipline segments to which the knowledge points belong, standard frequency of courses and associated problem types. Further, the verification of the primary classification result comprises the steps of performing hierarchical compliance verification, triggering automatic replacemen