CN-121681846-B - Knowledge system construction method based on multi-source teaching materials
Abstract
The invention discloses a knowledge system construction method based on multi-source teaching materials, and belongs to the field of knowledge system construction. The method comprises the steps of extracting a course list of a target specialty, enhancing and checking the course list through expert cooperation to form an authoritative course tag library, automatically positioning and structuring directory contents of target course materials after preprocessing aiming at different source course materials of the target specialty, reconstructing chapter nesting relations, mapping the target course materials to unique course tags in the authoritative course tag library, and carrying out bottom-up semantic clustering and structure backtracking fusion aiming at different course materials under the same course tag to output a structured knowledge system based on the teaching sequence of the course tags. The invention realizes the alignment and structure fusion of cross-teaching material semantics and finally generates a unified, authoritative, clear-layered and semantically consistent knowledge system.
Inventors
- ZHANG SIJIA
- MA XIAOYONG
- LI XUEYAO
- CHI SHENGQIANG
- ZHANG YING
Assignees
- 之江实验室
Dates
- Publication Date
- 20260508
- Application Date
- 20260209
Claims (8)
- 1. The knowledge system construction method based on the multi-source teaching materials is characterized by comprising the following steps of: extracting a course list of a target specialty, and enhancing and checking the course list through expert cooperation to form an authoritative course tag library; for target specialized teaching materials of different sources, automatically positioning and structuring to analyze the catalog content of the target teaching materials after preprocessing, reconstructing chapter nesting relation, and mapping the target teaching materials to unique course labels in the authoritative course label library; for the different teaching materials under same course label, carry out bottom-up semantic clustering and structure backtracking and fuse, based on the teaching order output structuring knowledge system of course label, include: Before semantic clustering and structure backtracking are fused, carrying out multidimensional assessment on different teaching materials under the same course label based on a large language model, wherein the dimensionality comprises knowledge point coverage, content depth and authority; Based on the multidimensional evaluation, generating a comprehensive quality score of each teaching material, dividing the teaching material with the comprehensive quality score in a preset front into a high-quality teaching material set, and taking the high-quality teaching material set as input for subsequent semantic clustering and structure backtracking fusion; Extracting leaf node titles without sub-chapters from the directory structure of the teaching materials in the high-quality teaching material collection to form an initial knowledge unit set; Converting leaf node titles in the initial knowledge unit set into semantic vectors, clustering based on semantic similarity to form an initial leaf title cluster set, and generating uniform knowledge point names for each cluster to form a knowledge point set; backtracking the father node title of each title in the original teaching material catalog in the cluster, combining the father node title with a single-layer knowledge point to form a new title set, and executing semantic clustering and standardization operation on the new title set; And stopping iteration when the parent node of the higher hierarchy cannot be traced back, and outputting the complete knowledge system.
- 2. The method of claim 1, wherein the list of courses for the target specialty includes a must-repair course, a core course, and a recommended course for the target specialty.
- 3. The method of claim 1, wherein the enhancing and verifying comprises: Checking the integrity of courses in the course list according to the actual teaching condition of the target specialty, and supplementing the missing courses into the course list if the missing courses exist; The frontier update is carried out, and whether to supplement an emerging course is determined according to the development trend of the target specialty; And (3) sequencing teaching logic, namely performing hierarchical organization and sequence optimization based on knowledge progressive logic on courses of the target profession according to expert cognition of the target profession.
- 4. The method of claim 1, wherein the preprocessing comprises, for the teaching materials with different formats, uniformly converting the contents of the teaching materials with different sources into plain text representations by adopting a multi-mode document processing flow, finely dividing the full text into blocks according to a title level by utilizing an intelligent document processing framework, and reserving the title and content information of the blocks; And if the teaching material is a selectable text type PDF, extracting the original text and layout information thereof by combining with a structural analysis tool to obtain the text content.
- 5. The method of claim 4 wherein automatically locating and structurally parsing directory contents of the different source textbooks, reconstructing chapter nesting relationships comprises: Positioning a potential catalog starting position in the preprocessed block sequence by a multilingual keyword matching technology; Based on the initial position, inputting the subsequent block information into a large language model one by one, judging whether the block information belongs to the directory content or not based on a multi-target collaborative judgment mode, and carrying out directory extraction to obtain an original directory; Performing hierarchical structure reconstruction on the original catalogue to construct a complete catalogue structure; And deleting the non-knowledge noise in the catalogue after the hierarchical structure reconstruction, only reserving the knowledge unit titles, and uniformly translating the knowledge unit titles into Chinese to obtain a structured catalogue tree.
- 6. The method of claim 5, wherein the multi-objective collaborative judgment mode is specifically that the large language model judges the block information through judgment targets including text length, format regularity, content semantics and number continuity, and if the correct number of judgment targets is greater than a preset value, the block is judged to belong to catalog content.
- 7. The method of claim 5 wherein mapping the target course material to a unique course label in the authoritative course label library comprises: And carrying out deep semantic understanding on the target teaching material based on the large language model, mapping the target teaching material to a single course label with highest matching degree in the authoritative course label library, and attaching a matching confidence score, and if the matching confidence score is lower than a preset value, secondarily confirming by an expert.
- 8. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-7.
Description
Knowledge system construction method based on multi-source teaching materials Technical Field The invention relates to the field of knowledge system construction, in particular to a knowledge system construction method based on multi-source teaching materials. Background With the acceleration of the digital transformation of education, the construction of a structured, standardized and computable course knowledge system has become a core infrastructure for applications such as intelligent education, personalized learning, AI teaching assistance systems and the like. In recent years, artificial intelligence techniques such as knowledge graph and Large Language Model (LLM) have been widely tried for knowledge organization. However, the current mainstream methods focus on knowledge extraction or general semantic modeling of a single teaching material, lack of depth alignment with educational standards, and also have difficulty in coping with heterogeneity of expressions, structures, versions, and the like of multi-source teaching materials. Especially in the higher education field, the course system is numerous and miscellaneous, the teaching materials are updated frequently, the teaching logic difference is obvious, and the construction of the automatic knowledge system faces the dilemma of no standard, difficult alignment and low multiplexing for a long time. Therefore, a technical scheme capable of integrating authoritative education standards, expert knowledge and multi-source teaching material content and realizing automatic generation of a high-fidelity and high-consistency knowledge system is needed. The existing knowledge system construction or course knowledge graph generation technology mainly has the following three defects: (1) The lack of unified, authoritative course semantic benchmarks results in a chaotic and non-reusable tagging system. Most education knowledge patterns at present depend on manual definition or directly extract course or knowledge point names from teaching materials, and are not hooked with legal teaching standards. In different projects, platforms and even different versions of the same platform, the naming, boundary and meaning of courses are defined differently, so that semantic drift is serious. In addition, course labels are often marked by non-field personnel, lack of comprehensive verification and frontier supplement, and are difficult to be used as reliable anchor points for subsequent automatic processing. (2) The analysis of the directory of the teaching materials is highly dependent on manual or rule templates, and is difficult to adapt to format noise and structural isomerism of real publications. In the prior art, regular expressions, fixed templates or simple OCR post-processing are adopted to extract teaching material catalogues, and when common problems of PDF scanning pieces, typesetting disorder, multi-language mixed arrangement, page number embedding, chapter number missing and the like are faced, the accuracy rate is rapidly reduced. More importantly, the traditional method cannot recover the chapter nesting relation lost by format conversion, so that the subsequent knowledge unit level is broken, and the structural knowledge tree construction cannot be supported. (3) The multi-teaching material knowledge fusion method is characterized in that semantic consistency and teaching logic integrity are cut off, and fusion result quality is low. The current knowledge fusion mostly adopts pure semantic clustering, and although knowledge points with similar expressions can be combined, father and child levels in the original teaching materials are completely ignored, so that the fused knowledge tree has logic jump or structure collapse. Otherwise, if a single teaching material structure is forcibly reserved, the redundancy and ambiguity of the cross-teaching material cannot be eliminated, and the generalization capability is poor. The prior art has not effectively balanced the core contradiction of semantic deduplication and structure fidelity. Disclosure of Invention Aiming at the defects of the prior art, the invention aims to provide a knowledge system construction method based on multi-source teaching materials, which solves the problem that the prior method cannot completely cover three-level collaboration of teaching standard, teaching materials and knowledge. The invention aims at realizing the technical scheme that the knowledge system construction method based on the multi-source teaching materials comprises the following steps: extracting a course list of a target specialty, and enhancing and checking the course list through expert cooperation to form an authoritative course tag library; for target specialized teaching materials of different sources, automatically positioning and structuring to analyze the catalog content of the target teaching materials after preprocessing, reconstructing chapter nesting relation, and mapping the target teaching materials