CN-121980016-A - Question bank knowledge point screening method and system based on text analysis
Abstract
The invention relates to the technical field of education information processing, in particular to a question bank knowledge spot screening method and system based on text analysis. The method comprises the steps of extracting a plurality of candidate nodes corresponding to old nodes mounted on a test question to be processed in a new outline map, obtaining test question text vectors of the test question, generating unique feature vectors according to unique words of each candidate node different from other candidate nodes, determining text similarity according to the test question text vectors and the unique feature vectors, determining topology matching degree according to matching relations between a history associated node set constructed based on a history test paper record and topology neighborhood sets of each candidate node, generating mutual exclusion marks according to whether each candidate node belongs to the history associated node set, fusing the text similarity, the topology matching degree and the mutual exclusion marks, determining comprehensive cost, comparing the minimum comprehensive cost with a preset cost threshold, and determining node updating or isolating operation on the test question according to comparison results, so that accurate screening of the test question of the stock is achieved.
Inventors
- Zhao Chenmeng
Assignees
- 北京博思创成技术发展有限公司
Dates
- Publication Date
- 20260505
- Application Date
- 20260409
Claims (10)
- 1. A method for screening question bank knowledge points based on text analysis, the method comprising: Acquiring a to-be-processed test question and a history test paper record thereof, wherein the to-be-processed test question is hung on an old node of an old outline map, and the old outline map and a new outline map below are structured sets of knowledge point nodes and association relations thereof; generating unique feature vectors according to the unique words of each candidate node different from other candidate nodes, and determining the text similarity of the test questions relative to each candidate node according to the similarity of the text vectors of the test questions and each unique feature vector; the method comprises the steps of extracting nodes for mounting other test questions from a history test paper record, constructing a history associated node set, determining a topology neighborhood set of each candidate node in a new outline map, and determining topology matching degree according to the matching relation between the history associated node set and the topology neighborhood set; Generating a mutual exclusion mark according to whether each candidate node belongs to a history associated node set, fusing the text similarity, the topology matching degree and the mutual exclusion mark, and determining the comprehensive cost value of updating the test questions to each candidate node; and comparing the minimum comprehensive cost value with a preset cost threshold value, determining to execute target node updating or isolating operation on the test questions according to the comparison result, and completing test question screening.
- 2. The method for screening the knowledge points of the question bank based on the text analysis according to claim 1 is characterized in that the knowledge point nodes at least comprise attribute information including node unique identification, node name, node definition text, father node identification, subject identification to which the nodes belong and outline version identification to which the nodes belong in the map.
- 3. The method for screening question bank knowledge points based on text analysis according to claim 1, wherein the steps of extracting a plurality of candidate nodes corresponding to old nodes in a new outline map, and obtaining a test question text vector of a test question comprise: traversing nodes in the old outline map, comparing with the new outline map, and positioning old nodes which exist in the old outline map but are split into a plurality of child nodes in the new outline map; determining a plurality of child nodes as candidate nodes corresponding to the old nodes; The method comprises the steps of obtaining a question stem text and an analysis text of a test question to be processed, combining the question stem text and the analysis text, and obtaining a complete text of the test question to be processed; inputting the complete text into a pre-trained natural language processing model, converting the complete text into a numerical vector with fixed dimension through the model, and taking the numerical vector as a test question text vector of a test question to be processed.
- 4. The text analysis-based question bank knowledge spot screening method according to claim 2, wherein the unique feature vector generation process comprises: Acquiring node definition texts of all candidate nodes, performing word segmentation processing on the definition texts, and removing vocabularies without actual semantics by using a preset filtering vocabulary to obtain initial vocabulary sets of all candidate nodes; Counting the vocabulary co-occurring in the initial vocabulary sets of at least two candidate nodes to form a common vocabulary set; For each candidate node, removing words belonging to the common word set from the initial word set of the candidate node, and forming a unique word set of the candidate node by the residual words; If the unique vocabulary set is not empty, each vocabulary in the unique vocabulary set is respectively converted into a corresponding word vector, and the arithmetic average value of all the word vectors in the same dimension is calculated to be used as the word vector average value; If the unique vocabulary set is an empty set, inputting node names of the candidate nodes into a pre-trained natural language processing model to be converted into numerical vectors to be used as unique feature vectors of the candidate nodes, wherein the unique feature vectors of the candidate nodes and the test question text vectors have the same dimension number.
- 5. The method for screening knowledge points in a question bank based on text analysis according to claim 1, wherein the text similarity generating process comprises: For each candidate node, calculating cosine similarity between the test question text vector and the unique feature vector of the candidate node; converting the cosine similarity into a range from 0 to 1 through linear transformation to obtain text similarity, wherein the text similarity is used for representing the degree of agreement between the text content of the test question to be processed and the text semantics of the candidate node.
- 6. The text analysis-based question bank knowledge spot screening method according to claim 2, wherein the history associated node set construction process comprises: Searching all historical test paper records containing the test questions to be processed in a preset test paper database; Extracting other test questions except the test questions to be processed from the historical test paper records aiming at each historical test paper record; the current mounted knowledge point nodes of other test questions are used as history associated nodes; and performing de-duplication treatment on all the history associated nodes to form a history associated node set corresponding to the test questions to be treated.
- 7. The text analysis-based question bank knowledge point screening method according to claim 6, wherein the topological neighborhood set comprises neighbor nodes with father-son relationship, brother relationship or map distance not exceeding a preset hop count with candidate nodes as a center in a new outline map; The topology matching degree determining process comprises the following steps: Calculating the shortest path hop count of the history associated node and the neighbor node in the new outline map aiming at each history associated node in the history associated node set and each neighbor node in the topology neighborhood set, wherein if the subject identification of the node belonging to the history associated node in the history associated node set is different from the subject identification of the node belonging to the neighbor node in the topology neighborhood set, or no communication path exists between the history associated node and the neighbor node in the new outline map, directly judging that the association weight value between the related history associated node and the neighbor node is zero, and skipping the hop count calculation; adding 1 to the shortest path hop count, taking the reciprocal as an association weight value between the history associated node and the neighbor node; searching an optimal one-to-one matching combination between the history associated node set and the topology neighborhood set according to the history associated node set, the topology neighborhood set and the associated weight value; Calculating the sum of the association weight values of all matching edges in the optimal matching combination to be used as the total association weight; the method comprises the steps of obtaining a topological matching degree of a test question to be processed relative to each candidate node, taking a minimum value of a historical associated node set and the number of nodes in a topological neighborhood set as a target node number, dividing the total associated weight by the target node number, and directly setting the topological matching degree to be zero if the target node number is zero.
- 8. The text analysis-based question bank knowledge point screening method according to claim 7, wherein searching for an optimal one-to-one matching combination between the history associated node set and the topology neighborhood set according to the history associated node set, the topology neighborhood set and the associated weight value comprises: combining each historical associated node in the historical associated node set and each neighbor node in the topological neighborhood set into a group of node pairs; Taking a history associated node set as a left node set, a topological neighborhood set as a right node set and association weight values among node pairs of each group as side weights to construct a bipartite graph; if the node numbers of the left node set and the right node set are inconsistent, supplementing the bipartite graph into a square matrix by adding virtual nodes and giving zero-weight edge weight; solving an array by using a Hungary algorithm, and searching an optimal matching edge set which enables the sum of edge weights of all matching edges to be maximum under the constraint condition that the same node is not shared, wherein the matching edges are used for representing a group of node pairs; And taking the optimal matching edge set as an optimal matching combination.
- 9. The method for screening knowledge points in a question bank based on text analysis according to claim 1, wherein the mutual exclusion mark is used for indicating whether each candidate node has a mutual exclusion relation with a history associated node set of a test question to be processed, and the mutual exclusion relation indicates that the candidate node exists in the history associated node set of the test question to be processed; the comprehensive cost value determining process comprises the following steps: Multiplying the text similarity of each candidate node with the topological matching degree to obtain a product value, and adding a preset minimum constant on the basis of the product value to obtain a forward matching item; If the mutual exclusion mark indicates that the candidate node and the test question to be processed have a mutual exclusion relation, the punishment amplifying item is equal to 1, and if the mutual exclusion mark indicates that the candidate node and the test question to be processed do not have a mutual exclusion relation, the punishment amplifying item is equal to 1; dividing the punishment amplifying term by the forward matching term to obtain the comprehensive cost value of updating the test questions to be processed to the candidate nodes.
- 10. A question bank knowledge spot screening system based on text analysis, characterized in that the system comprises a memory, a processor and a computer program stored in the memory and running on the processor, characterized in that the processor implements the steps of the method according to any one of claims 1-9 when executing the computer program.
Description
Question bank knowledge point screening method and system based on text analysis Technical Field The invention relates to the technical field of education information processing, in particular to a question bank knowledge spot screening method and system based on text analysis. Background In an online educational administration system, the knowledge graph of the teaching outline is periodically iterated, and single knowledge point nodes of the old outline are often split into a plurality of new outline subdivision nodes in the iteration process, so that stock test questions of the old node mounted in a bottom-layer question library are required to be redistributed to corresponding new nodes. However, the conventional test question reclassification method has obvious defects that the conventional natural language processing model only extracts text features through test question stems, a large number of common words exist in a new node definition text, the test question stems are short and small in space, feature homogenization easily occurs, classification deviation is caused, and the conventional test paper record co-occurrence frequency-based verification method is extremely easy to incorporate error data of manual false labeling, pollute calculation results and further cause problems of test point overlapping, mutual exclusion and the like when automatic test paper assembly of a test library occurs. At present, a question bank knowledge spot screening mode based on text analysis does not effectively solve the problems of text classification offset and noise interference. Therefore, on the premise of not re-labeling the training set, the method uses noisy data to correct text classification offset and avoid noise data interference, which is a technical problem to be solved. Disclosure of Invention In order to solve the technical problem of screening question bank knowledge points by correcting text classification offset through noisy data on the premise of not re-labeling a training set, the invention provides a question bank knowledge point screening method and system based on text analysis, and the adopted technical scheme is as follows: the invention provides a question bank knowledge spot screening method based on text analysis, which comprises the following steps: Extracting a plurality of candidate nodes corresponding to the old nodes in the new outline map, and obtaining test question text vectors of the test questions; generating unique feature vectors according to the unique words of each candidate node different from other candidate nodes, and determining the text similarity of the test questions relative to each candidate node according to the similarity of the text vectors of the test questions and each unique feature vector; the method comprises the steps of extracting nodes for mounting other test questions from a history test paper record, constructing a history associated node set, determining a topology neighborhood set of each candidate node in a new outline map, and determining topology matching degree according to the matching relation between the history associated node set and the topology neighborhood set; Generating a mutual exclusion mark according to whether each candidate node belongs to a history associated node set, fusing the text similarity, the topology matching degree and the mutual exclusion mark, and determining the comprehensive cost value of updating the test questions to each candidate node; and comparing the minimum comprehensive cost value with a preset cost threshold value, determining to execute target node updating or isolating operation on the test questions according to the comparison result, and completing test question screening. The old outline map and the new outline map are structured sets of knowledge point nodes and association relations thereof, wherein the knowledge point nodes at least comprise attribute information including node unique identifiers, node names, node definition texts, father node identifiers, subject identifiers to which the nodes belong and outline version identifiers to which the nodes belong. Further, the extracting a plurality of candidate nodes corresponding to the old node in the new outline map, and obtaining a test question text vector of the test question includes: traversing nodes in the old outline map, comparing with the new outline map, and positioning old nodes which exist in the old outline map but are split into a plurality of child nodes in the new outline map; determining a plurality of child nodes as candidate nodes corresponding to the old nodes; The method comprises the steps of obtaining a question stem text and an analysis text of a test question to be processed, combining the question stem text and the analysis text, and obtaining a complete text of the test question to be processed; inputting the complete text into a pre-trained natural language processing model, converting the complete text into a numerical v