US-20260127973-A1 - ONLINE TEST METHOD AND APPARATUS
Abstract
This disclosure provides an online test method and apparatus. The method includes obtaining a test question library, where the test question library includes a plurality of collected test questions. The method also includes obtaining a test model based on the test question library and a policy optimization algorithm, where the test model may be used to select at least one test question from the test question library in an online test process, the test model may include a state encoder and a recommender, the state encoder is configured to obtain a difference between input test questions to generate a state code, the recommender may be configured to output a test question based on the state code and an optimization objective, the optimization objective includes at least one of novelty or diversity.
Inventors
- Wei Xia
- Hangyu Wang
- Ruiming Tang
- Weinan Zhang
- Yong Yu
Assignees
- HUAWEI TECHNOLOGIES CO., LTD.
Dates
- Publication Date
- 20260507
- Application Date
- 20251229
- Priority Date
- 20230630
Claims (20)
- 1 . An online test method, comprising: obtaining a test question library, wherein the test question library comprises a plurality of test questions; and obtaining a test model based on the test question library and a policy optimization algorithm, wherein the test model is used to select at least one test question from the test question library, the test model comprises a state encoder and a recommender, the state encoder is configured to generate a state code based on a difference between the test questions, the recommender is configured to output a test question based on the state code and an optimization objective of the policy optimization algorithm, the optimization objective comprises at least one of novelty or diversity, a factor for measuring the novelty comprises an exposure rate, and a factor for measuring the diversity comprises whether there is an added knowledge point.
- 2 . The method according to claim 1 , wherein the optimization objective of the policy optimization algorithm comprises a reward function used to update the test model.
- 3 . The method according to claim 2 , wherein the reward function is a reward function in a plurality of dimensions, and the reward function in the plurality of dimensions comprises at least two of quality, diversity, and novelty.
- 4 . The method according to claim 3 , wherein the reward function in the plurality of dimensions comprises a quality reward, a diversity reward, and/or a novelty reward, wherein the quality reward is determined based on output accuracy of testing of the test model in the test question library, the diversity reward is determined based on whether a new knowledge point is added to a test question selected by the test model from the test question library for a current time relative to a test question selected by the test model from the test question library for at least one previous time, the novelty reward is determined based on whether the test question selected by the test model from the test question library for the current time is a hot test question, the test question selected by the test model from the test question library is classified into one of the hot test question and a non-hot test question, and a quantity of historical selection times of the hot test question is greater than a quantity of historical selection times of the non-hot test question.
- 5 . The method according to claim 1 , wherein the test model further comprises a relationship-aware aggregator, an input of the relationship-aware aggregator comprises at least one of a prerequisite graph or a correlation graph, and the method further comprises: obtaining, by the relationship-aware aggregator, an embedding representation of a relationship between knowledge points or an embedding representation of a relationship between a test question and a knowledge point based on the input, the prerequisite graph represents a sequential relationship between knowledge points in an input test question, and the correlation graph represents a correlation relationship between the test question and the knowledge point; and extracting, by the state encoder, an association relationship between the test question and the knowledge point based on data output by the relationship-aware aggregator, and generating the state code based on the association relationship.
- 6 . The method according to claim 1 , wherein the obtaining the test model based on the test question library and the policy optimization algorithm comprises: selecting the at least one test question from the test question library via the test model; and performing reinforcement learning on the test model based on an answering record of the at least one test question, to obtain the test model obtained through the reinforcement learning.
- 7 . The method according to claim 6 , wherein the method further comprises: obtaining the answering record of the at least one test question from the test question library; or receiving online answering data obtained by performing an operation on the at least one test question by a user, and obtaining the answering record of the at least one test question based on the online answering data.
- 8 . The method according to claim 1 , wherein the test question library is divided into a candidate set and a meta-question set, the test question selected by the test model is a test question in the candidate set, the test question selected by the test model is further used to train the test model, and the meta-question set is used to calculate a reward in a plurality of dimensions; and the method further comprises: performing, by the policy optimization algorithm, reinforcement learning comprised in the policy optimization algorithm, the reinforcement learning comprises a test phase and a verification phase, the candidate set is used to train the test model in the test phase, and the meta-question set is used to calculate the reward in the plurality of dimensions in the verification phase.
- 9 . The method according to claim 7 , wherein performing the reinforcement learning comprises: selecting, in the test phase, the at least one test question from a candidate set via the test model, and after receiving a response to the at least one test question, obtaining a capability evaluation value based on the response to the at least one test question, wherein the capability evaluation value represents a degree of correctness of answering the test question; calculating, in the verification phase, a reward in the plurality of dimensions based on the capability evaluation value and a verification set; and updating the test model based on the reward in the plurality of dimensions, to obtain the test model obtained through current iterative learning.
- 10 . The method according to claim 1 , wherein the obtaining the test model based on the test question library and the policy optimization algorithm comprises: performing supervised learning based on the test question library, to obtain the test model, wherein the test question library comprises label data labeled with the diversity and/or the novelty, and performing the supervised learning comprises performing supervised learning on an initial test model based on the label data, to obtain a trained test model.
- 11 . The method according to claim 1 , further comprising: obtaining, by the state encoder, the difference between the input test questions and at least one capability evaluation value to generate the state code.
- 12 . An online test apparatus, comprising: a memory storing program instructions; and a processor, coupled to the memory, configured to execute the program instructions stored in the memory, to cause the online test apparatus to: obtain a test question library, wherein the test question library comprises a plurality of test questions; and obtain a test model based on the test question library and a policy optimization algorithm, wherein the test model is used to select at least one test question from the test question library, the test model comprises a state encoder and a recommender, the state encoder is configured to generate a state code based on a difference between the test questions, the recommender is configured to output a test question based on the state code and an optimization objective of the policy optimization algorithm, the optimization objective comprises at least one of novelty or diversity, a factor for measuring the novelty comprises an exposure rate, and a factor for measuring the diversity comprises whether there is an added knowledge point.
- 13 . The online test apparatus according to claim 12 , wherein the optimization objective of the policy optimization algorithm comprises a reward function used to update the test model.
- 14 . The online test apparatus according to claim 13 , wherein the reward function is a reward function in a plurality of dimensions, and the reward function in the plurality of dimensions comprises at least two of quality, diversity, and novelty.
- 15 . The online test apparatus according to claim 14 , wherein the reward function in the plurality of dimensions comprises a quality reward, a diversity reward, and/or a novelty reward, wherein the quality reward is determined based on output accuracy of testing of the test model in the test question library, the diversity reward is determined based on whether a new knowledge point is added to a test question selected by the test model from the test question library for a current time relative to a test question selected by the test model from the test question library for at least one previous time, the novelty reward is determined based on whether the test question selected by the test model from the test question library for the current time is a hot test question, the test question selected by the test model from the test question library is classified into one of the hot test question and a non-hot test question, and a quantity of historical selection times of the hot test question is greater than a quantity of historical selection times of the non-hot test question.
- 16 . The online test apparatus according to claim 12 , wherein the test model further comprises a relationship-aware aggregator, an input of the relationship-aware aggregator comprises at least one of a prerequisite graph or a correlation graph, and the processor is further configured to cause the online test apparatus to: obtain, by the relationship-aware aggregator, an embedding representation of a relationship between knowledge points or an embedding representation of a relationship between a test question and a knowledge point based on the input, the prerequisite graph represents a sequential relationship between knowledge points in an input test question, and the correlation graph represents a correlation relationship between the test question and the knowledge point; and extract, by the state encoder, an association relationship between the test question and the knowledge point based on data output by the relationship-aware aggregator, and generate the state code based on the association relationship.
- 17 . The online test apparatus according to claim 12 , wherein the online test apparatus to obtain the test model based on the test question library and the policy optimization algorithm comprises the online test apparatus to: select the at least one test question from the test question library via the test model; and perform reinforcement learning on the test model based on an answering record of the at least one test question, to obtain the test model obtained through the reinforcement learning.
- 18 . The online test apparatus according to claim 17 , the processor is further configured to cause the online test apparatus to: obtain the answering record of the at least one test question from the test question library; or receive online answering data obtained by performing an operation on the at least one test question by a user, and obtaining the answering record of the at least one test question based on the online answering data.
- 19 . The online test apparatus according to claim 12 , wherein the test question library is divided into a candidate set and a meta-question set, the test question selected by the test model is a test question in the candidate set, the test question selected by the test model is further used to train the test model, and the meta-question set is used to calculate a reward in a plurality of dimensions; and the processor further configured to cause the online test apparatus to: perform, by the policy optimization algorithm, reinforcement learning comprised in the policy optimization algorithm, the reinforcement learning comprises a test phase and a verification phase, the candidate set is used to train the test model in the test phase, and the meta-question set is used to calculate the reward in the plurality of dimensions in the verification phase.
- 20 . A non-transitory computer-readable storage medium, comprising a program, wherein when the program is executed by a processor, the processor is configured to perform operations, comprising: obtaining a test question library, wherein the test question library comprises a plurality of test questions; and obtaining a test model based on the test question library and a policy optimization algorithm, wherein the test model is used to select at least one test question from the test question library, the test model comprises a state encoder and a recommender, the state encoder is configured to generate a state code based on a difference between the test questions, the recommender is configured to output a test question based on the state code and an optimization objective of the policy optimization algorithm, the optimization objective comprises at least one of novelty or diversity, a factor for measuring the novelty comprises an exposure rate, and a factor for measuring the diversity comprises whether there is an added knowledge point.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS This application is a continuation of International Application No. PCT/CN2024/090884, filed on Apr. 30, 2024, which claims priority to Chinese Patent Application No. 202310802114.7, filed on Jun. 30, 2023. The disclosures of the aforementioned applications are hereby incorporated by reference in their entireties. TECHNICAL FIELD This disclosure relates to the test field, and in particular, to an online test method and apparatus. BACKGROUND With rapid development of internet technologies, people gradually get rid of a paper- and -pen test mode of repetitive work, and therefore computerized adaptive testing (CAT) is proposed. CAT is an online test that can accurately measure capabilities of students by continuously providing most appropriate test questions to the students. CAT has been applied to many large-scale education and examination scenarios, for example, Test of English as a Foreign Language (TOEFL) and postgraduate entrance examinations. Basic logic of CAT is to obtain most comprehensive capability evaluation of testees with a minimum quantity of questions. For example, for a testee whose capability level is low, a highly difficult question cannot help evaluate a capability level of the testee. Questions with corresponding difficulty can be provided based on the capability levels of the testees, to obtain more accurate test results. This avoids selecting questions that are greatly different from capabilities of the testees, avoids waste of question-making opportunities, and avoids test-oriented rote practice. In existing online test manners, high-quality test question selection can be implemented. However, selected test questions may not mirror actual capabilities of the testees if only quality of test questions is focused. SUMMARY This disclosure provides an online test method and apparatus, to perform reinforcement learning from a plurality of dimensions, so as to select test questions for a user from the plurality of dimensions to ensure that a test result can better mirror an actual capability of the user. In view of this, according to a first aspect, this disclosure provides an online test method, including: obtaining a test question library, where the test question library includes a plurality of collected test questions; and obtaining a test model based on the test question library and a policy optimization algorithm, where the test model may be used to select at least one test question from the test question library in an online test process, the test model may specifically include a state encoder and a recommender, the state encoder is configured to obtain a difference between input test questions to generate a state code, the recommender may be configured to output a test question based on the state code and an optimization objective, the optimization objective includes at least one of novelty or diversity, a factor for measuring the novelty includes an exposure rate, and a factor for measuring the diversity includes whether there is an added knowledge point. In an embodiment of this disclosure, in an online test scenario, a test question may be selected from dimensions such as novelty and/or diversity, to select a test question with the novelty and/or the diversity for a user, so that the selected test question can more comprehensively test an answering capability of the user. In addition, the novelty means that a knowledge point corresponding to the test question selected by the test model has a novel feature, and the novelty may be measured based on an exposure rate of the knowledge point. For example, in each round of question selection, the test model needs to select, for the user, a test question corresponding to a knowledge point with a low exposure rate. The diversity means that the knowledge point corresponding to the test question selected by the test model has a diverse feature, and the diversity may be measured based on coverage of the knowledge point. For example, the test model may select, for the user, a test question containing more knowledge points. In a possible embodiment, an optimization objective of the policy optimization algorithm may include a reward function used to update the test model. Therefore, when the test model is updated, a required optimization objective may be set for learning, to obtain a test model that can output a test question matching the optimization objective. In a possible embodiment, the foregoing reward function may include a reward function in a plurality of dimensions, the reward function in the plurality of dimensions is used to update the test model, and the plurality of dimensions may include but are not limited to at least two of quality, diversity, and novelty. The novelty indicates controlling an exposure rate of an output test question, and the diversity indicates that a plurality of output test questions contain a plurality of knowledge points. In other words, the test model may be optimized ba