CN-121981285-A - Self-adaptive fine-tuning intelligent question reasoning and feedback method and system

CN121981285ACN 121981285 ACN121981285 ACN 121981285ACN-121981285-A

Abstract

The invention discloses a self-adaptive fine-tuning intelligent question reasoning and feedback method and system, and relates to the technical field of intelligent text analysis. The method comprises the steps of system initialization, receiving user input, extracting a question text and a mathematical formula, converting the question text and the mathematical formula into LaTeX and natural language description, integrating the LaTeX and the natural language description into structural data, generating a question solving thinking chain comprising question understanding, step decomposition, step-by-step reasoning and result verification, positioning error steps and judging types through semantic similarity calculation, sequence alignment and step matching degree evaluation, comprehensively obtaining answer correctness conclusions and missing step information, generating improvement suggestions and outputting the improvement suggestions in a multi-mode form, and utilizing LoRA to adaptively fine tune a basic model according to triggering conditions so as to adapt to user question styles and difficulties. The invention improves correction accuracy and individuation, and simultaneously gives consideration to interpretive reasoning process, deployment cost controllability and multi-mode input processing capability.

Inventors

YU XIANGYU
WANG HAORAN
SHAO XINPING
HAN TIANXING

Assignees

华院计算技术(上海)股份有限公司

Dates

Publication Date: 20260505
Application Date: 20260402

Claims (15)

1. The self-adaptive fine-tuning intelligent question reasoning and feedback method is characterized by comprising the following steps of: S1, initializing a system, wherein the system comprises a basic problem solving model, an OCR model, a formula translation model and a multi-agent system, wherein the multi-agent system constructs an agent according to the functions of problem analysis, problem solving reasoning, answer comparison, feedback generation and self-adaptive learning; S2, receiving user input, and extracting a topic text and a mathematical formula; S3, multi-mode input processing, namely converting a mathematical formula into a LaTeX formula and natural language description, and integrating the LaTeX formula and the natural language description with a topic text to form structural data serving as a thinking chain reasoning frame; S4, generating a problem solving thinking chain by analyzing the structured data, and generating a problem solving thinking chain, wherein the reasoning step of the thinking chain comprises the following link planning of problem understanding, step decomposition, step-by-step reasoning and result verification; S5, carrying out semantic similarity calculation, step matching degree evaluation and error step positioning on an answer submitted by a user and the solution thinking chain to obtain corresponding results, comprehensively judging according to a preset fusion strategy based on the results to obtain an answer correctness conclusion and error step, error type and missing step information, and generating an improvement suggestion; s6, feedback output, namely outputting the structural result generated in the step S5 to a user in the forms of text feedback, visual feedback and/or voice feedback; and S7, self-adaptive fine tuning, namely setting triggering conditions according to data accumulation and performance change, and automatically triggering LoRA the fine tuning of the model to maintain correction quality so as to adapt the model to the topic style and the difficulty level of a user.
2. The method according to claim 1, wherein in step S1, The basic solving model is a LLaMA large language model with the fine tuning parameter number of 3B by using Kaggle mathematical data sets; The OCR model is Nougat-small model; The formula translation model is obtained based on T5-small fine adjustment; The multi-intelligent system comprises a question analysis intelligent agent, a question solving reasoning intelligent agent, an answer comparison intelligent agent, a self-adaptive learning intelligent agent and a feedback generation intelligent agent.
3. The method according to claim 1, wherein in the step S3, the structured data includes a question ID, a question text, a LaTeX formula list, a natural language description list of formulas, an answer area, and a question solving step field, wherein the answer area is a question answering area reserved in a question and corresponds to a reference of a standard answer, and the question solving step field is used for storing the question solving step generated in the step S4.
4. The method according to claim 1, wherein in the step S4, the step of reasoning the mind chain specifically includes: Converting the questions into natural language description, and identifying the types of the questions and key information; decomposing the problem solving process into a plurality of orderly reasoning steps; Gradually reasoning, namely prompting engineering by using a thinking chain to generate a reasoning process and an intermediate result of each step; The result verification comprises the steps of self-checking an inference chain by using a large language model and/or consistency verification by a preset rule, so that the integrity of the inference chain is ensured; each inference step contains an inference process and intermediate results.
5. The method according to claim 1, wherein the comparison in step S5 specifically includes: calculating semantic similarity, namely calculating the overall semantic similarity of a user answer and a solution thinking chain and step-level semantic similarity between each step of the user answer and each step of the solution thinking chain by using a semantic understanding model to form a step-level similarity matrix; Step matching degree evaluation, namely carrying out sequence alignment on the step of solving the questions of the user answers and the step of solving the questions thinking chain based on the step-level similarity matrix to obtain a step alignment relation; The error step positioning and error type judging method comprises the steps of identifying error steps and missing steps in a user answer according to step alignment relation, step level similarity and conclusion consistency, and classifying error types of each error step, wherein the error types comprise at least one of calculation errors, formula or theorem misuse, logic reasoning errors, missing key steps, redundancy or irrelevant steps; Comprehensively judging according to semantic similarity, step matching degree and error step positioning result and a preset fusion strategy to obtain answer correctness conclusion, wherein the fusion strategy comprises layering judgment and/or weighted fusion; And generating an improvement suggestion, namely generating specific improvement suggestions and prompts according to the error steps, the error types and the missing steps.
6. The method of claim 5, wherein the step of determining the position of the probe is performed, In the semantic similarity calculation, the overall semantic similarity is vector similarity between the user answer and the full text or conclusion of the solving thinking chain, and the step-level semantic similarity is calculated by encoding each step of the user answer and each step of the solving thinking chain by a semantic understanding model; In the step matching degree evaluation, step sequence alignment adopts a dynamic programming sequence alignment or bipartite graph matching method, and the corresponding relation between the user steps and the standard steps is obtained according to a step-level similarity matrix, wherein the step matching degree is the proportion of the number of correctly aligned steps to the total number of the solution thinking chain steps or the harmonic average based on the number of the correctly aligned steps; In the error step positioning and error type judgment, for the aligned user steps and standard steps, when the step level similarity is lower than a set threshold value or the conclusion is inconsistent, marking the user steps as error steps; user steps which are not aligned with the solving thinking chain are marked as errors or redundancy according to the consistency with standard conclusion, the solving thinking chain steps which are not aligned with the user answers are marked as missing steps, and the error type is judged by rules and/or classification models according to the content and the context of the steps; the comprehensive judgment is carried out by judging as incorrect if any one of calculation errors, misuse of formulas or theories and logic reasoning errors exists, judging as partial correct or incomplete if no error exists and the step matching degree or the step integrity is lower than a threshold value, judging as correct if no error exists and the step matching degree and the step integrity are both higher than the threshold value and the whole semantic similarity is higher than the threshold value, weighting and fusing to obtain a comprehensive score according to the whole semantic similarity, the step matching degree, the step integrity and the error penalty term weighting, and obtaining a correctness conclusion according to comparison of the comprehensive score and the threshold value.
7. The method according to claim 1, wherein the triggering conditions in step S7 include: time triggering, namely periodically triggering, wherein the triggering period is weekly, biweekly or monthly; Triggering the data volume when the accumulated user operation data volume reaches a preset threshold value, wherein the threshold value is 500-2000 pieces; triggering when the correction accuracy rate is reduced to exceed a preset threshold value, wherein the threshold value is 5% -10%, and the accuracy rate is obtained based on a verification sample set with standard answers and/or manual spot check labels; the active triggering of the user is that the user manually triggers.
8. The method of claim 1, wherein the data volume trigger threshold is 1000.
9. The method according to claim 1, wherein the model fine tuning process in step S7 comprises: collecting the questions uploaded by the user and the corresponding question solving steps, and forming a training data set through manual verification/auditing by an expert; Data cleaning, namely cleaning training data to remove repeated data, abnormal data and low-quality data; Marking the question type, the difficulty level and the question solving step of the training data; LoRA fine tuning, namely fine tuning a large language model in a basic solution model by using a low-rank adaptation method, wherein the rank r of LoRA is 8-32; evaluating the performance of the trimmed model by using a verification set, wherein evaluation indexes comprise correction accuracy, reasoning quality and response speed; And (3) model deployment, namely deploying a new model and replacing an old model when the performance of the trimmed model meets the preset requirement.
10. The method of claim 9, wherein LoRA has a rank r of 16 and the lora fine tuning runs on a single Zhang Xiaofei stage GPU.
11. An intelligent topic reasoning and feedback system with adaptive fine tuning, wherein each module or agent in the system operates cooperatively to implement the steps of the method of any one of claims 1-10, the system architecture being: The application layer comprises an application module facing to a user; the intelligent agent layer comprises a question analysis intelligent agent, a self-adaptive learning intelligent agent, a question solving reasoning intelligent agent, an answer comparison intelligent agent and a feedback generation intelligent agent; The data processing layer comprises a data collection module, a data cleaning module and a data labeling module; model layer: loRA A Fine tuning Module, model evaluation Module, including OCR model a formula translation model, a large language model, and a semantic understanding model; The hardware layer comprises a single consumer GPU; and the storage layer comprises a training data storage module, a model parameter storage module and a user data storage module.
12. The system of claim 11, wherein the functional layout of the system is: an application layer, providing a user interaction interface, and uniformly calling each intelligent agent by taking the whole flow as a standard; the intelligent agent layer is used for identifying and understanding questions, generating a question solving reasoning step, comparing answers of students with standard answers, generating correction feedback, adaptively learning and continuously optimizing the system performance; The data processing layer is used for collecting training data, cleaning and preprocessing the data and labeling training samples; model layer, namely fine-tuning model parameters, evaluating model performance, identifying handwriting/printing questions, converting mathematical formulas and texts, reasoning and generating, analyzing answer semanteme and generating voice feedback; A hardware layer, which is to provide a single consumption-level GPU computing resource and support model reasoning and LoRA fine tuning; And the storage layer is used for storing the cleaned and marked data, storing the trimmed model parameters and storing the user operation and the modification record.
13. A multi-agent collaboration architecture to implement the steps of the method of any of claims 1-10, the multi-agent collaboration architecture comprising the following agents: The title analyzing agent recognizes the title type, extracts key information and converts the title into structured data through OCR model and natural language understanding; the problem-solving reasoning intelligent body is used for prompting engineering through the fine-tuned large language model and the thinking chain, generating a problem-solving thinking chain, executing gradual reasoning and providing a problem-solving thought; The answer comparison agent compares the answer of the user with the standard thought, identifies the wrong step and evaluates the correctness through semantic similarity calculation, rule matching and model judgment; the self-adaptive learning intelligent agent monitors data accumulation, triggers a fine adjustment task and optimizes model parameters through LoRA fine adjustment, data management and model version control; And feeding back to generate an agent, namely rendering, typesetting, speech synthesis and interface presentation of the S5 structured result.
14. The multi-agent collaboration architecture of claim 13, wherein the multi-agent collaboration comprises: the collaborative process comprises user input, question analysis agent, question solving reasoning agent, answer comparison agent, feedback generation agent, output result, wherein the output result is input into the self-adaptive learning agent by triggering; the agents communicate using a structured data format, with the output of each agent containing a confidence score, supporting feedback loops between agents.
15. A computer program product, characterized in that the computer program, when being executed by a processor, implements the steps of the method according to any one of claims 1-10.

Description

Self-adaptive fine-tuning intelligent question reasoning and feedback method and system Technical Field The invention relates to the technical field of intelligent text analysis, in particular to an intelligent topic reasoning and feedback method and system with self-adaptive fine tuning. Background With the rapid development of artificial intelligence technology, intelligent education auxiliary systems are increasingly being applied to teaching scenes. The traditional operation correction method is mainly divided into two types, namely a correction system based on rules, correction is carried out through preset rules and templates, but flexibility and learning capability are not provided, and the correction is carried out by using a pre-trained large language model, and the model parameters are fixed and cannot be adaptively optimized according to a use scene and a theme style of a user although the correction is carried out by using the pre-trained large language model. The existing intelligent correction system has the following technical problems: The static model problem is that the existing system mostly adopts a fixed pre-training model, and cannot be dynamically adjusted according to the question type, the difficulty distribution and the user habit in the actual use scene, so that the correction accuracy is difficult to continuously improve. The existing system usually directly gives answers or simply misjudges, cannot provide detailed problem solving ideas and reasoning processes, and is not beneficial to culturing the problem solving capability of students. The cost and the performance contradict that a large-scale model has excellent performance but high deployment cost and needs a large amount of calculation resources, and a small-scale model has low cost but limited performance and is difficult to meet actual requirements. The prior system has limited recognition and understanding capability when processing jobs containing complex contents such as mathematical formulas, charts and the like, and particularly has inaccurate processing on LaTeX format formulas. The prior system cannot be optimized in a personalized way according to the use data (such as the preference of the topic style, the difficulty level and the like) of the user, and the education concept of 'teaching in accordance with the material' is difficult to realize. Therefore, there is an urgent need for an intelligent job modifying method that can automatically adapt, continuously optimize, provide a derivation process, and be cost-effective. Disclosure of Invention Aiming at the technical defects of static model lack of reasoning process, contradiction between cost and performance, insufficient multi-mode processing, lack of personalized adaptation and the like in the prior art, the invention aims to provide an intelligent operation correcting method which can automatically adapt, continuously optimize, provide detailed reasoning process and has low deployment cost. In a first aspect, the present invention provides a method and a system for adaptively and finely tuning intelligent question reasoning and feedback, comprising the following steps: S1, initializing a system, wherein the system comprises a basic problem solving model, an OCR model, a formula translation model and a multi-agent system, wherein the multi-agent system constructs an agent according to the functions of problem analysis, problem solving reasoning, answer comparison, feedback generation and self-adaptive learning; S2, receiving user input, and extracting a topic text and a mathematical formula; S3, multi-mode input processing, namely converting a mathematical formula into a LaTeX formula and natural language description, and integrating the LaTeX formula and the natural language description with a topic text to form structural data serving as a thinking chain reasoning frame; S4, generating a problem solving thinking chain by analyzing the structured data, and generating a problem solving thinking chain, wherein the reasoning step of the thinking chain comprises the following link planning of problem understanding, step decomposition, step-by-step reasoning and result verification; s5, carrying out semantic similarity calculation, step matching degree evaluation and error step positioning results on answers submitted by users and the solution thinking chain, and comprehensively judging according to a preset fusion strategy based on the results to obtain answer correctness conclusions and error steps, error types and missing step information, thereby generating improvement suggestions; s6, feedback output, namely outputting the structural result generated in the step S5 to a user in the form of text feedback, visual feedback and/or voice feedback, wherein the step S6 does not judge the answer any more; and S7, self-adaptive fine tuning, namely setting triggering conditions according to data accumulation and performance change, and automatically triggering LoRA