CN-121998083-A - Mathematical and logical problem processing method, system, equipment and medium
Abstract
The invention provides a mathematical and logical problem processing method, a system, equipment and a medium, which relate to the technical field of artificial intelligence and natural language processing, and the method comprises the steps of acquiring a plurality of mathematical theorem information corresponding to a target problem, wherein the target problem represents a mathematical problem or a logical problem in a natural language form; the method comprises the steps of generating a word vector sequence based on the target question and a plurality of pieces of mathematical theorem information, inputting the word vector sequence into a custom attention layer, repeatedly executing a plurality of rounds of attention process, and obtaining a context representation with mathematical theorem knowledge, wherein the custom attention layer is formed based on a cross attention module, and generating a target answer corresponding to the target question based on the context representation. The invention enables theorem content to participate in the model reasoning process, thereby realizing directional guidance and context enhancement of the current problem, effectively relieving the illusion of knowledge and improving the reasoning consistency and answer reliability.
Inventors
- ZHU BINGKE
- LI YU
- CHEN YINGYING
- WANG JINQIAO
Assignees
- 中国科学院自动化研究所
Dates
- Publication Date
- 20260508
- Application Date
- 20251230
Claims (10)
- 1. A method for processing mathematical and logical problems, comprising: acquiring a plurality of mathematical theorem information corresponding to a target problem, wherein the target problem represents a mathematical class problem or a logical class problem in a natural language form; Generating a word vector sequence based on the target problem and a plurality of pieces of mathematical theorem information; inputting the word vector sequence to a custom attention layer, and repeatedly executing a multi-round attention process to obtain a context representation with mathematical theorem knowledge, wherein the custom attention layer is formed based on a cross attention module; And generating a target answer corresponding to the target question based on the context representation.
- 2. The method for processing a mathematical and logical problem of claim 1, wherein the obtaining a plurality of mathematical theorem information corresponding to the target problem comprises: Based on a large language model or a vector retrieval module, carrying out semantic similarity matching on the target problem and theorem knowledge in a structured theorem knowledge base to obtain Top-K theorem knowledge; and taking the Top-K theorem knowledge as a plurality of pieces of mathematical theorem information corresponding to the target problem.
- 3. The mathematical and logical problem processing method of claim 1, wherein the generating a word vector sequence based on the target problem and a plurality of the mathematical theorem information, comprises: based on a preset word segmentation device, respectively carrying out segmentation processing on the target problem and a plurality of pieces of mathematical theorem information to obtain the segmented target problem and the segmented mathematical theorem information; Based on embedding layers in a large language model, respectively carrying out embedded coding on the target problem after the segmentation processing and a plurality of mathematical theorem information after the segmentation processing to obtain a problem vector sequence and an theorem vector sequence; And generating the word vector sequence according to the problem vector sequence and the theorem vector sequence.
- 4. A mathematical and logical problem processing method as claimed in claim 3, wherein said custom attention layer comprises a plurality of serial attention sub-modules, said attention sub-modules comprising at least one self-attention header and at least one cross-attention header performing parallel calculations, said attention sub-modules for modeling the interaction relationship between said sequence of problem vectors and said sequence of theorem vectors.
- 5. The method of claim 4, wherein inputting the sequence of word vectors into the custom attention layer repeatedly performs a plurality of rounds of attention process to obtain a contextual representation with knowledge of mathematical theorem, comprising: And taking the problem vector sequence as a Query, and simultaneously taking the theorem vector sequence as Key and Value, inputting the Key and the Value to the custom attention layer, repeatedly executing the multi-round attention process to carry out attention weighting, and obtaining the context representation.
- 6. The method for processing mathematical and logical problems as claimed in claim 5, wherein said repeatedly performing a plurality of rounds of attention process performs attention weighting, in particular comprising: after the attention sub-module of the upper layer carries out attention weighting, outputting the upper part and the lower part Wen Biaozheng of the upper layer; Inputting the context representation of the previous layer to an attention submodule of the current layer to carry out attention weighting, and outputting the upper and lower Wen Biaozheng of the current layer by the attention submodule of the current layer; After determining that all attention sub-modules in the custom attention layer complete the attention weighting, a contextual representation of the attention sub-module output of the last layer in the custom attention layer is used as the contextual representation.
- 7. The method of claim 1, wherein generating a target answer corresponding to the target question based on the context representation comprises: Based on the context representation, calculating to obtain corresponding vocabulary prediction probability distribution in each reasoning time step; And generating the target answers in a reasoning stage successively based on a preset searching strategy and the vocabulary prediction probability distribution.
- 8. A mathematical and logical problem processing system, comprising: The retrieval module is used for acquiring a plurality of mathematical theorem information corresponding to a target problem, wherein the target problem represents a mathematical problem or a logical problem in a natural language form; The vector sequence generating module is used for generating a word vector sequence based on the target problem and a plurality of pieces of mathematical theorem information; The multi-round attention module is used for inputting the word vector sequence into a custom attention layer to repeatedly execute a multi-round attention process to obtain a context representation with mathematical theorem knowledge, wherein the custom attention layer is formed based on the cross attention module; And the answer generation module is used for generating a target answer corresponding to the target question based on the context representation.
- 9. An electronic device comprising a memory, a processor and a computer program stored on the memory and running on the processor, wherein the processor implements the mathematical and logical problem processing method of any one of claims 1 to 7 when the computer program is executed by the processor.
- 10. A non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the mathematical and logical problem processing method of any of claims 1 to 7.
Description
Mathematical and logical problem processing method, system, equipment and medium Technical Field The invention relates to the technical field of artificial intelligence and natural language processing, in particular to a method, a system, equipment and a medium for processing mathematical and logical problems. Background With remarkable progress of large language models (Large Language Model, abbreviated as LLM) in tasks such as general questions and answers, text generation and the like, the application of the large language models in the fields of mathematics and logical reasoning is increasingly wide. The traditional end-to-end training large language model often presents a plurality of defects when facing the problem of relying on axiom and theorem systems. In particular, a illusion of knowledge, i.e. a deduction process or answer in which the model "compiles" what appears to be reasonable but actually wrong in the absence of explicit support; the method has the advantages that the context forgets and reasoning breaks, in long-chain pushing, a large language model is difficult to continuously track key preconditions or theorem, so that errors in intermediate steps are accumulated, the method lacks of interpretability, answers output by the large language model cannot be traced to specific mathematical principles or theorem basis, application of high-reliability scenes such as education, scientific research and the like is not facilitated, generalization capability is weak, and the large language model is unstable in performance of questions or combinations which are not found in training data, particularly in tasks requiring to call collaborative reasoning of a plurality of theorems. The existing solution mostly adopts strategies such as prompt engineering, thinking chain or fine adjustment, and the like, but has a certain effect, but still does not fundamentally solve the problems of active utilization and dynamic guiding of the structured knowledge by the large language model. Some studies attempt to introduce external knowledge bases, such as mathematical formula databases, but often only serve as static search sources, failing to be deeply fused with the large language model internal reasoning mechanisms. Accordingly, there is a need for a method, system, apparatus, and medium for processing mathematical and logical problems. Disclosure of Invention Aiming at the problems existing in the prior art, the invention provides a method, a system, equipment and a medium for processing mathematical and logical problems. The invention provides a mathematical and logical problem processing method, which comprises the following steps: acquiring a plurality of mathematical theorem information corresponding to a target problem, wherein the target problem represents a mathematical class problem or a logical class problem in a natural language form; Generating a word vector sequence based on the target problem and a plurality of pieces of mathematical theorem information; inputting the word vector sequence to a custom attention layer, and repeatedly executing a multi-round attention process to obtain a context representation with mathematical theorem knowledge, wherein the custom attention layer is formed based on a cross attention module; And generating a target answer corresponding to the target question based on the context representation. According to the method for processing the mathematical and logical problems provided by the invention, the method for acquiring a plurality of mathematical theorem information corresponding to the target problems comprises the following steps: Based on a large language model or a vector retrieval module, carrying out semantic similarity matching on the target problem and theorem knowledge in a structured theorem knowledge base to obtain Top-K theorem knowledge; and taking the Top-K theorem knowledge as a plurality of pieces of mathematical theorem information corresponding to the target problem. According to the method for processing mathematical and logical problems provided by the invention, the generating of a word vector sequence based on the target problem and a plurality of pieces of mathematical theorem information comprises the following steps: based on a preset word segmentation device, respectively carrying out segmentation processing on the target problem and a plurality of pieces of mathematical theorem information to obtain the segmented target problem and the segmented mathematical theorem information; Based on embedding layers in a large language model, respectively carrying out embedded coding on the target problem after the segmentation processing and a plurality of mathematical theorem information after the segmentation processing to obtain a problem vector sequence and an theorem vector sequence; And generating the word vector sequence according to the problem vector sequence and the theorem vector sequence. According to the mathematical and logical proble