CN-122021907-A - Translation-based multilingual reasoning method, device, apparatus, storage medium, and program product

CN122021907ACN 122021907 ACN122021907 ACN 122021907ACN-122021907-A

Abstract

The invention provides a multi-language reasoning method, a device, equipment, a storage medium and a program product based on translation, which relate to the technical field of translation reasoning and comprise the steps of obtaining a large language model, wherein the large language model comprises a cross-language reasoning layer, an autonomous translation layer and a target language reasoning layer; based on language prompt, processing target English questions through the cross-language reasoning layer to generate first target language answers, translating the target English questions through the autonomous translation layer to generate self-translated questions, reasoning the self-translated questions through the target language reasoning layer to generate second target language answers, and training the large language model based on the first target language answers and the second target language answers to obtain a target large language model so as to generate multi-language reasoning results based on the target large language model. The invention can promote the multi-language reasoning capability of the large language model.

Inventors

HUANG SHUJIAN
LIU JUNXIAO
LI YIXIAO
HUANG XIN
HAN XUE
FENG JUNLAN

Assignees

中移九天人工智能科技(北京)有限公司
中国移动通信集团有限公司
中国移动通信集团江苏有限公司
南京大学

Dates

Publication Date: 20260512
Application Date: 20260130

Claims (10)

1. A translation-based multilingual reasoning method, comprising: acquiring a large language model, wherein the large language model comprises a cross-language reasoning layer, an autonomous translation layer and a target language reasoning layer; processing the target English question through the cross-language reasoning layer based on the language prompt to generate a first target language answer; translating the target English question through the autonomous translation layer to generate a self-translation question; Reasoning the self-translation problem through the target language reasoning layer to generate a second target language answer; Training the large language model based on the first target language answer and the second target language answer to obtain a target large language model so as to generate a multi-language reasoning result based on the target large language model.
2. The method for translation-based multilingual reasoning according to claim 1, wherein the processing of target english questions by the cross-language reasoning layer further comprises, before generating the first target language answer: randomly selecting an initial English question from the English reasoning data set; sampling and answering the initial English questions through the cross-language reasoning layer based on the language prompt, and outputting a plurality of initial answers corresponding to the initial English questions, wherein the language prompt is used for guiding the cross-language reasoning layer to answer English questions by using a target language; And calculating the answer accuracy rate for the initial English questions according to the plurality of initial answers based on the standard answers, and determining the initial English questions with the answer accuracy rate greater than a predefined threshold as target English questions.
3. The method of claim 1, wherein training the large language model based on the first target language answer and the second target language answer to obtain a target large language model comprises: Calculating a first-stage rewarding value according to the first target language answer based on the standard answer, and calculating a second-stage rewarding value and a third-stage rewarding value according to the second target language answer; Calculating a composite prize value based on the first stage prize value, the second stage prize value, and the third stage prize value; and training the large language model based on the comprehensive rewarding value to obtain a target large language model.
4. The translation-based multilingual reasoning method of claim 3 wherein the calculating a first phase reward value based on the first target language answer and a second phase reward value and a third phase reward value based on the second target language answer based on the standard answer comprises: calculating a first stage prize value based on the standard answer, according to the first format prize, the first language-consistent prize, the first accurate prize and the first repeated prize of the first target language answer; Calculating a second-stage rewarding value according to the answer accuracy of the second target language answer based on the standard answer; And calculating a third-stage prize value according to the second format prize, the second language-consistent prize, the second accuracy prize and the second repeated prize calculation second-stage prize value of the second target language answer based on the standard answer.
5. The method of claim 3, wherein training the large language model based on the comprehensive rewards value to obtain a target large language model comprises: and based on the comprehensive rewarding value, performing reinforcement learning training on the large language model through a group relative strategy optimization algorithm to obtain the target large language model.
6. The method of claim 5, wherein the performing reinforcement learning training on the large language model by a group relative strategy optimization algorithm based on the comprehensive rewards value to obtain the target large language model comprises: Performing policy sampling on the target English questions for multiple times through the large language model to generate multiple different output groups, wherein each output group comprises a first target language answer and a second target language answer; Respectively calculating comprehensive rewards corresponding to the first target language answers and the second target language answers in each output group; based on the comprehensive rewards of all the output groups under the same target English problem, calculating the relative advantage scores of all the output groups; And calculating a strategy gradient based on the relative dominance score, and iteratively updating model parameters of the large language model based on the strategy gradient until the model converges to obtain a target large language model.
7. A translation-based multilingual reasoning apparatus comprising: The system comprises an acquisition module, a target language reasoning module and a target language modeling module, wherein the acquisition module is used for acquiring a large language model, and the large language model comprises a cross-language reasoning layer, an autonomous translation layer and the target language reasoning layer; the generation module is used for processing the target English questions through the cross-language reasoning layer based on the language prompt to generate first target language answers; The generating module is further used for translating the target English problem through the autonomous translation layer to generate a self-translation problem; The generation module is further used for reasoning the self-translation problem through the target language reasoning layer to generate a second target language answer; The training module is used for training the large language model based on the first target language answer and the second target language answer to obtain a target large language model so as to generate a multi-language reasoning result based on the target large language model.
8. An electronic device comprising a memory, a processor and a computer program stored on the memory and running on the processor, characterized in that the processor implements the translation-based multilingual reasoning method as claimed in any of claims 1 to 6 when the computer program is executed.
9. A non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor implements a translation-based multilingual reasoning method as claimed in any of claims 1 to 6.
10. A computer program product comprising a computer program which, when executed by a processor, implements a translation-based multilingual reasoning method as claimed in any one of claims 1 to 6.

Description

Translation-based multilingual reasoning method, device, apparatus, storage medium, and program product Technical Field The present invention relates to the field of translation reasoning, and in particular, to a translation-based multilingual reasoning method, apparatus, device, storage medium, and program product. Background With the rapid development of Large Language Models (LLM), inference models (Long Reasoning Models, LRMs, such as OpenAI o [1], deepSeek-R [1] [2], etc.) with Long thinking chain (Long CoT) capability have made significant breakthroughs in complex tasks such as mathematics, codes and logical reasoning. However, due to the predominance of english in the pre-training data, these models tend to lag significantly behind english in processing non-english (especially low resource language) tasks, and are prone to "language inconsistencies" (i.e., the input is non-english, the models primarily use english for thinking and answering, resulting in performance degradation), which can severely impact the multi-language reasoning capabilities of large language models. Disclosure of Invention The invention provides a multi-language reasoning method, device, equipment, storage medium and program product based on translation, which are used for solving the defect that the multi-language reasoning capability of a large language model is influenced by lack of low-resource language training data in the prior art. The invention provides a multi-language reasoning method based on translation, which comprises the following steps: acquiring a large language model, wherein the large language model comprises a cross-language reasoning layer, an autonomous translation layer and a target language reasoning layer; processing the target English question through the cross-language reasoning layer based on the language prompt to generate a first target language answer; translating the target English question through the autonomous translation layer to generate a self-translation question; Reasoning the self-translation problem through the target language reasoning layer to generate a second target language answer; Training the large language model based on the first target language answer and the second target language answer to obtain a target large language model so as to generate a multi-language reasoning result based on the target large language model. According to the translation-based multilingual reasoning method provided by the invention, before the target english question is processed by the cross-language reasoning layer and the first target language answer is generated, the method further comprises: randomly selecting an initial English question from the English reasoning data set; sampling and answering the initial English questions through the cross-language reasoning layer based on the language prompt, and outputting a plurality of initial answers corresponding to the initial English questions, wherein the language prompt is used for guiding the cross-language reasoning layer to answer English questions by using a target language; And calculating the answer accuracy rate for the initial English questions according to the plurality of initial answers based on the standard answers, and determining the initial English questions with the answer accuracy rate greater than a predefined threshold as target English questions. According to the multi-language reasoning method based on translation provided by the invention, the training of the large language model based on the first target language answer and the second target language answer to obtain a target large language model comprises the following steps: Calculating a first-stage rewarding value according to the first target language answer based on the standard answer, and calculating a second-stage rewarding value and a third-stage rewarding value according to the second target language answer; Calculating a composite prize value based on the first stage prize value, the second stage prize value, and the third stage prize value; and training the large language model based on the comprehensive rewarding value to obtain a target large language model. According to the multi-language reasoning method based on translation provided by the invention, the multi-language reasoning method based on standard answers calculates a first stage rewarding value according to the first target language answer, calculates a second stage rewarding value and a third stage rewarding value according to the second target language answer, and comprises the following steps: calculating a first stage prize value based on the standard answer, according to the first format prize, the first language-consistent prize, the first accurate prize and the first repeated prize of the first target language answer; Calculating a second-stage rewarding value according to the answer accuracy of the second target language answer based on the standard answer; And calculating a third-stage prize