CN-122021672-A - Machine translation method, device, equipment, storage medium and program product based on reinforcement learning

CN122021672ACN 122021672 ACN122021672 ACN 122021672ACN-122021672-A

Abstract

The invention provides a machine translation method, a device, equipment, a storage medium and a program product based on reinforcement learning, and relates to the technical field of translation, wherein the method comprises the steps of obtaining a large language model, wherein the large language model comprises a translation task layer and a post-editing task layer; the method comprises the steps of processing first tuple data through a translation task layer to obtain a candidate translation text set, determining a first dominance value according to the candidate translation text set, processing second tuple data through a post-editing task layer to obtain a post-editing text set, determining a second dominance value according to the post-editing text set, and performing reinforcement learning training on the large language model based on the first dominance value and the second dominance value to obtain a target large language model so as to generate a machine translation result based on the target large language model. The invention can effectively improve the machine translation quality.

Inventors

HUANG SHUJIAN
Shen Yunzhi
HUANG XIN
HAN XUE
FENG JUNLAN

Assignees

中移九天人工智能科技(北京)有限公司
中国移动通信集团有限公司
中国移动通信集团江苏有限公司
南京大学

Dates

Publication Date: 20260512
Application Date: 20260130

Claims (10)

1. A machine translation method based on reinforcement learning, comprising: Acquiring a large language model, wherein the large language model comprises a translation task layer and a post-editing task layer; Processing the first tuple data through the translation task layer to obtain a candidate translation text set, and determining a first dominance value according to the candidate translation text set; processing the second tuple data through the post-editing task layer to obtain a post-editing text set, and determining a second dominance value according to the post-editing text set; And performing reinforcement learning training on the large language model based on the first dominance value and the second dominance value to obtain a target large language model so as to generate a machine translation result based on the target large language model.
2. The reinforcement learning-based machine translation method of claim 1, wherein said first tuple data comprises a first source language text and a first reference translation, wherein said processing the first tuple data by said translation task layer to obtain a candidate set of translated texts, and determining a first dominance value from the candidate set of translated texts comprises: Processing the first source language text through the translation task layer to obtain a candidate translation text set; a first dominance value is determined by a quality estimation model based on the first reference translation and the set of candidate translated texts.
3. The reinforcement learning-based machine translation method of claim 2, wherein said determining a first dominance value by a quality estimation model based on said first reference translation and said set of candidate translated texts comprises: scoring each candidate translation text in the candidate translation text set through the quality estimation model based on the first reference translation to obtain a first scoring result set; determining a first average value and a first standard deviation according to the first scoring result set; a first dominance value for each candidate translation text in the set of candidate translation texts is determined based on the first set of scoring results, the first average value, and the first standard deviation.
4. The reinforcement learning-based machine translation method of claim 1, wherein the second tuple data comprises a second source language text, a second language text to be edited, and a second reference translation, wherein the processing the second tuple data by the post-editing task layer to obtain a post-editing text set, and determining a second dominance value according to the post-editing text set comprises: processing the second language text to be edited through the post-editing task layer based on the second reference translation to obtain a post-editing text set; Scoring the second language text to be edited through a quality estimation model to obtain a basic scoring result; And determining a second dominance value through a quality estimation model based on the base scoring result, the second source language text, the second reference translation, and the post-edit text set.
5. The reinforcement learning-based machine translation method of claim 4, wherein said determining a second dominance value by a quality estimation model based on said base scoring result, said second source language text, said second reference translation, and said post-edit text set comprises: Scoring each post-editing text in the post-editing text set through the quality estimation model based on the second reference translation to obtain a second scoring result set; determining a second average value and a second standard deviation according to the basic scoring result and the second scoring result set; and determining a second dominance value of each post-edit text in the post-edit text set based on the base scoring result, the second scoring result set, the second average value, and the second standard deviation.
6. The reinforcement learning-based machine translation method of claim 1, wherein the reinforcement learning training of the large language model based on the first dominance value and the second dominance value to obtain a target large language model comprises: and based on the first dominance value or the second dominance value, optimizing an objective function through a maximization strategy, and iteratively updating model parameters of the large language model until the model converges to obtain the target large language model.
7. A reinforcement learning-based machine translation device, comprising: The system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a large language model, and the large language model comprises a translation task layer and a post-editing task layer; The determining module is used for processing the first tuple data through the translation task layer to obtain a candidate translation text set, and determining a first dominance value according to the candidate translation text set; The determining module is used for processing the second tuple data through the post-editing task layer to obtain a post-editing text set, and determining a second dominance value according to the post-editing text set; and the training module is used for performing reinforcement learning training on the large language model based on the first dominance value and the second dominance value to obtain a target large language model so as to generate a machine translation result based on the target large language model.
8. An electronic device comprising a memory, a processor, and a computer program stored on the memory and running on the processor, wherein the processor implements the reinforcement learning based machine translation method of any one of claims 1 to 6 when the computer program is executed by the processor.
9. A non-transitory computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the reinforcement learning based machine translation method of any of claims 1 to 6.
10. A computer program product comprising a computer program which, when executed by a processor, implements the reinforcement learning based machine translation method of any one of claims 1 to 6.

Description

Machine translation method, device, equipment, storage medium and program product based on reinforcement learning Technical Field The present invention relates to the field of translation technologies, and in particular, to a machine translation method, apparatus, device, storage medium, and program product based on reinforcement learning. Background With the rapid development of Large Language Models (LLM), post-training techniques based on reinforcement learning have become a key means to improve the quality of machine translation. Currently, the mainstream reinforcement learning approach (such as GRPO training paradigm based on automatic assessment index rewards) is focused mainly on optimizing a single translation task, i.e. direct conversion from source language to target language. However, this optimization approach, which focuses only on a single translation task, tends to limit the model to the surface layer mapping of the source language and the target language, which makes it difficult for the model to learn a more focused expression, thereby affecting the final machine translation quality. Disclosure of Invention The invention provides a machine translation and post-editing mixed training technology (invention name) based on reinforcement learning, which is used for solving the problem that the machine translation quality of a large language model is affected by adopting a traditional reinforcement learning training model in the prior art. The invention provides a machine translation method based on reinforcement learning, which comprises the following steps: Acquiring a large language model, wherein the large language model comprises a translation task layer and a post-editing task layer; Processing the first tuple data through the translation task layer to obtain a candidate translation text set, and determining a first dominance value according to the candidate translation text set; processing the second tuple data through the post-editing task layer to obtain a post-editing text set, and determining a second dominance value according to the post-editing text set; And performing reinforcement learning training on the large language model based on the first dominance value and the second dominance value to obtain a target large language model so as to generate a machine translation result based on the target large language model. The machine translation method based on reinforcement learning provided by the invention, wherein the first tuple data comprises a first source language text and a first reference translation, the first tuple data is processed by the translation task layer to obtain a candidate translation text set, and a first dominance value is determined according to the candidate translation text set, and the method comprises the following steps: Processing the first source language text through the translation task layer to obtain a candidate translation text set; a first dominance value is determined by a quality estimation model based on the first reference translation and the set of candidate translated texts. According to the machine translation method based on reinforcement learning provided by the invention, the first dominance value is determined by a quality estimation model based on the first reference translation and the candidate translation text set, and the method comprises the following steps: scoring each candidate translation text in the candidate translation text set through the quality estimation model based on the first reference translation to obtain a first scoring result set; determining a first average value and a first standard deviation according to the first scoring result set; a first dominance value for each candidate translation text in the set of candidate translation texts is determined based on the first set of scoring results, the first average value, and the first standard deviation. The machine translation method based on reinforcement learning provided by the invention, wherein the second tuple data comprises a second source language text, a second language text to be edited and a second reference translation, the second tuple data is processed by the post-editing task layer to obtain a post-editing text set, and a second dominance value is determined according to the post-editing text set, and the method comprises the following steps: processing the second language text to be edited through the post-editing task layer based on the second reference translation to obtain a post-editing text set; Scoring the second language text to be edited through a quality estimation model to obtain a basic scoring result; And determining a second dominance value through a quality estimation model based on the base scoring result, the second source language text, the second reference translation, and the post-edit text set. The machine translation method based on reinforcement learning provided by the invention is characterized in that the determining a second