CN-115358245-B - Translation model training method and device, translation method and device
Abstract
The invention provides a training method and device of a translation model, and the translation method and device, wherein the training method of the translation model comprises the steps of obtaining training data of M translation tasks and a preset translation model, determining target affinities between any two translation tasks for the training data of any two translation tasks and the preset translation model in the M translation tasks, clustering each translation task based on each target affinity to obtain a plurality of target task clusters, and training the target translation model corresponding to each target task cluster based on the training data of each translation task in each target task cluster. The target task clusters are obtained by determining target affinities between any two translation tasks, and the target translation models corresponding to the target task clusters are trained based on the target task clusters, so that the accuracy of multi-language translation of the target translation models can be improved.
Inventors
- ZHANG JIAJUN
- WANG QIAN
- ZONG CHENGQING
Assignees
- 中国科学院自动化研究所
Dates
- Publication Date
- 20260508
- Application Date
- 20220706
Claims (10)
- 1. A method for training a translation model, comprising: Acquiring training data of M translation tasks and a preset translation model, wherein M is greater than 1; For training data of any two translation tasks in the M translation tasks and a preset translation model, determining target affinity between the any two translation tasks, wherein the target affinity is obtained by training the preset translation model based on the any two translation tasks; clustering each translation task based on each target affinity to obtain a plurality of target task clusters; Training a target translation model corresponding to each target task cluster based on training data of each translation task in each target task cluster; The determining the target affinity between any two translation tasks in the training data and the preset translation model for the any two translation tasks in the M translation tasks comprises the following steps: Inputting training data of a first translation task in the any two translation tasks into the preset translation model to obtain a first loss function value of the preset translation model at a time t aiming at the first translation task; Updating the preset translation model based on training data of a second translation task in the any two translation tasks, and inputting the training data of the first translation task into the updated preset translation model to obtain a second loss function value of the updated preset translation model at the time t+1 for the first translation task; calculating the difference between the first loss function value and the second loss function value, and determining the affinity between the first translation task and the second translation task at the time t; And under the condition that the preset translation model reaches a convergence condition, determining the target affinity based on affinities between the first translation task and the second translation task at all moments.
- 2. The method for training a translation model according to claim 1, wherein clustering each of the translation tasks based on each of the target affinities to obtain a plurality of target task clusters comprises: determining an auxiliary translation task corresponding to each translation task based on each target affinity, wherein the target affinity between the auxiliary translation task and the translation task is greater than 0; clustering each translation task and auxiliary translation tasks corresponding to each translation task to obtain M initial task clusters; Clustering is carried out on each initial task cluster, and a plurality of target task clusters are obtained.
- 3. The method for training a translation model according to claim 2, wherein clustering each of the initial task clusters to obtain a plurality of target task clusters comprises: For any two initial task clusters in the M initial task clusters, calculating the overall affinity between the any two initial task clusters; and carrying out clustering treatment on the two initial task clusters with the maximum overall affinity in a circulating way until a preset stopping condition is reached, so as to obtain a plurality of target task clusters.
- 4. The method for training a translation model according to claim 1, wherein training the target translation model corresponding to each task cluster based on training data of each translation task in each target task cluster comprises: Sampling training data of each translation task and auxiliary translation tasks in each target task cluster by using a preset sampling rule to obtain sampling training data corresponding to each target task cluster; And inputting the sampling training data corresponding to each target task cluster into a target translation model corresponding to each target task cluster for training until the training stopping condition is reached.
- 5. A method of translation, comprising: Acquiring at least one text to be translated; Inputting the text to be translated into a target translation model corresponding to the text to be translated, and obtaining a multi-language translation result output by the target translation model; Wherein the target translation model is trained by the training method of the translation model according to any one of claims 1 to 4.
- 6. A training device for a translation model, comprising: the first acquisition module is used for acquiring training data of M translation tasks and a preset translation model, wherein M is greater than 1; The determining module is used for determining target affinities between any two translation tasks of the M translation tasks and a preset translation model, wherein the target affinities are obtained by training the preset translation model based on the any two translation tasks; the processing module is used for clustering each translation task based on each target affinity to obtain a plurality of target task clusters; The training module is used for training the target translation model corresponding to each target task cluster based on the training data of each translation task in each target task cluster; The determining the target affinity between any two translation tasks in the training data and the preset translation model for the any two translation tasks in the M translation tasks comprises the following steps: Inputting training data of a first translation task in the any two translation tasks into the preset translation model to obtain a first loss function value of the preset translation model at a time t aiming at the first translation task; Updating the preset translation model based on training data of a second translation task in the any two translation tasks, and inputting the training data of the first translation task into the updated preset translation model to obtain a second loss function value of the updated preset translation model at the time t+1 for the first translation task; calculating the difference between the first loss function value and the second loss function value, and determining the affinity between the first translation task and the second translation task at the time t; And under the condition that the preset translation model reaches a convergence condition, determining the target affinity based on affinities between the first translation task and the second translation task at all moments.
- 7. A method of translation, comprising: The second acquisition module is used for acquiring at least one text to be translated; the input module is used for inputting the text to be translated into a target translation model corresponding to the text to be translated, and obtaining a multilingual translation result output by the target translation model; Wherein the target translation model is trained by the training method of the translation model according to any one of claims 1 to 4.
- 8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the training method of the translation model according to any one of claims 1 to 4 or the translation method according to claim 5 when executing the program.
- 9. A non-transitory computer readable storage medium having stored thereon a computer program, characterized in that the computer program, when executed by a processor, implements a method of training a translation model according to any of claims 1 to 4, or a translation method according to claim 5.
- 10. A computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements a method for training a translation model according to any one of claims 1 to 4 or a translation method according to claim 5.
Description
Translation model training method and device, translation method and device Technical Field The present invention relates to the field of natural language processing technologies, and in particular, to a training method and apparatus for a translation model, and a translation method and apparatus. Background With the development of deep learning technology, machine translation is a hot problem in the field of natural language processing, and aims to automatically translate text in one language into text in another language by using computer technology. Compared with the traditional statistical machine translation method, the machine translation method based on the neural network has obvious improvement. By using a neural network machine translation model, the mapping relation between one language set and another language set can be directly modeled by expanding language data of a source end or a target end, and translations among a plurality of languages are processed in the same model, so that the efficiency of deploying applications is improved. In the related art, a common practice in research on multilingual translation is to construct a unified multilingual translation model for all languages to be translated. In this approach, a multilingual translation model supports translation of all languages to be translated. However, thousands of languages exist in the world, and constructing a unified multi-language translation model for all languages causes interactions between different languages, thereby reducing the translation quality of the multi-language translation model. Therefore, in the scenario of multi-language translation, how to improve the accuracy of multi-language translation is an important issue to be solved in the industry. Disclosure of Invention Aiming at the problems existing in the prior art, the invention provides a training method and device of a translation model, and a translation method and device. The invention provides a training method of a translation model, which comprises the following steps: Acquiring training data of M translation tasks and a preset translation model, wherein M is greater than 1; for training data of any two translation tasks in the M translation tasks and a preset translation model, determining target affinity between the any two translation tasks, wherein the target affinity is obtained by training the preset model based on the any two translation tasks; clustering each translation task based on each target affinity to obtain a plurality of target task clusters; Training a target translation model corresponding to each target task cluster based on training data of each translation task in each target task cluster. Optionally, the determining the target affinity between any two translation tasks of the M translation tasks according to the training data of the any two translation tasks and the preset translation model includes: Inputting training data of a first translation task in the any two translation tasks into the preset translation model to obtain a first loss function value of the preset translation model at a time t aiming at the first translation task; Updating the preset translation model based on training data of a second translation task in the any two translation tasks, and inputting the training data of the first translation task into the updated preset translation model to obtain a second loss function value of the updated preset translation model at the time t+1 for the first translation task; calculating the difference between the first loss function value and the second loss function value, and determining the affinity between the first translation task and the second translation task at the time t; And under the condition that the preset translation model reaches a convergence condition, determining the target affinity based on affinities between the first translation task and the second translation task at all moments. Optionally, clustering each of the translation tasks based on each of the target affinities to obtain a plurality of target task clusters, including: determining an auxiliary translation task corresponding to each translation task based on each target affinity, wherein the target affinity between the auxiliary translation task and the translation task is greater than 0; clustering each translation task and auxiliary translation tasks corresponding to each translation task to obtain M initial task clusters; Clustering is carried out on each initial task cluster, and a plurality of target task clusters are obtained. Optionally, the clustering processing is performed on each initial task cluster to obtain a plurality of target task clusters, including: For any two initial task clusters in the M initial task clusters, calculating the overall affinity between the any two initial task clusters; and carrying out clustering treatment on the two initial task clusters with the maximum overall affinity in a circulating way until a p