CN-116151277-B - Machine translation method and device based on imitation learning and terminal equipment

CN116151277BCN 116151277 BCN116151277 BCN 116151277BCN-116151277-B

Abstract

The invention is applicable to the technical field of machine translation, and provides a machine translation method, a device and terminal equipment based on imitation learning, wherein the method comprises the steps of inputting an original text to be translated into a non-autoregressive model to obtain a first initial translation; when the first initial translation passes through the deletion classifying decoder, deleting the wrong word in the first initial translation and outputting a second initial translation, when the second initial translation passes through the insertion classifying decoder, using the position of the missing word in the second initial translation of the occupation Fu Tianbu to output a third initial translation, and when the third initial translation passes through the word classifying decoder, replacing the occupation character with the correct word and outputting a final translation. The invention integrates the advantages of the autoregressive model and the non-autoregressive model, ensures the quality of the translation and can realize parallel acceleration.

Inventors

ZHU XIANCHAO
HAN BING
Huo Zhanyu

Assignees

四川语言桥信息技术有限公司

Dates

Publication Date: 20260508
Application Date: 20230216

Claims (8)

1. A machine translation method based on imitation learning, comprising: Inputting the original text to be translated into a non-autoregressive model to obtain a first initial translation; when the first initial translation passes through the deletion classification decoder, deleting the error word in the first initial translation and outputting a second initial translation; filling the missing word position in the second initial translation by using a placeholder when the second initial translation passes through the insertion classification decoder, and outputting a third initial translation; When the third initial translation passes through the word classification decoder, the placeholder is replaced by a correct word, and a final translation is output; before inputting the original text to be translated into the non-autoregressive model to obtain the first initial translation, the method comprises the following steps: Acquiring translated text and a reference translation corresponding to the translated text; Training a basic translation model by taking the translated text and the reference translation as training data to obtain the deletion classification decoder, the insertion classification decoder and the word classification decoder; training a basic translation model by taking the translated text and the reference translation as training data to obtain a deletion class decoder, the insertion class decoder and the word class decoder, wherein the method comprises the following steps of: training based on a non-autoregressive model to obtain a baseline translation model; translating the original text to be translated by using the baseline translation model to generate a baseline translation; Processing the baseline translation and the reference translation by an edit distance algorithm to construct classifier data of the deletion class decoder, classifier data of the insertion class decoder, and classifier data of the word class decoder; training the baseline translation model based on a simulation learning algorithm using classifier data of the erasure classification decoder, classifier data of the insertion classification decoder, and classifier data of the word classification decoder to obtain the erasure classification decoder, the insertion classification decoder, and the word classification decoder.
2. The machine translation method according to claim 1, wherein when the translated text and the reference translation are used as training data, the translated text and the reference translation are converted into an original translation pair.
3. The machine translation method based on simulated learning of claim 1, wherein processing said baseline translation and said reference translation by an edit distance algorithm comprises: The shortest distance from the baseline translation to the reference translation is calculated using an edit distance algorithm.
4. The machine translation method based on imitation learning of claim 3, wherein constructing the classifier data of the erasure classification decoder, the classifier data of the insertion classification decoder, comprises: and constructing classifier data of the deletion class decoder and classifier data of the insertion class decoder according to the shortest distance.
5. The machine-translation method based on imitation learning of claim 3, wherein constructing classifier data for the word classification decoder comprises: And constructing classifier data of the word classification decoder according to word replacement from the baseline translation to the reference translation when the shortest distance exists.
6. A machine translation device based on imitation learning, comprising: the initial translation module is used for inputting the original text to be translated into the non-autoregressive model to obtain a first initial translation; The first decoding module is used for deleting the error word in the first initial translation when the first initial translation passes through the deletion classification decoder, and outputting a second initial translation; The second decoding module is used for filling the missing word position in the second initial translation by using a placeholder when the second initial translation passes through the insertion classification decoder, and outputting a third initial translation; The third decoding module is used for replacing the placeholder with a correct word when the third initial translation passes through the word classification decoder and outputting a final translation; before inputting the original text to be translated into the non-autoregressive model to obtain the first initial translation, the method comprises the following steps: Acquiring translated text and a reference translation corresponding to the translated text; Training a basic translation model by taking the translated text and the reference translation as training data to obtain the deletion classification decoder, the insertion classification decoder and the word classification decoder; training a basic translation model by taking the translated text and the reference translation as training data to obtain a deletion class decoder, the insertion class decoder and the word class decoder, wherein the method comprises the following steps of: training based on a non-autoregressive model to obtain a baseline translation model; translating the original text to be translated by using the baseline translation model to generate a baseline translation; Processing the baseline translation and the reference translation by an edit distance algorithm to construct classifier data of the deletion class decoder, classifier data of the insertion class decoder, and classifier data of the word class decoder; training the baseline translation model based on a simulation learning algorithm using classifier data of the erasure classification decoder, classifier data of the insertion classification decoder, and classifier data of the word classification decoder to obtain the erasure classification decoder, the insertion classification decoder, and the word classification decoder.
7. A terminal device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the machine translation method based on imitation learning as claimed in any one of claims 1 to 5 when the computer program is executed.
8. A storage medium being a computer readable storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the steps of the machine translation method based on imitation learning as claimed in any of claims 1 to 5.

Description

Machine translation method and device based on imitation learning and terminal equipment Technical Field The present invention relates to the field of machine translation technologies, and in particular, to a machine translation method, apparatus, and terminal device based on imitative learning. Background Currently, in the machine translation task, a sequence based on a neural network is mostly adopted in the neural network, a generation model adopts an autoregressive mode, and the autoregressive model based on a cyclic neural network RNN or an autoregressive model based on a multi-head autoregressive network transducer is generated word by word when decoding and generating a translation, that is, when generating the next word, the generated last word needs to be input into the model. The translation thus generated has high accuracy, but the generation of the latter word depends on what the former word is, resulting in an autoregressive model, i.e., an AT model, which decodes AT a slower rate. Generally, in order to improve the problem of low decoding speed of the autoregressive model, a non-autoregressive model, i.e., a NAT model, is used to generate the whole translation in parallel, but although the decoding speed is improved, the quality of the generated translation is inferior to that of the translation method based on the autoregressive model. Disclosure of Invention The invention mainly aims to provide a machine translation method, a device and terminal equipment based on imitation learning, which solve the problem that the traditional machine translation cannot consider the translation efficiency of a non-autoregressive model and the translation accuracy of an autoregressive model. To achieve the above object, according to a first aspect of the present invention, there is provided a machine translation method based on imitation learning, comprising: Inputting the original text to be translated into a non-autoregressive model to obtain a first initial translation; when the first initial translation passes through the deletion classification decoder, deleting the error word in the first initial translation and outputting a second initial translation; filling the missing word position in the second initial translation by using a placeholder when the second initial translation passes through the insertion classification decoder, and outputting a third initial translation; And when the third initial translation passes through the word classification decoder, replacing the placeholder with the correct word, and outputting the final translation. With reference to the first aspect of the present invention, in a first embodiment of the present invention, before inputting an original text to be translated into a non-autoregressive model to obtain a first initial translation, the method includes: Acquiring translated text and a reference translation corresponding to the translated text; And training a basic translation model by taking the translated text and the reference translation as training data to obtain the deletion classification decoder, the insertion classification decoder and the word classification decoder. With reference to the first embodiment of the first aspect of the present invention, in a second embodiment of the present invention, when the translated text and the reference translation are used as training data, the translated text and the reference translation are converted into an original translation pair. With reference to the first embodiment of the first aspect of the present invention, in a third embodiment of the present invention, training a basic translation model by using the translated text and the reference translation as training data, to obtain a deletion class decoder, the insertion class decoder, and the word class decoder, including: training based on a non-autoregressive model to obtain a baseline translation model; translating the original text to be translated by using the baseline translation model to generate a baseline translation; Processing the baseline translation and the reference translation by an edit distance algorithm to construct classifier data of the deletion class decoder, classifier data of the insertion class decoder, and classifier data of the word class decoder; training the baseline translation model based on a simulation learning algorithm using classifier data of the erasure classification decoder, classifier data of the insertion classification decoder, and classifier data of the word classification decoder to obtain the erasure classification decoder, the insertion classification decoder, and the word classification decoder. With reference to the third embodiment of the first aspect of the present invention, in a fourth embodiment of the present invention, the processing, by an edit distance algorithm, the baseline translation and the reference translation includes: The shortest distance from the baseline translation to the reference translation