Search

CN-121981131-A - Machine translation method, related device, electronic equipment and storage medium

CN121981131ACN 121981131 ACN121981131 ACN 121981131ACN-121981131-A

Abstract

The application discloses a machine translation method, a related device, electronic equipment and a storage medium, wherein the machine translation method comprises the steps of carrying out element identification based on a source language text to obtain text elements and element types of the text elements in the source language text, determining whether to execute replacement operation on the text elements based on placeholders to obtain a text to be translated based on whether the element types of the text elements are translatable, constructing a first mapping relation based on the placeholders and the text elements replaced by the placeholders, executing translation operation based on the text to be translated to obtain a first target language text, and carrying out text reduction on the placeholders in the first target language text based on the first mapping relation to obtain a second target language text. According to the scheme, the effective protection of the rich text format structure can be realized in the machine translation process.

Inventors

  • ZHANG LINFANG
  • LONG MINGKANG
  • LIU KUN
  • JIAN SHENGQI
  • WU JIANGZHAO

Assignees

  • 合肥智能语音创新发展有限公司

Dates

Publication Date
20260505
Application Date
20251231

Claims (16)

  1. 1. A machine translation method, comprising: Performing element recognition based on a source language text to obtain text elements in the source language text and element categories of the text elements; determining whether to execute substitution operation on the text element based on placeholders based on whether the element category of the text element is characterized as translatable or not to obtain a text to be translated, wherein the placeholders adopted by different text elements respectively when the text elements need to be replaced are different; constructing a first mapping relation based on the placeholder and the text element replaced by the placeholder, and executing translation operation based on the text to be translated to obtain a first target language text, wherein the placeholder is constrained to be maintained as it is in the process of executing the translation operation; And performing text reduction on the placeholders in the first target language text based on the first mapping relation to obtain the second target language text.
  2. 2. The method of claim 1, wherein the translating operation is implemented by a translation big model, and wherein prior to the performing a translating operation based on the text to be translated, the method further comprises: performing term matching on the text to be translated based on a professional term library to obtain original terms in the text to be translated, standard translation of the original terms and term definitions in the applicable field of the standard translation; Obtaining a relevance score of the standard translation based on the semantic similarity of the original term in the context of the text to be translated and the term definition; determining a constraint mode of the original term in the process of executing the translation operation based on the relevance scores of the standard translation; and performing rule injection in a large model instruction of the translation large model based on the constraint mode of the original term.
  3. 3. The method of claim 1, wherein the translating operation is implemented by a translation big model, and wherein prior to the performing a translating operation based on the text to be translated, the method further comprises: performing style extraction based on the text to be translated to obtain target style characteristics, wherein the target style characteristics comprise at least one of passive language frequency, long sentence density, professional term density and syntax complexity; extracting statement pairs conforming to the target style characteristics from the reference translation text as reference statement pairs, wherein the statement pairs comprise source language statements and target language statements; Based on the reference statement pairs, example injection is performed in large model instructions of the translation large model.
  4. 4. The method of claim 1, wherein prior to said performing a translation operation based on said text to be translated to obtain a first target language text, the method further comprises: Based on the text to be translated, carrying out complexity measurement to obtain the prediction complexity of executing the translation operation on the text to be translated; And selecting a preset translation model matched with the prediction complexity from a plurality of preset translation models as a translation big model for executing the translation operation, wherein the plurality of preset translation models comprise a first translation model, a second translation model and a third translation model, the parameter quantity of the first translation model is lower than that of the second translation model, and the parameter quantity of the second translation model is lower than that of the third translation model.
  5. 5. The method of claim 4, wherein the prediction complexity is weighted by a number of complexity metrics including at least one of length complexity, term rareness, syntax complexity, context dependency; And/or each preset translation model is provided with a complexity interval suitable for each preset translation model, and when the predicted complexity is in the complexity interval of the preset translation model, the preset translation model is used as a translation big model for executing the translation operation.
  6. 6. The method of claim 4, wherein in the event that the first translation model is selected as the translation large model, the method further comprises switching to the second translation model as the translation large model to re-perform the translation operation in response to the translation operation failing to perform; And/or, in the case of selecting the second translation model as the translation big model, determining whether to enable the mental chain reasoning of the translation big model to formally execute the translation operation based on the first confidence of initial translation of the text to be translated by the translation big model, switching to the first translation model to execute the translation operation as the translation big model again in response to timeout of execution of the translation operation based on the second translation model, and switching to the third translation model to execute the translation operation as the translation big model again in response to that the translation quality of the translation operation based on the second translation model does not meet the quality requirement; And/or, in the case of selecting the third translation model as the translation big model, switching to the second translation model as the translation big model to re-execute the translation operation in response to the translation operation execution failure, recording an exception log and returning error information in response to re-executing the translation operation execution failure based on the second translation model.
  7. 7. The method of claim 1, wherein the translating operation is implemented by a translation big model, and the performing the translating operation based on the text to be translated results in a first target language text, comprising: Constructing a second mapping relation to a dynamic term set based on the source language terms and the target language terms respectively extracted from the source language paragraphs and the target language paragraphs before and after the translation operation is executed, wherein the term frequency is recorded in the second mapping relation; selecting a global term set which retains the second mapping relation to the translation operation or eliminates the second mapping relation from the dynamic term set based on the term frequency of the second mapping relation outside a sliding window maintained in the process of executing the translation operation, wherein the sliding window moves along with the translation progress of the translation operation; And responding to the translation operation of the new source language paragraph, inquiring the dynamic term set, the global term set and the professional term library according to a preset priority, obtaining a historical translation of the source language terms in the new source language paragraph, and carrying out requirement injection in a large model instruction of the translation large model based on the historical translation of the source language terms in the new source language paragraph.
  8. 8. The method of claim 7, wherein the second mapping relationship further has a second confidence level for the translation of the term recorded therein, wherein after performing the injection of the requirements in the large model of the translation large model based on the historical translation of the source language term in the new source language paragraph, in the case that the model translation of the source language term in the new source language paragraph by the translation large model is detected to be inconsistent with the historical translation, the method further comprises: Replacing the model interpretation of the source language terms in the new source language paragraph with the history interpretation in response to the second confidence level of the history interpretation being higher than an upper limit confidence level; Marking a model interpretation of the source language terms in the new source language paragraph as awaiting manual review in response to the second confidence level of the historical interpretation being between a lower limit confidence level and an upper limit confidence level; And in response to the second confidence level of the historical interpretation being below a lower threshold confidence level, updating the dynamic term set based on the source language terms in the new source language paragraph and the model interpretation thereof.
  9. 9. The method of claim 7, wherein selecting to either preserve the second mapping relationship to a global term set for the translation operation or to reject the second mapping relationship from the dynamic term set based on a term frequency of the second mapping relationship outside a sliding window maintained during execution of the translation operation comprises: Responsive to the term frequency of the second mapping relationship outside the sliding window satisfying a screening condition with respect to a frequency threshold, retaining the second mapping relationship to the global term set; and eliminating the second mapping relation from the dynamic term set in response to the term frequency of the second mapping relation outside the sliding window not meeting the screening condition about the frequency threshold.
  10. 10. The method according to claim 7, wherein the order of the preset priority from high to low is the dynamic term set, the global term set, and the term library, and the querying the dynamic term set, the global term set, and the term library according to the preset priority to obtain the historical interpretation of the source language terms in the new source language paragraph includes: querying whether the historical translation exists in the dynamic term set; in response to the historical translation existing in the dynamic term set, no longer querying the global term set and the term-of-art library; In response to the historical translation not being present in the dynamic term set, continuing to query whether the historical translation is present in the global term set, and if the historical translation is present in the global term set, not querying the technical term library any more, and if the historical translation is not present in the global term set, continuing to query the technical term library.
  11. 11. The method of claim 1, wherein after said performing a translation operation based on said text to be translated results in a first target language text, the method further comprises: Performing translation quality inspection based on the text to be translated and the first target language text to obtain a quality inspection result of a target sub-text in the first target language text, wherein the quality inspection result of the target sub-text comprises the severity of translation problems of the target sub-text and the third confidence of a corrected sub-text of the target sub-text; And selecting a target correction mode applicable to the target sub-text from a plurality of preset correction modes based on the severity and the third confidence corresponding to the target sub-text.
  12. 12. The method of claim 11, wherein the translating operation is implemented by a translation big model, and the number of preset revisions includes performing a replacement operation based on the revised sub-text, re-translating the target sub-text based on the translation big model with reference to the translation problem and a context of the target sub-text, re-translating a plurality of translation problems of the same severity based on the translation big model with assigned weights, and marking as waiting for manual review.
  13. 13. The method according to any one of claims 1 to 12, wherein the placeholders comprise prefix identifications differing from natural language, type identifications characterizing the element categories to which the text elements to be replaced belong, and identity identifications for distinguishing the different placeholders; And/or the translation operation is realized by a translation large model, and in the process of executing the translation operation, a forced rule is injected into large model instructions of the translation large model, wherein the forced rule comprises at least one of allowing the position of the placeholder to be adjusted with a translation but not changed in format, and the placeholder must be left as the place and not translated, deleted or modified; And/or, in the case that the placeholders in the text to be translated and the first target language text do not meet the target conditions, performing a repair operation on the placeholders on the first target language text before executing text reduction, wherein the target conditions comprise that the number and the content of the placeholders are completely consistent.
  14. 14. A machine translation device, comprising: the element identification module is used for carrying out element identification based on the source language text to obtain text elements in the source language text and element categories of the text elements; the element replacement module is used for determining whether to execute replacement operation on the text element based on the placeholders based on whether the element category of the text element is characterized as translatable or not to obtain a text to be translated, wherein the placeholders adopted by different text elements when the text elements need to be replaced are different; The system comprises a construction translation module, a translation module and a translation module, wherein the construction translation module is used for constructing a first mapping relation based on the placeholder and text elements replaced by the placeholder, and executing translation operation based on the text to be translated to obtain a first target language text; and the text reduction module is used for carrying out text reduction on the placeholders in the first target language text based on the first mapping relation to obtain the second target language text.
  15. 15. An electronic device comprising at least a memory and a processor coupled to each other, the memory having at least program instructions stored therein, the processor being configured to execute the program instructions to implement the machine translation method of any one of claims 1 to 13.
  16. 16. A computer readable storage medium, characterized in that program instructions executable by a processor for implementing the machine translation method of any one of claims 1 to 13 are stored.

Description

Machine translation method, related device, electronic equipment and storage medium Technical Field The present application relates to the field of machine translation technologies, and in particular, to a machine translation method, a related device, an electronic device, and a storage medium. Background With the acceleration of globalization process and the expansion of cross-border business, the long document translation requirement in the professional field is growing increasingly. In the vertical fields of law, medical, finance, technical documents, etc., requirements for translation quality include not only language accuracy, but also format preservation of special elements such as markup language in the original document. At present, in the prior art, when a document containing a markup language such as Markdown, HTML (Hyper-Text Markup Language, hypertext markup language) is translated, the problems of label loss, nesting error, position disorder and the like easily occur, namely, effective protection of a rich text format structure is difficult to form during machine translation. In view of this, how to achieve effective protection of rich text format structures during machine translation is a problem to be solved. Disclosure of Invention The application mainly solves the technical problem of providing a machine translation method, a related device, electronic equipment and a storage medium, and can realize effective protection of a rich text format structure in the process of machine translation. In order to solve the technical problems, the first aspect of the application provides a machine translation method, which comprises the steps of carrying out element recognition based on a source language text to obtain text elements and element categories of the text elements in the source language text, determining whether to execute replacement operation on the text elements based on placeholders based on whether the element categories of the text elements are translatable to obtain a text to be translated, wherein placeholders adopted by different text elements when the text elements need to be replaced are different, constructing a first mapping relation based on the placeholders and the text elements replaced by the placeholders, and executing translation operation based on the text to be translated to obtain a first target language text, wherein the placeholders are constrained to be maintained as is in the process of executing translation operation, and carrying out text reduction on the placeholders in the first target language text based on the first mapping relation to obtain a second target language text. In order to solve the technical problems, the second aspect of the application provides a machine translation device which comprises an element identification module, an element replacement module, a construction translation module and a text restoration module, wherein the element identification module is used for carrying out element identification based on a source language text to obtain text elements and element categories of the text elements in the source language text, the element replacement module is used for determining whether to execute replacement operation on the text elements based on placeholders or not based on whether the element categories of the text elements are interpretable to obtain a text to be translated, the placeholders adopted by different text elements are different when the different text elements need to be replaced, the construction translation module is used for constructing a first mapping relation based on the placeholders and the text elements replaced by the placeholders and carrying out translation operation based on the text to be translated to obtain a first target language text, the placeholders are constrained to be maintained in the process of executing the translation operation, and the text restoration module is used for carrying out text restoration on the placeholders in the first target language text based on the first mapping relation to obtain a second target language text. In order to solve the above technical problem, a third aspect of the present application provides an electronic device, at least including a memory and a processor, which are coupled to each other, where at least program instructions are stored in the memory, and the processor is configured to execute the program instructions to implement the machine translation method in the first aspect. In order to solve the above technical problem, a fourth aspect of the present application provides a computer readable storage medium storing program instructions executable by a processor for implementing the machine translation method of the first aspect. In the above scheme, element identification is performed based on a source language text to obtain a text element and an element category of the text element in the source language text, whether a substitution operation is performed