CN-121997952-A - Document translation method, device, electronic equipment and storage medium

CN121997952ACN 121997952 ACN121997952 ACN 121997952ACN-121997952-A

Abstract

The disclosure provides a document translation method, a document translation device, electronic equipment and a storage medium. The method comprises the steps of obtaining a first document to be translated, wherein the first document comprises a first text and format information used for describing the first text, extracting the first text, indicating a first model to translate the first text to obtain a first translated text, indicating the first model to reconstruct a first label based on the format information, inserting the first label into the first translated text to obtain a translated document corresponding to the first document, and displaying the translated document corresponding to the first document. Thus, high quality tagged document translation is achieved.

Inventors

Wang Nijianqiao
ZHANG HONG
WU GUOHUA
YANG WENHAI

Assignees

北京字跳网络技术有限公司

Dates

Publication Date: 20260508
Application Date: 20260128

Claims (10)

1. A document translation method, comprising: Acquiring a first document to be translated, wherein the first document comprises a first text and format information for describing the first text; extracting the first text, and indicating a first model to translate the first text to obtain a first translated text; Indicating the first model to reconstruct a first label based on the format information, and inserting the first label into the first translation text to obtain a translation document corresponding to the first document; and displaying the translation document corresponding to the first document.
2. The method of claim 1, wherein the extracting the first text and instructing a first model to translate the first text to obtain a first translated text comprise: responding to the first document meeting a first condition, extracting the first text, and indicating a first model to translate the first text to obtain a first translated text; The first condition comprises that the length of the first document is larger than or equal to a preset first length threshold value or the duty ratio of the format information in the first document is larger than or equal to a preset first duty ratio threshold value.
3. The method according to claim 1, wherein the method further comprises: generating first prompt information in response to the first document meeting a second condition, wherein the first prompt information comprises the first document and a first instruction, and the first instruction is used for indicating to translate the first document; inputting the first prompt information to the first model to obtain a translation document corresponding to the first document; the second condition comprises that the length of the first document is smaller than a preset length threshold value or the ratio of the format information in the first document is smaller than a preset first ratio threshold value.
4. The method of claim 1, wherein the first model is trained by: Instructing a second model to translate a second text in a second document to obtain a second translated text, and instructing the second model to insert a second tag in the second document into a reference translated text of the second text to obtain a second translated document, wherein the second tag is used for describing the second text; Updating parameters of the second model based on the second translation text, the second translation document and a reference translation document of the second document to obtain a third model; Instructing the third model to translate a third document to obtain a third translated document, wherein the third document comprises third text and a third label for describing the third text; And updating parameters of the third model based on the third translation document and the reference translation document of the third document to obtain the first model.
5. The method of claim 4, wherein the second document is obtained by: Word segmentation is carried out on the second text to obtain a plurality of word elements; Inserting the second label among the plurality of word elements to obtain a fourth document; Verifying the fourth document based on a preset first verification strategy; and in response to the fourth document passing the verification, determining the fourth document as the second document.
6. The method of claim 5, wherein inserting the second tag between the plurality of tokens results in a fourth document, comprising: Determining key phrases in the second text based on the weight of each word element in the second text; Analyzing the syntactic dependency relationship of the second text to obtain a syntactic analysis tree, wherein the syntactic analysis tree is used for describing the syntactic relationship among different sentence components in the second text; determining a first position in the second text where the second tag is to be inserted based on the position of the key phrase in the parse tree; And inserting the second label into the first position to obtain the fourth document.
7. The method of claim 4, wherein the reference translation document of the third document is obtained by: Indicating the third model to translate the third text to obtain a third translated text; instructing the third model to insert the third label into the third translation text to obtain a fifth translation document; Verifying the fifth translation document based on a preset second verification policy; in response to the fifth translation document passing the verification, the fifth translation document is determined to be a reference translation document for the third document.
8. A document translation apparatus, comprising: The system comprises an acquisition module, a translation module and a translation module, wherein the acquisition module is used for acquiring a first document to be translated, and the first document comprises a first text and format information for describing the first text; The first translation module is used for extracting the first text, instructing a first model to translate the first text to obtain a first translation text, instructing the first model to reconstruct a first label based on the format information, and inserting the first label into the first translation text to obtain a translation document corresponding to the first document; And the display module is used for displaying the translation document corresponding to the first document.
9. An electronic device, comprising: A processor; a memory for storing the processor-executable instructions; wherein the processor is configured to execute the instructions to implement the document translation method of any one of claims 1 to 7.
10. A computer readable storage medium, wherein instructions in the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the document translation method of any one of claims 1 to 7.

Description

Document translation method, device, electronic equipment and storage medium Technical Field The present document relates to the field of natural language processing technologies, and in particular, to a document translation method, a device, an electronic device, and a storage medium. Background Document translation is a typical task in the field of natural language processing technology and faces some challenges. When translating documents that contain tags, such as hypertext markup language (Hyper Text Markup Language, HTML) documents, lightweight markup language (e.g., markdown) documents, documents that contain custom anchor tags, etc., these tags often interfere with the semantic understanding of the document, thereby affecting translation quality. Disclosure of Invention An object of the embodiments of the present specification is to provide a document translation method, apparatus, electronic device, and storage medium for implementing high-quality tagged document translation. In order to achieve the above object, the embodiment of the present specification adopts the following technical solutions: In a first aspect, a document translation method is provided, including: Acquiring a first document to be translated, wherein the first document comprises a first text and format information for describing the first text; extracting the first text, and indicating a first model to translate the first text to obtain a first translated text; Indicating the first model to reconstruct a first label based on the format information, and inserting the first label into the first translation text to obtain a translation document corresponding to the first document; and displaying the translation document corresponding to the first document. In a second aspect, there is provided a document translation apparatus comprising: The system comprises an acquisition module, a translation module and a translation module, wherein the acquisition module is used for acquiring a first document to be translated, and the first document comprises a first text and format information for describing the first text; The first translation module is used for extracting the first text, instructing a first model to translate the first text to obtain a first translation text, instructing the first model to reconstruct a first label based on the format information, and inserting the first label into the first translation text to obtain a translation document corresponding to the first document; And the display module is used for displaying the translation document corresponding to the first document. In a third aspect, there is provided an electronic device comprising: A processor; a memory for storing the processor-executable instructions; wherein the processor is configured to execute the instructions to implement the document translation method as provided in the first aspect. In a fourth aspect, there is provided a computer readable storage medium, which when executed by a processor of an electronic device, enables the electronic device to perform the document translation method as provided in the first aspect. The scheme of the embodiment of the specification has the advantages that a two-stage strategy is adopted to translate the first document, in the first stage, the plain text translation capability of the first model is utilized to instruct the first model to translate the first text in the first document to obtain the first translated text, in the second stage, the label reconstruction and insertion capability of the first model are utilized to instruct the first model to reconstruct the first label based on format information in the first document, and the first label is inserted into the first translated text to obtain the translated document corresponding to the first document. Therefore, the method can solve the core problems of label damage, symmetry failure, semantic drift and the like in the traditional translation method, has obvious improvement in the aspects of translation quality, label maintenance, robustness and the like, and provides a complete technical solution for the translation of the labeled document. Drawings The accompanying drawings, which are included to provide a further understanding of the specification, illustrate and explain the exemplary embodiments of the present specification and their description, are not intended to limit the specification unduly. In the drawings: FIG. 1 is a schematic flow chart of a document translation method according to an embodiment of the present disclosure; FIG. 2 is a schematic flow chart of a model training method according to an embodiment of the present disclosure; FIG. 3 is a schematic diagram of a first stage model training process provided in one embodiment of the present disclosure; FIG. 4 is a schematic diagram of a second stage model training process provided by one embodiment of the present disclosure; FIG. 5 is a schematic diagram of a document translation appa