CN-121981129-A - Real-time speech translation method, device, equipment and storage medium

CN121981129ACN 121981129 ACN121981129 ACN 121981129ACN-121981129-A

Abstract

The application relates to the technical field of real-time translation and discloses a real-time voice translation method, device, equipment and storage medium, comprising the steps of carrying out stream recognition on received real-time voice to obtain a recognition text; the method comprises the steps of matching an identification text with source terms in each memory unit, determining target terms corresponding to the identification text according to a matching result, obtaining a translation result of the identification text, replacing the translation terms corresponding to the source terms in the translation result with the target terms, obtaining a replaced translation text, and outputting the translation text. Through carrying out streaming recognition on real-time voice and carrying out real-time matching on the recognition text and the source term in the memory unit, the user-defined target term is adopted to replace the corresponding part in the general translation result, the translation preference of the user is ensured on the basis of guaranteeing the translation instantaneity, and the consistency and individuation adaptation of term translation are obviously improved.

Inventors

CHEN QIANG
AN KANG
TAO HAIPENG
Tong ziwei
HUANG HONGHONG
WANG MINGHUI
QIN WANLI

Assignees

歌尔股份有限公司

Dates

Publication Date: 20260505
Application Date: 20251225

Claims (10)

1. A real-time speech translation method, characterized in that the real-time speech translation method comprises: Performing stream recognition on the received real-time voice to obtain a recognition text; Matching the identification text with the source terms in each memory unit, and determining a target term corresponding to the identification text according to a matching result; acquiring a translation result of the identification text; and replacing the translation term corresponding to the source term in the translation result with the target term to obtain a replaced translation text, and outputting the translation text.
2. The method for real-time speech translation according to claim 1, wherein said step of determining the target term corresponding to said recognized text based on the matching result comprises: Responding to the matching result as successful matching, and acquiring context labels corresponding to the source terms which are successfully matched from all memory units stored in the memory model, wherein the source terms correspond to different candidate terms under different context labels; determining the domain label of the identification text according to each context label; And determining the target term corresponding to the identification text from the candidate terms according to the domain label.
3. The method of real-time speech translation according to claim 2, wherein said step of determining a domain label of said recognized text from each of said context labels comprises: Acquiring confidence weight of each context label; calculating the comprehensive score of each context label according to the number of the context labels and the corresponding confidence weights; and determining the context label with the highest comprehensive score as the domain label of the identification text.
4. The method for real-time speech translation according to claim 1, wherein said step of determining the target term corresponding to said recognized text based on the matching result comprises: Responding to the matching result as the matching failure, matching the identification text with the context labels in the memory units, and selecting undetermined terms needing to be replaced from the identification text according to the matching result; calculating semantic similarity between the undetermined term and each source term; and under the condition that the semantic similarity is larger than a preset similarity threshold, taking the candidate term corresponding to the source term as the target term of the identification text.
5. The method for real-time speech translation according to claim 1, wherein said step of obtaining a translation result of said recognized text comprises: acquiring domain adaptation attributes of each candidate translation model; Determining a translation model matched with the domain label of the identification text from the candidate translation models according to the domain adaptation attribute; and translating the identification text according to the translation model to obtain a translation result.
6. The method according to any one of claims 1 to 5, wherein after the step of replacing the translation term corresponding to the source term in the translation result with the target term and outputting the replaced translation text, further comprising: receiving a correction instruction fed back by a user aiming at the translation text, wherein the correction instruction is used for indicating correction of a specific term in the translation text; Updating the memory unit corresponding to the source term in response to the source term corresponding to the specific term existing in the memory unit; setting a memory unit under the field label of the identification text according to the correction instruction in response to the fact that the source term corresponding to the specific term does not exist in the memory unit; And correcting the translation result of the specific term contained in the history translation text.
7. The method for real-time speech translation according to claim 6, wherein after said step of correcting the translation result of the history translation text containing said specific term, further comprising: detecting whether the updated memory unit has conflict, wherein the conflict is different candidate terms corresponding to the same source term under the condition that the context labels are consistent; If the conflict exists, the use frequency and the time stamp of each memory unit are obtained; Determining the confidence priority of each memory unit according to the use frequency; and determining the memory unit to be deleted according to the confidence level priority and/or the time stamp.
8. A real-time speech translation apparatus, the apparatus comprising: The text recognition module is used for carrying out streaming recognition on the received real-time voice to obtain a recognition text; the term determining module is used for matching the identification text with the source term in each memory unit and determining a target term corresponding to the identification text according to a matching result; A text translation module for obtaining the translation result of the identification text; and the translation output module is used for replacing the translation term corresponding to the source term in the translation result with the target term and outputting the replaced translation text.
9. A real-time speech translation apparatus, characterized in that the apparatus comprises a memory, a processor and a computer program stored on the memory and executable on the processor, the computer program being configured to implement the steps of the real-time speech translation method according to any one of claims 1 to 7.
10. A storage medium, characterized in that the storage medium is a computer-readable storage medium, on which a computer program is stored, which computer program, when being executed by a processor, carries out the steps of the real-time speech translation method according to any one of claims 1 to 7.

Description

Real-time speech translation method, device, equipment and storage medium Technical Field The present application relates to the field of real-time translation technologies, and in particular, to a real-time speech translation method, device, apparatus, and storage medium. Background Existing mainstream machine translation systems cannot remember previously validated or revised translation preferences when processing long dialogs or streaming text, resulting in the same term being potentially again in error in subsequent translations, lacking consistency. Disclosure of Invention The application mainly aims to provide a real-time voice translation method, a device, equipment and a storage medium, and aims to solve the technical problem that the existing mainstream machine translation system cannot remember the translation preference confirmed or corrected before. In order to achieve the above object, the present application provides a real-time speech translation method, which includes: Performing stream recognition on the received real-time voice to obtain a recognition text; Matching the identification text with the source terms in each memory unit, and determining a target term corresponding to the identification text according to a matching result; acquiring a translation result of the identification text; and replacing the translation term corresponding to the source term in the translation result with the target term to obtain a replaced translation text, and outputting the translation text. Optionally, the step of determining the target term corresponding to the identified text according to the matching result includes: Responding to the matching result as successful matching, and acquiring context labels corresponding to the source terms which are successfully matched from all memory units stored in the memory model, wherein the source terms correspond to different candidate terms under different context labels; determining the domain label of the identification text according to each context label; And determining the target term corresponding to the identification text from the candidate terms according to the domain label. Optionally, the step of determining the domain label of the identified text according to each context label includes: Acquiring confidence weight of each context label; calculating the comprehensive score of each context label according to the number of the context labels and the corresponding confidence weights; and determining the context label with the highest comprehensive score as the domain label of the identification text. Optionally, the step of determining the target term corresponding to the identified text according to the matching result includes: Responding to the matching result as the matching failure, matching the identification text with the context labels in the memory units, and selecting undetermined terms needing to be replaced from the identification text according to the matching result; calculating semantic similarity between the undetermined term and each source term; and under the condition that the semantic similarity is larger than a preset similarity threshold, taking the candidate term corresponding to the source term as the target term of the identification text. Optionally, the step of obtaining the translation result of the recognition text includes: acquiring domain adaptation attributes of each candidate translation model; Determining a translation model matched with the domain label of the identification text from the candidate translation models according to the domain adaptation attribute; and translating the identification text according to the translation model to obtain a translation result. Optionally, after the step of replacing the translation term corresponding to the source term in the translation result with the target term and outputting the replaced translation text, the method further includes: receiving a correction instruction fed back by a user aiming at the translation text, wherein the correction instruction is used for indicating correction of a specific term in the translation text; Updating the memory unit corresponding to the source term in response to the source term corresponding to the specific term existing in the memory unit; setting a memory unit under the field label of the identification text according to the correction instruction in response to the fact that the source term corresponding to the specific term does not exist in the memory unit; And correcting the translation result of the specific term contained in the history translation text. Optionally, after the step of correcting the translation result including the specific term in the history translation text, the method further includes: detecting whether the updated memory unit has conflict, wherein the conflict is different candidate terms corresponding to the same source term under the condition that the context labels are consistent; If the conflict e