CN-121996190-A - Audio processing method and device, electronic equipment and storage medium

CN121996190ACN 121996190 ACN121996190 ACN 121996190ACN-121996190-A

Abstract

The embodiment of the application discloses an audio processing method, an audio processing device, electronic equipment and a storage medium, wherein the method comprises the steps of acquiring first audio acquired by first electronic equipment; translating the first audio to obtain second audio; and playing the second audio through a second electronic device. According to the method, the audio of the user can be collected through the first electronic equipment, the audio is translated, finally, the translated audio is played through the second electronic equipment, cross-language audio transmission and real-time translation can be achieved, communication among users in different languages is facilitated, meanwhile, the real-time translation can be achieved without wearing headphones by the user, privacy of the user is protected, and the real-time translation is more convenient.

Inventors

NIE DONG

Assignees

广东小天才科技有限公司

Dates

Publication Date: 20260508
Application Date: 20241105

Claims (13)

1. An audio processing method applied to a first electronic device, the method comprising: Acquiring first audio acquired by the first electronic equipment, wherein the first audio comprises audio of a first user with the distance between the head of the user and an acquisition device of the first electronic equipment smaller than or equal to a first threshold value; translating the first audio to obtain second audio; And playing the second audio through second electronic equipment, wherein the first electronic equipment and the second electronic equipment are in communication connection, and the distance between an audio playing device of the second electronic equipment and the head of the user of the second user is smaller than or equal to a second threshold value.
2. The method of claim 1, wherein the first electronic device is a headset and the second electronic device is a wearable device, the first user is a user wearing the headset and the wearable device, and the first user talks with the second user through the second audio; Or the first electronic device is an earphone, the second electronic device is a terminal device, and the first user is a user wearing the earphone and using the terminal device.
3. The method of claim 1, wherein the first electronic device is a wearable device, the second electronic device is a headset, the second user is a user wearing the headset and the wearable device, and the second user talks with the first user through the second audio; or the first electronic device is a terminal device, the second electronic device is an earphone, and the second user is a user wearing the earphone and using the terminal device.
4. The method of claim 1, wherein prior to the acquiring the first audio acquired by the first electronic device, the method further comprises: Acquiring a first voiceprint feature of a first user and/or a second voiceprint feature of a second user; The acquiring the first audio acquired by the first electronic device includes: Acquiring original audio acquired by first electronic equipment; and acquiring the first audio from the original audio according to the first voiceprint feature and/or the second voiceprint feature.
5. The method of claim 4, wherein the obtaining the first audio from the original audio according to the first voiceprint feature and/or the second voiceprint feature comprises: Extracting the audio corresponding to the first voiceprint feature from the original audio, and/or removing the audio corresponding to the second voiceprint feature from the original audio to obtain the first audio.
6. The method according to claim 1 or 5, wherein the translating the first audio to obtain the second audio includes: converting the first audio into first text information through a voice recognition module; Translating the first text information from a first language to second text information corresponding to a second language, wherein the first language is the language used by the first user, and the second language is the language used by the second user; And converting the second text information into the second audio through a voice synthesis module according to the first voiceprint features.
7. The method of claim 6, wherein said converting the second text information to the second audio by a speech synthesis module according to the first voiceprint feature comprises: Converting the second text information into third audio through a voice synthesis module according to the first voiceprint features; Acquiring environmental noise in the original audio; And superposing the third audio and the environmental noise to obtain the second audio.
8. The method of claim 1, wherein the acquiring the first audio acquired by the first electronic device comprises: acquiring original audio acquired by the first electronic equipment; converting the original audio into original text information; Acquiring the association degree of the original text information and the historical text information, wherein the historical text information comprises text information corresponding to first audio or second audio acquired in a historical manner, and the association degree is determined according to the semantics of the text information; And under the condition that the association degree is larger than a preset association degree, determining the original audio as the first audio.
9. The method of claim 1, wherein in the case where the first electronic device is an earphone and the second electronic device is a wearable device or a terminal device, the first electronic device is detachably accommodated in the second electronic device, the method further comprising: and establishing communication connection of the second electronic equipment under the condition that the first electronic equipment is separated from the second electronic equipment.
10. An audio processing apparatus for use with a first electronic device, the apparatus comprising: The audio acquisition module is used for acquiring first audio acquired by the first electronic equipment, wherein the first audio comprises audio of a first user, and the distance between the head of the user and an acquisition device of the first electronic equipment is smaller than or equal to a first threshold value; The audio processing module is used for translating the first audio to obtain second audio; The audio playing module is used for playing the second audio through the second electronic equipment, the first electronic equipment and the second electronic equipment are in communication connection, and the distance between an audio playing device of the second electronic equipment and the user head of the second user is smaller than or equal to a second threshold value.
11. An electronic device comprising a processor and a memory for storing code instructions, the processor for executing the code instructions to perform the method of any one of claims 1 to 9.
12. A computer readable storage medium storing a computer program comprising instructions for implementing the method of any one of claims 1 to 9.
13. A computer program product comprising computer program code for causing a computer to carry out the method according to any one of claims 1 to 9 when the computer program code is run on the computer.

Description

Audio processing method and device, electronic equipment and storage medium Technical Field The present invention relates to the field of audio processing technologies, and in particular, to an audio processing method, an audio processing device, an electronic device, and a storage medium. Background The application range of simultaneous interpretation is very wide, especially in the scene that needs to communicate in real time across languages, when the speaker speaks, the translator or the terminal equipment immediately translates the speaking content into the target language in real time, and along with the development of science and technology, the subject of simultaneous interpretation gradually changes to the electronic equipment by manpower. In the related art, in order to realize simultaneous interpretation, an earphone is often required for a talking party, which makes the simultaneous interpretation process very complicated, and meanwhile, because the talking party needs to wear an earphone which does not belong to the talking party, the problems of privacy and sanitation and the like can be generated, and how to make the simultaneous interpretation more convenient and cause no worry about privacy and sanitation for users is a problem to be solved urgently at present. Disclosure of Invention The embodiment of the application discloses an audio processing method, which is characterized in that a first electronic device is used for collecting audio of a user, translating the audio, and finally playing the translated audio through a second electronic device, so that cross-language audio transmission and real-time translation are realized, users in different languages can communicate conveniently, and meanwhile, the method can realize real-time translation without wearing headphones by the other side, thereby protecting user privacy and enabling real-time translation to be more convenient. The application discloses an audio processing method, which comprises the following steps: Acquiring first audio acquired by the first electronic equipment and applying the first audio to the first electronic equipment, wherein the first audio comprises audio of a first user, and the distance between the head of the user and an acquisition device of the first electronic equipment is smaller than or equal to a first threshold value; translating the first audio to obtain second audio; And playing the second audio through second electronic equipment, wherein the first electronic equipment and the second electronic equipment are in communication connection, and the distance between an audio playing device of the second electronic equipment and the head of the user of the second user is smaller than or equal to a second threshold value. In the technical scheme, the wearing party of the first electronic equipment and the second electronic equipment can input the audio information through the first electronic equipment and translate the audio information into the language which can be understood by the talking party through the second electronic equipment, and meanwhile, the talking party can also input the audio information through the first electronic equipment and translate the audio information into the language which can be understood by the wearing party of the first electronic equipment and the second electronic equipment through the second electronic equipment, so that real-time translation of cross languages is realized, and meanwhile, the talking party does not need to wear electronic equipment which does not belong to the talking party, and the problems of privacy, sanitation and the like are avoided. In a possible implementation manner, the first electronic device is an earphone, the second electronic device is a wearable device, the first user is a user wearing the earphone and the wearable device, and the first user talks with the second user through the second audio; Or the first electronic device is an earphone, the second electronic device is a terminal device, and the first user is a user wearing the earphone and using the terminal device. In the technical scheme, in the scene of the earphone and the wearable equipment or the terminal equipment, the method supports the first user to talk with the second user through the second audio, so that the communication experience between the users is enhanced, and particularly in the scene of multi-equipment use, the user can flexibly select different equipment to interact. In a possible implementation manner, the first electronic device is a wearable device, the second electronic device is a headset, the second user is a user wearing the headset and the wearable device, and the second user talks with the first user through the second audio; or the first electronic device is a terminal device, the second electronic device is an earphone, and the second user is a user wearing the earphone and using the terminal device. In the technical scheme, in the case that the f