WO-2026095467-A1 - ELECTRONIC DEVICE AND METHOD FOR IMPROVING VOICE RECOGNITION RATE OF AUDIO SIGNAL

WO2026095467A1WO 2026095467 A1WO2026095467 A1WO 2026095467A1WO-2026095467-A1

Abstract

An electronic device according to various embodiments of the present document may comprise at least one processor and a memory. The memory may store instructions which may be executed by the at least one processor and, when executed, instruct the electronic device to: acquire a first microphone signal and a second microphone signal, both of which include a user voice; generate a first mixed signal by mixing, on the basis of a first mixed parameter, a first input signal which is generated on the basis of the first microphone signal and a second input signal which is generated on the basis of the second microphone signal; generate a noise-canceled signal by canceling noise from the first mixed signal; generate a second mixed signal by mixing, on the basis of a second mixed parameter, the noise-canceled signal and the first mixed signal; and output the second mixed signal. Various other embodiments are possible.

Inventors

KIM, MINSEUNG
KIM, BOSUNG
KIM, GEEYEUN
MOON, Hangil
BANG, KYOUNGHO
BAEK, Soonho
LEE, GUNWOO

Assignees

삼성전자 주식회사

Dates

Publication Date: 20260507
Application Date: 20251021
Priority Date: 20241028

Claims (15)

In electronic devices, At least one processor; and It includes memory for storing multiple instructions, When the above plurality of instructions are executed individually or collectively by the at least one processor, the electronic device, Acquire a first microphone signal and a second microphone signal including the user's voice, A first input signal generated based on the first microphone signal and a second input signal generated based on the second microphone signal are mixed based on a first mixing parameter to generate a first mixing signal, and A noise removal signal is generated by removing noise from the first mixing signal above, and The noise removal signal and the first mixing signal are mixed based on the second mixing parameter to generate a second mixing signal, and An electronic device that outputs the above second mixing signal.
In Article 1, When the above plurality of instructions are executed individually or collectively by the at least one processor, the electronic device, An electronic device that performs the operation of generating the first mixing signal and/or the operation of generating the second mixing signal when a real-time translation function is running during a voice call.
In Article 1, When the above plurality of instructions are executed individually or collectively by the at least one processor, the electronic device, Further acquire a third microphone signal including the user's voice, and A beamforming signal is generated by applying beamforming to the second microphone signal and the third microphone signal, and The beamforming signal and the second microphone signal are mixed based on the third mixing parameter to generate a third mixing signal, and An electronic device that generates the first mixing signal by mixing the first input signal and the third mixing signal based on the first mixing parameter.
In Paragraph 3, When the above plurality of instructions are executed individually or collectively by the at least one processor, the electronic device, An electronic device that applies beamforming after performing a predetermined preprocessing process on a signal obtained by short-time Fourier transforming the second microphone signal and the third microphone signal.
In Article 1, When the above plurality of instructions are executed individually or collectively by the at least one processor, the electronic device, An electronic device that generates the first input signal by performing a predetermined preprocessing process on a signal obtained by performing a short-time Fourier transform on the first microphone signal.
In Article 1, When the above plurality of instructions are executed individually or collectively by the at least one processor, the electronic device, An electronic device that inputs the first mixing signal into a machine learning model and generates the noise removal signal as the output value of the machine learning model.
In Article 1, The first mixing parameter above includes a first weight applied to the first input signal and a second weight applied to the second input signal, and An electronic device in which the higher the intensity of noise detected in the first microphone signal and/or the second microphone signal, the lower the first weight and the higher the second weight.
In Article 1, An electronic device in which at least one of the first mixing parameter, the second mixing parameter, or the third mixing parameter is determined through a parameter tuning model.
In Paragraph 8, The above parameter tuning model is, To determine at least one of the first mixing parameter, the second mixing parameter, or the third mixing parameter, a plurality of parameters are applied to a test vector having a predetermined noise intensity to measure the speech recognition rate and speech quality, and An electronic device that selects one of the plurality of parameters based on the above-mentioned measured voice recognition rate and voice quality.
In Article 9, The above parameter tuning model is, An electronic device that calculates a weighted average of the measured voice recognition rate and voice quality based on weights determined by user selection, and selects the applied parameter when the weighted average value among the plurality of parameters is calculated to be the highest.
In Article 1, An electronic device in which the first microphone signal comprises a signal acquired by an internal microphone, and the second microphone signal comprises a signal acquired by an external microphone.
In Article 1, The above electronic device is, It is an audio device that can be worn by the user, and An electronic device comprising a first microphone that contacts the user's ear when worn by the user and acquires the first microphone signal, and a second microphone that is positioned opposite the user's ear when worn by the user and acquires the second microphone signal.
In Article 1, The above electronic device is, It is a mobile device that provides voice calls, and An electronic device comprising a communication circuit for receiving the first microphone signal and the second microphone signal from an external audio device.
In a method performed by an electronic device, An operation of acquiring a first microphone signal and a second microphone signal including the user's voice; An operation to generate a first mixing signal by mixing a first input signal generated based on the first microphone signal and a second input signal generated based on the second microphone signal based on a first mixing parameter; An operation to generate a noise-removed signal by removing noise from the first mixing signal above; An operation to generate a second mixing signal by mixing the noise removal signal and the first mixing signal based on a second mixing parameter; and A method including the operation of outputting the second mixing signal.
In a computer-readable non-transient recording medium, An operation of acquiring a first microphone signal and a second microphone signal including the user's voice; An operation to generate a first mixing signal by mixing a first input signal generated based on the first microphone signal and a second input signal generated based on the second microphone signal based on a first mixing parameter; An operation to generate a noise-removed signal by removing noise from the first mixing signal above; An operation to generate a second mixing signal by mixing the noise removal signal and the first mixing signal based on a second mixing parameter; and A recording medium storing instructions for outputting the second mixing signal.

Description

Method for improving the speech recognition rate of electronic devices and audio signals This document relates to an electronic device and, for example, to a method for improving the voice recognition rate of an audio signal including a user voice acquired through a microphone. Mobile devices, such as smartphones, can perform audio functions, such as voice calls, by utilizing external audio devices capable of providing audio input and output capabilities. During voice calls, background noise can degrade voice quality and clarity. Therefore, voice enhancement technology is required that can preserve the voice as much as possible while eliminating noise acquired alongside the user's voice signal. When a mobile device is conducting a voice call through an external audio device, consideration of the environmental and structural conditions of the external audio device may be necessary. Mobile devices can provide voice recognition capabilities during voice calls. For example, a function may be required to extract the user's voice from an incoming audio signal and convert it into text information, such as when performing real-time translation during a call. As such, when a mobile device requires real-time voice recognition during a voice call, it is necessary to process the voice signal in a way that improves the voice recognition rate while providing stable call quality. FIG. 1 is a block diagram of an electronic device in a network environment according to various embodiments. FIG. 2 illustrates a mobile device and an audio device according to one embodiment. FIG. 3 is a block diagram of a mobile device according to one embodiment. FIG. 4 is a block diagram of an audio device according to one embodiment. FIG. 5 is a block diagram showing the process of an electronic device processing an audio signal according to one embodiment. FIG. 6 is a block diagram showing the process of an electronic device processing an audio signal according to one embodiment. FIG. 7 is a flowchart of a method for enhancing a voice signal of an electronic device according to one embodiment. FIG. 8 illustrates models for determining mixing parameters according to one embodiment. FIGS. 9A, FIGS. 9B, and FIGS. 9C illustrate a foldable device according to one embodiment. FIGS. 10a, FIGS. 10b, FIGS. 10c and FIGS. 10d illustrate a multi-foldable device according to one embodiment. Hereinafter, embodiments of the present disclosure are described in detail with reference to the drawings so that those skilled in the art can easily practice them. However, the present disclosure may be embodied in various different forms and is not limited to the embodiments described herein. In relation to the description of the drawings, the same or similar reference numerals may be used for identical or similar components. Furthermore, in the drawings and related descriptions, descriptions of well-known functions and configurations may be omitted for clarity and brevity. FIG. 1 is a block diagram of an electronic device (101) in a network environment (100) according to various embodiments. Referring to FIG. 1, in a network environment (100), an electronic device (101) may communicate with an electronic device (102) through a first network (198) (e.g., a short-range wireless communication network) or with at least one of an electronic device (104) or a server (108) through a second network (199) (e.g., a long-range wireless communication network). According to one embodiment, the electronic device (101) may communicate with the electronic device (104) through a server (108). According to one embodiment, the electronic device (101) may include a processor (120), memory (130), input module (150), sound output module (155), display module (160), audio module (170), sensor module (176), interface (177), connection terminal (178), haptic module (179), camera module (180), power management module (188), battery (189), communication module (190), subscriber identification module (196), or antenna module (197). In some embodiments, at least one of these components (e.g., connection terminal (178)) may be omitted from the electronic device (101), or one or more other components may be added. In some embodiments, some of these components (e.g., sensor module (176), camera module (180), or antenna module (197)) may be integrated into a single component (e.g., display module (160)). The processor (120) can control at least one other component (e.g., a hardware or software component) of the electronic device (101) connected to the processor (120) by executing software (e.g., a program (140)), and can perform various data processing or operations. According to one embodiment, as at least part of the data processing or operations, the processor (120) can store commands or data received from other components (e.g., a sensor module (176) or a communication module (190)) in volatile memory (132), process the commands or data stored in volatile memory (132), and store the resulting