KR-102961992-B1 - Electronic device and Method for processing the audio signal thereof

KR102961992B1KR 102961992 B1KR102961992 B1KR 102961992B1KR-102961992-B1

Abstract

An electronic device and a method for processing an audio signal thereof are provided. The electronic device according to the present disclosure comprises a first processor for preprocessing an input audio signal, a memory for storing the preprocessed audio signal, and a second processor for inputting the preprocessed audio signal to a learned neural network model to obtain mask data for separating the source of the preprocessed audio signal and storing the obtained mask data in the memory. The first processor preprocesses the input audio signal and, after a preset time delay, separates the source of the preprocessed audio signal using the mask data stored in the memory, and postprocesses the audio signal from which the source has been separated.

Inventors

황인우
김완진
김기범
김선민

Assignees

삼성전자주식회사

Dates

Publication Date: 20260508
Application Date: 20201106

Claims (13)

In electronic devices, A first processor that preprocesses an input audio signal; A memory for storing the above-mentioned preprocessed audio signal; and A second processor that inputs the preprocessed audio signal into a trained neural network model to obtain mask data for separating the source of the preprocessed audio signal, and stores the obtained mask data in the memory; The above-mentioned first processor is, The above input audio signal is preprocessed, and after a preset time delay, the source of the preprocessed audio signal is separated using mask data stored in the memory, and the audio signal from which the source has been separated is postprocessed. The above preset time is, An electronic device in which the sum of the preprocessing time of the input audio signal, the postprocessing time of the audio signal from which the source is separated, and the preset time is determined to be smaller than the single audio frame time.
delete
delete
In paragraph 1, The above-mentioned first processor is, An electronic device for separating the source of the n+1th audio frame using mask data obtained from the nth or the previous nth audio frame.
In paragraph 1, The above-mentioned first processor is, An electronic device that delays for a predetermined time by performing operations for a predetermined time without generating a valid result value based on a predetermined command.
In paragraph 1, An electronic device in which the first processor is a DSP (Digital signal processor) and the second processor is a NPU (Neural Processing Unit).
In paragraph 1, The electronic device in which the first processor, the memory, and the second processor are implemented as a single chip.
In a method for processing audio signals of an electronic device, A step in which a DSP (Digital signal processor) preprocesses the input audio signal; A step of storing the above-mentioned preprocessed audio signal in memory; A step of acquiring mask data for separating the source of the preprocessed audio signal by inputting the preprocessed audio signal into a trained neural network model using an NPU (Neural Processing Unit); A step of storing the acquired mask data in the memory; and The method includes the step of preprocessing the input audio signal and, after a preset time delay, separating the source of the preprocessed audio signal using mask data stored in the memory, and post-processing the audio signal from which the source has been separated. The above preset time is, An audio signal processing method in which the sum of the preprocessing time of the input audio signal, the postprocessing time of the audio signal from which the source is separated, and the preset time is determined to be smaller than the single audio frame time.
delete
delete
In paragraph 8, The above post-processing step is, An audio signal processing method that separates the source of the n+1th audio frame using mask data obtained from the nth or the previous audio frame, and post-processes the audio signal from which the source has been separated.
In paragraph 8, The above DSP is, An audio signal processing method that delays for a predetermined time by performing operations for a predetermined time without generating a valid result value based on a predetermined command.
In paragraph 8, An audio signal processing method in which the above DSP, the above memory, and the above NPU are implemented as a single chip.

Description

Electronic device and Method for processing the audio signal thereof The present disclosure relates to an electronic device and a method for processing an audio signal thereof, and more specifically, to an electronic device capable of separating the source of an audio signal using mask data and a method for processing an audio signal thereof. Audio source separation is one of the representative fields of audio signal processing. In particular, audio source separation is utilized in various applications because it can provide diverse effects such as voice signal preprocessing, call quality improvement, vocal and instrument separation, noise reduction, and speech clarity enhancement. Recently, with the advancement of audio signal processing technologies utilizing machine learning and deep learning, coupled with the mass production of processors with high computational capabilities, audio source separation technologies with enhanced performance have been developed. In particular, technologies for separating audio sources using processors with high computational power, such as Neural Processing Units (NPUs), are being developed in recent years. However, while existing technologies simply utilize an NPU to separate audio sources, technology for separating sources of audio signals transmitted in real time using an NPU has not been developed. Therefore, there is a need to explore the development of methods to process audio signals transmitted in real time using an NPU. FIG. 1 is a block diagram for briefly explaining the configuration of an electronic device according to one embodiment of the present disclosure, FIG. 2 is a drawing for explaining a method for an electronic device to separate an audio source according to one embodiment of the present disclosure. FIG. 3 is a drawing for illustrating an embodiment of separating audio sources without delay operation, FIG. 4 is a drawing for explaining an embodiment of separating an audio source by performing a delay operation according to one embodiment of the present disclosure. FIG. 5 is a flowchart for explaining an audio signal processing method of an electronic device according to one embodiment of the present disclosure, and, FIG. 6 is a block diagram for explaining in detail the configuration of an electronic device according to one embodiment of the present disclosure. The embodiments described herein are subject to various modifications and may have various forms; specific embodiments are illustrated in the drawings and described in detail in the detailed description. However, this is not intended to limit the scope of specific embodiments and should be understood to include various modifications, equivalents, and/or alternatives of the embodiments of the present disclosure. In relation to the description of the drawings, similar reference numerals may be used for similar components. In describing the present disclosure, if it is determined that a detailed description of related known functions or configurations could unnecessarily obscure the essence of the present disclosure, such detailed description is omitted. Additionally, the following embodiments may be modified in various other forms, and the scope of the technical concept of the present disclosure is not limited to the following embodiments. Rather, these embodiments are provided to make the present disclosure more faithful and complete and to fully convey the technical concept of the present disclosure to those skilled in the art. The terms used in this disclosure are used merely to describe specific embodiments and are not intended to limit the scope of the rights. The singular expression includes the plural expression unless the context clearly indicates otherwise. In the present disclosure, expressions such as “have,” “may have,” “include,” or “may include” indicate the presence of such features (e.g., numerical values, functions, actions, or components such as parts) and do not exclude the presence of additional features. In the present disclosure, expressions such as “A or B,” “at least one of A or/and B,” or “one or more of A or/and B” may include all possible combinations of items listed together. For example, “A or B,” “at least one of A and B,” or “at least one of A or B” may refer to cases including (1) at least one A, (2) at least one B, or (3) both at least one A and at least one B. Expressions such as "first," "second," "first," or "second" used in this disclosure may modify various components regardless of order and/or importance, and are used only to distinguish one component from another and do not limit said components. Where it is stated that a certain component (e.g., a first component) is "(operatively or communicatively) coupled with/to" or "connected to" another component (e.g., a second component), it should be understood that the said certain component may be directly connected to the said other component or connected through another component (e.g., a third component). On t