US-12621620-B2 - Sound signal downmix method, sound signal coding method, sound signal downmix apparatus, sound signal coding apparatus, program
Abstract
A sound signal downmixing method includes a step of obtaining, for each of two channels, a signal obtained by adding an input sound signal of one channel to a signal obtained by delaying an input sound signal of the other channel and multiplying the delayed input sound signal by a weight value as a delayed crosstalk-added signal of the one channel, a step of obtaining preceding channel information and a left-right correlation value, and step of obtaining a downmix signal by performing weighted addition on the input sound signals of the two channels based on the left-right correlation value and the preceding channel information such that more of a signal derived from an input sound signal of a preceding channel among the signals derived from the input sound signals of the two channels is included as the left-right correlation value becomes larger.
Inventors
- Takehiro Moriya
- Yutaka Kamamoto
- Ryosuke SUGIURA
Assignees
- NTT, INC.
Dates
- Publication Date
- 20260505
- Application Date
- 20210901
Claims (10)
- 1 . A sound signal downmixing method for obtaining a downmix signal that is a monaural sound signal from input sound signals of two channels, the method comprising: a delayed crosstalk addition step of obtaining, for each of the two channels, a signal obtained by adding an input sound signal of one channel to a signal obtained by delaying an input sound signal of the other channel and multiplying the delayed input sound signal by a weight value that is a predetermined value having an absolute value smaller than 1, as a delayed crosstalk-added signal of the one channel; a left-right relationship information acquisition step of obtaining preceding channel information that is information indicating which of the delayed crosstalk-added signals of the two channels is preceding and a left-right correlation value that is a value indicating a magnitude of correlation between the delayed crosstalk-added signals of the two channels; and a downmixing step of obtaining the downmix signal by performing weighted addition on the input sound signals of the two channels based on the left-right correlation value and the preceding channel information such that more of a signal derived from an input sound signal of a preceding channel among the signals derived from the input sound signals of the two channels is included as the left-right correlation value becomes larger.
- 2 . The sound signal downmixing method according to claim 1 , wherein, in the delayed crosstalk addition step, when the input sound signals of the two channels are respectively a left channel input sound signal and a right channel input sound signal, the delayed crosstalk-added signals of the two channels are respectively a left channel delayed crosstalk-added signal and a right channel delayed crosstalk-added signal, a sample number is t, each sample of the left channel input sound signal is x L (t), each sample of the right channel input sound signal is x R (t), each sample of the left channel delayed crosstalk-added signal is y L (t), each sample of the right channel delayed crosstalk-added signal is y R (t), predetermined positive values are a 1 and a2, and predetermined values having an absolute value smaller than 1 are w 1 and w2, each sample y L (t) of the left channel delayed crosstalk-added signal is obtained by the following expression, and [ Math . 17 ] y L ( t ) = x L ( t ) + w 1 × x R ( t - a 1 ) each sample y R (t) of the right channel delayed crosstalk-added signal is obtained by the following expression, [ Math . 18 ] y R ( t ) = x R ( t ) + w 2 × x L ( t - a 2 ) .
- 3 . The sound signal downmixing method according to claim 1 , wherein, in the delayed crosstalk addition step, when the input sound signals of the two channels are respectively a left channel input sound signal and a right channel input sound signal, the delayed crosstalk-added signals of the two channels are respectively a left channel delayed crosstalk-added signal and a right channel delayed crosstalk-added signal, a frequency number is k, each frequency spectrum sample of a frequency spectrum obtained by performing Fourier transform on the left channel input sound signal for each frame is X L (k), each frequency spectrum sample of a frequency spectrum obtained by performing Fourier transform on the right channel input sound signal for each frame is X R (k), each frequency spectrum sample of the left channel delayed crosstalk-added signal in a frequency domain for each frame is Y L (k), each frequency spectrum sample of the right channel delayed crosstalk-added signal in the frequency domain for each frame is Y R (k), predetermined positive values are a 1 and a 2 , and predetermined values having an absolute value smaller than 1 are w 1 and w 2 , each frequency spectrum sample Y L (k) of the left channel delayed crosstalk-added signal in the frequency domain for each frame is obtained by the following expression, and [ Math . 19 ] Y L ( k ) = X L ( k ) + w 1 × X R ( k ) × e - j 2 a 1 π T k each frequency spectrum sample Y R (k) of the right channel delayed crosstalk-added signal in the frequency domain for each frame is obtained by the following expression, [ Math . 20 ] Y R ( k ) = X R ( k ) + w 2 × X L ( k ) × e - j 2 a 2 π T k .
- 4 . A sound signal encoding method comprising the sound signal downmixing method according to claim 1 as a sound signal downmixing step, wherein the sound signal encoding method further comprises: a monaural encoding step of encoding the downmix signal obtained in the downmixing step to obtain a monaural code; and a stereo encoding step of encoding the input sound signals of the two channels to obtain a stereo code.
- 5 . A non-transitory computer readable medium that stores a program for causing a computer to execute processing of each step of the sound signal encoding method according to claim 4 .
- 6 . A non-transitory computer readable medium that stores a program for causing a computer to execute processing of each step of the sound signal downmixing method according to claim 1 .
- 7 . A sound signal downmixing apparatus for obtaining a downmix signal that is a monaural sound signal from input sound signals of two channels, the sound signal downmixing apparatus comprising processing circuitry configured to: obtain, for each of the two channels, a signal obtained by adding an input sound signal of one channel to a signal obtained by delaying an input sound signal of the other channel and multiplying the delayed input sound signal by a weight value that is a predetermined value having an absolute value smaller than 1, as a delayed crosstalk-added signal of the one channel; obtain preceding channel information that is information indicating which of the delayed crosstalk-added signals of the two channels is preceding and a left-right correlation value that is a value indicating a magnitude of correlation between the delayed crosstalk-added signals of the two channels; and obtain the downmix signal by performing weighted addition on the input sound signals of the two channels based on the left-right correlation value and the preceding channel information such that more of a signal derived from an input sound signal of a preceding channel among the signals derived from the input sound signals of the two channels is included as the left-right correlation value becomes larger.
- 8 . The sound signal downmixing apparatus according to claim 7 , wherein, in the processing circuitry, when the input sound signals of the two channels are respectively a left channel input sound signal and a right channel input sound signal, the delayed crosstalk-added signals of the two channels are respectively a left channel delayed crosstalk-added signal and a right channel delayed crosstalk-added signal, a sample number is t, each sample of the left channel input sound signal is x L (t), each sample of the right channel input sound signal is x R (t), each sample of the left channel delayed crosstalk-added signal is y L (t), each sample of the right channel delayed crosstalk-added signal is y R (t), predetermined positive values are a 1 and a 2 , and predetermined values having an absolute value smaller than 1 are w 1 and w 2 , each sample y L (t) of the left channel delayed crosstalk-added signal is obtained by the following expression, and [ Math . 21 ] y L ( t ) = x L ( t ) + w 1 × x R ( t - a 1 ) each sample y R (t) of the right channel delayed crosstalk-added signal is obtained by the following expression, [ Math . 22 ] y R ( t ) = x R ( t ) + w 2 × x L ( t - a 2 ) .
- 9 . The sound signal downmixing apparatus according to claim 7 , wherein, in the processing circuitry, when the input sound signals of the two channels are respectively a left channel input sound signal and a right channel input sound signal, the delayed crosstalk-added signals of the two channels are respectively a left channel delayed crosstalk-added signal and a right channel delayed crosstalk-added signal, a frequency number is k, each frequency spectrum sample of a frequency spectrum obtained by performing Fourier transform on the left channel input sound signal for each frame is X L (k), each frequency spectrum sample of a frequency spectrum obtained by performing Fourier transform on the right channel input sound signal for each frame is X R (k), each frequency spectrum sample of the left channel delayed crosstalk-added signal in a frequency domain for each frame is Y L (k), each frequency spectrum sample of the right channel delayed crosstalk-added signal in the frequency domain for each frame is Y R (k), predetermined positive values are a 1 and a 2 , and predetermined values having an absolute value smaller than 1 are w 1 and w 2 , each frequency spectrum sample Y L (k) of the left channel delayed crosstalk-added signal in the frequency domain for each frame is obtained by the following expression, and [ Math . 23 ] Y L ( k ) = X L ( k ) + w 1 × X R ( k ) × e - j 2 a 1 π T k each frequency spectrum sample Y R (k) of the right channel delayed crosstalk-added signal in the frequency domain for each frame is obtained by the following expression, [ Math . 24 ] Y R ( k ) = X R ( k ) + w 2 × X L ( k ) × e - j 2 a 2 π T k .
- 10 . A sound signal encoding apparatus comprising the sound signal downmixing apparatus according to claim 7 , wherein the sound signal encoding apparatus further comprises processing circuitry configured to: encode the downmix signal obtained by the downmixing unit to obtain a monaural code; and encode the input sound signals of the two channels to obtain a stereo code.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS This application is a U.S. 371 Application of International Patent Application No. PCT/JP2021/032080, filed on 1 Sep. 2021, the disclosure of which is hereby incorporated herein by reference in its entirety. TECHNICAL FIELD The present invention relates to a technique for obtaining a monaural sound signal from a two-channel sound signal in order to encode the sound signal in monaural, encode the sound signal by using both monaural encoding and stereo encoding, process the sound signal in monaural, or perform signal processing using a monaural sound signal for a stereo sound signal. BACKGROUND ART As a technique for obtaining a monaural sound signal from a two-channel sound signal and embedded encoding/decoding the two-channel sound signal and the monaural sound signal, there is a technique of Patent Literature 1. Patent Literature 1 discloses a technique for obtaining a monaural signal by averaging an input left channel sound signal and an input right channel sound signal for each corresponding sample, encoding (monaural encoding) the monaural signal to obtain a monaural code, decoding (monaural decoding) the monaural code to obtain a monaural local decoded signal, and encoding a difference (prediction residual signal) between the input sound signal and a prediction signal obtained from the monaural local decoded signal for each of the left channel and the right channel. In the technique of Patent Literature 1, for each channel, a signal obtained by delaying a monaural local decoded signal and giving an amplitude ratio is used as a prediction signal, and a prediction signal having a delay and an amplitude ratio that minimize an error between an input sound signal and the prediction signal is selected or a prediction signal having a delay and an amplitude ratio that maximize cross-correlation between the input sound signal and the monaural local decoded signal is used to subtract the prediction signal from the input sound signal to obtain a prediction residual signal, and the prediction residual signal is set as an encoding/decoding target, thereby suppressing sound quality deterioration of the decoded sound signal of each channel. CITATION LIST Patent Literature Patent Literature 1: WO 2006/070751 A SUMMARY OF INVENTION Technical Problem In the technique of Patent Literature 1, the coding efficiency of each channel can be improved by optimizing the delay and the amplitude ratio given to the monaural local decoded signal when obtaining the prediction signal. However, in the technique of Patent Literature 1, the monaural local decoded signal is obtained by encoding and decoding a monaural signal obtained by averaging a left channel sound signal and a right channel sound signal. That is, the technique of Patent Literature 1 has a problem that it is not devised to obtain a monaural signal useful for signal processing such as encoding processing from a two-channel sound signal. An object of the present invention is to provide a technique for obtaining a monaural signal useful for signal processing such as encoding processing from a two-channel sound signal. Solution to Problem One aspect of the present invention is a sound signal downmixing method for obtaining a downmix signal that is a monaural sound signal from input sound signals of two channels, the method including: a delayed crosstalk addition step of obtaining, for each of the two channels, a signal obtained by adding an input sound signal of one channel to a signal obtained by delaying an input sound signal of the other channel and multiplying the delayed input sound signal by a weight value that is a predetermined value having an absolute value smaller than 1, as a delayed crosstalk-added signal of the one channel; a left-right relationship information acquisition step of obtaining preceding channel information that is information indicating which of the delayed crosstalk-added signals of the two channels is preceding and a left-right correlation value that is a value indicating a magnitude of correlation between the delayed crosstalk-added signals of the two channels; and a downmixing step of obtaining the downmix signal by performing weighted addition on the input sound signals of the two channels based on the left-right correlation value and the preceding channel information such that more of an input sound signal of a preceding channel among the input sound signals of the two channels is included as the left-right correlation value becomes larger. One aspect of the present invention is a sound signal encoding method including the above sound signal downmixing method as a sound signal downmixing step, in which the sound signal encoding method includes: a monaural encoding step of encoding the downmix signal obtained in the downmixing step to obtain a monaural code; and a stereo encoding step of encoding the input sound signals of the two channels to obtain a stereo code. Advantageous Effects of Invention Accor