JP-7857393-B2 - Methods for processing audio signals, signal processing units, binaural renderers, audio encoders, and audio decoders.

JP7857393B2JP 7857393 B2JP7857393 B2JP 7857393B2JP-7857393-B2

Inventors

フュグ・シモーネ
プログスティーズ・ヤン

Assignees

フラウンホーファー－ゲゼルシャフト・ツール・フェルデルング・デル・アンゲヴァンテン・フォルシュング・アインゲトラーゲネル・フェライン

Dates

Publication Date: 20260512
Application Date: 20241227
Priority Date: 20130722

Claims (20)

A method for processing an audio signal according to a room impulse response, wherein the room impulse response includes an initial portion and a late reverberation, the late reverberation of the room impulse response is replaced with a composite reverberation, and the method is Processing the audio signal separately using the initial portion of the room impulse response and the synthesized reverberation, wherein processing the audio signal using the synthesized reverberation includes generating a scaled reverberation signal, the scaling of which depends on the audio signal. This includes synthesizing the audio signal processed using the initial portion of the room impulse response with the scaled reverberation signal, The process of generating the scaled reverberation signal is as follows: Setting the gain factor according to a predefined correlation measure of the audio signal, wherein the predefined correlation measure has a fixed value empirically determined based on analysis of a plurality of audio signals, and applying the gain factor to the reverberation signal, or setting the gain factor A method comprising obtaining a gain factor using correlation analysis of the audio signals and applying the gain factor to the reverberation signal .
The method according to claim 1, wherein the scaling depends on the state of one or more input channels of the audio signal.
The method according to claim 2, wherein the state of the one or more input channels of the audio signal includes one or more of the number of input channels, the number of active input channels, and the activity in the input channels.
The method according to claim 2 or 3 , wherein the gain factor is determined based on the state of one or more input channels of the audio signal .
The aforementioned gain factor is determined as follows: g=c u +ρ・(c c −c u ) However, ρ = a predefined or calculated correlation coefficient for the audio signal. c u , c c = Factors indicating the state of one or more input channels of the audio signal, where c u refers to channels that are uncorrelated as a whole, and c c refers to channels that are correlated as a whole. The method according to claim 4 .
c u and c c are determined as follows: However, K in = the number of active or fixed downmix channels. The method according to claim 5 .
The method according to any one of claims 4 to 6 , wherein the gain factor is subjected to low-pass filtering across multiple audio frames.
The aforementioned gain factor is subjected to low-pass filtering as follows: However, t s = time constant of the low-pass filter, ti = audio frame in frame ti, g s = smoothed gain factor, k = frame size, and f s = sampling frequency. The method according to claim 7 .
The method according to any one of claims 1 to 8, wherein the correlation analysis of the audio signal comprises determining a composite correlation measure for the audio frames of the audio signal, the composite correlation measure is calculated by compositing correlation coefficients for multiple channel combinations of a single audio frame, each audio frame comprising one or more time slots.
The method according to claim 9 , wherein synthesizing the correlation coefficients includes averaging the multiple correlation coefficients of one audio frame.
Determining the aforementioned composite correlation measure is (i) Calculate the overall average value for each channel of the one audio frame, (ii) Calculating a zero-average audio frame by subtracting the average value from the corresponding channel, (iii) Calculating the correlation coefficient for multiple channel synthesis, (iv) The method according to claim 9 or 10 , further comprising calculating the composite correlation measure as the average value of a plurality of correlation coefficients.
The respective correlation coefficients for channel synthesis are calculated as follows: However, ρ[m,n] = correlation coefficient, σ(x m [j]) = Standard deviation over one time slot j of channel m, σ(x n [j]) = Standard deviation over one time slot j of channel n, x m , x n = zero mean variable, i∀[1,N] = frequency band, j∀[1,M] = Time slot, m, n∀[1,K] = channel, * = complex conjugate, The method according to any one of claims 9 to 11 .
The method according to any one of claims 1 to 12 , comprising delaying the scaled reverberation signal so that the start of the scaled reverberation signal coincides with the transition point from early reflection to late reverberation in the room impulse response.
The method according to any one of claims 1 to 13 , wherein processing the audio signal using the synthesized reverberation includes downmixing the audio signal and applying the downmixed audio signal to a reverberator.
A computer-readable medium storing instructions for performing the method according to any one of claims 1 to 14 , when being executed by a computer.
An input for receiving audio signals, An initial part processor for processing the received audio signal according to the initial part of a room impulse response, wherein the room impulse response includes the initial part and the later reverberation, and the later reverberation of the room impulse response is replaced by a composite reverberation; A late reverberation processor for processing the received audio signal according to the composite reverberation of the room impulse response, wherein the late reverberation processor is configured to generate a scaled reverberation signal, and the scaling depends on the received audio signal. The system includes an output for combining the processed initial portion of the received audio signal and the scaled reverberation signal into an output audio signal . The aforementioned late reverberation processor is Setting the gain factor according to a predefined correlation measure of the audio signal, wherein the predefined correlation measure has a fixed value empirically determined based on analysis of a plurality of audio signals, and applying the gain factor to the reverberation signal, or setting the gain factor The gain factor is obtained using correlation analysis of the audio signal, and the gain factor is applied to the reverberation signal. A signal processing unit that scales the reverberation signal accordingly .
The signal processing unit according to claim 16, comprising a correlation analyzer that generates the gain factor dependent on the audio signal.
The aforementioned late reverberation processor is A reverber that receives the aforementioned audio signal and generates a reverberation signal, The signal processing unit according to claim 16 or 17 , further comprising a gain stage coupled to the input or output of the reverberation device and controlled by a gain factor.
A low-pass filter coupled to the gain stage, The signal processing unit according to claim 18 , further comprising at least one of the following: a delay element coupled between the gain stage and the adder, wherein the adder is further coupled to the initial subprocessor and the output.
A binaural renderer comprising the signal processing unit according to any one of claims 16 to 19 .

Description

This invention relates to the field of audio coding/decoding, and more particularly to spatial audio coding and spatial audio object coding, such as 3D audio codec systems. Embodiments of this invention relate to a method for processing audio signals according to an indoor impulse response, including a signal processing unit, a binaural renderer, an audio encoder, and an audio decoder. Spatial audio coding tools are well-known in this art and are standardized, for example, in the MPEG surround standard. Spatial audio coding begins with multiple original inputs, e.g., five or seven input channels, which are identified by their arrangement in the playback setup as, for example, left channel, center channel, right channel, left surround channel, right surround channel, and low-frequency extension channel. A spatial audio encoder can derive one or more downmix channels from the original channels and further derive parametric data related to the spatial cue, such as inter-channel level differences, inter-channel phase differences, and inter-channel time differences of channel coherence values. To ultimately obtain output channels that are approximate versions of the original input channels, one or more downmix channels are sent to a spatial audio decoder along with parametric side information indicating the spatial cue for decoding the downmix channels and associated parametric data. The channel arrangement in the output setup can be fixed, e.g., 5.1 format, 7.1 format, etc. Furthermore, spatial audio object coding tools are well-known in this art, and are standardized, for example, in the MPEG SAOC standard (SAOC = Spatial Audio Object Coding). In contrast to spatial audio coding which starts from the original channel, spatial audio object coding starts from audio objects that are not automatically dedicated for a given rendering playback setup. Rather, the placement of audio objects in the playback scene is flexible and can be set by the user, for example, by inputting certain rendering information into the spatial audio object coding decoder. Alternatively or additionally, rendering information may be transmitted as additional side information or metadata, and rendering information may include information on where a given audio object should be placed (e.g., over time) in its position in the playback setup. To obtain a certain data compression, some audio objects are coded using a SAOC encoder, which computes one or more transport channels from the input objects by downmixing the objects according to certain downmix information. In addition, the SAOC encoder computes parametric side information representing inter-object cues, such as object-level difference (OLD) and object coherence values. As in the case of SAC (Spatial Audio Coding), inter-object parametric data is calculated for each individual time/frequency tile. For a frame containing an audio signal (e.g., 1024 or 2048 samples), multiple frequency bands (e.g., 24, 32, or 64 bands) are considered so that parametric data is provided for each frame and each frequency band. For example, when the audio portion has 20 frames and each frame is subdivided into 32 frequency bands, the number of time/frequency tiles is 640. In 3D audio systems, it is sometimes desirable to provide a spatial impression of the audio signal as if it were being heard in a specific room. In such situations, the room impulse response of a particular room is provided, for example, based on its measurement, and used to process the audio signal when presenting it to the listener. In such a presentation, it may be desirable to process direct sound and early reflections separated from late reverberation. This provides an overview of the 3D audio encoder in a 3D audio system.This provides an overview of the 3D audio decoder in a 3D audio system.An example of implementing a format converter that can be implemented in the 3D audio decoder shown in Figure 2 is presented.Figure 2 shows one embodiment of a binaural renderer that can be implemented in the 3D audio decoder.An example of the indoor impulse response h(t) is shown.This paper demonstrates different possibilities for processing audio input signals using the room impulse response and shows how to process the entire audio signal according to the room impulse response.This paper demonstrates different possibilities for processing audio input signals using room impulse responses, showing separate processing for the initial and late reverberation portions.A block diagram of a signal processing unit, such as a binaural renderer, operating according to the teachings of the present invention is shown.The binaural processing of an audio signal in a binaural renderer according to one embodiment of the present invention is schematically shown.Figure 8 schematically shows the processing in the frequency domain reverberation of the binaural renderer according to one embodiment of the present invention. Next, embodiments of the present invention wi