JP-2026076313-A - Related methods using multi-signal encoders, multi-signal decoders, and signal whitening or signal post-processing.

JP2026076313AJP 2026076313 AJP2026076313 AJP 2026076313AJP-2026076313-A

Abstract

[Problem] To provide an improved and more flexible concept for multi-signal coding or decoding. [Solution] The multi-signal encoder includes a signal preprocessor (100) for individually preprocessing each audio signal to obtain at least three preprocessed audio signals. The preprocessing is performed so that the preprocessed audio signals are whited out relative to the unprocessed signals. The encoder also includes an adaptive joint signal processor (200) for obtaining at least three jointly processed signals or at least two jointly processed signals and an unprocessed signal, a signal encoder (300) for encoding each signal to obtain an encoded signal, and an output interface (400) for transmitting or storing an encoded multi-signal audio signal including an encoded signal, side information regarding preprocessing, and side information regarding processing. [Selection Diagram] Figure 5a

Inventors

フォトプルー・エレニ
ムルトルス・マルクス
ディック・ザシャ
マーコビッチ・ゴラン
マーベン・パラヴィ
コーゼ・ズリカンス
バイヤー・シュテファン
ディッシュ・ザシャ
ヘレ・ユルゲン

Assignees

フラウンホーファー－ゲゼルシャフト・ツール・フェルデルング・デル・アンゲヴァンテン・フォルシュング・アインゲトラーゲネル・フェライン

Dates

Publication Date: 20260511
Application Date: 20260213
Priority Date: 20180704

Claims (20)

A multi-signal encoder for encoding at least three audio signals, A signal preprocessor (100) for individually preprocessing each audio signal to obtain at least three preprocessed audio signals, wherein the preprocessing is performed such that the preprocessed audio signals are whitened relative to the original signals, An adaptive joint signal processor (200) for performing processing of the at least three pre-processed audio signals in order to obtain at least three jointly processed signals or at least two jointly processed signals and an unprocessed signal, A signal encoder (300) for encoding each signal in order to obtain one or more encoded signals, A multi-signal encoder including an output interface (400) for transmitting or storing an encoded multi-signal audio signal including one or more encoded signals, side information relating to the preprocessing, and side information relating to the processing.
The adaptive joint signal processor (200) is configured to perform broadband energy normalization (210) of the at least three preprocessed audio signals such that each preprocessed audio signal has normalized energy. The multi-signal encoder according to claim 1, wherein the output interface (400) is configured to include, as further side information, a broadband energy normalized value (534) of each preprocessed audio signal.
The adaptive joint signal processor (200) is, The information regarding the average energy of the pre-processed audio signal is calculated (212), Calculate the energy information for each pre-processed audio signal (211), The energy normalized value is calculated based on the information relating to the average energy and the information relating to the energy of a specific preprocessed audio signal (213, 214). A multi-signal encoder according to claim 2, configured as described above.
The adaptive joint signal processor (200) is configured to calculate (213, 214) a scaling ratio (534b) between a specific preprocessed audio signal from the average energy and the energy of the preprocessed audio signal. The adaptive joint signal processor (200) is configured to determine a flag (534a) indicating whether the scaling ratio is upscaling or downscaling, and the flag for each signal is included in the encoded signal. A multi-signal encoder according to any one of claims 1 to 3.
The adaptive joint signal processor (200) is configured to quantize the scaling ratio to the same quantization range (214) regardless of whether the scaling is upscaling or downscaling. The multi-signal encoder according to claim 4.
The adaptive joint signal processor (200) is, To obtain at least three normalized signals, each preprocessed audio signal is normalized relative to a reference energy (210), The cross-correlation value of the normalized signals of each possible pair of the at least three normalized signals is calculated (220), Select the signal pair with the highest cross-correlation value (229), The joint stereo processing mode for the selected signal pair is determined (232a), To obtain a processed signal pair, the selected signal pair is subjected to joint stereo processing according to the determined joint stereo processing mode (232b). A multi-signal encoder according to any one of claims 1 to 5, configured as described above.
The adaptive joint signal processor (200) is configured to apply cascaded signal pair preprocessing, or the adaptive joint signal processor (200) is configured to apply non-cascaded signal pair processing. In the cascaded signal pair preprocessing, the signals of the processed signal pair are selectable in a further iterative step comprising calculating an updated cross-correlation value, selecting the signal pair having the highest cross-correlation value, determining the joint stereo processing mode for the selected signal pair, and performing the joint stereo processing on the selected signal pair according to the determined joint stereo processing mode, or in the non-cascaded signal pair processing, the signals of the processed signal pair are not selectable in further selecting the signal pair having the highest cross-correlation value, determining the joint stereo processing mode for the selected signal pair, and performing the joint stereo processing on the selected signal pair according to the determined joint stereo processing mode. The multi-signal encoder according to claim 6.
The adaptive joint signal processor (200) is configured to determine the signals to be individually encoded as signals remaining after the pairwise processing procedure. The adaptive joint signal processor (200) is configured to correct the energy normalization applied to the signal before performing the pairwise processing procedure, such as a return (237), or to at least partially return the energy normalization applied to the signal before performing the pairwise processing procedure. A multi-signal encoder according to any one of claims 1 to 7.
The adaptive joint signal processor (200) is configured to determine bit distribution information (536) for each signal processed by the signal encoder (300), and the output interface (400) is configured to introduce the bit distribution information (536) into the encoded signal for each signal. A multi-signal encoder according to any one of claims 1 to 8.
The adaptive joint signal processor (200) calculates the signal energy information of each signal processed by the signal encoder (300) (282), The total energy of the plurality of signals encoded by the signal encoder (300) is calculated (284), Based on the signal energy information and the total energy information, the system is configured to calculate bit distribution information (536) for each signal (286). The output interface (400) is configured to introduce the bit distribution information into the encoded signal for each signal. A multi-signal encoder according to any one of claims 1 to 9.
The adaptive joint signal processor (200) is configured to optionally assign an initial number of bits to each signal (290), assign a number of bits based on the bit distribution information (291), optionally perform a further improvement step (292), or optionally perform a final donation step (292), The signal encoder (300) is configured to perform the signal coding using the assigned bits for each signal. The multi-signal encoder according to claim 10.
The signal preprocessor (100) performs the following for each audio signal: Time-spectral transformation operations (108, 110, 112) to obtain the spectrum of each audio signal, It is configured to perform time noise shaping operations (114a, 114b) and/or frequency domain noise shaping operations (116) for each signal spectrum, The signal preprocessor (100) is configured to supply the signal spectrum to the adaptive joint signal processor (200) following the time noise shaping operation and/or the frequency domain noise shaping operation. The adaptive joint signal processor (200) is configured to perform the joint signal processing on the received signal spectrum. A multi-signal encoder according to any one of claims 1 to 11.
The adaptive joint signal processor (200) is, For each signal in the selected signal pair, determine the required bitrate for a full-band decoupled coding mode such as L/R, the required bitrate for a full-band joint coding mode such as M/S, or the bitrate for a band-by-band joint coding mode such as M/S plus the required bits for band-by-band signal transmission, such as the M/S mask. When the majority of the bandwidth is determined for a particular mode, and a small portion of the bandwidth, less than 10% of the total bandwidth, is determined for other encoding modes, the system is configured to determine the decoupled encoding mode or the joint encoding mode as the particular mode for all bandwidths of the signal pair, or to determine the encoding mode that requires the fewest number of bits. The output interface (400) is configured to include a display in the encoded signal, the display indicating the specific mode of all bandwidths of the frame instead of the encoded mode mask of the frame. A multi-signal encoder according to any one of claims 1 to 12.
The signal encoder (300) includes a rate loop processor for each individual signal or for two or more signals, and the rate loop processor is configured to receive and use bit distribution information (536) for the particular signal or two or more signals. A multi-signal encoder according to any one of claims 1 to 14.
The adaptive joint signal processor (200) is configured to adaptively select signal pairs for joint coding, or to determine, for each selected signal pair, a band-by-band mid/side coding mode, a full-band mid/side coding mode, or a full-band left/right coding mode, and the output interface (400) is configured to display the selected coding mode in the coded multi-signal audio signal as side information (532). A multi-signal encoder according to any one of claims 1 to 15.
The adaptive joint signal processor (200) is configured to form a band-by-band mid/side decision versus left/right decision based on the estimated bit rate in each band when encoded in mid/side mode or left/right mode, and the final joint coding mode is determined based on the results of the band-by-band mid/side decision versus left/right decision. A multi-signal encoder according to any one of claims 1 to 16.
The adaptive joint signal processor (200) is configured to perform the spectral band replication process or the intelligent gap-filling process (260) to determine parameter side information for the spectral band replication process or the intelligent gap-filling process, and the output interface (400) is configured to include the spectral band replication or intelligent gap-filling side information (532) as additional side information in the encoded signal, according to any one of claims 1 to 17.
The adaptive joint signal processor (200) is configured to perform stereo intelligent gap-filling processing on an encoded signal pair and to perform single-signal intelligent gap-filling processing on at least one of the individually encoded signals. The multi-signal encoder according to claim 18.
The at least three audio signals include a low-frequency boosted signal, the adaptive joint signal processor (200) is configured to apply a signal mask, the signal mask indicates for which the adaptive joint signal processor (200) is activated, and the signal mask indicates that the low-frequency boosted signal should not be used in the pairwise processing of the at least three preprocessed audio signals. A multi-signal encoder according to any one of claims 1 to 19.
The adaptive joint signal processor (200) is configured to calculate the energy of the MDCT spectrum of the signal as the information relating to the energy of the signal, or to calculate the average energy of the MDCT spectrum of the at least three preprocessed audio signals as the information relating to the average energy of the at least three preprocessed audio signals. A multi-signal encoder according to any one of claims 1 to 5.

Description

The embodiment relates to an MDCT-based multi-signal coding and decoding system having signal-adaptive joint channel processing, where the signal is a channel, and the multi-signal may be a multi-channel signal, or instead, an audio signal that is a component of a sound field representation, such as ambisonic components, i.e., W, X, Y, Z of first-order ambisonics, or any other arbitrary component of a higher-order ambisonic representation. The signal may also be a signal of a sound field representation in A-format, B-format, or any other arbitrary format. In MPEG USAC [1], joint stereo coding of two channels is performed using Complex Prediction with band-limited or full-band residual signals, MPS2-1-2, or Unified Stereo. - MPEG Surround [2] hierarchically combines OTT and TTT boxes for joint coding of multi-channel audio, regardless of whether residual signals are transmitted or not. - MPEG-H Quad Channel Elements [3] hierarchically apply the MPS2-1-2 stereo box following the complex prediction/MS stereo box, which constructs a "fixed" 4x4 remix tree. AC4[4] introduces new 3-channel, 4-channel, and 5-channel elements that enable remixing of transmitted channels via the transmitted mix matrix and subsequent joint stereo coding information. Previous publications have proposed using orthogonal transforms such as the Karhunen-Loeve transform (KLT) for Enhanced Multichannel Audio Coding [5]. • Multichannel coding tools (MCTs) [6] that support joint coding of three or more channels enable flexible, signal-adaptive joint channel coding in the MDCT domain. This is achieved through the iterative combination and concatenation of complex stereo predictions of real values for two specified channels, as well as stereo coding techniques such as rotational stereo coding (KLT). In the context of 3D audio, loudspeaker channels are distributed across several height layers, resulting in horizontal and vertical channel pairs. The two-channel joint coding defined in USAC is insufficient to account for the spatial and perceptual relationships between channels. MPEG surround is applied with additional pre- and post-processing steps, and residual signals are transmitted individually, without the possibility of joint stereo coding that utilizes dependencies between, for example, left and right vertical residual signals. AC-4 introduces a dedicated N-channel element that allows for sufficient coding of joint coding parameters, but fails in common speaker setups with more channels, as proposed in new immersive playback scenarios (7.1+4, 22.2). MPEG-H is also limited to only four channels and cannot be dynamically applied to arbitrary channels, but only to a pre-configured fixed number of channels. MCT introduces the flexibility of signal-adaptive joint channel coding for arbitrary channels, but stereo processing is performed on windowed and transformed denormalized (non-whitened) signals. Furthermore, encoding the predictive counts or angles in each frequency band of each stereo box requires a large number of bits. A block diagram of single-channel preprocessing in a preferred implementation configuration is shown.This shows a suitable implementation of a block diagram for a multi-signal encoder.Figure 2 shows a preferred implementation of the cross-correlation vector and channel pair selection procedure.This shows a channel pair indexing scheme in a suitable implementation.A suitable implementation of the multi-signal encoder according to the present invention is shown.A schematic diagram of an encoded multi-channel audio signal frame is shown.The procedure performed by the adaptive joint signal processor shown in Figure 5a is illustrated.Figure 8 shows a preferred implementation configuration performed by the adaptive joint signal processor.Figure 5 shows another preferred implementation configuration performed by the adaptive joint signal processor.Another procedure for performing the bit allocation used by the quantization coding processor shown in Figure 5 is presented.A block diagram of a suitable implementation configuration for a multi-signal decoder is shown.Figure 10 shows a preferred implementation configuration performed by the joint signal processor.Figure 10 shows a suitable implementation configuration for the signal decoder.This describes another suitable implementation of a joint signal processor in the context of bandwidth expansion or intelligent gap filling (IGF).Figure 10 shows a further preferred implementation of the joint signal processor.Figure 10 shows a preferred processing block executed by the signal decoder and joint signal processor.This document describes implementations of post-processors for performing dewhitening operations and other optional procedures. Figure 5 shows a preferred implementation of a multi-signal encoder for encoding at least three audio signals. The at least three audio signals are input to a signal processor 100 for individually pre-processing each audio signal to obtain at least