RU-2861330-C2 - DEVICE AND METHOD FOR COMPRESSING HEAD-RELATED TRANSFER FUNCTIONS
Abstract
FIELD: audio signal coding. SUBSTANCE: invention relates to audio signal processing and to audio signal decoding, and in particular to a device and method for binaural rendering, and more specifically to a device and method for compressing and expanding head-related transfer functions (HRTF). The claimed device comprises a rendering information processor (110) configured to modify original binaural rendering information depending on direction information so as to obtain modified binaural rendering information in which spectral distortion is adjusted. EFFECT: creating improved concepts for binaural rendering, with enhanced localisation and externalisation without distorting the spectra of input signals. 15 cl, 5 dwg
Inventors
- WOLF, FELIX
- SCHEUREGGER, Oliver
- NEUKAM, SIMONE
Dates
- Publication Date
- 20260504
- Application Date
- 20230217
- Priority Date
- 20220218
Claims (20)
- 1. A device for binaural rendering, wherein the device comprises:
- - rendering information processor (110),
- wherein the rendering information processor (110) is configured to receive direction information, and
- wherein the rendering information processor (110) is configured to modify the original binaural rendering information using the directional information in such a way that the spectral distortion is adjusted to obtain modified binaural rendering information.
- 2. The device according to paragraph 1,
- - in which the rendering information processor (110) is configured to modify the original binaural rendering information in such a way that the degree of spectral distortion control depends on the directional information.
- 3. The device according to paragraph 1 or 2,
- - wherein the binaural rendering information is suitable for use in processing one or more input audio signals so as to obtain a binaural signal comprising two audio channels.
- 4. The device according to paragraph 3,
- - wherein the device further comprises a signal processor (120) configured to process one or more input audio signals depending on the modified binaural rendering information in such a way as to obtain a binaural signal containing two audio channels.
- 5. A device according to one of the preceding paragraphs,
- - wherein the initial binaural rendering information comprises one or more pairs of initial sound perceptual functions, wherein each of the one or more pairs of initial sound perceptual functions comprises a first initial sound perceptual function and a second initial sound perceptual function,
- - wherein, depending on the direction information, the rendering information processor (110) is configured to modify the first original sound perception function and/or the second original sound perception function from each of the one or more pairs of original sound perception functions to obtain a first modified sound perception function and/or a second modified sound perception function from each of the one or more pairs of modified sound perception functions.
- 6. The device according to paragraph 5,
- - in which the rendering information processor (110) is configured to determine the coefficient depending on the direction, and
- - wherein the rendering information processor (110) is configured to apply a coefficient to the first original sound perception function and/or to the second original sound perception function of at least one of the one or more pairs of original sound perception functions in order to obtain a first modified sound perception function and/or a second modified sound perception function of at least one of the one or more pairs of modified sound perception functions.
- 7. The device according to paragraph 6,
- - in which the coefficient represents the compression ratio.
- 8. The device according to one of paragraphs 5-7,
- - in which the rendering information processor (110) is configured to modify the first original sound perception function and/or the second original sound perception function from each pair of original sound perception functions from one or more pairs of sound perception functions depending on the directional information in such a way that at least one difference in absolute values between two frequency bands of the first original sound perception function and/or the second original sound perception function from said pair of original sound perception functions is modified.
Description
The present invention relates to audio signal coding, audio signal processing and audio signal decoding, and in particular to a device and method for binaural rendering, and more particularly to a device and method for compression and expansion of head-related transfer functions (HRTF). When sound waves are emitted by loudspeakers into the listener's ears, the sound is modified in many ways, for example, by reflections of the sound waves in the walls. Thus, the sound that reaches the auricle contains, for example, in addition to music and speech, information about the listening environment. Furthermore, sound coming from multiple directions is processed in various ways by the listener's head and auricle. Using this information, the listener's brain can determine the approximate direction and distance of the sound source. However, when using headphones, all this information is typically missing, as audio data is transmitted almost directly to the listener's eardrums. This creates the impression that the sound is being generated in the listener's head, which can be perceived as discomfort and can cause spectral coloration, particularly when using earbuds for extended periods. It has been determined that the above-described modifications of sound waves on their way to the listener's auricle and eardrum can be measured and replicated using digital filters, such as perceptual impulse responses, perceptual transfer functions, binaural room impulse responses, and binaural room transfer functions. When such filters are applied to audio signals to be reproduced through headphones or small earbuds, a spatial sound is created that creates a realistic auditory experience. This processing of audio signals is called "binaural processing" or "binaural rendering." Head-related transfer functions (HRTFs) are the acoustic transfer functions from sound sources to two ears. HRTFs contain information about the location of the corresponding sound sources. A virtual sound from a specific direction can be formed by convolving the corresponding HRTFs and the audio signal when listening through headphones. To create spatial sound through binaural rendering, the HRTFs of relevant locations around the listener are measured and stored. Absolute HRTF values are frequency-dependent and provide essential psychoacoustic cues for a sufficiently plausible binaural effect. However, these frequency variations inevitably lead to spectral distortion of the audio signal after binauralization. The degree to which a signal is spectrally distorted must be acceptable to varying degrees depending on a number of factors, such as the type of input signal (e.g., speech, music, surround sound, special effects, etc.), the frequency spectra of the signal, the frequency spectra of the HRTF, whether dynamic head tracking is used during binaural playback, and the distribution of signals around the head. Signal distortion can be reduced by smoothing the absolute HRTF values over frequency. This smoothing is referred to herein as "HRTF compression." Similarly, improving spectral absolute values can be achieved through inversion (referred to as "HRTF expansion") and should increase the spectral distortion of the input signal. To avoid redundancy, if the term "HRTF compression" is used below, and with the understanding that "expansion" is simply "negative compression," the term "HRTF compression" encompasses both HRTF compression and HRTF expansion. There is no algorithmic distinction between compression and expansion. A compromise is provided between using uncompressed HRTFs, which provide full cue cues for sufficiently plausible binaural rendering, but with the risk of spectral distortion of the binaural audio signal, and, on the other hand, using compressed HRTFs, which provide less effective cue cues for sufficiently plausible binaural rendering, but resulting in less spectral distortion of the binaural audio signal. [1] and [2] describe a modification of HRTF filters to reduce unwanted timbral effects. This technology also reduces the variation in the root mean square (RMS) of the HRTF spectrum to reduce unwanted timbral coloration. In [3] the effect of compression or smoothing of absolute value on perceptual outcome (e.g., externalization) is described. [4] and [5] present concepts for binaural virtualization of a single-channel audio signal only partially through filtering. The control enables a smooth transition between fully binaural HRTF-based virtualization and non-binaural panning-based virtualization. In [6], [7] and [8] concepts are presented that generally attempt to reduce spectral distortion, although such concepts seem to affect the overall spatial impression and require complex operations or transformations such as principal component analysis (PCA). [8] and [9] describe concepts that attempt to compress HRTFs to reduce redundancy and decrease the amount of data that must be stored. A representation of the HRTF that requires less data space than