Search

EP-4742239-A1 - DECODING DEVICE, DECODING METHOD, PROGRAM, AND ENCODING DEVICE

EP4742239A1EP 4742239 A1EP4742239 A1EP 4742239A1EP-4742239-A1

Abstract

The present disclosure relates to a decoding device, a decoding method, a program, and an encoding device that can maintain good sound quality even at a low bit rate. A demultiplexing unit separates, from an encoded bitstream of a content signal, first encoded data obtained by encoding an envelope component of the content signal through frequency transform and second encoded data obtained by encoding a flattened waveform of the content signal with the number of bits different from the number of bits for the envelope component, for each band, a band decoding unit generates a band-divided signal by synthesizing the envelope component decoded from the first encoded data and the flattened waveform decoded from the second encoded data, and a band synthesis unit generates the content signal by synthesizing the band-divided signals of respective bands. The technology according to the present disclosure can be applied to, for example, an audio signal transmission system that transmits encoded bitstreams.

Inventors

  • TOGURI YASUHIRO
  • HATTORI TAKAFUMI
  • SUZUKI SHIRO
  • KOTANI RINA
  • MATSUMURA YUUKI
  • KENMOCHI CHISATO

Assignees

  • Sony Group Corporation

Dates

Publication Date
20260513
Application Date
20240618

Claims (20)

  1. A decoding device comprising: a demultiplexing unit configured to separate, from an encoded bitstream of a content signal, first encoded data obtained by encoding an envelope component of the content signal through frequency transform and second encoded data obtained by encoding a flattened waveform of the content signal with the number of bits different from the number of bits for the envelope component, for each band; a band decoding unit configured to generate a band-divided signal by synthesizing the envelope component decoded from the first encoded data and the flattened waveform decoded from the second encoded data; and a band synthesis unit configured to generate the content signal by synthesizing the band-divided signals of respective bands.
  2. The decoding device according to claim 1, wherein the content signal is at least any of an audio signal, a video signal, and a tactile signal.
  3. The decoding device according to claim 1, wherein the first encoded data is a frequency spectrum obtained by performing frequency transform on the envelope component, and the second encoded data is a parameter obtained by performing parameter encoding on the flattened waveform.
  4. The decoding device according to claim 1, wherein the demultiplexing unit separates, for each band, third encoded data that is a frequency spectrum obtained by performing frequency transform on the band-divided signal from the encoded bitstream, and the band decoding unit outputs the band-divided signal decoded from the third encoded data.
  5. The decoding device according to claim 4, wherein the demultiplexing unit determines whether to separate the first encoded data and the second encoded data or to separate only the third encoded data for each band in accordance with a flag included in the encoded bitstream.
  6. The decoding device according to claim 1, wherein the band decoding unit performs weighting processing on the first encoded data of each band.
  7. The decoding device according to claim 6, wherein the band decoding unit multiplies each of frequency bins of the first encoded data of each band by a different weighting coefficient.
  8. The decoding device according to claim 7, wherein the band decoding unit performs the weighting processing using a trained neural network.
  9. The decoding device according to claim 1, wherein the band decoding unit performs weighting processing on a specific frequency range of the envelope component decoded from the first encoded data of each band.
  10. The decoding device according to claim 9, wherein the band decoding unit multiplies a band signal, obtained by dividing the envelope component by a first filter bank, by a weighting coefficient, and then re-synthesizes the band signals by a second filter bank.
  11. The decoding device according to claim 1, further comprising a weight information determination unit configured to determine weight information for setting a weighting coefficient in weighting processing related to the first encoded data of each band, by using any of parameters prepared in advance.
  12. The decoding device according to claim 11, wherein the weight information determination unit selects a parameter of the parameters based on at least any of a content type of the content signal, profile information of a user, terminal information of the decoding device, and a status of a network in which the encoded bitstream is transmitted.
  13. The decoding device according to claim 11, wherein the encoded bitstream is audio object encoded data including audio object data and metadata of an audio object, and a rendering unit configured to perform rendering processing for the audio object on the band-divided signal of each band is further included.
  14. The decoding device according to claim 13, wherein the metadata includes at least any of attribute information, position information, and priority information of the audio object.
  15. The decoding device according to claim 13, wherein the rendering processing is rendering processing using at least any of Vector-based Amplitude Panning (VBAP), Head Related Transfer Function (HRTF), Room Impulse Response (RIR), and Higher Order Ambisonics (HOA).
  16. The decoding device according to claim 13, wherein the weight information determination unit selects a parameter of the parameters based on the metadata.
  17. The decoding device according to claim 13, wherein the rendering unit performs the rendering processing based on the metadata.
  18. A decoding method comprising: by a decoding device, separating, from an encoded bitstream of a content signal, first encoded data obtained by encoding an envelope component of the content signal through frequency transform and second encoded data obtained by encoding a flattened waveform of the content signal with the number of bits different from the number of bits for the envelope component, for each band; by the decoding device, generating a band-divided signal by synthesizing the envelope component decoded from the first encoded data and the flattened waveform decoded from the second encoded data; and by the decoding device, generating the content signal by synthesizing the band-divided signals of respective bands.
  19. A program causing a computer to perform processing of: separating, from an encoded bitstream of a content signal, first encoded data obtained by encoding an envelope component of the content signal through frequency transform and second encoded data obtained by encoding a flattened waveform of the content signal with the number of bits different from the number of bits for the envelope component, for each band; generating a band-divided signal by synthesizing the envelope component decoded from the first encoded data and the flattened waveform decoded from the second encoded data; and generating the content signal by synthesizing the band-divided signals of respective bands.
  20. An encoding device comprising: a band dividing unit configured to divide a content signal into band-divided signals of respective bands; a band encoding unit configured to separate the band-divided signal into an envelope component and a flattened waveform, encode the envelope component through frequency transform, and encode the flattened waveform with the number of bits different from the number of bits for the envelope component; and a multiplexing unit configured to generate an encoded bitstream by multiplexing first encoded data obtained by encoding the envelope component and second encoded data obtained by encoding the flattened waveform.

Description

Technical Field The present disclosure relates to a decoding device, a decoding method, a program, and an encoding device, and particularly relates to a decoding device, a decoding method, a program, and an encoding device that enable a good sound quality to be maintained even at a low bit rate. Background Art In recent years, the number of remote live events held has been increasing with the expansion of network transmission bandwidths. The term remote live refers to a real-time distribution of videos and audio recordings of a live performance by only performers or performers and spectators at a live event venue for an entertainment event such as music or a play to spectators (remote spectators) outside the live event venue. In the distribution such as the remote live, high-quality audio signals need to be transmitted simultaneously to the terminals of a large number of remote spectators without losing the sense of presence at the live event venue. The existing waveform encoding scheme, however, has limitations in compression performance, and it is difficult to significantly lower the bit rate while maintaining the quality. On the other hand, according to parameter encoding in which the audio is modeled and synthesized as parameters, even though the bit rate can be lowered significantly, the target of the processing is limited to voice and the quality of general acoustic signals is degraded, making it difficult to perform transmission and reconstruction without losing the sense of presence at the live event venue. In order to improve encoding efficiency, there is a method of extracting and encoding envelope components of the signal waveform or frequency spectrum. For example, PTL 1 discloses a technique of outputting an index by using a codebook of vector quantization for encoding envelope information. Citation List Patent Literature PTL 1: Patent Application Laid-open No. 9-146593 Summary Technical Problem Meanwhile, it has been recently found that envelope components of audio signals significantly affect the auditory sense. However, in the existing encoding scheme, envelope components tend to be distorted at low bit rates, leading to problems in the auditory sense. The present disclosure has been made in view of such circumstances, and is intended to bring a satisfactory sound quality even at low bit rates. Solution to Problem A decoding device according to a first aspect of the present disclosure is a decoding device including a demultiplexing unit configured to separate, from an encoded bitstream of a content signal, first encoded data obtained by encoding an envelope component of the content signal through frequency transform and second encoded data obtained by encoding a flattened waveform of the content signal with a number of bits different from the number of bits for the envelope component, for each band, a band decoding unit configured to generate a band-divided signal by synthesizing the envelope component decoded from the first encoded data and the flattened waveform decoded from the second encoded data, and a band synthesis unit configured to generate the content signal by synthesizing the band-divided signals of respective bands. A decoding method according to the first aspect of the present disclosure is a decoding method including: by a decoding device, separating, from an encoded bitstream of a content signal, first encoded data obtained by encoding an envelope component of the content signal through frequency transform and second encoded data obtained by encoding a flattened waveform of the content signal with the number of bits different from the number of bits for the envelope component, for each band; by the decoding device, generating a band-divided signal by synthesizing the envelope component decoded from the first encoded data and the flattened waveform decoded from the second encoded data; and by the decoding device, generating the content signal by synthesizing the band-divided signals of respective bands. A program according to the first aspect of the present disclosure is a program causing a computer to perform processing of separating, from an encoded bitstream of a content signal, first encoded data obtained by encoding an envelope component of the content signal through frequency transform and second encoded data obtained by encoding a flattened waveform of the content signal with the number of bits different from the number of bits for the envelope component, for each band, generating a band-divided signal by synthesizing the envelope component decoded from the first encoded data and the flattened waveform decoded from the second encoded data, and generating the content signal by synthesizing the band-divided signals of respective bands. An encoding device according to a second aspect of the present disclosure is an encoding device including a band dividing unit configured to divide a content signal into band-divided signals of respective bands, a band encoding unit configu