US-12621447-B2 - Adaptive predictive encoding

US12621447B2US 12621447 B2US12621447 B2US 12621447B2US-12621447-B2

Abstract

A method performed in an encoder for selecting a coding mode for a current frame includes obtaining bit rates for absolute coding and predictive coding. The method includes calculating a bit rate difference based on the bit rates obtained. The method includes low-pass filtering the bit rate difference. The method includes selecting a coding mode based on the bit rate difference, the low-pass filtered bit rate difference, and a predictive mode counter.

Inventors

Erik Norvell
Fredrik Jansson

Assignees

TELEFONAKTIEBOLAGET LM ERICSSON (PUBL)

Dates

Publication Date: 20260505
Application Date: 20211215

Claims (20)

1 . A method performed in an encoder for selecting a coding mode for a current frame, the method comprising: obtaining bit rates for absolute coding and predictive coding; calculating a bit rate difference based on the bit rates for absolute coding and predictive coding; low-pass filtering the bit rate difference according to bitdiff LP (m)=γbitdiff(m)+(1-γ) bitdiff LP (m−1), where γ is a low-pass filter coefficient; and selecting a coding mode based on the bit rate difference, the low-pass filtered bit rate difference, and a predictive mode counter.
2 . The method of claim 1 , further comprising: responsive to selecting a predictive coding mode: setting the coding mode to be the predictive coding mode; and incrementing the predictive mode counter.
3 . The method of claim 1 , further comprising: responsive to selecting an absolute coding mode: setting the coding mode to be the absolute coding mode; and resetting the predictive mode counter to an initial value.
4 . The method of claim 1 , wherein determining whether predictive coding is allowed comprises determining if a flag indicates predictive coding is allowed.
5 . The method of claim 4 , wherein determining if the flag indicates predictive coding is allowed comprises: determining if a predictive count is below a maximum number of predictive frames; responsive to determining that the predictive count is less than the maximum number of predictive frames, setting the flag to indicate predictive coding is allowed; and responsive to determining that the predictive count is greater than the maximum number of predictive frames, setting the flag to indicate predictive coding is not allowed.
6 . The method of claim 1 , further comprising: determining whether predictive coding is allowed; responsive to determining that predictive coding is not allowed: setting the coding mode to be an absolute coding mode; and setting the predictive mode counter to an initial value.
7 . The method of claim 1 , wherein calculating the bit rate difference comprises calculating the bit rate difference according to: bitdiff ⁡ ( m ) = nbits abs ( m ) - nbits pred ( m ) , where nbits abs (m) is the bit rate for absolute coding and nbits pred (m) is the bit rate for predictive coding.
8 . The method of claim 1 , wherein selecting the coding mode based on the bit rate difference, the low-pass filtered bit rate difference, and the predictive mode counter comprises selecting the coding mode according to: g mode ( m ) = { PREDICTIVE , predcond ⁡ ( m ) ABSOLUTE , otherwise where predcond ⁡ ( m ) = bitdiff ⁡ ( m ) > C · bitdiff LP ( m ) · predcont ⁡ ( m - 1 ) MAX_PRED ⁢ _STREAK ⁢ bitdiff ⁡ ( m ) = nbits abs ( m ) - nbits pred ( m ) , and ⁢ bitdiff LP ( m ) = γ ⁢ bitdiff ⁡ ( m ) + ( 1 - γ ) ⁢ bitdiff LP ( m - 1 ) where g mode (m) is the coding mode, predcount (m−1) is a number of frames since a last ABSOLUTE coded frame, C is a tuning constant, bitdiff(m) is the bit rate difference, nbits pred (m) is a number of bits estimated to be used for predictive coding, nbits abs (m) is a number of bits estimated to be used for absolute coding, bitdiff LP (m) is the low-pass filtered bit rate difference, γ is the low-pass filter coefficient, and MAX_PRED_STREAK is a maximum number of predictive successive frames allowed.
9 . The method of claim 8 , wherein predcount ⁡ ( m ) = { 0 , g mode ( m ) = ABSOLUTE precount ⁡ ( m - 1 ) + step ⁡ ( m ) , g mode ( m ) = PREDICTIVE step ( m ) = { 1 , active ⁢ frame 8 , SID ⁢ frame where SID frame is a silence insertion descriptor frame.
10 . The method of claim 8 , wherein γ ∈ [0.01,0.20] and C ∈ [0,1].
11 . The method of claim 8 , wherein g mode (m) is further based on pred_allowed (m) according to: g mode ( m ) = { PREDICTIVE , pred_allowed ⁢ ( m ) ⁢ AND ⁢ predcond ⁡ ( m ) ABSOLUTE , otherwise where pred_allowed ⁢ ( m ) = { TRUE , predcont ⁡ ( m - 1 ) < MAX_PRED ⁢ _STREAK FALSE , otherwise
12 . The method of claim 1 , wherein selecting the coding mode based on the bit rate difference, the low-pass filtered bit rate difference, and the predictive mode counter comprises selecting the coding mode according to: g mode = { PREDICTIVE , pred_allowed ⁢ ( m ) ⁢ AND ⁢ nbits pred ( m ) < nbits abs ( m ) - A · predcount ⁡ ( m - 1 ) ABSOLUTE , otherwise pred_allowed ⁢ ( m ) = { TRUE , predcount ⁢ ( m - 1 ) < MAX_PRED ⁢ _STREAK FALSE , otherwise , where g mode (m) is the coding mode, predcount (m) is a number of frames since a last ABSOLUTE coded frame, nbits pred (m) is a number of bits estimated to be used for predictive coding, nbits abs (m) is a number of bits estimated to be used for absolute coding, A is a constant, and MAX_PRED_STREAK is a maximum number of predictive successive frames allowed.
13 . The method of claim 12 , wherein A ∈ (0,2].
14 . The method of claim 1 , wherein selecting the coding mode based on the bit rate difference, the low-pass filtered bit rate difference, and the predictive mode counter comprises selecting the coding mode according to: g mode ( m ) = { PREDICTIVE , pred_allowed ⁢ ( m ) ⁢ AND ⁢ nbits pred ( m ) < nbits abs ( m ) - G ⁢ α acc ( m - 1 ) ABSOLUTE , otherwise pred_allowed ⁢ ( m ) = { TRUE , ε rel ( m - 1 ) > ε THR ⁢ AND ⁢ α acc ( m - 1 ) < α acc , max FALSE , otherwise , α acc ( m ) = { 0 , g mode = ABSOLUTE α ⁡ ( m ) + α acc ( m - 1 ) g mode ( m ) = PREDICTIVE where g mode (m) is the coding mode, nbits pred (m) is a number of bits estimated to be used for predictive coding, nbits abs (m) is a number of bits estimated to be used for absolute coding, G is a constant, α acc (m) is a weighting factor, ε rel (m) is a relative error, and ε THR is a threshold value for the relative error.
15 . An encoder for selecting a coding mode for a current frame, the encoder adapted to perform operations comprising: obtaining bit rates for absolute coding and predictive coding; calculating a bit rate difference based on the bit rates for absolute coding and predictive coding; low-pass filtering the bit rate difference according to bitdiff LP (m)=γbitdiff(m)+(1−γ) bitdiff LP (m−1), where γ is a low-pass filter coefficient; and selecting a coding mode based on the bit rate difference, the low-pass filtered bit rate difference, and a predictive mode counter.
16 . The encoder of claim 15 , wherein the encoder is adapted to perform further operations comprising: responsive to selecting a predictive coding mode: setting the coding mode to be the predictive coding mode; and incrementing the predictive mode counter.
17 . The encoder of claim 15 , further comprising: responsive to selecting an absolute coding mode: setting the coding mode to be the absolute coding mode; and setting the predictive mode counter to an initial value.
18 . The encoder of claim 15 , wherein determining whether predictive coding is allowed comprises determining if a flag indicates predictive coding is allowed.
19 . The encoder of claim 18 , wherein determining if the flag indicates predictive coding is allowed comprises: determining if a predictive count is below a maximum number of predictive frames; responsive to determining that the predictive count is less than the maximum number of predictive frames, setting the flag to indicate predictive coding is allowed; and responsive to determining that the predictive count is greater than the maximum number of predictive frames, setting the flag to indicate predictive coding is not allowed.
20 . The encoder of claim 15 , further comprising: determining whether predictive coding is allowed; responsive to determining that predictive coding is not allowed: setting the coding mode to be an absolute coding mode; and setting the predictive mode counter to an initial value.

Description

CROSS REFERENCE TO RELATED APPLICATIONS This application is a 35 U.S.C. § 371 national stage application of PCT International Application No. PCT/EP2021/085932 filed on Dec. 15, 2021, the disclosure and content of which is incorporated by reference herein in its entirety. TECHNICAL FIELD The present disclosure relates generally to communications, and more particularly to communication methods and related devices and nodes supporting wireless communications. BACKGROUND In communications networks, there may be a challenge to obtain good performance and capacity for a given communications protocol, its parameters and the physical environment in which the communications network is deployed. For example, although the capacity in telecommunication networks is continuously increasing, it is still of interest to limit the required resource usage per user. In mobile telecommunication networks, less required resource usage per call means that the mobile telecommunication network can service a larger number of users in parallel. Lowering the resource usage also yields lower power consumption in both the devices at the user-side (such as in terminal devices) and the devices at the network-side (such as in network nodes). This translates to energy and cost saving for the network operator, whilst enabling prolonged battery life and increased talk-time to be experienced in the terminal devices. One mechanism for reducing the required resource usage for speech communication applications in mobile telecommunication networks is to exploit natural pauses in the speech. In more detail, in most conversations only one party is active at a time, and thus the speech pauses in one communication direction will typically occupy more than half of the signal. One way to utilize this property in order to decrease the required resource usage is to employ a Discontinuous Transmission (DTX) system, where the active signal encoding is discontinued during speech pauses. During speech pauses it is common to transmit a very low bit rate encoding of the background noise to allow for a Comfort Noise Generator (CNG) system at the receiving end to fill pauses with a background noise having similar characteristics as the original noise. The CNG makes the sound more natural compared to having silence in the speech pauses since the background noise is maintained and not switched on and off together with the speech. Complete silence in the speech pauses is commonly perceived as annoying and often leads to the misconception that the call has been disconnected. A DTX system might further rely on a Voice Activity Detector (VAD), which indicates to the transmitting device whether to use active signal encoding or low rate background noise encoding. In this respect, the transmitting device might be configured to discriminate between other source types by using a (Generic) Sound Activity Detector (GSAD or SAD), which not only discriminates speech from background noise but also might be configured to detect music or other signal types, which are deemed relevant. Communication services may be further enhanced by supporting stereo or multichannel audio transmission. In these cases, the DTX/CNG system might also consider the spatial characteristics of the signal in order to provide a comfort noise that is pleasant-sounding. A common mechanism to generate comfort noise is to transmit information about the energy and spectral shape of the background noise in the speech pauses. This can be done using significantly lower number of bits than the regular coding of speech segments. At the receiving device side the comfort noise is generated by creating a pseudo random signal and then shaping the spectrum of the signal with a filter based on information received from the transmitting device. The signal generation and spectral shaping can be performed in the time or the frequency domain. For stereo operation additional parameters are transmitted to the receiving side. In a typical stereo signal, the channel pair shows a high degree of similarity, or correlation. Current state-of-the-art stereo coding schemes exploit this correlation by employing parametric coding, where a single channel is encoded with high quality and complemented with a parametric description that allows to reconstruct the full stereo image. The process of reducing the channel pair into a single channel is often called a down-mix and the resulting channel the down-mix channel. The down-mix procedure typically tries to maintain the energy by aligning inter-channel time differences (ITD) and inter-channel phase differences (IPD) before mixing the channels. To maintain the energy balance of the input signal, the inter-channel level difference (ILD) is also measured. The ITD, IPD and ILD are then encoded and may be used in a reversed up-mix procedure when reconstructing the stereo channel pair at a decoder. Most audio coding systems process the input audio signal in segments, often called frames. For stabl