US-12627939-B2 - Stereo audio signal processing method, encoding device, and storage medium
Abstract
A method for processing a stereo audio signal, performed by an encoding device, includes: determining an initial first threshold Thresh0 1 and an initial second threshold Thresh0 2 of a current frame of the stereo audio signal, where Thresh0 1 ∈(−1,0), and Thresh0 2 ∈(0,1); determining an offset value Delta; determining a first threshold Thresh1 and a second threshold Thresh2 corresponding to the current frame of the stereo audio signal according to a de-correlation manner for a previous frame of the stereo audio signal, the offset value Delta, the initial first threshold Thresh0 1 of the current frame, and the initial second threshold Thresh0 2 of the current frame; and performing de-correlation on the current frame according to the first threshold Thresh1 and the second threshold Thresh2 corresponding to the current frame.
Inventors
- Shuo Gao
Assignees
- BEIJING XIAOMI MOBILE SOFTWARE CO., LTD.
Dates
- Publication Date
- 20260512
- Application Date
- 20211203
Claims (18)
- 1 . A method for processing a stereo audio signal, performed by an encoding device, comprising: determining an initial first threshold Thresh0 1 and an initial second threshold Thresh0 2 of a current frame of the stereo audio signal, wherein Thresh0 1 ∈(−1,0), and Thresh0 2 ∈(0,1); determining an offset value Delta; determining a first threshold Thresh1 and a second threshold Thresh2 corresponding to the current frame of the stereo audio signal according to a de-correlation manner for a previous frame of the stereo audio signal, the offset value Delta, the initial first threshold Thresh0 1 of the current frame, and the initial second threshold Thresh0 2 of the current frame; and performing de-correlation on the current frame according to the first threshold Thresh1 and the second threshold Thresh2 corresponding to the current frame; wherein determining the first threshold Thresh1 and the second threshold Thresh2 corresponding to the current frame of the stereo audio signal according to the de-correlation manner for the previous frame of the stereo audio signal, the offset value Delta, the initial first threshold Thresh0 1 of the current frame, and the initial second threshold Thresh0 2 of the current frame comprises: determining the first threshold Thresh1 and the second threshold Thresh2 corresponding to the current frame according to a first formula, wherein the de-correlation manner for the previous frame of the stereo audio signal is performing the de-correlation with a first de-correlation manner, wherein the first formula is: { Thresh 1 = Thresh 0 1 + Delta Thresh 2 = Thresh 0 2 wherein Thresh1 and Thresh2 represent the first threshold and the second threshold of the current frame respectively, Thresh0 1 and Thresh0 2 represent the initial first threshold of the current frame and the initial second threshold of the current frame respectively, and Delta represents an offset value, and Delta∈(0, |Thresh0 1 |).
- 2 . The method of claim 1 , wherein determining the first threshold Thresh 1 and the second threshold Thresh2 corresponding to the current frame of the stereo audio signal according to the de-correlation manner for the previous frame of the stereo audio signal, the offset value Delta, the initial first threshold Thresh 01 of the current frame, and the initial second threshold Thresh0 2 of the current frame comprises: determining the first threshold Thresh1 and the second threshold Thresh2 corresponding to the current frame according to a second formula, wherein de-correlation manner for the previous frame of the stereo audio signal is performing the de-correlation with a second de-correlation manner, wherein the second formula is: { Thresh 1 = Thresh 0 1 Thresh 2 = Thresh 0 2 - Delta wherein Thresh1 and Thresh2 represent a first threshold of the current frame and a second threshold of the current frame respectively, Thresh0 1 and Thresh0 2 represent the initial first threshold of the current frame and the initial second threshold of the current frame respectively, and Delta represents an offset value, and Delta∈(0, |Thresh0 2 |).
- 3 . The method of claim 2 , wherein the second de-correlation manner comprises a second Mid/Sid down-mixing processing comprising: obtaining a Mid-channel signal and a Sid-channel signal by processing a left channel signal and a right channel signal of the previous frame according to a seventh formula, wherein the seventh formula is: { Mid ( n ) = ( L ( n ) - R ( n ) ) 2 Sid ( n ) = L ( n ) - R ( n ) wherein Mid(n) represents a Mid-channel signal of the previous frame, Sid(n) represents a Sid-channel signal of the previous frame, L(n) represents the left channel signal of the previous frame, and R(n) represents the right channel signal of the previous frame.
- 4 . The method of claim 1 , wherein determining the first threshold Thresh1 and the second threshold Thresh2 corresponding to the current frame of the stereo audio signal according to the de-correlation manner for the previous frame of the stereo audio signal, the offset value Delta, the initial first threshold Thresh0 1 of the current frame, and the initial second threshold Thresh0 2 of the current frame comprises: determining the first threshold Thresh1 and the second threshold Thresh2 corresponding to the current frame according to a third formula, wherein the de-correlation manner for the previous frame of the stereo audio signal is not performing the de- correlation, and a reason of not performing the de-correlation is that a first cross-correlation coefficient for a left channel signal and a right channel signal of the previous frame is greater than or equal to a first threshold Thresh2 1 corresponding to the previous frame and is less than or equal to a second threshold Thresh2 2 corresponding to the previous frame, wherein the third formula is: { Thresh 1 = Thresh 0 1 Thresh 2 = Thresh 0 2 wherein Thresh1 and Thresh2 represent a first threshold of the current frame and a second threshold of the current frame respectively, Thresh0 1 and Thresh0 2 represent the initial first threshold of the current frame and the initial second threshold of the current frame respectively.
- 5 . The method of claim 4 , wherein determining the first cross-correlation coefficient comprises: determining the first cross-correlation coefficient for the left channel signal and the right channel signal of the previous frame according to an eighth formula of { η ( LR ) = ∑ n = 1 N ( L ( n ) - L _ ) × ( R ( n ) - R _ ) ∑ n = 1 N ( L ( n ) - L _ ) 2 × ∑ n = 1 N ( R ( n ) - R _ ) 2 L _ = ∑ n = 1 N L ( n ) N R _ = ∑ n = 1 N R ( n ) N wherein η (LR) represents the first cross-correlation coefficient for the left channel signal and the right channel signal of the previous frame, L(n) represents a n th sample point of the left channel signal of the previous frame, L represents an average value of all sample points of the left channel signal of the previous frame, R(n) represents a n th sample point of the right channel signal of the previous frame, R represents an average value of all sample points of the right channel signal of the previous frame, N represents a total number of sample points of the left channel signal or the right channel signal of the previous frame.
- 6 . The method of claim 1 , wherein determining the first threshold Thresh1 and the second threshold Thresh2 corresponding to the current frame of the stereo audio signal according to the de-correlation manner for the previous frame of the stereo audio signal, the offset value Delta, the initial first threshold Thresh0 1 of the current frame, and the initial second threshold Thresh0 2 of the current frame comprises: determining the first threshold Thresh1 and the second threshold Thresh2 corresponding to the current frame according to a fourth formula, wherein the de-correlation manner for the previous frame of the stereo audio signal is not performing the de-correlation, a reason of not performing the de-correlation is that a first cross-correlation coefficient for a left channel signal and a right channel signal of the previous frame is less than a first threshold Thresh2 1 corresponding to the previous frame, and the first cross-correlation coefficient is greater than or equal to a second cross-correlation coefficient, wherein the second cross-correlation coefficient is a cross-correlation coefficient for de-correlated signals obtained by performing a first de-correlation on signals of the previous frame with a first de-correlation manner, wherein the fourth formula is: { Thresh 1 = Thresh 0 1 - Delta Thresh 2 = Thresh 0 2 wherein Thresh1 and Thresh2 represent a first threshold of the current frame and a second threshold of the current frame respectively, Thresh0 1 and Thresh0 2 represent the initial first threshold of the current frame and the initial second threshold of the current frame respectively, and Delta represents an offset value, and Delta∈(0, |Thresh0 1 |).
- 7 . The method of claim 6 , wherein the first de-correlation manner comprises a first Mid/Sid down-mixing processing comprises: obtaining a Mid-channel signal and a Sid-channel signal by processing the left channel signal and the right channel signal of the previous frame according to a sixth formula, wherein the sixth formula is: { Mid ( n ) = ( L ( n ) - R ( n ) ) 2 Sid ( n ) = L ( n ) + R ( n ) wherein Mid(n) represents a Mid-channel signal of the previous frame, Sid(n) represents a Sid-channel signal of the previous frame, L(n) represents the left channel signal of the previous frame, and R(n) represents the right channel signal of the previous frame.
- 8 . The method of claim 6 , wherein the de-correlated signals comprise a Mid-channel signal and a Sid-channel signal, and calculating the second cross-correlation coefficient for the de-correlated signals comprises: determining the second cross-correlation coefficient for the de-correlated signals according to a ninth formula of { η ( MS ) = ∑ n = 1 N ( Mid ( n ) - M 1 d _ ) × ( Sid ( n ) - S 1 d _ ) ∑ n = 1 N ( Mid ( n ) - M 1 d _ ) 2 × ∑ n = 1 N ( Sid ( n ) - S 1 d _ ) 2 M 1 d _ = ∑ n = 1 N Mid ( n ) N S 1 d _ = ∑ n = 1 N S 1 d ( n ) N wherein η (MS) represents the second cross-correlation coefficient, Mid(n) represents a n th sample point of the Mid-channel signal in the de-correlated signals, Mid represents an average value of all sample points of the Mid-channel signal in the de-correlated signals, Sid(n) represents a n th sample point of the Sid-channel signal in the de-correlated signals, Sid represents an average value of all sample points of the Sid-channel signal in the de-correlated signals, N represents a total number of sample points of the Mid-channel signal or the Sid-channel signal of the previous frame.
- 9 . The method of claim 6 , wherein determining the first cross-correlation coefficient comprises: determining the first cross-correlation coefficient for the left channel signal and the right channel signal of the previous frame according to an eighth formula of { η ( LR ) = ∑ n = 1 N ( L ( n ) - L _ ) × ( R ( n ) - R _ ) ∑ n = 1 N ( L ( n ) - L _ ) 2 × ∑ n = 1 N ( R ( n ) - R _ ) 2 L _ = ∑ n = 1 N L ( n ) N R _ = ∑ n = 1 N R ( n ) N wherein η (LR) represents the first cross-correlation coefficient for the left channel signal and the right channel signal of the previous frame, L(n) represents a n th sample point of the left channel signal of the previous frame, L represents an average value of all sample points of the left channel signal of the previous frame, R(n) represents a n th sample point of the right channel signal of the previous frame, R represents an average value of all sample points of the right channel signal of the previous frame, N represents a total number of sample points of the left channel signal or the right channel signal of the previous frame.
- 10 . The method of claim 1 , wherein determining the first threshold Thresh1 and the second threshold Thresh2 corresponding to the current frame of the stereo audio signal according to the de-correlation manner for the previous frame of the stereo audio signal, the offset value Delta, the initial first threshold Thresh0 1 of the current frame, and the initial second threshold Thresh0 2 of the current frame comprises: determining the first threshold Thresh1 and the second threshold Thresh2 corresponding to the current frame according to a fifth formula, wherein the de-correlation manner for the previous frame of the stereo audio signal is not performing the de-correlation, and a reason of not performing the de-correlation is that a first cross-correlation coefficient for a left channel signal and a right channel signal of the previous frame is greater than a second threshold Thresh2 2 corresponding to the previous frame, and the first cross- correlation coefficient is less than or equal to a third cross-correlation coefficient, wherein the third cross-correlation coefficient is a cross-correlation coefficient for de-correlated signals obtained by performing a second de-correlation on signals of the previous frame with a second de-correlation manner, wherein the fifth formula is: { Thresh 1 = Thresh 0 1 Thresh 2 = Thresh 0 2 + Delta wherein Thresh1 and Thresh2 represent a first threshold of the current frame and a second threshold of the current frame respectively, Thresh0 1 and Thresh0 2 represent the initial first threshold of the current frame and the initial second threshold of the current frame respectively, and Delta represents an offset value, and Delta∈(0, |Thresh0 2 |).
- 11 . The method of claim 10 , wherein the second de-correlation manner comprises a second Mid/Sid down-mixing processing comprising: obtaining a Mid-channel signal and a Sid-channel signal by processing the left channel signal and the right channel signal of the previous frame according to a seventh formula, wherein the seventh formula is: { Mid ( n ) = ( L ( n ) - R ( n ) ) 2 Sid ( n ) = L ( n ) - R ( n ) wherein Mid(n) represents a Mid-channel signal of the previous frame, Sid(n) represents a Sid-channel signal of the previous frame, L(n) represents the left channel signal of the previous frame, and R(n) represents the right channel signal of the previous frame.
- 12 . The method of claim 10 , wherein determining the first cross-correlation coefficient comprises: determining the first cross-correlation coefficient for the left channel signal and the right channel signal of the previous frame according to an eighth formula of { η ( LR ) = ∑ n = 1 N ( L ( n ) - L _ ) × ( R ( n ) - R _ ) ∑ n = 1 N ( L ( n ) - L _ ) 2 × ∑ n = 1 N ( R ( n ) - R _ ) 2 L _ = ∑ n = 1 N L ( n ) N R _ = ∑ n = 1 N R ( n ) N wherein η (LR) represents the first cross-correlation coefficient for the left channel signal and the right channel signal of the previous frame, L(n) represents a n th sample point of the left channel signal of the previous frame, L represents an average value of all sample points of the left channel signal of the previous frame, R(n) represents a n th sample point of the right channel signal of the previous frame, R represents an average value of all sample points of the right channel signal of the previous frame, N represents a total number of sample points of the left channel signal or the right channel signal of the previous frame.
- 13 . The method of claim 10 , wherein the de-correlated signals comprise a Mid-channel signal and a Sid-channel signal, and calculating the third cross-correlation coefficient for the de-correlated signals comprises: determining the third cross-correlation coefficient for the de-correlated signals according to a ninth formula of <CWU-Call number = “ 57 ” /> { η ( MS ) = ∑ n = 1 N ( Mid ( n ) - M l d _ ) × ( Sid ( n ) - S l d _ ) ∑ n = 1 N ( Mid ( n ) - M l d _ ) 2 × ∑ n = 1 N ( Sid ( n ) - S l d _ ) 2 M l d _ = ∑ n = 1 N Mid ( n ) N S l d _ = ∑ n = 1 N S l d ( n ) N wherein η (MS) represents the third cross-correlation coefficient, Mid(n) represents a n th sample point of the Mid-channel signal in the de-correlated signals, Mid represents an average value of all sample points of the Mid-channel signal in the de-correlated signals, Sid(n) represents a n th sample point of the Sid-channel signal in the de-correlated signals, Sid represents an average value of all sample points of the Sid-channel signal in the de-correlated signals, N represents a total number of sample points of the Mid-channel signal or the Sid-channel signal of the previous frame.
- 14 . The method of claim 1 , wherein the first de-correlation manner comprises a first Mid/Sid down-mixing processing comprising: obtaining a Mid-channel signal and a Sid-channel signal by processing a left channel signal and a right channel signal of the previous frame according to a sixth formula, wherein the sixth formula is: { Mid ( n ) = ( L ( n ) - R ( n ) ) 2 Sid ( n ) = L ( n ) + R ( n ) wherein Mid(n) represents a Mid-channel signal of the previous frame, Sid(n) represents a Sid-channel signal of the previous frame, L(n) represents the left channel signal of the previous frame, and R(n) represents the right channel signal of the previous frame.
- 15 . The method of claim 1 , further comprising: determining an initial first threshold Thresh0 1 and an initial second threshold Thresh0 2 of a first frame of the stereo audio signal; and determining a first threshold Thresh3 1 and a second threshold Thresh3 2 corresponding to the first frame according to a tenth formula of { Thresh 3 1 = Thresh 0 1 Thresh 3 2 = Thresh 0 2 wherein Thresh3 1 and Thresh3 2 represent a first threshold of the first frame and a second threshold of the first frame respectively, and Thresh0 1 and Thresh0 2 represent an initial first threshold of the first frame and an initial second threshold of the first frame respectively.
- 16 . An encoding device, comprising: a processor; and a memory having stored therein a computer program that, when executed by the processor, causes the encoding device to implement: determining an initial first threshold Thresh 01 and an initial second threshold Thresh0 2 of a current frame of a stereo audio signal, wherein Thresh0 1 ∈(−1,0), and Thresh0 2 ∈(0,1); determining an offset value Delta; determining a first threshold Thresh1 and a second threshold Thresh2 corresponding to the current frame of the stereo audio signal according to a de-correlation manner for a previous frame of the stereo audio signal, the offset value Delta, the initial first threshold Thresh0 1 of the current frame, and the initial second threshold Thresh0 2 of the current frame; and performing de-correlation on the current frame according to the first threshold Thresh1 and the second threshold Thresh2 corresponding to the current frame; wherein determining the first threshold Thresh1 and the second threshold Thresh2 corresponding to the current frame of the stereo audio signal according to the de-correlation manner for the previous frame of the stereo audio signal, the offset value Delta, the initial first threshold Thresh0 1 of the current frame, and the initial second threshold Thresh0 2 of the current frame comprises: determining the first threshold Thresh1 and the second threshold Thresh2 corresponding to the current frame according to a first formula, wherein the de-correlation manner for the previous frame of the stereo audio signal is performing the de-correlation with a first de-correlation manner, wherein the first formula is: { Thresh 1 = Thresh 0 1 + Delta Thresh 2 = Thresh 0 2 wherein Thresh1 and Thresh2 represent the first threshold and the second threshold of the current frame respectively, Thresh0 1 and Thresh0 2 represent the initial first threshold of the current frame and the initial second threshold of the current frame respectively, and Delta represents an offset value, and Delta∈(0, |Thresh0 1 |).
- 17 . An encoding device, comprising a processor and an interface circuit; wherein the interface circuit is configured to receive a code instruction and transmit the code instruction to the processor; and the processor is configured to run the code instruction to implement: determining an initial first threshold Thresh0 1 and an initial second threshold Thresh0 2 of a current frame of a stereo audio signal, wherein Thresh0 1 ∈(−1,0), and Thresh0 2 ∈(0,1); determining an offset value Delta; determining a first threshold Thresh1 and a second threshold Thresh2 corresponding to the current frame of the stereo audio signal according to a de-correlation manner for a previous frame of the stereo audio signal, the offset value Delta, the initial first threshold Thresh0 1 of the current frame, and the initial second threshold Thresh0 2 of the current frame; and performing de-correlation on the current frame according to the first threshold Thresh1 and the second threshold Thresh2 corresponding to the current frame; wherein determining the first threshold Thresh1 and the second threshold Thresh2 corresponding to the current frame of the stereo audio signal according to the de-correlation manner for the previous frame of the stereo audio signal, the offset value Delta, the initial first threshold Thresh0 1 of the current frame, and the initial second threshold Thresh0 2 of the current frame comprises: determining the first threshold Thresh1 and the second threshold Thresh2 corresponding to the current frame according to a first formula, wherein the de-correlation manner for the previous frame of the stereo audio signal is performing the de-correlation with a first de-correlation manner, wherein the first formula is: { Thresh 1 = Thresh 0 1 + Delta Thresh 2 = Thresh 0 2 wherein Thresh1 and Thresh2 represent the first threshold and the second threshold of the current frame respectively, Thresh0 1 and Thresh0 2 represent the initial first threshold of the current frame and the initial second threshold of the current frame respectively, and Delta represents an offset value, and Delta∈(0, |Thresh0 1 |).
- 18 . A non-transitory computer-readable storage medium having stored therein instructions that, when executed, cause the method of claim 1 to be implemented.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS The present application is a U.S. national phase of International Application No. PCT/CN2021/135514, filed on Dec. 3, 2021, the entire disclosure of which is incorporated herein by reference for all purposes. FIELD The present disclosure relates to the field of communication technologies, and in particular to a stereo audio signal processing method, an encoding device and a storage medium. BACKGROUND Lossless encoding is widely applied due to its ability for realizing high-quality audio playback and lossless storage. When lossless encoding is performed on stereo audio signals, de-correlation is usually performed on the stereo audio signals, to improve the encoding compression rate. In the related art, de-correlation is normally performed by setting a threshold, calculating a correlation coefficient for a left channel signal and a right channel signal of a current frame of a stereo audio signal, determining a correlation between the left channel signal and the right channel signal of the current frame based on the correlation coefficient and the threshold, and performing the de-correlation on the current frame by adopting an optimal de-correlation manner based on the determined correlation. However, in the related art, the threshold corresponding to each frame of the stereo audio signal is fixed and cannot be updated adaptively, which will affect the accuracy of determining the correlation among different frames. In this way, it is hard to accurately select an optimal threshold for each frame, and improve the encoding compression rate. SUMMARY According to an aspect of the present disclosure, there is provided a method for processing a stereo audio signal, performed by an encoding device, including: determining an initial first threshold Thresh01 and an initial second threshold Thresh02 of a current frame of the stereo audio signal, where Thresh01∈(−1,0), and Thresh02∈(0,1); determining an offset value Delta; determining a first threshold Thresh1 and a second threshold Thresh2 corresponding to the current frame of the stereo audio signal according to a de-correlation manner for a previous frame of the stereo audio signal, the offset value Delta, the initial first threshold Thresh01 of the current frame, and the initial second threshold Thresh02 of the current frame; and performing de-correlation on the current frame according to the first threshold Thresh1 and the second threshold Thresh2 corresponding to the current frame. According to a further aspect of the present disclosure, there is provided an encoding device, including: a processor; and a memory having stored therein a computer program that, when executed by the processor, causes the communication device to implement the method of embodiments of the above aspect. According to a further aspect of the present disclosure, there is provided an encoding device, including: a processor and an interface circuit. The interface circuit is configured to receive a code instruction and transmit the code instruction to the processor. The processor is configured to run the code instruction to implement the method of embodiments of the above aspect. According to a further aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium having stored therein instructions that, when executed, cause the method of embodiments of the above aspect to be implemented. BRIEF DESCRIPTION OF THE DRAWINGS The above and/or additional aspects and advantages of the present disclosure will become apparent and more readily appreciated from the following descriptions made with reference to the drawings, in which: FIG. 1A is a flowchart of a method for processing a stereo audio signal provided by an embodiment of the present disclosure; FIG. 1B is a block diagram illustrating a flow of obtaining an encoded code stream based on de-correlated signals provided by an embodiment of the present disclosure; FIG. 2 is a flowchart of a method for processing a stereo audio signal provided by an embodiment of the present disclosure; FIG. 3 is a flowchart of a method for processing a stereo audio signal provided by an embodiment of the present disclosure; FIG. 4 is a flowchart of a method for processing a stereo audio signal provided by an embodiment of the present disclosure; FIG. 5 is a flowchart of a method for processing a stereo audio signal provided by an embodiment of the present disclosure; FIG. 6 is a flowchart of a method for processing a stereo audio signal provided by an embodiment of the present disclosure; FIG. 7 is a flowchart of a method for processing a stereo audio signal provided by an embodiment of the present disclosure; FIG. 8 is a schematic diagram of an apparatus for processing a stereo audio signal provided by an embodiment of the present disclosure; FIG. 9 is a block diagram of a user equipment provided by an embodiment of the present disclosure; and FIG. 10 is a block diagram of a network side