EP-4738348-A1 - AUDIO DATA PROCESSING METHOD AND APPARATUS, CHIP, AND ELECTRONIC DEVICE

EP4738348A1EP 4738348 A1EP4738348 A1EP 4738348A1EP-4738348-A1

Abstract

An audio data processing method and apparatus (600), a chip, and an electronic device (800), relating to the technical field of data processing. The method comprises: obtaining a data matrix of audio data, wherein the data matrix is an MDF'T data matrix corresponding to the audio data in an AVS encoder (101); determining a target parameter value of an exponentiation parameter corresponding to each column vector in the data matrix (102); on the basis of the target parameter value, performing power transformation on each column vector in the data matrix to obtain a power-transformed matrix corresponding to the data matrix, wherein the power transformation is used for concentrating the variance of the data matrix in the direction of a principal component, and increasing the feature percentage of the principal component in the data matrix (103); and using a PCA dimension reduction algorithm to perform data dimension reduction processing on the power-transformed matrix to obtain a data dimension reduction result for the audio data (104). The accuracy of PCA module data dimension reduction in the AVS encoder can be improved.

Inventors

WANG, Di
WANG, BIN
LIU, YONG

Assignees

Beijing Xiaomi Mobile Software Co., Ltd.

Dates

Publication Date: 20260506
Application Date: 20230630

Claims (11)

A method for processing audio data, comprising: obtaining a data matrix of the audio data, wherein the data matrix is a modulated discrete fourier transform (MDFT) data matrix corresponding to the audio data in an audio video coding standard (AVS) encoder; determining a target parameter value of a power operation parameter corresponding to each column vector in the data matrix; obtaining a power transformation matrix corresponding to the data matrix by performing a power transformation on each column vector in the data matrix based on the target parameter value, wherein the power transformation is used to concentrate a variance of the data matrix in a direction of principal components to enhance a feature proportion of the principal components in the data matrix; and obtaining a data dimensionality reduction result of the audio data by performing data dimensionality reduction on the power transformation matrix using a principal component analysis (PCA) dimensionality reduction algorithm.
The method of claim 1, wherein determining the target parameter value of the power operation parameter corresponding to each column vector in the data matrix comprises: determining a power operation parameter vector λ corresponding to the data matrix, wherein the power operation parameter vector λ comprises the power operation parameter of the each column vector in the data matrix, and different column vectors correspond to different power operation parameters; determining an initial power function matrix corresponding to the data matrix based on an initial power function equation and the power operation parameter vector λ ; determining a first covariance matrix of the initial power function matrix; obtaining a plurality of first eigenvalues corresponding to a plurality of column vectors by performing eigenvalue decomposition on the first covariance matrix, wherein the plurality of first eigenvalues comprise the power operation parameter vector λ , wherein the first eigenvalue is configured to represent a variation relationship between eigenvector variance distribution of the data matrix and the power operation parameter vector λ in the power transformation; and determining a PCA performance indicator based on the plurality of first eigenvalues, and determining the target parameter value of the power operation parameter corresponding to each column vector according to the PCA performance indicator.
The method of claim 2, wherein the initial power function equation is characterized by : y λ = y λ , λ ! = 0 ln y + 1 , λ = 0 , where y represents a column element in a column vector, | y (λ) | represents an absolute value of a power operation result corresponding to the column element, a sign of y λ is the same as a sign of y, and λ represents the power operation parameter vector.
The method of claim 2, wherein determining the PCA performance indicator based on the plurality of first eigenvalues, and determining the target parameter value of the power operation parameter corresponding to each column vector according to the PCA performance indicator comprises: repeatedly performing a following determination process until it is determined that the PCA performance indicator reaches a maximum value, and extracting the target parameter value of the power operation parameter corresponding to each column vector when the PCA performance indicator reaches the maximum value: determining the PCA performance indicator by inputting the power operation parameter vector into a PCA performance indicator estimation equation and selecting a preset number of first eigenvalues among the plurality of first eigenvalues and inputting the preset number of first eigenvalues into the PCA performance indicator estimation equation.
The method of claim 4, wherein the PCA performance indicator estimation equation is characterized by : L λ = eval 1 λ + eval 2 λ + ⋯ + evalk λ evalsum λ λ = λ 1 , λ 2 , λ 3 , ⋯ , λm where L λ represents the PCA performance indicator, eval 1 λ , eval 1 λ , ··· , evalk λ represent the preset number of first eigenvalues among the plurality of first eigenvalues, the preset number is a target dimension of the data dimensionality reduction, evalsum represents a sum of absolute values of the plurality of first eigenvalues, λ = { λ 1 , λ 2 , λ 3··· λm } represents the power operation parameter vector, λ 1, λ 2, λ 3..., λm represent the power operation parameter corresponding to each column vector in the power operation parameter vector, and m represents a number of column vectors.
The method of claim 2, wherein obtaining the power transformation matrix corresponding to the data matrix by performing the power transformation on each column vector in the data matrix based on the target parameter value comprises: determining a power operation result corresponding to each column element by substituting the target parameter value of the power operation parameter corresponding to each column vector into the initial power function equation; and obtaining the power transformation matrix corresponding to the data matrix after determining the power operation result for each column element.
The method of claim 1, wherein obtaining the data dimensionality reduction result of the audio data by performing the data dimensionality reduction on the power transformation matrix using the PCA dimensionality reduction algorithm comprises: performing standardization processing on the power transformation matrix, so that each column vector in the power transformation matrix has a mean of 0 and a variance of 1; determining a second covariance matrix between column vectors based on a standardized power transformation matrix; obtaining a plurality of second eigenvalues corresponding to a plurality of column vectors and a plurality of second eigenvectors corresponding to the plurality of second eigenvalues by performing eigenvalue decomposition on the second covariance matrix, wherein a second eigenvalue is configured to represent a variance of a second eigenvector of the data matrix after the power transformation; forming a principal component matrix by selecting a preset number of second eigenvectors from the plurality of second eigenvectors according to a descending order of the second eigenvalues; and obtaining the data dimensionality reduction result of the audio data by projecting the audio data onto the principal component matrix.
An apparatus for processing audio data, comprising: an obtaining module, configured to obtain a data matrix of the audio data, wherein the data matrix is a modulated discrete fourier transform (MDFT) data matrix corresponding to the audio data in an audio video coding standard (AVS) encoder; a determining module, configured to determine a target parameter value of a power operation parameter corresponding to each column vector in the data matrix; a power transformation module, configured to obtain a power transformation matrix corresponding to the data matrix by performing a power transformation on each column vector in the data matrix based on the target parameter value, wherein the power transformation is used to concentrate a variance of the data matrix in a direction of principal components to enhance a feature proportion of the principal components in the data matrix; and a processing module, configured to obtain a data dimensionality reduction result of the audio data by performing data dimensionality reduction on the power transformation matrix using a principal component analysis (PCA) dimensionality reduction algorithm.
An electronic device, comprising: at least one processor; and a memory communicatively connected to the at least one processor, wherein the memory stores instructions executable by the at least one processor that, when executed by the at least one processor, cause the at least one processor to implement the method of any one of claims 1 to 7.
A non-transitory computer-readable storage medium for storing computer instructions, wherein the computer instructions are configured to cause a computer to implement the method of any one of claims 1 to 7.
A chip, comprising: one or more interface circuits; and one or more processors, wherein the interface circuits are configured to receive a signal from a memory of an electronic device and send the signal to the processors, and the signal comprises computer instructions stored in the memory that, when executed by the processor, cause the electronic device to implement the method of any one of claims 1 to 7.

Description

TECHNICAL FIELD The present disclosure relates to a field of data processing technology, and in particular, to a method and an apparatus for processing audio data, a chip, and an electronic device. BACKGROUND With popularization of audio services, a new generation audio video coding standard (AVS) encoder is widely applied. In an encoding process of audio data by the AVS encoder, a principal component analysis (PCA) data dimensionality reduction algorithm may be used to perform dimensionality reduction processing on a modulated discrete fourier transform (MDFT) data matrix of the audio data obtained after MDFT transformation, mapping the audio data from a high-dimensional space to a low-dimensional space, which may help extract key information in the audio data and compress a data size of the audio data. However, when the PCA dimensionality reduction algorithm selects principal components, an interpretation degree of a variance is mainly considered. In some cases, principal component features with a small variance but significant importance may be ignored, thus leading to lower accuracy of data dimensionality reduction by a PCA module in the AVS encoder. SUMMARY The present disclosure provides a method and an apparatus for processing audio data, a chip, and an electronic device, which may, via power transformation processing, enable a PCA module in an AVS encoder to select a principal component matrix with a larger variance proportion when selecting principal components, thus solving a technical problem of lower accuracy of data dimensionality reduction by the PCA module in the AVS encoder. According to a first aspect of the embodiments of the present disclosure, a method for processing audio data is provided, including: obtaining a data matrix of the audio data, in which the data matrix is an MDFT data matrix corresponding to the audio data in an AVS encoder; determining a target parameter value of power operation parameter corresponding to each column vector in the data matrix; obtaining a power transformation matrix corresponding to the data matrix by performing a power transformation on the each column vector in the data matrix based on the target parameter value, in which the power transformation is used to concentrate a variance of the data matrix in a direction of principal components to enhance a feature proportion of the principal components in the data matrix; and obtaining a data dimensionality reduction result of the audio data by performing data dimensionality reduction on the power transformation matrix using a PCA dimensionality reduction algorithm. According to a second aspect of the embodiments of the present disclosure, an apparatus for processing audio data is provided, including: an obtaining module, configured to obtain a data matrix of the audio data, in which the data matrix is a modulated discrete fourier transform (MDFT) data matrix corresponding to the audio data in an AVS encoder; a determining module, configured to determine a target parameter value of a power operation parameter corresponding to each column vector in the data matrix; a power transformation module, configured to obtain a power transformation matrix corresponding to the data matrix by performing a power transformation on the each column vector in the data matrix based on the target parameter value, in which the power transformation is used to concentrate a variance of the data matrix in a direction of principal components to enhance a feature proportion of the principal components in the data matrix; and a processing module, configured to obtain a data dimensionality reduction result of the audio data by performing data dimensionality reduction on the power transformation matrix using a PCA dimensionality reduction algorithm. According to a third aspect of the embodiments of the present disclosure, an electronic device is provided, including: at least one processor; and a memory communicatively connected to the at least one processor, in which the memory stores instructions executable by the at least one processor that, when executed by the at least one processor, cause the at least one processor to implement the method described in the first aspect of the embodiments of the present disclosure. According to a fourth aspect of the embodiments of the present disclosure, a non-transitory computer-readable storage medium for storing computer instructions is provided, in which the computer instructions are configured to cause a computer to implement the method described in the first aspect of the embodiments of the present disclosure. According to a fifth aspect of the embodiments of the present disclosure, a chip is provided, including: one or more interface circuits; and one or more processors, in which the interface circuits are configured to receive a signal from a memory of an electronic device and send the signal to the processors, and the signal includes computer instructions stored in the memory that, when exec