CN-121985283-A - Spatial audio signal generation method and headphone

CN121985283ACN 121985283 ACN121985283 ACN 121985283ACN-121985283-A

Abstract

The invention provides a method for generating a spatial audio signal and a headset, wherein the method comprises the steps of obtaining the audio signal in a game type and a multi-dimensional gesture signal of a user; the method comprises the steps of converting an audio signal into frequency spectrum features, mapping a multi-dimensional gesture signal into a corresponding multi-dimensional feature vector, embedding the multi-dimensional feature vector into the frequency spectrum features to obtain a fusion feature matrix, and performing feature rendering on the fusion feature matrix through a sound field rendering model to generate a spatial audio signal. According to the invention, the audio signal in the game type and the multi-dimensional gesture signal of the user are obtained, and the multi-dimensional feature vector mapped by the multi-dimensional gesture signal is embedded into the frequency spectrum feature of the audio signal, so that the audio signal and the multi-dimensional gesture signal are fused, and the feature rendering is carried out on the fusion feature matrix, so that the delay of sound field switching can be reduced when the head of the user moves, the precision of the azimuth is improved, and the audio requirement of rapid change in the game type is matched.

Inventors

ZHU MING

Assignees

歌尔科技有限公司

Dates

Publication Date: 20260505
Application Date: 20260105

Claims (10)

1. A method of spatial audio signal generation, the method comprising: Acquiring an audio signal in a game type and a multidimensional gesture signal of a user; Converting the audio signal into a frequency spectrum characteristic, and mapping the multi-dimensional attitude signal into a corresponding multi-dimensional feature vector; embedding the multidimensional feature vector into the frequency spectrum feature to obtain a fusion feature matrix; and performing feature rendering on the fusion feature matrix through a sound field rendering model to generate a space audio signal.
2. The method for generating a spatial audio signal according to claim 1, wherein said embedding the multi-dimensional feature vector into the spectral feature to obtain a fusion feature matrix comprises: acquiring the game type of the currently running game; determining a correlation weight between the spectral feature and each dimension feature vector in the multi-dimension feature vector according to the game type; and fusing each dimension feature vector in the multi-dimension feature vector to the frequency spectrum feature according to the correlation weight to obtain the fusion feature matrix.
3. The method for generating a spatial audio signal according to claim 1, wherein the multi-dimensional gesture signal is a three-dimensional gesture parameter signal, and wherein the acquiring the audio signal in the game type and the multi-dimensional gesture signal in the game type comprises: acquiring gesture data of a user through an inertial sensor; Noise reduction is carried out on the gesture data through a Kalman filtering algorithm; Decomposing the gesture data after noise reduction into three-dimensional gesture parameter signals through quaternion conversion; acquiring an initial audio signal within the game type; Pre-emphasis is performed on high frequency components in the initial audio signal to obtain the audio signal.
4. A spatial audio signal generation method according to claim 3 wherein said noise reduction of said pose data by a kalman filter algorithm comprises: acquiring the game type of the currently running game; Adjusting a filter coefficient of the Kalman filtering algorithm according to the game type; and denoising the attitude data by using a Kalman filtering algorithm with the adjusted filter coefficient.
5. The spatial audio signal generation method according to claim 1, wherein the spatial audio signal generation method further comprises: distributing uniform time stamps to each frame of signals of the multi-dimensional gesture signals and each frame of signals in the audio signals through a time stamp alignment mechanism; Taking the signal frequency of the audio signal as a reference, and performing linear interpolation on the multi-dimensional attitude signal after the time stamp distribution by a linear interpolation algorithm; The converting the audio signal into a spectral feature and mapping the multi-dimensional gesture signal into a corresponding multi-dimensional feature vector includes: And converting the audio signal after the time stamp distribution into frequency spectrum characteristics by utilizing short-time Fourier transformation, and mapping the multi-dimensional attitude signal after linear interpolation into a corresponding multi-dimensional characteristic vector.
6. The spatial audio signal generation method according to claim 1, wherein the spatial audio signal generation method further comprises: acquiring an initial sound field rendering model for executing feature rendering; Acquiring stored audio feature templates of different game types; setting a scene embedding layer in the initial sound field rendering model; Embedding a target audio feature template matched with the fusion feature matrix into the initial sound field rendering model by utilizing the scene embedding layer; and performing rendering parameter adjustment through the target audio feature template to obtain the sound field rendering model.
7. The spatial audio signal generation method of claim 6, wherein after the acquiring the initial sound field rendering model for performing feature rendering, further comprising: Adjusting an input layer of the initial sound field rendering model and a decoder according to the fusion feature matrix; And carrying out light weight treatment on the regulated encoder in the initial sound field rendering model to obtain an optimized initial sound field rendering model.
8. The spatial audio signal generation method of claim 7, wherein the adjusting the input layer of the initial sound field rendering model according to the fusion feature matrix and the decoder comprises: Adding an audio feature extraction branch in an input layer of the initial sound field rendering model according to the audio dynamic features in the fusion feature matrix; and adjusting the number of decoding layers in a decoder of the initial sound field rendering model according to the audio characteristic extraction branches to obtain an optimized initial sound field rendering model, wherein the number of decoding layers is the same as the number of the audio characteristic extraction branches.
9. The method for generating a spatial audio signal according to claim 7, wherein said performing a light-weight process on the encoder in the adjusted initial sound field rendering model to obtain an optimized initial sound field rendering model comprises: Configuring a number of first self-attention mechanisms for focusing on frequency correlation of the audio signal and a number of second self-attention mechanisms for focusing on correlation between the audio signal and the multi-dimensional gesture signal within each layer of an encoder of the initial sound field rendering model according to a game type; And reducing the convolution dimension of a feedforward neural network in the encoder after the self-attention mechanism is configured, and obtaining an optimized initial sound field rendering model.
10. A headphone for performing the spatial audio signal generation method of any one of claims 1 to 9; the headset comprises an attitude sensor, an audio input module, a clock module, a processor, a fusion chip and a sounding module; the attitude sensor, the audio input module and the clock module are all connected with the fusion chip, the fusion chip is also connected with the processor, and the processor is also connected with the sounding module; And a sound field rendering model is arranged in the processor.

Description

Spatial audio signal generation method and headphone Technical Field The present invention relates to the field of headphones, and in particular, to a method for generating a spatial audio signal and a headphone. Background The accurate perception of the player to the audio direction in the game directly affects the game experience, especially the First-person shooting game (FPS), the multi-person online tactical competition (Multiplayer Online Battle Arena, MOBA) and other types of games, and the audio direction recognition of footstep sound, explosion sound, teammate voice and other audio is the key basis for the player to judge war situation and make decisions. The existing headphone spatial audio technology mostly adopts fixed sound field rendering or simple head posture mapping, at this time, head posture data is used as independent parameters to be input into a sound field rendering model for rendering, so that delay exists in sound field switching when the head rotates, and the azimuth mapping precision is low, so that the audio requirement of rapid change in the game type cannot be matched. Disclosure of Invention The invention mainly aims to provide a space audio signal generation method and a headset, and aims to solve the problems that when a head rotates, sound field switching is delayed, azimuth mapping accuracy is low, and the audio requirements of rapid change in game types cannot be matched. To achieve the above object, the present invention provides a spatial audio signal generating method, the method comprising: Acquiring an audio signal in a game type and a multidimensional gesture signal of a user; Converting the audio signal into a frequency spectrum characteristic, and mapping the multi-dimensional attitude signal into a corresponding multi-dimensional feature vector; embedding the multidimensional feature vector into the frequency spectrum feature to obtain a fusion feature matrix; and performing feature rendering on the fusion feature matrix through a sound field rendering model to generate a space audio signal. Optionally, the embedding the multi-dimensional feature vector into the spectral feature to obtain a fusion feature matrix includes: acquiring the game type of the currently running game; determining a correlation weight between the spectral feature and each dimension feature vector in the multi-dimension feature vector according to the game type; and fusing each dimension feature vector in the multi-dimension feature vector to the frequency spectrum feature according to the correlation weight to obtain the fusion feature matrix. Optionally, the multi-dimensional gesture signal is a three-dimensional gesture parameter signal, and the acquiring the audio signal in the game type and the multi-dimensional gesture signal in the game type includes: acquiring gesture data of a user through an inertial sensor; Noise reduction is carried out on the gesture data through a Kalman filtering algorithm; Decomposing the gesture data after noise reduction into three-dimensional gesture parameter signals through quaternion conversion; acquiring an initial audio signal within the game type; Pre-emphasis is performed on high frequency components in the initial audio signal to obtain the audio signal. Optionally, the denoising the gesture data by using a kalman filtering algorithm includes: acquiring the game type of the currently running game; Adjusting a filter coefficient of the Kalman filtering algorithm according to the game type; and denoising the attitude data by using a Kalman filtering algorithm with the adjusted filter coefficient. Optionally, the spatial audio signal generating method further includes: distributing uniform time stamps to each frame of signals of the multi-dimensional gesture signals and each frame of signals in the audio signals through a time stamp alignment mechanism; Taking the signal frequency of the audio signal as a reference, and performing linear interpolation on the multi-dimensional attitude signal after the time stamp distribution by a linear interpolation algorithm; The converting the audio signal into a spectral feature and mapping the multi-dimensional gesture signal into a corresponding multi-dimensional feature vector includes: And converting the audio signal after the time stamp distribution into frequency spectrum characteristics by utilizing short-time Fourier transformation, and mapping the multi-dimensional attitude signal after linear interpolation into a corresponding multi-dimensional characteristic vector. Optionally, the spatial audio signal generating method further includes: acquiring an initial sound field rendering model for executing feature rendering; Acquiring stored audio feature templates of different game types; setting a scene embedding layer in the initial sound field rendering model; Embedding a target audio feature template matched with the fusion feature matrix into the initial sound field rendering model by utilizing the scen