CN-122027940-A - Earphone dialogue method and system based on reinforcement learning

CN122027940ACN 122027940 ACN122027940 ACN 122027940ACN-122027940-A

Abstract

The invention provides a headset dialogue method and a headset dialogue system based on reinforcement learning, which comprise the steps of obtaining an initial test signal, carrying out compensation pretreatment on the initial test signal to obtain a compensation voice signal, obtaining an original environment sound signal, carrying out dynamic sound scene reconstruction on the original environment sound signal to obtain a sound scene reconstruction audio signal, obtaining user feedback at a device end, adjusting processing parameters in S1 and S2 through a local strategy network, simultaneously obtaining a local strategy gene and uploading the local strategy gene to a cloud, obtaining decision optimization parameters based on the local strategy gene by adopting an evolutionary algorithm, and transmitting the decision optimization parameters to the device end to update the local strategy network.

Inventors

OuYang Shikui

Assignees

东莞市欧木科技有限公司

Dates

Publication Date: 20260512
Application Date: 20260204

Claims (9)

1. A method of earphone conversation based on reinforcement learning, the method comprising: S1, acquiring an initial test signal, and performing compensation pretreatment on the initial test signal to acquire a compensation voice signal; s2, acquiring an original environment sound signal, and carrying out dynamic sound scene reconstruction on the original environment sound signal to acquire a sound scene reconstruction audio signal; And S3, at the equipment end, acquiring user feedback, adjusting the processing parameters in the S1 and the S2 through the local strategy network, acquiring the local strategy gene, uploading the local strategy gene to the cloud end, acquiring decision optimization parameters by adopting an evolutionary algorithm based on the local strategy gene, and transmitting the decision optimization parameters to the equipment end to update the local strategy network.
2. The method of earphone dialogue based on reinforcement learning according to claim 1, wherein step S1 comprises: Acquiring an initial test signal based on a trigger signal, wherein the initial test signal comprises an inward test signal and an outward test signal, and performing distortion characteristic coding on the initial test signal to acquire an acoustic feature vector; Acquiring an inward voice signal, extracting corresponding voice shallow features, and extracting coefficients of an acoustic feature vector and the voice shallow features by using a filter coefficient network to acquire a real-time filter coefficient, wherein parameters of the filter coefficient network are dynamically adjusted based on action data of a local strategy network in the step S3; and applying the real-time filter coefficient to a filter algorithm, and processing the inward voice signal based on the filter algorithm to obtain a compensation voice signal.
3. The method of earphone dialogue based on reinforcement learning according to claim 1, wherein step S2 comprises: Acquiring an original environment sound signal and a media audio signal, and acquiring a scene tag and a corresponding spatial diffusion field sample based on the original environment sound signal; Performing sound source separation on the original environment sound signal to obtain target sound and base noise; performing style migration on the base noise to generate comfortable background sound; Based on the head posture data and the target sound, performing spatial processing through a head related transfer function to obtain a spatial target sound; Mixing the media audio signal, the spatialization target sound, the comfort background sound and the spatial diffusion field sample based on user preference parameters to generate a sound scene reconstruction audio signal, wherein the user preference parameters are dynamically adjusted based on action data of the local strategy network in the step S3.
4. A method of earphone conversation based on reinforcement learning according to claim 3, wherein the spatially processing by the head related transfer function based on the head pose data and the target sound to obtain the spatially target sound comprises: Acquiring a target sound space orientation based on the scene tag and the original environment sound signal; Acquiring head posture data, and calculating real-time space orientation of the target sound relative to the head based on the head posture data; Based on the real-time spatial orientation, acquiring corresponding binaural filters from a head related transfer function database; the target sound is processed using binaural filters to generate a spatialized target sound.
5. The method of earphone dialogue based on reinforcement learning according to claim 1, wherein step S3 comprises: At the equipment end, acquiring a state vector, inputting the state vector into a local strategy network to acquire action data, executing the action data, acquiring user feedback, generating a local strategy gene based on the state vector, the local strategy network and the user feedback, and uploading the local strategy gene to the cloud; at the cloud, constructing and updating strategy populations based on local strategy genes from a plurality of equipment ends, and optimizing the strategy populations by adopting an evolutionary algorithm to generate decision optimization parameters; And transmitting the decision optimization parameters to the equipment end to update the local strategy network.
6. The method for earphone dialogue based on reinforcement learning according to claim 5, wherein, at the device side, acquiring the state vector, inputting the state vector into the local policy network, acquiring the action data, executing the action data, acquiring the user feedback, generating the local policy gene based on the state vector, the local policy network and the user feedback, and uploading the local policy gene to the cloud, comprising: the state vector comprises an acoustic feature vector, a scene tag, a real-time filter coefficient and a user preference parameter; Inputting the state vector into a local strategy network, and acquiring action data, wherein the action data comprises parameters of a filter coefficient network and adjustment instructions of user preference parameters; executing the action data, updating parameters of the filter coefficient network in the step S1, and simultaneously adjusting user preference parameters in the step S2; and processing subsequent audio signals based on the updated parameters of the filter coefficient network and the user preference parameters, and collecting user feedback.
7. The method of claim 5, wherein constructing and updating a strategy population based on local strategy genes from a plurality of equipment ends at the cloud end, optimizing the strategy population by using an evolutionary algorithm, and generating decision optimization parameters, wherein the method comprises: Maintaining a world model at the cloud, wherein the world model is trained based on state vectors, action data and user feedback extracted from local strategy genes and is used for acquiring expected user feedback according to the input state vectors and the action data; Based on the world model, cloud fitness is obtained for each individual in the strategy population, quantified through expected user feedback obtained by the individual executing multi-step interaction in the world model, and all the individuals are ranked based on the cloud fitness and divided into a plurality of groups; In each group, obtaining low-fitness individuals, and performing imagination training on the low-fitness individuals so as to optimize corresponding strategy network parameters; after the optimization in the family group is completed, all individuals are remixed and divided into the family group, and multiple rounds of iterative optimization are carried out; And obtaining elite individuals from the strategy population subjected to the multi-round iterative optimization, taking the elite individuals as teacher networks, training a light student network through a knowledge distillation technology, and issuing the light student network as decision optimization parameters to the equipment end.
8. The reinforcement learning based earpiece conversation method of claim 7 wherein imagining training of a low fitness individual to optimize corresponding strategic network parameters includes: Initializing a corresponding stream agent for a low fitness individual, using a corresponding individual strategy network as an actor network of the stream agent, and initializing a deduction starting state based on a state space of the world model; Simulating the actor network to interact from the deduction starting state through the world model, predicting expected user feedback of future multi-step, and calculating feedback rewards; Based on calculating the feedback rewards, parameters of the corresponding actor networks are optimized by gradient descent to maximize the cumulative predicted rewards.
9. A reinforcement learning based earphone dialogue system applied to the method of any one of claims 1 to 8, comprising: the system comprises an ear canal acoustic compensation unit, a sound source unit and a sound source unit, wherein the ear canal acoustic compensation unit is used for acquiring an initial test signal, and performing compensation pretreatment on the initial test signal to acquire a compensation voice signal; The sound scene reconstruction unit is used for acquiring an original environment sound signal and carrying out dynamic sound scene reconstruction on the original environment sound signal so as to acquire a sound scene reconstruction audio signal; The device-side strategy management unit is used for acquiring user feedback on the device side, adjusting processing parameters in the ear canal acoustic compensation unit and the sound scene reconstruction unit through the local strategy network, generating a local strategy gene based on the state vector, the local strategy network and the user feedback, and uploading the local strategy gene to the cloud strategy evolution unit; The cloud policy evolution unit is used for receiving local policy genes of a plurality of equipment ends, constructing and updating policy populations based on the local policy genes, optimizing the policy populations by adopting an evolutionary algorithm, generating decision optimization parameters, and issuing the decision optimization parameters to the equipment ends to update the local policy network.

Description

Earphone dialogue method and system based on reinforcement learning Technical Field The invention relates to the technical field of audio processing, in particular to a headset dialogue method and system based on reinforcement learning. Background In recent years, noise reduction headphones have greatly improved the audio experience of people in noisy environments, however, users face a double dilemma when enjoying the calm of noise reduction for clear calls. First, when a user wears an in-ear earphone to make a call, the ear canal is physically closed, which results in a clunky, unnatural "can-sound" of the user's own speech. Such distorted self-audible feedback may force the user to involuntarily improve the voice or change the way of speaking, be extremely fatiguing, and may affect the quality of speech delivered to the counterpart. Conventional schemes typically ignore this. Another problem is that in an absolute rest state where the ambient sound is completely isolated, the user feels psychological discomfort and anxiety, and for a long time in such isolation, the user loses the basic perception of the surrounding environment, creating an isolated feeling and potential uneasiness, which is contrary to the instinct that people always keep a lot of ambient awareness when talking in natural environment, the "pass-through mode" provided by the prior art can introduce ambient sound, but it simply amplifies the whole background noise, and seriously interferes with speech clarity when talking, which is not an all-inclusive strategy. The noise reduction function enables the user to generate hearing distortion to the sound emitted by the user, and meanwhile, the noise reduction function enables the user to fall into isolated sense for a long time, so that how to design a scheme capable of dynamically solving the physical acoustic and psychological acoustic double dilemma becomes a technical problem to be solved urgently in the field. Disclosure of Invention In order to solve the technical problems, the invention provides a headset dialogue method and system based on reinforcement learning. The first aspect of the present invention provides an earphone dialogue method based on reinforcement learning, comprising: S1, acquiring an initial test signal, and performing compensation pretreatment on the initial test signal to acquire a compensation voice signal; s2, acquiring an original environment sound signal, and carrying out dynamic sound scene reconstruction on the original environment sound signal to acquire a sound scene reconstruction audio signal; And S3, at the equipment end, acquiring user feedback, adjusting the processing parameters in the S1 and the S2 through the local strategy network, acquiring the local strategy gene, uploading the local strategy gene to the cloud end, acquiring decision optimization parameters by adopting an evolutionary algorithm based on the local strategy gene, and transmitting the decision optimization parameters to the equipment end to update the local strategy network. As a preferred technical solution, S1 includes: Acquiring an initial test signal based on a trigger signal, wherein the initial test signal comprises an inward test signal and an outward test signal, and performing distortion characteristic coding on the initial test signal to acquire an acoustic feature vector; Acquiring an inward voice signal, extracting corresponding voice shallow features, and extracting coefficients of an acoustic feature vector and the voice shallow features by using a filter coefficient network to acquire a real-time filter coefficient, wherein parameters of the filter coefficient network are dynamically adjusted based on action data of a local strategy network in the step S3; and applying the real-time filter coefficient to a filter algorithm, and processing the inward voice signal based on the filter algorithm to obtain a compensation voice signal. As a preferred technical solution, S2 includes: Acquiring an original environment sound signal and a media audio signal, and acquiring a scene tag and a corresponding spatial diffusion field sample based on the original environment sound signal; Performing sound source separation on the original environment sound signal to obtain target sound and base noise; performing style migration on the base noise to generate comfortable background sound; Based on the head posture data and the target sound, performing spatial processing through a head related transfer function to obtain a spatial target sound; Mixing the media audio signal, the spatialization target sound, the comfort background sound and the spatial diffusion field sample based on user preference parameters to generate a sound scene reconstruction audio signal, wherein the user preference parameters are dynamically adjusted based on action data of the local strategy network in the step S3. As a preferred technical solution, based on the head pose data and the target sound, performing a