EP-3834429-B1 - AUDIO DEVICE COMPRISING A BATTERY AND A MICROPHONE ARRAY

EP3834429B1EP 3834429 B1EP3834429 B1EP 3834429B1EP-3834429-B1

Inventors

GANESHKUMAR, ALAGANANDAN
CARRERAS, Ricardo

Dates

Publication Date: 20260506
Application Date: 20190802

Claims (11)

An audio device (212), comprising: - a battery power source (218) configured to power the audio device (212); - a plurality of microphones that are configurable into a microphone array (214) and that are adapted to receive sound from a sound field and create outputs; and - a processing system that is responsive to the outputs of the plurality of microphones and is configured to: - use a signal processing algorithm to detect speech in the outputs, the signal processing algorithm comprising a normally off adaptative beamformer that is configured to use multiple microphone outputs, said adaptative beamformer comprising a plurality of beamformer coefficients; - detect a predefined trigger event in one of the outputs, called single output, indicating a possible change in the sound field, said trigger event comprising an increase in noise in the sound field; and - wake-up, modify the signal processing algorithm upon the detection of the predefined trigger event, said modification comprising determining beamformer coefficients, store new beamformer coefficients and go back to sleep.
The audio device of claim 1, wherein a single microphone of the plurality of microphones is configured to monitor the sound field and create the single output, said single output being provided to a processor of the processing system.
The audio device (212) of claim 2, wherein the processor is a low-power digital sound processor (220) configured to periodically wake up, and to determine the noise level from the single output.
The audio device (212) of claim 3, wherein: - the processing system comprises a beamformer digital sound processor configured to modify the beamformer coefficients and to store them; and - the low-power digital sound processor (220) is configured, upon the detection of an increase of the noise level above the previous level, to: - wake-up the beamformer digital sound processor so that it modifies the beamformer coefficients and store them, - let the beamformer digital sound processor go back to sleep after the storage of the coefficients.
The audio device (212) of claims 1 to 4, wherein the predefined trigger event further comprises the passing of a predetermined amount of time.
The audio device (212) of claim 5, wherein the predefined amount of time is variable.
The audio device (212) of claim 6, wherein a variation in the predefined amount of time is based on a sound field in the past.
The audio device (212) of claims 1 to 4, wherein the sound field is monitored in only select frequencies of the sound field.
The audio device (212) of claim 8, wherein if the noise increases in the select frequencies beamformer coefficients are calculated by the processing system.
The audio device (212) of claims 1 to 4, wherein the predefined trigger event comprises input from a sensor device.
The audio device (212) of claim 10, wherein the sensor device comprises a motion sensor and the input from the motion sensor is interpreted to detect motion of the audio device, wherein motion is based on GPS location.

Description

BACKGROUND This disclosure relates to an audio device with a microphone. Audio devices that use one or more microphones to continuously monitor the sound field for a spoken wakeup word and spoken commands can use signal processing algorithms, such as beamformers, to increase spoken word detection rates in noisy environments. However, beamforming and other complex signal processing algorithms can use substantial amounts of power. For battery-operated audio devices, the resultant battery drain can become a use limitation. WO 2017/029044 A1 discloses a prior art audio device. SUMMARY The invention is defined in appended independent claim 1. The dependent claims thereof define preferred embodiments of the invention. BRIEF DESCRIPTION OF THE DRAWINGS Figure 1 is a schematic block diagram of an audio device with pre-adaptation.Figure 2 is a more detailed block diagram of an audio device with pre-adaptation.Figure 3 is a representation of a user wearing headphones that comprise an audio device with pre-adaptation. DETAILED DESCRIPTION For devices with voice-controlled user interfaces (e.g., to activate a virtual personal assistant (VPA)), the device has to be constantly listening for the proper cue. In some such devices, a special word or phrase, which is sometimes called a "wakeup word," is used to activate the speech-recognition features of the device. The user often speaks command(s) following the wakeup word. In some examples, the present audio device with pre-adaptation utilizes one or more microphones to constantly listen for a wakeup word. The microphones and processors used to detect a wakeup word and spoken commands use power. In battery-operated devices, power use can shorten battery life and thus negatively impact the user experience. However, devices need to accurately detect wakeup words and spoken commands or there will be a degraded user experience, e.g., there may be false positives, where a device thinks a wakeup word or command has been spoken when it has not, or there may be false negatives where a device misses detecting a wakeup word or command that has been spoken. This can be problematic and annoying for the user. An adaptive algorithm, such as an adaptive beamformer, can be used to help detect a wakeup word and/or spoken commands in the presence of noise. Typical adaptive algorithms require a noise-only adaptation period to maximize the extraction of speech from a noisy environment. In noisy environments the optimal adaptation period can be in the range of 0.5 to 1 seconds. During the adaptation period the algorithm calculates updated beamformer filter coefficients that are used by the algorithm in the speech recognition process. Beamformer filter coefficients are well understood by those skilled in the technical field, and so will not be further described herein. In order to adapt and then work well, beamformers require the user to pause after saying the wakeup word (e.g., "OK Google") so that the beamformer can adapt to the current noise conditions. Only after the adaptation should the user then speak a command. The pause should be sufficiently long for the beamformer to adapt. If the beamformer is always running, the adaptation can be run essentially continuously; this allows the beamformer to work well even without an extended pause after the wakeup word. However, in low-power audio devices (e.g., those that run off of batteries), constantly running the beamformer so that it can be adapted and ready to detect voice results in reduced battery life. In order to both maintain battery life and have a well-adapted beamformer, the present disclosure contemplates adapting the beamformer when the environment within the expected sound detection range or sound field of the audio device has changed in some manner such that is possible or likely to require updated beamformer filter coefficients in order for the beamformer to work well. Such prospective beamformer adaptation may be termed "pre-adaptation." An environmental change that may be indicative of a possible change in the sound field (sometimes termed herein a "trigger event") can be detected and used to trigger a beamformer pre-adaptation. The types of trigger events detected are typically but not necessarily predefined. Pre-adaptation of the beamformer allows the beamformer to be normally off, and then turned on and adapted only as necessary, resulting in less power use and thus longer battery life. Pre-adaptation of beamformer filter coefficients will establish coefficients that are closer to the ideal coefficients for whenever the user speaks the wakeup word. Pre-adaptation thus can help the audio device to be better able to detect the wakeup word. Also, any time needed for the system to adapt to current noise conditions should be decreased, resulting in a shorter adaptation period before the system is ready to receive speech signals such as commands. Ideally, any needed adaptation period will be in the range of the normal pause a person