US-12621598-B2 - Automatic keyword pass-through system

US12621598B2US 12621598 B2US12621598 B2US 12621598B2US-12621598-B2

Abstract

At least one embodiment is directed to a method for automatically activating ambient sound pass-through in an earphone in response to a detected keyword in the ambient sound field of the earphone user, the steps of the method comprising at least receiving at least one ambient sound microphone (ASM) signal; receiving at least one audio content (AC) signal; and comparing the ASM signal to a keyword and if the ASM signal matches a keyword then an AC gain is created.

Inventors

John Usher

Assignees

ST FAMTECH, LLC

Dates

Publication Date: 20260505
Application Date: 20220429

Claims (18)

1 . A method for modifying audio content and ambient passthrough in response to a keyword comprising: receiving an ear canal microphone (ECM) signal, wherein the ECM is part of an earphone; receiving an ambient sound microphone (ASM) signal, wherein the ASM is part of an earphone; receiving a second ambient sound microphone (ASM2) signal; receiving an audio content input (AC) signal; comparing the ECM signal to the ASM signal to detect a user's voice activity, and if the user's voice activity is detected then analyzing the ASM signal to detect a keyword; verifying the detection of the keyword using the ASM2 signal; increasing a previous ASM gain to generate a new ASM gain if the keyword is detected; reducing a previous AC gain to generate a new AC gain if the keyword is detected; applying the new ASM gain to a modified ASM signal to generate a new modified ASM signal; applying the new AC gain to a modified AC signal to generate a new modified AC signal; mixing the new modified ASM signal and the new modified AC signal to generate a mixed signal; sending the mixed signal to a speaker of the earphone; and maintaining the new AC gain and the new ASM gain following cessation of the user's voice activity until a pre-fade delay has expired, wherein the pre-fade delay is less than 10 seconds, and wherein after the pre-fade delay the new AC gain is set to the previous AC gain and the new AC gain is set to the previous AC gain.
2 . The method according to claim 1 further including: sending the new modified ASM signal to a speaker.
3 . The method according to claim 1 , wherein the new modified AC signal has less volume compared to the previous modified AC signal when emitted from the speaker.
4 . The method according to claim 3 wherein the volume is 0.
5 . The method according to claim 1 wherein the new modified ASM signal has a greater volume compared to the previous modified ASM signal when emitted from the speaker.
6 . The method according to claim 1 wherein the keyword is a word spoken by the user.
7 . The method according to claim 1 , wherein the new AC gain is frequency dependent.
8 . The method according to claim 1 , wherein the new ASM gain is frequency dependent.
9 . The method according to claim 1 , wherein the keyword is composed of more than one word.
10 . The method according to claim 9 , wherein the keyword is a voice command.
11 . The method according to claim 10 , wherein the voice command is to call a phone number.
12 . The method according to claim 11 , wherein the voice command includes the phone number to call.
13 . The method according to claim 12 , further comprising: calling the phone number.
14 . The method according to claim 1 , further comprising: automatically initiating a phone call when a keyword matches a predetermined word or phrase.
15 . The method of claim 1 , further comprising: identifying the keyword if the spectral pattern of the keyword match within a threshold average value (e.g., +/−3dB) of a stored word spectral pattern.
16 . The method of claim 15 , wherein the spectral pattern is a spectrogram.
17 . The method of claim 15 wherein the spectral patterns are patterns limited to a previous determined frequency range.
18 . The method of claim 1 , wherein the keyword is considered as detected when the keyword is uttered by a user wearing the earphone otherwise the keyword is considered as not detected.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application is a continuation in part of and claims priority benefit to U.S. patent application Ser. No. 17/172,065, filed 9 Feb. 2021, which is a continuation of and claims priority benefit to U.S. patent application Ser. No. 16/555,824, filed 29 Aug. 2019, which is a continuation of and claims priority to U.S. patent application Ser. No. 16/168,752, filed 23 Oct. 2018, which is a non-provisional of and claims priority to U.S. Patent Application Ser. No. 62/575,713 filed 23 Oct. 2017, the disclosures of which are herein incorporated by reference in their entirety. FIELD OF THE INVENTION The present invention relates to acoustic keyword detection and passthrough, though not exclusively, devices that can be acoustically controlled or interacted with. BACKGROUND OF THE INVENTION Sound isolating (SI) earphones and headsets are becoming increasingly popular for music listening and voice communication. SI earphones enable the user to hear and experience an incoming audio content signal (be it speech from a phone call or music audio from a music player) clearly in loud ambient noise environments, by attenuating the level of ambient sound in the user ear-canal. The disadvantage of such SI earphones/headsets is that the user is acoustically detached from their local sound environment, and communication with people in their immediate environment is therefore impaired. If a second individual in the SI earphone user's ambient environment wishes to talk with the SI earphone wearer, the second individual must often shout loudly in close proximity to the SI earphone wearer, or otherwise attract the attention of said SI earphone wearer e.g. by being in visual range. Such a process can be time-consuming, dangerous or difficult in critical situations. A need therefore exists for a “hands-free” mode of operation to enable an SI earphone wearer to detect when a second individual in their environment wishes to communicate with them. WO2007085307 describes a system for directing ambient sound through an earphone via non-electronic means via a channel, and using a switch to select whether the channel is open or closed. Application US 2011/0206217 A1 describes a system to electronically direct ambient sound to a loudspeaker in an earphone, and to disable this ambient sound pass-through during a phone call. US 2008/0260180 A1 describes an earphone with an ear-canal microphone and ambient sound microphone to detect user voice activity. U.S. Pat. No. 7,672,845 B2 describes a method and system to monitor speech and detect keywords or phrases in the speech, such as for example, monitored calls in a call center or speakers/presenters using teleprompters. US 2007/0189544 describes a method to detect a characteristic form in an ambient signal and performs a volume reduction of playing media audio signal, for a time delay before checking for a characteristic form again. U.S. Pat. No. 8,150,044 describes adjusting audio sent to a ear canal based on a detected target sound. But the above art does not describe a method to automatically pass-through ambient sound to an SI earphone wearer when a key word is spoken to the SI earphone wearer nor using two microphones to detect a user's voice. BRIEF DESCRIPTION OF THE DRAWINGS Exemplary embodiments of present invention will become more fully understood from the detailed description and the accompanying drawings, wherein: FIG. 1 illustrates audio hardware system; FIG. 2 illustrates a method for mixing ambient sound microphone with audio content; FIG. 3 illustrates a method for keyword detection to adjust audio gain; and FIG. 4 illustrates a method for keyword detection to make a phone call. DETAILED DESCRIPTION OF EXEMPLARY EMBODIMENTS The following description of exemplary embodiment(s) is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses. At least one embodiment is directed to a system for detecting a keyword spoken in a sound environment and alerting a user to the spoken keyword. In one embodiment as an earphone system: an earphone typically occludes the earphone user's ear, reducing the ambient sound level in the user's ear canal. Audio content signal reproduced in the earphone by a loudspeaker, e.g., incoming speech audio or music, further reduces the earphone user's ability to understand, detect or otherwise experience keywords in their environment, e.g., the earphone user's name as vocalized by someone who is in the user's close proximity. At least one ambient sound microphone, e.g., located on the earphone or a mobile computing device, directs ambient sound to a key word analysis system, e.g., an automatic speech recognition system. When the key word analysis system detects a keyword, sound from an ambient sound microphone is directed to the earphone loudspeaker and (optionally) reduces the level of audio content reproduced on the earphone loudspeaker, thereby allowing the earphone wearer to