US-12626723-B2 - System and method of determining auditory context information

US12626723B2US 12626723 B2US12626723 B2US 12626723B2US-12626723-B2

Abstract

A device includes a memory configured to store a captured audio input signal and one or more processors configured to process the captured audio input signal to determine auditory context information within the captured audio input signal. The one or more processors are configured to determine an audio quality enhancement level to be applied to the captured audio input signal based on the determined auditory context information, and perform audio quality enhancement on the captured audio input signal based on the determined audio quality enhancement level, wherein the audio quality enhancement level is dynamically adjusted during the storing of the captured audio input signal according to the determined auditory context information.

Inventors

Te-Won Lee
Khaled Helmi El-Maleh
Heejong Yoo
Jongwon Shin

Assignees

QUALCOMM INCORPORATED

Dates

Publication Date: 20260512
Application Date: 20210511

Claims (20)

1 . A device comprising: a first processor configured to: process, while a second processor, in the device, is in an idle mode, a captured audio input signal, based on the characteristics of an audio input signal, to determine auditory context information is a keyword within the captured audio input signal; the second processor configured to: transition from the idle mode to an active audio logging mode in response to the keyword detected, by the first processor; determine additional auditory contextual information in the captured audio signal after the keyword is detected; determine an audio quality enhancement level to be applied to the captured audio input signal based on determined additional auditory context information after the keyword is detected; dynamically adjust the audio quality enhancement level in accordance to the determined additional auditory context information; and perform audio quality enhancement on the captured audio input signal is based on the dynamically adjusted audio quality enhancement level.
2 . The device of claim 1 , wherein the determined identified additional auditory context information after they keyword is detected comprises an activity of a user of the device.
3 . The device of claim 1 , wherein the determined identified additional auditory context information after the keyword is detected comprises an environment of a user of the device.
4 . The device of claim 1 , wherein the determined identified additional auditory context information after the keyword is detected comprises an activity of a user of the device and an environment of a user of the device.
5 . The device of claim 2 , wherein the activity of the user of the device is determined based on: i) speech parameters, ii) music signal parameters, or iii) both speech parameters and music signal parameters.
6 . The device of claim 2 , wherein the determined identified additional auditory context information after the keyword is detected comprises that the activity of the user of the device is running or walking.
7 . The device of claim 3 , wherein the determined identified additional auditory context information after the keyword is detected comprises that the environment of a user of the device is an office, car, restaurant, subway, or ball park.
8 . The device of claim 1 , wherein the dynamically adjusted audio quality enhancement level is dynamically adjusted from a lower level of audio quality enhancement to a higher level of audio quality enhancement in response to the determined additional auditory context information after the keyword is detected.
9 . The device of claim 1 , wherein the dynamically adjusted audio quality enhancement level, after the keyword is detected, is based on the indicated change in level of background noise.
10 . The device of claim 1 , comprising a plurality of microphones including at least a first microphone and a second microphone, the device configured to capture the audio input signal with one or both of the first and second microphones and wherein a number of active microphones is dynamically adjusted in accordance to the determined additional auditory context information after the keyword is detected.
11 . The device of claim 10 , wherein the number of active microphones is dynamically adjusted in accordance to the determined additional auditory context information comprises an increase in the number of active microphones after the keyword is detected.
12 . The device of claim 10 , wherein the number of active microphones is dynamically adjusted in accordance to the determined additional auditory context information comprises a decrease in the number of active microphones after they keyword is detected.
13 . The device claim 1 , wherein the second processor is configured to perform the audio quality enhancement on the captured audio input signal after the keyword is detected based on one of a plurality of audio quality enhancement levels including a no-enhancement level, a low-enhancement level, and a high-enhancement level.
14 . The device of claim 1 , wherein the second processor is configured to dynamically select audio compression parameters for compression of the captured audio input signal in accordance with the determined additional auditory context information after the keyword is detected.
15 . The device of claim 14 , wherein the second processor is configured to dynamically adjust audio compression parameters compression level between a plurality of compression levels.
16 . The device of claim 1 , wherein the second processor is configured to dynamically adjust a coding format for the captured audio input signal in accordance with the determined additional auditory context information.
17 . The device of claim 1 , wherein the step of perform audio quality enhancement on the captured audio input signal includes one of: i) acoustic echo cancellation, ii) active noise cancellation, iii) noise suppression, iv) acoustic gain control, v) acoustic volume control, and vi) acoustic dynamic range control.
18 . The device method of claim 1 , wherein the determined additional auditory context information after the keyword is detected indicates that a background noise of the captured audio input signal has changed from a stationary type to a non-stationary type.
19 . The device of claim 1 , wherein the determined additional auditory context information after the keyword is detected indicates that a background noise level has changed.
20 . The device of claim 1 , wherein the second processor is configured to log, to a buffer in a memory, the additional auditory contextual information after the keyword is detected.

Description

RELATED APPLICATIONS This application is a continuation of, and a claims priority to U.S. Non Provisional application Ser. No. 14/802,088, filed Jul. 17, 2015 entitled “SYSTEM AND METHOD OF SMART AUDIO LOGGING FOR MOBILE DEVICES”, which claims benefit of priority is made to U.S. Non Provisional application Ser. No. 13/076,242, filed Mar. 30, 2011 entitled “SYSTEM AND METHOD OF SMART AUDIO LOGGING FOR MOBILE DEVICES”, which claims benefit of priority is made to U.S. Provisional Application No. 61/322,176 entitled “SMART AUDIO LOGGING” filed Apr. 8, 2010, and assigned to the assignee hereof and hereby expressly incorporated by reference herein. BACKGROUND I. Field The present disclosure generally relates to audio and speech signal capturing. More specifically, the disclosure relates to mobile devices capable of initiating and/or terminating audio and speech signal capturing operations, or interchangeably logging operation, based on the analysis of audio context information. II. Description of Related Art Thanks to the power control technology advance in Application Specific Integrated Circuits (ASIC) and increased computational power of mobile processors such as Digital Signal Processor (DSP) or microprocessors, an increasing number of mobile devices are now capable of enabling much more complex features which were not regarded as feasible until recently due to the lack of required computational power or hardware (HW) support. For example, mobile stations (MS) or mobile phones were initially developed to enable voice or speech communication over traditional circuit-based wireless cellular networks. Thus, MS was originally designed to address fundamental voice applications like voice compression, acoustic echo cancellation (AEC), noise suppression (NS), and voice recording. The process of implementing a voice compression algorithm is known as vocoding and the implementing apparatus is known as a vocoder or “speech coder.” Several standardized vocoding algorithms exist in support of the different digital communication systems which require speech communication. The 3rd Generation Partnership Project 2 (3GPP2) is an example standardization organization which specifies Code Division Multiple Access (CDMA) technology such as IS-95, CDMA2000 1×Radio Transmission Technology (1×RTT), and CDMA2000 Evolution-Data Optimized (EV-DO) communication systems. The 3rd Generation Partnership Project (3GPP) is another example standardization organization which specifies the Global System for Mobile Communications (GSM), Universal Mobile Telecommunications System (UMTS), High-Speed Downlink Packet Access (HSDPA), High-Speed Uplink Packet Access (HSUPA), High-Speed Packet Access Evolution (HSPA+), and Long Term Evolution (LTE). The Voice over Internet Protocol (VOIP) is an example protocol used in the communication systems defined in 3GPP and 3GPP2, as well as others. Examples of vocoders employed in such communication systems and protocols include International Telecommunications Union (ITU)-T G.729, Adaptive Multi-Rate (AMR) codec, and Enhanced Variable Rate Codec (EVRC) speech service options 3, 68, and 70. Voice recording is an application to record human voice. Voice recording is often referred to as voice logging or voice memory interchangeably. Voice recording allows users to save some portion of a speech signal picked up by one or more microphones into a memory space. The saved voice recording can be played later in the same device or it can be transmitted to a different device through a voice communication system. Although voice recorders can record some music signals, the quality of recorded music is typically not superb because the voice recorder is optimized for speech characteristics uttered by a human vocal tract. Audio recording or audio logging is sometimes used interchangeably with voice recording but it is sometimes understood as a different application to record any audible sound including human voice, instruments and music because of its ability to capture higher frequency signals than that generated by the human vocal tract. In the context of the present application, “audio logging” or “audio recording” terminology will be broadly used to refer to voice recording or audio recording. Audio logging enables recording of all or some portions of an audio signal of interest which are typically picked up by one or more microphones in one or more mobile devices. Audio logging is sometimes referred to as audio recording or audio memo interchangeably. SUMMARY This document describes a device configured to process a digital audio signal for a device. The device includes a memory configured to store a captured audio input signal and one or more processors configured to process the captured audio signal to determine auditory context information within the captured audio input signal. The one or more processors are configured to determine an audio quality enhancement level to be applied to the captured audio input signal b