US-12621604-B1 - Reduction of loudspeaker distortion

US12621604B1US 12621604 B1US12621604 B1US 12621604B1US-12621604-B1

Abstract

A system configured to reduce loudspeaker distortion by performing nonlinear signal processing is provided. A device may include preprocessing component(s) that apply nonlinear signal correction prior to sending a playback audio signal to a driver in order to compensate for a nonlinear response of the driver. While the driver response may be nonlinear, a combination of the preprocessing and the nonlinear driver response results in a combined response that is linear and/or compensates for the nonlinear driver response. For example, applying the nonlinear driver response to a processed audio signal may result in output audio generated by the driver accurately reproducing the playback audio signal input to the preprocessing components. To train the preprocessing components to apply the nonlinear signal correction, a deep neural network (DNN) is trained to model the driver response.

Inventors

Guillermo Daniel Garcia
Shobha Devi Kuruba Buchannagari
Carlo Murgia

Assignees

AMAZON TECHNOLOGIES, INC.

Dates

Publication Date: 20260505
Application Date: 20230224

Claims (19)

1 . A computer-implemented method comprising: receiving, by an electronic device, first audio data representing a signal to be output by a loudspeaker of the electronic device; determining first data representing a first amplitude of the first audio data; determining, using the first data, a first model, and first weight values, second audio data, the first model trained using a driver response of the loudspeaker; determining, using the second audio data and a second model, third audio data, the second model corresponding to simulation of output audio generated by the loudspeaker, wherein the third audio data represents the signal and distortion caused by the driver response of the loudspeaker; determining fourth audio data using the first audio data and the third audio data, the fourth audio data representing the distortion; and determining second weight values associated with the first model by minimizing a first difference between the third audio data and the first audio data.
2 . The computer-implemented method of claim 1 , further comprising: receiving, by the electronic device, fifth audio data; determining, using the fifth audio data, the first model, and the second weight values, sixth audio data; and generating, by the loudspeaker, first output audio using the sixth audio data.
3 . The computer-implemented method of claim 1 , further comprising: generating, by the loudspeaker, first output audio using the second audio data; determining second data representing a second amplitude of the first audio data; determining, using the second data, the first model, and the second weight values, fifth audio data; and generating, by the loudspeaker, second output audio using the fifth audio data.
4 . The computer-implemented method of claim 1 , wherein the first weight values and the second audio data correspond to a first range of frequencies, the computer-implemented method further comprising: determining second data representing a second amplitude of the first audio data; determining, using the second data, the first model, and third weight values that are different than the first weight values, fifth audio data, wherein the third weight values and the fifth audio data correspond to a second range of frequencies; determining, using the fifth audio data and the second model, sixth audio data; and determining, using the first audio data and the sixth audio data, fourth weight values associated with the first model, the fourth weight values corresponding to the second range of frequencies.
5 . The computer-implemented method of claim 1 , wherein determining the third audio data further comprises: determining a first portion of the second audio data, the first portion of the second audio data corresponding to a first range of frequencies; determining a first portion of the third audio data using the second model, third weight values, and the first portion of the second audio data, wherein the third weight values correspond to a first portion of the driver response that is associated with the first range of frequencies; determining a second portion of the second audio data, the second portion of the second audio data corresponding to a second range of frequencies; and determining a second portion of the third audio data using the second model, fourth weight values, and the second portion of the second audio data, wherein the fourth weight values correspond to a second portion of the driver response that is associated with the second range of frequencies.
6 . The computer-implemented method of claim 1 , further comprising: determining first loudness data representing first sound pressure levels of the first audio data; determining second loudness data representing second sound pressure levels of the third audio data; determining a first function corresponding to minimizing a second difference between the second loudness data and the first loudness data; determining a second function corresponding to maximizing the second sound pressure levels; and determining a cost function using the first function and the second function, wherein the cost function includes a first association between the first function and a first value and a second association between the second function and a second value and the second weight values are determined using the cost function.
7 . The computer-implemented method of claim 1 , wherein determining the second audio data further comprises: determining, using the first data, second data corresponding to temperature of the loudspeaker; generating, using first parameter values, first gain data by reducing a second amplitude of the second data; generating fifth audio data using the first audio data and the first gain data; and determining the second audio data using the first model, the fifth audio data, and the first weight values, wherein the computer-implemented method further comprises: determining, using the fourth audio data, second parameter values.
8 . The computer-implemented method of claim 7 , further comprising: receiving, by the electronic device, sixth audio data; determining, using the sixth audio data, third data corresponding to the temperature of the loudspeaker; generating, using the second parameter values, second gain data by reducing a third amplitude of the third data; generating seventh audio data using the sixth audio data and the second gain data; determining eighth audio data using the first model, the seventh audio data, and the second weight values; and generating, by the loudspeaker, first output audio using the eighth audio data.
9 . The computer-implemented method of claim 1 , wherein determining the second audio data further comprises: determining, using the first data, second data corresponding to excursion of a membrane of the loudspeaker; generating, using first parameter values, first gain data by reducing a second amplitude of the second data; generating fifth audio data using the first audio data and the first gain data; and determining the second audio data using the first model, the fifth audio data, and the first weight values, wherein the computer-implemented method further comprises: determining, using the fourth audio data, second parameter values.
10 . The computer-implemented method of claim 9 , further comprising: receiving, by the electronic device, sixth audio data; determining, using the sixth audio data, third data corresponding to the excursion of the membrane; generating, using the second parameter values, second gain data by reducing a third amplitude of the third data; generating seventh audio data using the sixth audio data and the second gain data; determining eighth audio data using the first model, the seventh audio data, and the second weight values; and generating, by the loudspeaker, first output audio using the eighth audio data.
11 . A system comprising: at least one processor; and memory including instructions operable to be executed by the at least one processor to cause the system to: receive, by an electronic device, first audio data representing a signal to be output by a loudspeaker of the electronic device; determine, based on a first amplitude of the first audio data, first data corresponding to temperature of the loudspeaker; generate, using a first component and first parameter values, first gain data by reducing a second amplitude of the first data; generate second audio data using the first audio data and the first gain data; determine, based on a third amplitude of the second audio data, second data corresponding to excursion of a membrane of the loudspeaker; generate, using a second component and second parameter values, second gain data by reducing a fourth amplitude of the second data; generate third audio data using the second audio data and the second gain data; determine, using a first model and the third audio data, fourth audio data, the first model corresponding to simulation of output audio generated by the loudspeaker, wherein the fourth audio data represents the signal and distortion caused by a driver response of the loudspeaker; determine fifth audio data using the first audio data and the fourth audio data, the fifth audio data representing the distortion; and determine, using the fifth audio data, third parameter values associated with the first component by minimizing a first difference between the fourth audio data and the first audio data.
12 . The system of claim 11 , wherein the memory further comprises instructions that, when executed by the at least one processor, further cause the system to: determine, using the fifth audio data, fourth parameter values associated with the second component; receive, by the electronic device, sixth audio data; determine seventh audio data using the first component, the sixth audio data, and the third parameter values; determine eighth audio data using the second component, the seventh audio data, and the fourth parameter values; and generate, by the loudspeaker, first output audio using the eighth audio data.
13 . The system of claim 11 , wherein the memory further comprises instructions that, when executed by the at least one processor, further cause the system to: determine, using the fifth audio data, fourth parameter values associated with the second component; generate, by the loudspeaker, first output audio using the third audio data, the first output audio corresponding to a first portion of the first audio data; determine sixth audio data using the first component, a second portion of the first audio data, and the third parameter values; determine seventh audio data using the second component, the sixth audio data, and the fourth parameter values; and generate, by the loudspeaker, second output audio using the seventh audio data.
14 . The system of claim 11 , wherein the memory further comprises instructions that, when executed by the at least one processor, further cause the system to: determine a first portion of the third audio data, the first portion of the third audio data corresponding to a first range of frequencies; determine, using the first model, first values, and the first portion of the third audio data, a first portion of the fourth audio data, wherein the first values correspond to a first portion of a driver response of the loudspeaker; determine a second portion of the third audio data, the second portion of the third audio data corresponding to a second range of frequencies; and determine, using the first model, second values, and the second portion of the third audio data, a second portion of the fourth audio data, wherein the second values correspond to a second portion of the driver response.
15 . The system of claim 11 , wherein the memory further comprises instructions that, when executed by the at least one processor, further cause the system to: determine first loudness data representing first sound pressure levels of the first audio data; determine second loudness data representing second sound pressure levels of the fourth audio data; determine a first function corresponding to minimizing a second difference between the second loudness data and the first loudness data; determine a second function corresponding to maximizing the second sound pressure levels; and determine a cost function using the first function and the second function, wherein the cost function includes a first association between the first function and a first value and a second association between the second function and a second value and the third parameter values are determined using the cost function.
16 . The system of claim 11 , wherein the memory further comprises instructions that, when executed by the at least one processor, further cause the system to: generate sixth audio data using the second audio data and the second gain data; determine, using a second model, the sixth audio data, and first values, the third audio data, the second model trained using a driver response of the loudspeaker; and determine, using the fifth audio data, second values associated with the second model.
17 . The system of claim 16 , wherein the memory further comprises instructions that, when executed by the at least one processor, further cause the system to: determine, using the fifth audio data, fourth parameter values associated with the second component; receive, by the electronic device, sixth audio data; determine seventh audio data using the first component, the sixth audio data, and the third parameter values; determine eighth audio data using the second component, the seventh audio data, and the fourth parameter values; determine ninth audio data using the second model, the eighth audio data, and the second values; and generate, by the loudspeaker, first output audio using the ninth audio data.
18 . The system of claim 11 , wherein the memory further comprises instructions that, when executed by the at least one processor, further cause the system to: determine, using the first amplitude of the first audio data, a first average voltage level associated with a portion of the first audio data; determine, using a thermal model associated with the first component, a first temperature value of the loudspeaker corresponding to the first average voltage level, the first data including the first temperature value; determine, using the third amplitude of the second audio data, a second voltage level associated with a portion of the second audio data; and determine, using an excursion model associated with the second component, a first excursion value of the membrane corresponding to the second voltage level, the second data including the first excursion value.
19 . A computer-implemented method comprising: receiving, by an electronic device, first audio data; determining first data representing a first amplitude of the first audio data; determining, using the first data, a first model, and first weight values, second audio data, the first model trained using a driver response of a loudspeaker of the electronic device; determining, using the second audio data and a second model, third audio data, the second model corresponding to simulation of output audio generated by the loudspeaker; determining fourth audio data using the first audio data and the third audio data; determining, using the fourth audio data, second weight values associated with the first model; receiving, by the electronic device, fifth audio data; determining, using the fifth audio data, the first model, and the second weight values, sixth audio data; and generating, by the loudspeaker, first output audio using the sixth audio data.

Description

CROSS-REFERENCE TO RELATED APPLICATION DATA This application claims priority to U.S. Provisional Patent Application Ser. No. 63/336,541, entitled “Reduction of Loudspeaker Distortion,” filed on Apr. 29, 2022, in the names of Guillermo Daniel Garcia, et al. The above provisional application is herein incorporated by reference in its entirety. BACKGROUND With the advancement of technology, the use and popularity of electronic devices has increased considerably. Electronic devices are commonly used to receive audio data and generate output audio based on the received audio data. Described herein are technological improvements to such systems. BRIEF DESCRIPTION OF DRAWINGS For a more complete understanding of the present disclosure, reference is now made to the following description taken in conjunction with the accompanying drawings. FIG. 1 illustrates a system for reducing loudspeaker distortion by applying nonlinear signal correction according to embodiments of the present disclosure. FIG. 2 illustrates an example component diagram for applying nonlinear signal correction using a neural network according to embodiments of the present disclosure. FIG. 3 illustrates an example component diagram for applying nonlinear signal correction using a thermal compressor and an excursion limiter according to embodiments of the present disclosure. FIG. 4 illustrates an example component diagram for applying nonlinear signal correction using a combination of a thermal compressor, an excursion limiter, and a neural network according to embodiments of the present disclosure. FIGS. 5A-5C illustrate examples of applying nonlinear signal correction while generating output audio according to embodiments of the present disclosure. FIG. 6 illustrates an example component diagram for adaptively applying nonlinear signal correction while generating output audio according to embodiments of the present disclosure. FIGS. 7A-7B illustrate examples of an excursion limiter and a joint voltage-excursion limiter according to embodiments of the present disclosure. FIG. 8 is a block diagram conceptually illustrating example components of a system for managing temperature and excursion effects of a loudspeaker according to embodiments of the present disclosure. DETAILED DESCRIPTION Electronic devices such as smart loudspeakers, cellular telephones, tablets, laptop computers, and other such devices, are becoming smaller and/or more portable. As the sizes of these devices shrink, the sizes of audio-output devices—i.e., loudspeakers—associated with the devices also shrink. As the sizes of the loudspeakers shrink, however, the quality of the audio output by the loudspeakers decreases, especially low-frequency audio output (i.e., bass). The loudspeakers may be constructed using a frame, magnet, voice coil, and diaphragm (e.g., semi-rigid membrane). Electrical current moves through the voice coil, which causes a magnetic force to be applied to the voice coil; this force causes the membrane attached to the voice coil to move in accordance with the electrical current and thereby emit audible sound waves. The movement of the diaphragm is referred to herein as excursion. The membrane may, however, have a maximum excursion that, when reached, causes the sound output to be distorted. In addition, as the current in the loudspeaker flows through the voice coil, some of its energy is converted into heat instead of sound. If the temperature is too large, this heating can damage the voice coil. Equalization, filtering, or similar pre-processing may be used to limit the excursion and/or temperature and thereby prevent or minimize the distortion and/or damage. To protect the loudspeaker, however, across all related factors such as loudspeaker variations, operating conditions, and audio signals, the filtering is conservative such that, under typical conditions, the loudspeaker does not operate at its optimal output. Electronic devices may be used to receive audio data and generate audio corresponding to the audio data. For example, an electronic device may receive audio data from various audio sources (e.g., content providers) and may generate the audio using loudspeakers. The audio data may have large level changes (e.g., large changes in volume) within a song, from one song to another song, between different voices during a conversation, from one content provider to another content provider, and/or the like. For example, a first portion of the audio data may correspond to music and may have a high volume level (e.g., extremely loud volume), whereas a second portion of the audio data may correspond to a talk radio station and may have a second volume level (e.g., quiet volume). These high volume levels may cause excursion beyond an upper limit (i.e., over-excursion) and/or temperature beyond an upper limit, which may cause distortion in the output audio. To improve a user experience and reduce driver distortion, devices, systems and methods are disclosed that p