US-20260129361-A1 - AI-Based Multi-Band Loudspeaker Control

US20260129361A1US 20260129361 A1US20260129361 A1US 20260129361A1US-20260129361-A1

Abstract

In one embodiment, a method includes filtering an audio signal into multiple frequency bands including a high-frequency band and a low-frequency band; and converting the audio signal in the high-frequency band to a target sound pressure, and converting the audio signal in the low-frequency band to a target speaker displacement. The method further includes predicting, by a trained HF neural network and based on the target sound pressure corresponding to the audio signal in the high-frequency band, a first output voltage for a playing the audio signal by a loudspeaker; predicting, by a trained LF neural network and based on the target speaker displacement corresponding to the audio signal in the low-frequency band, a second output voltage for playing the audio signal by the loudspeaker; and combining the first and second output voltages to obtain a final output voltage for playing the audio signal by the loudspeaker.

Inventors

Yuan Li
Pascal Brunet

Assignees

SAMSUNG ELECTRONICS CO., LTD.

Dates

Publication Date: 20260507
Application Date: 20241106

Claims (20)

1 . A method comprising: filtering an audio signal into a plurality of frequency bands comprising a high-frequency band and a low-frequency band; converting the audio signal in the high-frequency band to a target sound pressure; converting the audio signal in the low-frequency band to a target speaker displacement; predicting, by a trained HF neural network and based on the target sound pressure corresponding to the audio signal in the high-frequency band, a first output voltage for a playing the audio signal by a loudspeaker; predicting, by a trained LF neural network and based on the target speaker displacement corresponding to the audio signal in the low-frequency band, a second output voltage for playing the audio signal by the loudspeaker; and combining the first and second output voltages to obtain a final output voltage for playing the audio signal by the loudspeaker.
2 . The method of claim 1 , wherein the audio signal is filtered into the high-frequency band and the low-frequency band by a crossover filter.
3 . The method of claim 1 , wherein combining the first and second output voltages to obtain a final output voltage comprises summing the first and second output voltages.
4 . The method of claim 1 , wherein the loudspeaker comprises a speaker of a smartphone.
5 . The method of claim 1 , wherein the loudspeaker comprises a speaker of a headphone.
6 . The method of claim 1 , wherein the trained HF neural network is trained to predict ground-truth control voltages from input, recorded sound pressures caused by those ground-truth control voltages.
7 . The method of claim 1 , wherein the trained LF neural network is trained to predict ground-truth control voltages from input, recorded speaker displacements caused by those ground-truth control voltages.
8 . One or more non-transitory computer readable storage media storing instructions that are operable when executed to: filter an audio signal into a plurality of frequency bands comprising a high-frequency band and a low-frequency band; convert the audio signal in the high-frequency band to a target sound pressure; convert the audio signal in the low-frequency band to a target speaker displacement; predict, by a trained HF neural network and based on the target sound pressure corresponding to the audio signal in the high-frequency band, a first output voltage for a playing the audio signal by a loudspeaker; predict, by a trained LF neural network and based on the target speaker displacement corresponding to the audio signal in the low-frequency band, a second output voltage for playing the audio signal by the loudspeaker; and combine the first and second output voltages to obtain a final output voltage for playing the audio signal by the loudspeaker.
9 . The media of claim 8 , wherein combining the first and second output voltages to obtain a final output voltage comprises summing the first and second output voltages.
10 . The media of claim 8 , wherein the loudspeaker comprises a speaker of a smartphone.
11 . The media of claim 8 , wherein the loudspeaker comprises a speaker of a headphone.
12 . The media of claim 8 , wherein the trained HF neural network is trained to predict ground-truth control voltages from input, recorded sound pressures caused by those ground-truth control voltages.
13 . The media of claim 8 , wherein the trained LF neural network is trained to predict ground-truth control voltages from input, recorded speaker displacements caused by those ground-truth control voltages.
14 . A system comprising: one or more non-transitory computer readable storage media storing instructions; and one or more processors coupled to the one or more non-transitory computer readable storage media and operable to execute the instructions to: filter an audio signal into a plurality of frequency bands comprising a high-frequency band and a low-frequency band; convert the audio signal in the high-frequency band to a target sound pressure; convert the audio signal in the low-frequency band to a target speaker displacement; predict, by a trained HF neural network and based on the target sound pressure corresponding to the audio signal in the high-frequency band, a first output voltage for a playing the audio signal by a loudspeaker; predict, by a trained LF neural network and based on the target speaker displacement corresponding to the audio signal in the low-frequency band, a second output voltage for playing the audio signal by the loudspeaker; and combine the first and second output voltages to obtain a final output voltage for playing the audio signal by the loudspeaker.
15 . The system of claim 14 , wherein the audio signal is filtered into the high-frequency band and the low-frequency band by a crossover filter.
16 . The system of claim 14 , wherein combining the first and second output voltages to obtain a final output voltage comprises summing the first and second output voltages.
17 . The system of claim 14 , wherein the loudspeaker comprises a speaker of a smartphone.
18 . The system of claim 14 , wherein the loudspeaker comprises a speaker of a headphone.
19 . The system of claim 14 , wherein the trained HF neural network is trained to predict ground-truth control voltages from input, recorded sound pressures caused by those ground-truth control voltages.
20 . The system of claim 14 , wherein the trained LF neural network is trained to predict ground-truth control voltages from input, recorded speaker displacements caused by those ground-truth control voltages.

Description

TECHNICAL FIELD This application generally relates to AI-based multi-band loudspeaker control. BACKGROUND A loudspeaker converts an electrical audio signal into a corresponding sound. Loudspeakers can be used for playing music, listening to audio content corresponding to video content (e.g., audio of a TV show or a movie), etc. Loudspeakers can include one or more speakers in an entertainment system or one or more speakers integrated into another electronic device (e.g., speakers in a smartphone, tablet, personal computer, wearable device, headphones such as earbuds, etc.). A loudspeaker includes a linear electric motor connected to a diaphragm. The loudspeaker uses voltage to move the diaphragm and thus create acoustic waves that produce sounds. The exact relationship between the sound reproduced and the voltage used to drive the loudspeaker is complex, difficult to model, and is specific to the loudspeaker and its enclosure. Furthermore, that relationship is nonlinear and time-varying, and can be particularly complex for audio that includes a broad spectrum of frequencies, from low bass to high treble. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 illustrates an example method for determining the control voltage of a loudspeaker. FIG. 2 illustrates speaker displacement and sound pressure as a function of frequency. FIG. 3 illustrates an example implementation of the method of FIG. 1. FIG. 4 illustrates an example computing system. DESCRIPTION OF EXAMPLE EMBODIMENTS Actual solutions for nonlinear control of loudspeakers are complex, difficult to implement, and to setup. Their precision is limited due to incomplete physical models of an audio system, and therefore such models do not completely capture the complexity of that system. The parameters of an audio system can be frequency dependent, time-varying, and nonlinear, making them difficult to measure, model, and estimate. This is particularly true for speakers designed to cover a broad spectrum of frequencies, from low bass (e.g., 20 Hz) to high treble (e.g., 20 kHz), although it is also true for loudspeakers that play audio in a narrower frequency range. For example, the elastic properties (e.g., stiffness) of a surround (the flexible material that attaches the speaker diaphragm to the speaker basket) varies non-linearly as a function of the diaphragm's excursion, and the stiffness of the surround affects the sound produced by the diaphragm in response to a control voltage. As another example, the efficiency of a loudspeaker motor (i.e. how well the motor converts input electrical power to mechanical power) also varies non-linearly as a function of the diaphragm's excursion, and a motor's efficiency affects the sound produced by a loudspeaker in response to an input control voltage. As another example, the inductance of the voice coil varies as a function of the input current, and the inductance of the voice coil affects the sound produced by a loudspeaker in response to an input control voltage. These are just a few examples of the complex, non-linear behavior of a real loudspeaker, which makes it difficult to precisely predict the output sound of a real loudspeaker in response to an input voltage. The techniques of this disclosure account for such nonlinearities and other complexities by using parallel neural networks to determine a control voltage for a loudspeaker based on the input audio signal that the speaker will play. FIG. 1 illustrates an example method for determining the control voltage of a loudspeaker. Step 110 of the example method of FIG. 1 includes filtering an audio signal into multiple frequency bands that include (1) a high-frequency band and (2) a low-frequency band. For example, a crossover filter may be used to filter an audio signal u(t) (an input voltage) into multiple frequency bands. In particular embodiments, the frequency bands may be a low-frequency band and a high-frequency band. In particular embodiments, the low frequency band may include frequencies below at least 1 kHz, and the high-frequency band may include frequencies above 700 kHz; however, these values may be specific to the driver used in a particular loudspeaker (e.g. may depend on the driver's size). As described more fully below, the multiple frequency bands may include more than two frequency bands. U.S. Pat. No. 11,356,773 describes an approach to determining loudspeaker control voltage based on inputting speaker displacement values to a trained neural network. However, as illustrated in FIG. 2, displacement of a full-range driver rapidly falls off above a certain frequency threshold, and that threshold depends on the particular parameters of the loudspeaker. For example, curve 210 of FIG. 2 illustrates driver displacement as a function of input frequency. In the example of FIG. 2, at around 800 Hz the output falls to −50 dB, and frequencies above 1 kHz result in a displacement of only about 1 μm/V. Displacement values this small have small signal-to-noise (SNR)