Search

US-12620407-B2 - Neural modeler of audio systems

US12620407B2US 12620407 B2US12620407 B2US 12620407B2US-12620407-B2

Abstract

A process is provided for training a neural network that digitally models an audio system. A sound source is utilized to electrically couple a test signal into an input of a reference audio system. The output of the reference audio system is collected into an audio interface coupled to a computer. A neural network is then trained using the test signal and the captured information to derive a set of weight vectors with appropriate values such that the overall output of the neural network converges towards an output representative of the reference audio system, and a signal in the time domain from a musical instrument is processed through the trained neural network with a latency under 20 milliseconds. A graphical user interface then outputs a graphical representation of the trained neural network, where the graphical representation visually displays at least one virtual control for interaction by a user.

Inventors

  • Douglas Andres Castro Borquez
  • Eero-Pekka Damskägg
  • Athanasios Gotsopoulos
  • Lauri Juvela
  • Thomas William Sherson

Assignees

  • Neural DSP Technologies Oy

Dates

Publication Date
20260505
Application Date
20221216

Claims (20)

  1. 1 . A process for training a neural network that digitally models an audio system, comprising: utilizing a sound source to electrically couple a test signal into an input of a reference audio system; collecting an output of the reference audio system into an audio interface coupled to a computer to store captured information; training a neural network using the test signal and the captured information to derive a set of weight vectors with appropriate values such that: the overall output of the neural network converges towards an output representative of the reference audio system; and a digital signal representing an analog musical instrument signal is processed through the trained neural network in the time domain with an algorithmic latency under 20 milliseconds; outputting to a graphical user interface, a graphical representation of the trained neural network, the graphical representation visually displaying at least one virtual control; and enabling a user to interact with the virtual control of the graphical representation of the trained neural network via the graphical user interface to define a virtualization of the reference audio system.
  2. 2 . The process of claim 1 , wherein the training is carried out on the computer.
  3. 3 . The process of claim 1 , wherein the training is carried out on a separate, remote computer.
  4. 4 . The process of claim 1 , wherein training the neural network further comprises: setting a stopping condition that determines when training ends by performing at least one of: setting a user-initiated stopping condition; or processing a number of iterations of the training data as the stopping condition.
  5. 5 . The process of claim 1 , wherein training the neural network further comprises: setting a stopping condition that determines when training ends by processing a perceptual loss function where the perceptual loss function serves as an indicator of the stopping condition.
  6. 6 . The process of claim 1 , wherein: outputting to the graphical user interface, the graphical representation of the trained neural network further comprises outputting to the graphical user interface, a graphical representation of an effects processor that is not within a native capability of the reference audio system.
  7. 7 . The process of claim 6 , wherein: outputting to the graphical user interface, the graphical representation of the effects processor that is not within the native capability of the reference audio system comprises outputting to the graphical user interface, a graphical representation of an equalizer for equalization, wherein the equalizer is not part of the reference audio system.
  8. 8 . The process of claim 6 , wherein: outputting to the graphical user interface, the graphical representation of the effects processor that is not within the native capability of the reference audio system comprises outputting to the graphical user interface, a graphical representation of a dynamics processor, wherein the dynamics processor is not part of the reference audio system.
  9. 9 . The process of claim 1 , wherein: outputting to the graphical user interface, the graphical representation of the effects processor that is not within the native capability of the reference audio system comprises outputting to the graphical user interface, a graphical representation of a time-based processor, wherein the time-based processor is not part of the reference audio system.
  10. 10 . A process for training a neural network that digitally models an audio system, comprising: utilizing a sound source to electrically couple a test signal into an input of a reference audio system; collecting an output of the reference audio system into an audio interface coupled to a computer to store captured information; training a neural network using the test signal and the captured information to derive a set of weight vectors with appropriate values such that the overall output of the neural network converges towards an output representative of the reference audio system, and a signal in the time domain from a musical instrument is processed through the trained neural network with a latency under 20 milliseconds; outputting to a graphical user interface, a graphical representation of the trained neural network, the graphical representation visually displaying at least one virtual control; and enabling a user to interact with the virtual control of the graphical representation of the trained neural network via the graphical user interface to define a virtualization of the reference audio system; wherein training the neural network further comprises modeling a non-linear behavior of the reference audio system, modeling a first linear aspect of the reference audio system, and modeling a second linear aspect of the reference audio system.
  11. 11 . The process of claim 1 , wherein: outputting to the graphical user interface, the graphical representation of the trained neural network further comprises outputting to the graphical user interface, a graphical representation of at least one of an emulation of a speaker and an emulation of a speaker cabinet, wherein the emulation is not part of the reference audio system.
  12. 12 . The process of claim 1 , wherein: training the neural network models a non-linear behavior and a linear aspect of the reference audio system.
  13. 13 . The process of claim 1 , wherein: training the neural network further comprises creating a model file that includes sufficient data such that that when read out and processed by a modeling audio system, a functioning model of the reference audio system is realized; wherein: the virtualization comprises a framework that enables the modeling audio system to model the reference audio system by loading the model file into the modeling audio system.
  14. 14 . A process for creating digital audio systems, comprising: utilizing a sound source to electrically couple a test signal into an input of a reference audio system; collecting an output of the reference audio system into an audio interface coupled to a computer to store captured information; training a neural network using the test signal and the captured information to derive a set of weight vectors with appropriate values such that the overall output of the neural network converges towards an output representative of the reference audio system, wherein the training digitally models a non-linear behavior of the reference audio system, models a first linear aspect of the reference audio system, and models a second linear aspect of the reference audio system, the training carried out by repeatedly performing operations comprising: predicting by the neural network, a model output based upon an input, where the output approximates an expected output of the reference audio system; computing an error in the prediction; and adjusting the weight vectors to minimize the computed error; outputting a neural network model file upon training; outputting to a graphical user interface, a graphical representation of the trained neural network in the neural network model file, the graphical representation visually displaying at least one virtual control; and enabling a user to interact with the virtual control of the graphical representation of the trained neural network via the graphical user interface to define a virtualization of the reference audio system.
  15. 15 . The process of claim 14 , wherein predicting by the neural network, the model output comprises carrying out the prediction in the time domain.
  16. 16 . The process of claim 14 , wherein: modeling the non-linear behavior of the reference audio system, modeling the first linear aspect of the reference audio system, and modeling the second linear aspect of the reference audio system are arranged in series, parallel, or a combination thereof.
  17. 17 . The process of claim 16 further comprising modeling a temporal dependency of the reference audio system in addition to, or in lieu of modeling the first linear aspect of the reference audio system.
  18. 18 . The process of claim 14 , wherein the training further comprises: applying a perceptual loss function to the neural network based upon a determined psychoacoustic property, wherein the perceptual loss function is applied in the frequency domain; and adjusting the neural network responsive to the output of the perceptual loss function.
  19. 19 . The process of claim 18 , wherein applying the perceptual loss function to the neural network comprises establishing a loudness threshold of hearing for each of multiple frequency bins such that a signal below the loudness threshold is not optimized further, wherein: for each frequency bin, a loudness threshold is independently set under which a signal is not optimized further in order to optimize further that particular frequency bin.
  20. 20 . The process of claim 18 , wherein applying the perceptual loss function to the neural network comprises: implementing frequency masking such that a frequency component is not further processed if a computed error is below a masking threshold, where the masking threshold is based upon a target signal.

Description

CROSS REFERENCE TO RELATED APPLICATIONS This application is a continuation of U.S. patent application Ser. No. 16/738,512, filed Jan. 9, 2020, now allowed, which claims the benefit of U.S. Provisional Patent Application Ser. No. 62/941,986, filed Nov. 29, 2019, entitled NEURAL MODELER OF AUDIO SYSTEMS, the disclosure of which is hereby incorporated by reference. BACKGROUND Various aspects of the present disclosure relate generally to modeling audio systems, and more specifically to the use of artificial neural networks to model audio systems. Amplifier modeling is a process by which a physical amplifier, e.g., a guitar amplifier, is emulated by a modeling platform. Amplifier modeling can be implemented using analog circuitry, digital signal processing, or a combination thereof. In this regard, amplifier modeling provides a flexible way to emulate a large number of different physical amplifiers using a common platform of hardware and software. BRIEF SUMMARY According to aspects of the present disclosure, a process is provided for training a neural network that digitally models an audio system. The process comprises utilizing a sound source to electrically couple a test signal into an input of a reference audio system. The process also comprises collecting an output of the reference audio system into an audio interface coupled to a computer to store captured information. The process further comprises training a neural network using the test signal and the captured information. The training is carried out to derive a set of weight vectors with appropriate values such that the overall output of the neural network converges towards an output representative of the reference audio system, and such that a signal in the time domain from a musical instrument is processed through the trained neural network with a latency under 20 milliseconds. Additionally, the process comprises outputting to a graphical user interface, a graphical representation of the trained neural network, the graphical representation visually displaying at least one virtual control. Further, the process comprises enabling a user to interact with the virtual control of the graphical representation of the trained neural network via the graphical user interface to define a virtualization of the reference audio system. According to further aspects of the present disclosure, a process is provided for creating digital audio systems. The process comprises utilizing a sound source to electrically couple a test signal into an input of a reference audio system. The process also comprises collecting an output of the reference audio system into an audio interface coupled to a computer to store captured information. Additionally, the process comprises training a neural network using the test signal and the captured information to derive a set of weight vectors with appropriate values such that the overall output of the neural network converges towards an output representative of the reference audio system. Here, the training digitally models a non-linear behavior of the reference audio system, models a first linear aspect of the reference audio system, and models a second linear aspect of the reference audio system. Moreover, the training is carried out by repeatedly performing operations comprising predicting by the neural network, a model output based upon an input, where the output approximates an expected output of the reference audio system, determining an error, and adjusting the weight vectors to minimize the error. The process still further comprises outputting a neural model file upon training, and outputting to a graphical user interface, a graphical representation of the trained neural network in the neural model file, the graphical representation visually displaying at least one virtual control. The process also comprises enabling a user to interact with the virtual control of the graphical representation of the trained neural network via the graphical user interface to define a virtualization of the reference audio system. According to additional aspects of the present disclosure, a process for creating digital audio systems is provided. The process comprises training a neural network that digitally models a reference audio system. Training the neural network is carried out by repeatedly performing predicting, evaluating, and updating operations. The prediction operation comprises predicting by the neural network, a model output based upon an input, where the output approximates an expected output of the reference audio system. Here, the prediction is carried out in the time domain. The evaluation operation comprises applying a perceptual loss function to the neural network based upon a determined psychoacoustic property, where the perceptual loss function is applied in the frequency domain. The update operation comprises adjusting the neural network responsive to the output of the perceptual loss function, e.g., changing at least one parameter o