EP-4371309-B1 - ENVIRONMENTAL SOUND LOUDSPEAKER

EP4371309B1EP 4371309 B1EP4371309 B1EP 4371309B1EP-4371309-B1

Inventors

OOMEN, Paulus
DE KLERK, LEENDERT

Dates

Publication Date: 20260506
Application Date: 20220714

Claims (15)

An environmental sound loudspeaker (100), comprising: a loudspeaker driver (102); a first microphone pair (104 1 ,104 3 ), the first microphone pair comprising a first microphone (104 1 ) and a second microphone (104 3 ) being positioned a distance d apart, the first microphone and the second microphone being positioned diametrically opposite each other and equidistant relative to a centre of the loudspeaker driver; and a signal processor (110) configured to: - receive a first input signal (113 2 ) from the first microphone and a second input signal (113 1 ) from the second microphone, each input signal representing a recorded sound; - determine an output signal (129) based on the first and second input signals; and - provide the output signal to the loudspeaker driver; wherein the determination of the output signal comprises: - inverting (114 1 ) the first input signal and combining (116 1 ) the inverted first input signal (115 1 ) with the second input signal into a combined signal (117 1 ); and - selectively amplifying (122) the combined signal and/or the first and second input signals to obtain a high-fidelity signal of environmental sounds captured by the first and/or second microphones for frequencies in an audible frequency range, the selectively amplifying comprising attenuating signals with a frequency higher than a first transition frequency and boosting signals with a frequency lower than a second transition frequency, the first and second transition frequencies being based on the distance d between the first and second microphones.
The environmental sound loudspeaker as claimed in claim 1, further comprising: one or more additional microphone pairs, each additional microphone pair comprising a first additional microphone and a second additional microphone positioned the distance d apart, the first and second additional microphones in each additional microphone pair being positioned diametrically opposite each other relative to the centre of the loudspeaker driver, the first microphone pair and the one or more additional microphone pairs being arranged symmetrically around the centre of the loudspeaker driver; wherein the signal processor is further configured to, for each of the one or more additional microphone pairs, receive a first additional input signal from the first additional microphone and a second additional input signal from the second additional microphone; wherein the determination of the output signal further comprises, for each additional microphone pair: - inverting the first additional input signal and combining the inverted first additional input signal with the second additional input signal into a combined additional signal; - applying a phase shift to the combined additional signal, the phase shift being based on an angle between an axis between the first and second microphones and an additional axis between the first and second additional microphones; and - combining the phase-shifted additional signal with the combined signal; and wherein the second transition frequency is further based on the number of microphone pairs.
The environmental sound loudspeaker as claimed in claim 2, wherein the first microphone pair and the one or more additional microphone pairs are equally distributed on a circle, a centre of the circle coinciding with the centre of the loudspeaker driver, the phase shift Δ φ i for the i-th additional microphone pair being equal to Δ φ = i × 360°/ N , with N the number of microphones, preferably the environmental sound loudspeaker comprising exactly one additional microphone pair placed orthogonally to the first microphone pair and the phase shift being equal to 90°; or wherein the first microphone pair and the one or more additional microphone pairs are equally distributed on a sphere, a centre of the sphere coinciding with the centre of the loudspeaker driver, preferably the environmental sound loudspeaker comprising exactly two additional microphone pairs, the first microphone pair and the two additional microphone pairs being placed on axes of a cartesian coordinate system with an origin in the centre of the loudspeaker driver and the phase shift being equal to 90°.
The environmental sound loudspeaker as claimed in any one of the preceding claims, wherein the audible frequency range comprises all frequencies between 20 Hz - 15 kHz, preferably all frequencies between 15 Hz - 20 kHz.
The environmental sound loudspeaker as claimed in any one of the preceding claims, wherein the microphones are omnidirectional microphones.
The environmental sound loudspeaker as claimed in any one of the preceding claims, further comprising an acoustic module for sound manipulation, the sound manipulation preferably comprising adding reverberation and/or virtual acoustics to a signal provided to the acoustic module, and wherein the determination of the output signal further comprises the acoustic module modifying the output signal.
The environmental sound loudspeaker as claimed in any one of the preceding claims, further comprising an external signal input for receiving an external input signal, the external input signal encoding a sound, and wherein the determination of the output signal further comprises combining the external input signal with the output signal.
The environmental sound loudspeaker as claimed in any one of the preceding claims, wherein the selectively amplifying comprises attenuating signals with a frequency higher than the first transition frequency with -3 dB for the first microphone pair and, optionally, for each doubling of the number of microphone pairs; and/or wherein the selectively amplifying comprises boosting signals with a frequency lower than the second transition frequency with +6 dB per octave; and/or wherein the first transition frequency f t, 1 is defined by f t , 1 = v 2 d , and/or wherein the second transition frequency f t, 2 is approximately equal to f t , 2 = 0.4 v N d , wherein v denotes the speed of sound and N denotes the number of microphones.
The environmental sound loudspeaker as claimed in any one of the preceding claims, wherein the selectively amplifying comprises applying a series of low-shelf filters, preferably the series of low-shelf filters being defined by a first transfer function H low , n f = 1 + G 0 f n f / B Q + f n , wherein G 0 denotes a gain factor preferably equal to G 0 = 1, B denotes a variable bandwidth preferably defined by B = f n f , Q denotes a Q-factor determining the slope of the gain curve, the Q-factor preferably being equal to Q = 5, and wherein f n denotes a central frequency of the n th low-shelf filter, preferably f n being determined by f n = 1 2 n v 2 N d , wherein v denotes the speed of sound, and wherein N denotes the number of microphones; and/or wherein the selectively amplifying comprises applying a high-shelf filter, preferably the high-shelf filter being defined by a second transfer function H high f = 1 − G ∞ + G ∞ f h f / B Q + f h , wherein G ∞ denotes a gain factor preferably equal to G ∞ = 1 − 1 N , wherein N denotes the number of microphones, B denotes a variable bandwidth preferably defined by B = f h f , Q denotes a Q-factor determining the slope of the gain curve, the Q-factor preferably being equal to Q = 5, and wherein f h denotes a central frequency of the high-shelf filter, preferably f h being determined by f h = v 2 2 d , wherein v denotes the speed of sound.
The environmental sound loudspeaker as claimed in any one of the preceding claims, wherein applying a phase shift Δ φ to a signal comprises: - creating a first copy and a second copy of the signal; - applying a Hilbert transform to the first copy to apply a 90° phase shift; - amplifying the first copy with a first factor a, and the second copy with a second factor b; and - combining the first and second copies and amplifying the combined copies with a third factor c; - wherein the factors a, b, and c are selected such that Δ φ = arctan( a / b ) and c = 1 / a 2 + b 2 ; or wherein applying a phase shift Δ φ to a signal comprises: - creating a first copy and a second copy of the signal; - applying a first frequency-dependent phase shift θ A ( f )to the first copy using one or more first all-pass filters with associated first corner frequencies f 0 ,A ( i ) and first quality factors Q A ( i ) , preferably the first frequency-dependent phase shift θ A ( f ) being given by θ A f = − 4 ∑ i = 1 n arctan f f 0 , A i Q A i ; - applying a second frequency-dependent phase shift θ B ( f ) to the second copy using one or more second all-pass filters with associated second corner frequencies f 0 , B ( i ) and second quality factors Q B ( i ) , preferably the second frequency-dependent phase shift θ B ( f ) being given by θ B f = − 4 ∑ i = 1 n arctan f f 0 , B i Q B i ; - and taking a difference between the first and second phase-shifted copies; - wherein the first and second corner frequencies and/or the first and second quality factors are optimised such that Δ φ ≈ θ A ( f ) - θ B ( f ) for all f in the audible frequency range.
A method for recording, processing and immediately replaying sounds, the method comprising: - receiving a first input signal from a first microphone and a second input signal from a second microphone, each input signal representing a recorded sound, the first microphone and the second microphone forming a first microphone pair, the first microphone and the second microphone being positioned a distance d apart, the first microphone and the second microphone being positioned diametrically opposite each other and equidistant relative to a centre of a loudspeaker driver; - determining an output signal based on the first and second input signals; - optionally, manipulating the output signal, the manipulation preferably comprising adding reverberation and/or virtual acoustics to the output signal; and - providing the, optionally manipulated, output signal to the loudspeaker driver; wherein the determination of the output signal comprises: - inverting the first input signal and combining the inverted first input signal with the second input signal into a combined signal; and - selectively amplifying the combined signal and/or the first and second input signals to obtain a high-fidelity signal of environmental sounds captured by the first and/or second microphones for frequencies in an audible frequency range, the selectively amplifying comprising attenuating signals with a frequency higher than a first transition frequency and boosting signals with a frequency lower than a second transition frequency, the first and second transition frequencies being based on the distance d between the first and second microphones.
The method as claimed in claim 11, further comprising: receiving a first additional input signal from a first additional microphone and a second additional input signal from a second additional microphone from each of one or more additional microphone pairs, each additional microphone pair comprising a first additional microphone and a second additional microphone positioned the distance d apart, the first and second additional microphones in each additional microphone pair being positioned diametrically opposite each other relative to the centre of the loudspeaker driver, the first microphone pair and the one or more additional microphone pairs being arranged symmetrically around the centre of the loudspeaker driver; wherein the determination of the output signal further comprises, for each additional microphone pair: - inverting the first additional input signal and combining the inverted first additional input signal with the second additional input signal into a combined additional signal; - applying a phase shift to the combined additional signal, the phase shift being based on an angle between an axis between the first and second microphones and an additional axis between the first and second additional microphones; and - combining the phase-shifted additional signal with the combined signal; and wherein the second transition frequency is further based on the number of microphone pairs.
A computer comprising a computer readable storage medium having computer readable program code embodied therewith, and a processor, preferably a microprocessor, coupled to a loudspeaker driver (102) and a first microphone pair (1041, 1043), the first microphone pair comprising a first microphone (1041) and a second microphone (1043) being positioned a distance d apart, the first microphone and the second microphone being positioned diametrically opposite each other and equidistant relative to a centre of the loudspeaker driver, whereby the processor is further coupled to the computer readable storage medium, wherein responsive to executing the computer readable program code, the processor is configured to perform the method as claimed in claim 11 or 12.
A computer program or suite of computer programs comprising at least one software code portion or a computer program product storing at least one software code portion, the software code portion, when run on a computer system, being configured for executing the method as claimed in claim 11 or 12, whereby the computer system is coupled to a loudspeaker driver (102) and a first microphone pair (1041, 1043), the first microphone pair comprising a first microphone (1041) and a second microphone (1043) being positioned a distance d apart, the first microphone and the second microphone being positioned diametrically opposite each other and equidistant relative to a centre of the loudspeaker driver.
A non-transitory computer-readable storage medium storing at least one software code portion, the software code portion, when executed or processed by a computer, is configured to perform the method as claimed in claim 11 or 12, whereby the computer is coupled to a loudspeaker driver (102) and a first microphone pair (1041, 1043), the first microphone pair comprising a first microphone (1041) and a second microphone (1043) being positioned a distance d apart, the first microphone and the second microphone being positioned diametrically opposite each other and equidistant relative to a centre of the loudspeaker driver.

Description

FIELD OF THE INVENTION This disclosure relates to systems and methods for capturing environmental sounds and immediately replaying the captured sounds. BACKGROUND Applications for sound in Virtual and Augmented Reality (VR/AR) generally aim to provide a lifelike experience of a virtual environment and/or an acoustically augmented environment. They typically simulate events - taking place in the virtual and/or acoustically augmented environment - that a subject can interact with. A common problem that such applications face is that the subject does not hear oneself (properly) reflected in the virtual environment, i.e., the real sounds produced by the subject, e.g., by means of voice and/or body movements, do not sound as if they take place in the virtual and/or augmented environment. Additionally, especially with regards to Mixed and/or Augmented Reality applications (XR/AR), one may also want to hear other environmental sounds, e.g., sounds produced by other (human) subjects and/or any other sound sources in the real environment that are reflected in the virtual environment. Thus, although the impression of a virtual environment may be convincingly simulated, as long as the subject does not hear oneself and the real environment reflected in the virtual and/or augmented environment, the simulation is perceptually incoherent. As a result, the experience is not lifelike and less physically and/or emotionally engaging than is desirable. Many virtual, mixed, and augmented reality applications use a so-called closed system, wherein sounds are typically delivered to a user using headphones. In a closed system, capturing the sounds produced by the subject and/or by any other sources in the environment may involve a prohibitive amount of microphones and/or sensors placed on the subject(s) and/or throughout the environment to accurately process the audio and spatial/movement data. The data delivery to the subject would then further involve full simulation of each and every sound source in the real environment to be able to provide a convincing experience of each sound source in the virtual environment. This may require prior knowledge about the real environment and/or the sound sources present in the environment and/or the type of events occurring in the environment. Such a simulation may require a prohibitive amount of real-time data processing and may require data that is often impossible to obtain either prior or in real-time. An example of a closed system is known from US 2013/0236040 A1. This document discloses a system to combine environmental sounds and augmented reality (AR) sounds. The system comprises headphones with speakers on the inside (directed towards the user's ears) and microphones on the outside, positioned close to the speakers but acoustically insulated from the speakers. The microphones capture ambient sounds, which may be processed (e.g. enhanced or suppressed, depending on the sound and the circumstances) before being fed to the respective speakers. It may be understood that the complexity of the problem of delivering a convincing audio experience, and thus of a closed system, increases exponentially when one considers applications that involve a larger number of subjects sharing the same physical environment. As an example, one may consider the effects of acoustically enhanced environments on various interacting groups of human subjects, e.g., a group of people discussing during an assembly meeting or an audience attending a live concert; as well as non-human subjects, e.g., a swarm of bees moving across a meadow of flowers, birds communicating with each other across trees or distributed plant growth in an open field. Alternatively, one could consider an open delivery to all subjects at once, e.g., by means of adding loudspeakers to the environment. This eliminates the necessity to capture data from each individual subject and/or sound source in the environment, as, instead, sound is captured on the level of the environment as a whole. However, a drawback of such an open system is the feedback that occurs when microphones are placed in the same environment as the loudspeakers and the captured environmental sound is played back in the same environment at a high gain in real-time. This is especially the case when omnidirectional microphones are used. Although omnidirectional microphones are particularly suitable to capture sound on the level of the environment as a whole, they are known to be particularly sensitive to produce feedback when played back in the environment in real-time. Consequently, omnidirectional microphones are not commonly used in the design of such systems. US 5,524,059 A and US 4,837,829 A disclose acoustic sound systems addressing the feedback problem by applying a phase shift to one microphone signal of a pair of microphone signals. Hence, there is a need in the art for a device that accurately captures environmental sound, i.e., the sound produced by subject(s) and any o