US-12621623-B2 - Locating a moving acoustic source

US12621623B2US 12621623 B2US12621623 B2US 12621623B2US-12621623-B2

Abstract

Processing sound signals acquired by at least one microphone, to locate a sound source emitting from a plurality of discrete positions at respective discrete points in time, in a space comprising at least one planar reflective surface. The method includes: obtaining: a first vector u → 0 ( k ) determining a direction of a first acoustic path, direct between the source and the microphone, a second vector u → n ( k ) representing a second acoustic path resulting from a specular reflection and arriving at the microphone, and a delay τ n ( k ) of second path at the microphone, compared to the direct path; exploiting a property of the specular reflection according to which a Euclidean distance between two positions of the source at two discrete points in time is equal to a Euclidean distance between two respective positions of images of the source and derived from one or more same reflections, respectively at said two discrete points in time.

Inventors

Srdan Kitic
Jérôme Daniel

Assignees

ORANGE

Dates

Publication Date: 20260505
Application Date: 20230213
Priority Date: 20220218

Claims (17)

1 . A method for processing sound signals acquired by at least one microphone, in order to locate at least one sound source emitting from a plurality of discrete positions at respective discrete points in time, in a space comprising at least one planar reflective surface, the method being implemented by a device and comprising: receiving the sound signals; obtaining from the sound signals at least, for each point in time k: a first vector u → 0 ( k ) determining a direction of a first acoustic path, direct between the source and the microphone, at least a second vector u → n ( k ) representing a second acoustic path resulting from at least one specular reflection and arriving at the microphone, at least one delay τ n ( k ) of the second path at the microphone, compared to the direct first acoustic path; and exploiting at least one property of the specular reflection, according to which a Euclidean distance between two positions of the source at two discrete points in time is equal to a Euclidean distance between two respective positions of images of the source and derived from one or a plurality of same reflections, respectively at said two discrete points in time, in order to determine at least one position of the source relative to the microphone respectively at said plurality of discrete points in time, as a function of, for each point in time k: the first vector u → 0 ( k ) in order to determine a direction of the direct first acoustic path, and both the delay τ n ( k ) and the second vector u → n ( k ) , in order to associate a distance d (k) between the source and the microphone with this direction of the direct first acoustic path.
2 . The method according to claim 1 , further comprising: exploiting, in addition to said property of specular reflection, a second geometric property according to which a projection on a chosen axis of said Euclidean distance between two positions of the source at two discrete points in time corresponds to a projection on the same chosen axis of the Euclidean distance between two respective positions of images of the source and derived from one or a plurality of same reflections, respectively at said two discrete points in time.
3 . The method according to claim 2 , wherein the chosen axis is parallel or perpendicular to said at least one surface.
4 . The method according to claim 2 , wherein the microphone is of the ambisonic type, and arranged so that the z axis along the height of the microphone is parallel to the chosen axis.
5 . The method according to claim 2 , wherein the exploitation of said property of specular reflection, combined with the exploitation of the second geometric property, generates an overdetermined system of equations in which the positions of the source relative to the microphone, for different points in time k, k′, are the unknowns.
6 . The method according to claim 1 , wherein the sound signals are acquired in a succession of frames over time, and wherein the first vector u → 0 ( k ) , the second vector u → n ( k ) , and the delay τ n ( k ) are obtained for a plurality of frames respectively corresponding to discrete points in time.
7 . The method according to claim 1 , wherein at least one parameter among the first vector u → 0 ( k ) , the second vector u → n ( k ) , and the delay τ n ( k ) is obtained from the expression of a generalized velocity vector, the method comprising: applying a time-frequency transform to the acquired signals, based on the acquired signals, expressing a generalized velocity vector in the frequency domain, for a plurality of discrete points in time, each generalized velocity vector for a given point in time k characterizing a composition between: the first acoustic path, direct between the source and the microphone, represented by the first vector u → 0 ( k ) , and having a delay τ 0 ( k ) between the emission of a sound by the source and the reception of this sound by the microphone, and at least the second acoustic path, represented by the second vector u → n ( k ) , and having delay τ n ( k ) at the microphone, relative to the direct first acoustic path.
8 . The method according to claim 7 , comprising: further applying an inverse transform, from frequency to time, to the generalized velocity vector in order to obtain, in the time domain, at least one peak linked to one or more reflections on one or more surfaces, in addition to a peak linked to an arrival of the sound along said direct first acoustic path, the peak linked to one or more reflections being shifted by delay τ n ( k ) relative to the peak linked to the arrival of the sound along the direct first acoustic path.
9 . The method according to claim 1 , wherein: a vector r → 0 ( k ) between the source and the microphone, at a point in time k, written as a function of the first vector u → 0 ( k ) : r → 0 ( k ) = d ( k ) ⁢ u → 0 ( k ) , where d (k) is the Euclidean distance at point in time k between the source and the microphone, a vector r → 0 ( k ) between an image of the source and the microphone, at a point in time k, written as a function of the second vector u → n ( k ) : r → n ( k ) = ( d ( k ) + δ n ( k ) ) ⁢ u → n ( k ) , with ⁢ δ n ( k ) = c · τ n ( k ) , where c is the speed of sound, said property of specular reflection is expressed, for two discrete points in time k and k′, by an expression of the type:  r → 0 ( k ) - r → 0 ( k ⁢ ′ )  2 =  r → n ( k ) - r → n ( k ⁢ ′ )  2 .
10 . The method according to claim 9 , wherein the expression  r → 0 ( k ) - r → 0 ( k ⁢ ′ )  2 =  r → n ( k ) - r → n ( k ⁢ ′ )  2 expands into: α n ( k , k ′ ) ⁢ d ( k ) + α n ( k ′ , k ) ⁢ d ( k ′ ) + χ n ( k , k ′ ) ⁢ d ( k ) ⁢ d ( k ′ ) + κ n ( k , k ′ ) = 0 , with: α n ( k , k ′ ) = 2 ⁢ ( δ n ( k ) - ξ n ( k , k ′ ) ⁢ δ n ( k ′ ) ) , χ n ( k , k ′ ) = 2 ⁢ ( ξ 0 ( k , k ′ ) - ξ n ( k , k ′ ) ) , κ n ( k , k ′ ) = ( δ n ( k ) ) 2 + ( δ n ( k ′ ) ) 2 - 2 ⁢ δ n ( k ) ⁢ δ n ( k ′ ) ⁢ ξ n ( k , k ′ ) ξ n ( k , k ′ ) = 〈 u → n ( k ) , u → n ( k ′ ) 〉 , where the notation <x,y>designates the dot product between two vectors x and y.
11 . The method according to claim 3 , wherein said second geometric property results in an expression of the type: 〈 u → z , r → n ( k ′ ) - r → n ( k ) 〉 2 = 〈 u → z , r → 0 ( k ′ ) - r → 0 ( k ) 〉 2 , where: {right arrow over (u)} z is a unit vector parallel to said chosen axis, the notation <x,y>designates the dot product between two vectors x and y, r → 0 ( k ) is a vector between the source and the microphone, at a point in time k, written in terms of the first vector u → 0 ( k ) : r → 0 ( k ) = d ( k ) ⁢ u → 0 ( k ) , where d (k) is the Euclidean distance at point in time k between the source and the microphone, r → n ( k ) is a vector between an image of the source and the microphone, at a point in time k, written as a function of the second vector u → n ( k ) : r → n ( k ) = ( d ( k ) + δ n ( k ) ) ⁢ u → n ( k ) , with ⁢ δ n ( k ) = c · τ n ( k ) , where c is the speed of sound, said expression 〈 u → z , r → n ( k ′ ) - r → n ( k ) 〉 2 = 〈 u → z , r → 0 ( k ′ ) - r → 0 ( k ) 〉 2 expanding into: μ n ( k , k ′ ) ⁢ d ( k ) + μ n ( k ′ , k ) ⁢ d ( k ′ ) + ω n ( k ) ( d ( k ) ) 2 + ω n ( k ′ ) ( d ( k ′ ) ) 2 + ρ n ( k , k ′ ) ⁢ d ( k ) ⁢ d ( k ′ ) + ζ n ( k , k ′ ) = 0 , where: μ n ( k , k ′ ) = 2 ⁢ z n ( k ) ( δ n ( k ′ ) ⁢ z n ( k ′ ) - δ n ( k ) ⁢ z n ( k ) ) ; ω n ( k ) = ( z 0 ( k ) ) 2 - ( z n ( k ) ) 2 ; ρ n ( k , k ′ ) = 2 ⁢ ( z n ( k ) ⁢ z n ( k ′ ) - z 0 ( k ) ⁢ z 0 ( k ′ ) ) ; ζ n ( k , k ′ ) = 2 ⁢ δ n ( k ) ⁢ δ n ( k ′ ) ⁢ z n ( k ) ⁢ z n ( k ′ ) - ( δ n ( k ) ⁢ z n ( k ) ) 2 - ( δ n ( k ′ ) ⁢ z n ( k ′ ) ) 2 , and z i ( k ) designates the dot product 〈 u → z , r → i ( k ) 〉 .
12 . The method according to claim 11 , wherein the respective expansions of the expressions  r → 0 ( k ) - r → 0 ( k ⁢ ′ )  2 =  r → n ( k ) - r → n ( k ⁢ ′ )  2 ⁢ and ⁢ 〈 u → z , r → n ( k ′ ) - r → n ( k ) 〉 2 = 〈 u → z , r → 0 ( k ′ ) - r → 0 ( k ) 〉 2 generate a system of bi-affine equations of the type: M [ d vtriu ⁢ dd ⊤ ] + q = 0 in which the variable d is a column vector having coefficients corresponding to the distances between the source and the microphone at different points in time 1, 2, . . . , K: d = [ d ( 1 ) d ( 2 ) … d ( K ) ] ⊤ , and where the operator vtriu dd T extracts coefficients from a diagonal and above the diagonal of the matrix dd T by concatenating them into a column vector.
13 . The method according to claim 12 , comprising a solving of said system of bi-affine equations by non-linear minimization of a cost function (·), given by: d ^ = arg ⁢ min l ⁢ b ≤ d ≤ ub ⁢ ℓ ⁡ ( MF + q ) knowing that f = [ d vtriu ⁢ dd ⊤ ] , where lb and ub are lower and upper limits given to the distances d (k) .
14 . The method according to claim 13 , wherein an adjustment term λr(d) is added to (·) to express the cost function as a whole, as follows: d ^ = arg ⁢ min l ⁢ b ≤ d ≤ ub ⁢ ℓ ⁡ ( Mf + q ) + λ ⁢ r ⁡ ( d ) the term λr(d) making it possible to adjust at least one smoothing structure applied to the coordinates of the vector d.
15 . The method according to claim 13 , wherein a diagonal weighting matrix diag(ψ) is applied in the cost function, as follows: d ^ = arg ⁢ min l ⁢ b ≤ d ≤ ub ⁢ ℓ ⁡ ( diag ⁡ ( ψ ) ⁢ ( Mf + q ) ) + λ ⁢ r ⁡ ( d ) .
16 . A non-transitory computer readable storage medium on which a program is stored, said program comprising instructions for implementing the method according to claim 1 , when said instructions are executed by a processor of a processing circuit of the device.
17 . A computer device comprising: a processing circuit configured to implement a method of processing sound signals acquired by at least one microphone, in order to locate at least one sound source emitting from a plurality of discrete positions at respective discrete points in time, in a space comprising at least one planar reflective surface, the method comprising: receiving the sound signals; obtaining from the sound signals at least, for each point in time k: a first vector u → 0 ( k ) determining a direction of a first acoustic path, direct between the source and the microphone, at least a second vector u → n ( k ) representing a second acoustic path resulting from at least one specular reflection and arriving at the microphone, at least one delay τ n ( k ) of the second path at the microphone, compared to the direct first acoustic path; and exploiting at least one property of the specular reflection, according to which a Euclidean distance between two positions of the source at two discrete points in time is equal to a Euclidean distance between two respective positions of images of the source and derived from one or a plurality of same reflections, respectively at said two discrete points in time, in order to determine at least one position of the source relative to the microphone respectively at said plurality of discrete points in time, as a function of, for each point in time k: the first vector u → 0 ( k ) in order to determine a direction of the direct first acoustic path, and both the delay τ n ( k ) and the second vector u → n ( k ) , in order to associate a distance d (k) between the source and the microphone with this direction of the direct first acoustic path.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This Application is a Section 371 National Stage Application of International Application No. PCT/EP2023/053424, filed Feb. 13, 2023, and published as WO 2023/156316 A1 on Aug. 24, 2023, not in English, which claims priority to French Patent Application No. 2201475, filed Feb. 18, 2022, the contents of which are hereby incorporated by reference in their entireties. FIELD OF THE DISCLOSURE This description relates to the field of locating acoustic sources, in particular for the estimation of the acoustic direction of arrival (DoA) by a compact microphone system (for example a microphone capable of capturing sounds in “ambiphonic” or “ambisonic” representation, see below). One possible application is beamforming for example, which then involves a spatial separation of audio sources, in particular to improve speech recognition (for example for a virtual assistant via voice interaction). Such processing may also be involved in 3D audio coding (pre-analysis of a sound scene in order to code the main signals individually), or may allow spatial domain editing of immersive sound content, possibly audiovisual (for artistic purposes, radio, cinema, etc.). It also allows following which person is speaking in teleconferencing, or detecting sound events (with or without associated video). BACKGROUND OF THE DISCLOSURE One approach was proposed in document WO-2021/074502, which uses the velocity vector of a sound to obtain in particular the sound's direction of arrival, its delay (therefore the distance from the source), as well as the delays related to any reflections on the surfaces of a room and the determination of the positions of such surfaces (possibly partitioning surfaces such as walls, the floor, the ceiling, but also reflective surfaces such as tables, screens, etc.). Such an implementation makes it possible to model the interference between the direct wave and at least one indirect wave (from a reflection) and to exploit the expressions of this model on the entire velocity vector (its imaginary part as well as its real part). An improvement to this approach was proposed in document FR2011874 by using a modified velocity vector, referred to as “generalized”, and constructed from the conventional velocity vector which is generally expressed as a function of an omnidirectional component in the denominator. The generalized velocity vector then replaces the conventional velocity vector within the meaning of document WO-2021/074502, but with a component in the denominator which is different from an omnidirectional component. This different component may in fact be more “selective” towards the direction of arrival of the sound. In an embodiment presented in those documents, it is possible to obtain (from an ambisonic sensor for example) a succession of peaks characterizing an acoustic intensity or energy, and each linked to a reflection on at least one surface, in addition to a peak linked to the arrival of the sound along the direct path (DoA) of the sound from the source. However, in certain cases of application where the sound source may be moving, a robust method is sought for determining the distance between the source and the microphone as the source moves about, particularly when the precise orientation of the surface(s) causing the reflection(s) at a given moment is not initially known. SUMMARY The present description improves this situation. For this purpose, it proposes relying in particular on the reflections from surfaces, at different discrete points in time. It therefore relates to a method for processing sound signals acquired by at least one microphone, in order to locate at least one sound source emitting from a plurality of discrete positions at respective discrete points in time (k, k′), in a space comprising at least one planar reflective surface, the method comprising: obtaining at least, for each point in time k:a first vector u→0(k) determining a direction of arrival (DoA) of a first acoustic path, direct between the source and the microphone, at least a second vector u→n(k) representing a second acoustic path resulting from at least one specular reflection and arriving at the microphone, at least one delay τn(k) of the second path at the microphone, compared to the direct path, exploiting at least one property of the specular reflection, according to which a Euclidean distance between two positions of the source at two discrete points in time is equal to a Euclidean distance between two respective positions of images of the source and derived from one or a plurality of same reflections, respectively at said two discrete points in time, in order to determine at least one position (d(k), d(k′)) of the source relative to the microphone respectively at said plurality of discrete points in time (k, k′), as a function of, for each point in time k:the first vector u→0(k), in order to determine a direction (DoA) of the direct path, and both the delay τn(k) and t