Search

CN-116243244-B - Sound source positioning method, device, equipment and computer readable storage medium

CN116243244BCN 116243244 BCN116243244 BCN 116243244BCN-116243244-B

Abstract

The application discloses a sound source positioning method, a device, equipment and a computer readable storage medium, wherein the sound source positioning method comprises the steps of collecting scene images and audio signals in an environment, and detecting all human body images included in the scene images; and detecting target positions matched with the sound source angles in all the current positions, and outputting the human body images corresponding to the target positions. The application improves the accuracy of sound source positioning.

Inventors

  • DUAN JINHUI
  • CHA ZHIFU

Assignees

  • 深圳市悦尔创新科技有限公司

Dates

Publication Date
20260508
Application Date
20221229

Claims (7)

  1. 1. A sound source localization method, characterized in that the sound source localization method comprises: acquiring scene images and audio signals in an environment, and detecting all human body images included in the scene images; determining the current position corresponding to each human body image and determining the sound source angle of the audio signal; Obtaining a calibration position corresponding to the sound source angle from a first preset calibration relation, wherein the first preset calibration relation comprises the corresponding relation between different calibration positions of different sound source angles in the scene image; Sequentially traversing all the current positions, and determining the distance from the traversed human body image corresponding to the current position to a preset camera according to the area of the traversed human body image corresponding to the current position; Obtaining a position error value corresponding to the distance from a second preset calibration relation, wherein the second preset calibration relation comprises a corresponding relation between different distances from the human body image to a preset camera and different position error values; determining a second horizontal coordinate of the calibration position in a preset coordinate system, determining a target position range according to the second horizontal coordinate and the position error value, and detecting whether the traversed current position is in the target position range; after the traversed current position is within the target position range, determining that the traversed current position is successfully matched with the calibration position; and after the matching is successful, taking the traversed current position as a target position, and outputting a human body image corresponding to the target position.
  2. 2. The sound source localization method of claim 1, wherein the step of detecting whether the current location traversed is within the target location range comprises: determining a first horizontal coordinate of the traversed current position in a preset coordinate system, and detecting whether the coordinate value of the first horizontal coordinate is in the target position range; and if the coordinate value of the first horizontal coordinate is in the target position range, determining that the traversed current position is in the target position range.
  3. 3. The sound source localization method according to claim 1, wherein the step of determining a target position range from the second horizontal coordinate and the position error value comprises: Taking the sum of the coordinate value of the second horizontal coordinate and the position error value as an upper limit position, and taking the difference between the coordinate value of the second horizontal coordinate and the position error value as a lower limit position; And taking the range between the upper limit position and the lower limit position as a target position range.
  4. 4. The sound source localization method of claim 1, wherein the step of determining a sound source angle of the audio signal comprises: and counting a plurality of microphones in a preset microphone array to acquire time difference information of the audio signals, inputting the time difference information into a pre-trained sound source positioning model, and outputting a sound source angle.
  5. 5. A sound source localization device, the sound source localization device comprising: the acquisition module is used for acquiring scene images and audio signals in the environment and detecting all human body images included in the scene images; The determining module is used for determining the current position corresponding to each human body image and determining the sound source angle of the audio signal; an output module for: Obtaining a calibration position corresponding to the sound source angle from a first preset calibration relation, wherein the first preset calibration relation comprises the corresponding relation between different calibration positions of different sound source angles in the scene image; Sequentially traversing all the current positions, and determining the distance from the traversed human body image corresponding to the current position to a preset camera according to the area of the traversed human body image corresponding to the current position; Obtaining a position error value corresponding to the distance from a second preset calibration relation, wherein the second preset calibration relation comprises a corresponding relation between different distances from the human body image to a preset camera and different position error values; determining a second horizontal coordinate of the calibration position in a preset coordinate system, determining a target position range according to the second horizontal coordinate and the position error value, and detecting whether the traversed current position is in the target position range; after the traversed current position is within the target position range, determining that the traversed current position is successfully matched with the calibration position; and after the matching is successful, taking the traversed current position as a target position, and outputting a human body image corresponding to the target position.
  6. 6. A sound source localization device comprising a memory, a processor and a sound source localization program stored on the memory and executable on the processor, the sound source localization program when executed by the processor implementing the steps of the sound source localization method according to any one of claims 1 to 4.
  7. 7. A computer-readable storage medium, on which a sound source localization program is stored, which when executed by a processor implements the steps of the sound source localization method as claimed in any one of claims 1 to 4.

Description

Sound source positioning method, device, equipment and computer readable storage medium Technical Field The present application relates to the field of positioning technologies, and in particular, to a method, an apparatus, a device, and a computer readable storage medium for positioning a sound source. Background Sound source localization is a technology for obtaining the sound source position through sound waves emitted by a sound source, and the sound source localization technology is beneficial to intelligent development of machines. The traditional sound source localization technology utilizes a microphone array to receive sound source signals, estimates time delay differences of the received sound source signals among microphones based on correlation among the sound source signals received by the microphones, calculates distance differences between the sound source and different microphones according to the time delay differences and sound velocity, establishes an equation set according to the distance differences and the distances between the microphones and the sound source, solves the equation set, and calculates the position coordinates of the sound source. But the sound source signal received by the microphone will often include signals other than the signal emitted by the sound source. For example, reverberation, etc. This results in a large error in the delay difference estimated based on the correlation of each sound source signal, which in turn leads to a problem of inaccurate sound source localization. Disclosure of Invention The application mainly aims to provide a sound source positioning method, a device, equipment and a computer readable storage medium, which aim to solve the technical problem of how to improve the sound source positioning accuracy. In order to achieve the above object, the present application provides a sound source localization method including the steps of: acquiring scene images and audio signals in an environment, and detecting all human body images included in the scene images; determining the current position corresponding to each human body image and determining the sound source angle of the audio signal; and detecting target positions matched with the sound source angle in all the current positions, and outputting a human body image corresponding to the target positions. Optionally, the step of detecting target positions matched with the sound source angle in all the current positions includes: Acquiring a calibration position corresponding to the sound source angle from a first preset calibration relation, wherein the first preset calibration relation comprises the corresponding relation between different calibration positions of different sound source angles in the scene image; Traversing all the current positions in sequence, and detecting whether the traversed current positions are matched with the calibration positions; And taking the traversed current position as a target position after successful matching. Optionally, the step of detecting whether the traversed current position matches the calibration position includes: Determining the distance from the traversed human body image corresponding to the current position to a preset camera; Determining a target position range according to the calibration position and the distance, and detecting whether the traversed current position is in the target position range; and after the traversed current position is within the target position range, determining that the traversed current position is successfully matched with the calibration position. Optionally, the step of detecting whether the traversed current position is within the target position range includes: determining a first horizontal coordinate of the traversed current position in a preset coordinate system, and detecting whether the coordinate value of the first horizontal coordinate is in the target position range; and if the coordinate value of the first horizontal coordinate is in the target position range, determining that the traversed current position is in the target position range. Optionally, the step of determining the target position range according to the calibration position and the distance includes: acquiring a position error value corresponding to the distance from a second preset calibration relation, wherein the second preset calibration relation comprises the corresponding relation between different distances from the human body image to a preset camera and different position error values; and determining a second horizontal coordinate of the calibration position in a preset coordinate system, and determining a target position range according to the second horizontal coordinate and the position error value. Optionally, the step of determining the target position range according to the second horizontal coordinate and the position error value includes: Taking the sum of the coordinate value of the second horizontal coordinate and the