CN-121978626-A - Microphone array sound source positioning system and method for desktop robot

CN121978626ACN 121978626 ACN121978626 ACN 121978626ACN-121978626-A

Abstract

The invention relates to the technical field of robots, and particularly discloses a system and a method for positioning a microphone array sound source of a tabletop robot. Including main control chip, collection microphone, power module, ESP32 main control board passes through two I2S bus pins and 4 the circular array connection that collection microphone constitutes, 4 the array diameter of collection microphone is 50mm, and is adjacent collection microphone contained angle is 90, collection microphone passes through the GND line and realizes ground connection, ESP32 main control board passes through the power supply line and does collection microphone provides 3.3V voltage. The 4-microphone circular array with the diameter of 50mm and the ESP32 are adopted for synchronous acquisition, 360-degree full-horizontal positioning is realized, the hardware size is small, the miniaturized requirement of a desktop accompanying robot is adapted, the VAD voice activity detection module is introduced, the algorithm operation is triggered only when the voice is detected, the dormant state power consumption is small, and the robot endurance is prolonged.

Inventors

SUN QI
CHI CHE
FANG TAO

Assignees

杭州星梦岛科技有限公司

Dates

Publication Date: 20260505
Application Date: 20260209

Claims (9)

1. The desktop robot microphone array sound source positioning system comprises a main control chip, acquisition microphones and a power supply module and is characterized in that the main control chip is an ESP32 main control board, a WebServer module, a VAD voice activity detection module, an SRP-PHAT calculation module, an I2S driving module and a WiFi module are arranged in the ESP32 main control board, the ESP32 main control board is connected with 4 circular arrays formed by the acquisition microphones through double I2S bus pins, the array diameters of the 4 acquisition microphones are 50mm, the included angles of adjacent acquisition microphones are 90 degrees, the acquisition microphones are grounded through GND lines, and the ESP32 main control board provides 3.3V voltages for the acquisition microphones through power supply lines.
2. The microphone array sound source localization system of claim 1, wherein the VAD voice activity detection module thresholds the short-time energy analysis of the audio stream to-40 dBFS.
3. The method for using the sound source positioning system of the microphone array of the tabletop robot according to claim 1-2, comprising the following steps: S1, synchronously acquiring time domain audio signals of 4 acquisition microphones by the ESP32 main control board through a double I2S bus, and arranging the acquired serial data into 4 independent parallel audio streams; S2, the ESP32 main control board sends the collected audio stream to a VAD voice activity detection module, and the module sets a short-time energy analysis threshold value of the audio stream to be-40 dBFS based on the basic calculation force of the ESP32 to judge whether the current signal is a human voice signal or not: if the detection result is NO, directly returning to the multi-channel audio synchronous acquisition link, and maintaining a low-power consumption standby state; And if the detection result is yes, triggering the SRP-PHAT algorithm core operation of the SRP-PHAT calculation module.
4. A method for using a desktop robot microphone array sound source positioning system according to claim 3, wherein before step S1 is executed, hardware initialization configuration is executed, a dual I2S bus of an ESP32 main control board is configured into an audio acquisition mode with a 16kHz sampling rate and 16bit depth, then physical parameters of 4 acquisition microphone circular arrays are calibrated, the physical parameters are array diameters, communication matching of the arrays and the ESP32 main control board is completed, finally, the WiFi module of the ESP32 main control board is initialized, and WebSocket service is started to support subsequent data pushing.
5. The method for using a microphone array sound source positioning system of a desktop robot according to claim 3, further comprising step S1, wherein the SRP-PHAT algorithm in the SRP-PHAT calculation module is executed by adding a hanning window to 4 paths of time domain audio, the hardware acceleration module of the ESP32 main control board executes 256-point FFT frequency domain conversion and converts the time domain audio into frequency domain signals, performs PHAT weighting processing on the frequency domain signals, only retains phase information of the signals and eliminates amplitude interference, and simultaneously weights only a human voice core frequency band of 300-3400 Hz.
6. A method for using a sound source positioning system of a microphone array of a tabletop robot according to claim 3, further comprising step S3, after weighting is completed, entering 360-degree direction circulation operation, calculating a phase delay corresponding to each direction based on the physical layout of the 4-microphone circular array, wherein the step length is 1 degree, applying the phase delay to frequency domain signals to realize digital beam forming, namely simulating the steering of the microphone array to the direction, then carrying out energy superposition on the frequency domain signals of 4 channels, and calculating SRP steering response power values corresponding to the direction.
7. The method of claim 6, further comprising the step of S31, after traversing all directions of 360 degrees, screening out a direction with a maximum SRP value as a sound source direction to complete positioning operation, calculating positioning confidence, and determining that the positioning result is valid when the ratio of the maximum SRP value to the next maximum SRP value is not less than 3, or else, determining that the positioning is invalid.
8. The method for using the desktop robot microphone array sound source positioning system is characterized by further comprising the step S4 of integrating a positioning result after positioning operation is completed, packaging sound source directions of 0-359 degrees and confidence indexes into structural data, pushing the data to a Web client in real time by means of a WebSocket protocol by means of the WiFi module of the ESP32 main control board through a Web visualized data pushing link, wherein the WebServer module, the VAD module, the SRP-PHAT computing module and the I2S driving module in the ESP32 main control board form a software layer, interacting with a hardware layer through a data bus, and pushing the data to the Web client of an interaction layer through WebSocket communication.
9. The method for using the desktop robot microphone array sound source positioning system according to claim 8, further comprising step S41, wherein after the Web client receives the data, the client displays the energy distribution in the 360-degree direction in a color gradient manner through the thermodynamic diagram rendering module, visual display of a positioning result is achieved, after visualization is completed, the process returns to step S1, and signal acquisition and detection of the next round are continuously executed, so that continuous closed-loop interaction is formed.

Description

Microphone array sound source positioning system and method for desktop robot Technical Field The invention relates to the technical field of robots, in particular to a system and a method for positioning a microphone array sound source of a desktop robot. Background At present, the sound source positioning function of the desktop accompanying robot mostly adopts the following prior art scheme that part of desktop robots adopt a single microphone or linear double microphone structure to realize sound source sensing, but the scheme can only realize rough direction judgment (such as left-right distinction) and cannot cover 360-degree all-horizontal directions, positioning errors are usually more than 15 degrees, and accurate interaction requirements are difficult to meet, and the high-calculation-force chip-driven positioning scheme that part of high-precision sound source positioning systems run SRP-PHAT and other positioning algorithms depending on PC, raspberry pie or special DSP chips, although the precision can be ensured, the power consumption of the chips is generally more than 50mA, the size is larger, and the chip is incompatible with the design requirements of 'miniaturization and low power consumption' of the desktop accompanying robots. The algorithm operation scheme without intelligent triggering comprises the following steps that an existing low-power sound source positioning scheme based on an MCU (such as an ESP 32) is provided, most of the algorithms are not integrated with Voice Activity Detection (VAD) modules, the algorithms are continuously operated in full time, so that the power consumption of the MCU is high, the duration of a robot is shortened by more than 50%, the low-power adaptation of an SRP-PHAT algorithm is defective in that the traditional SRP-PHAT algorithm needs to process full-band audio data (0-20 kHz) and traverse 360 DEG in a 1 DEG step length, single operation on the low-power MCU such as the ESP32 is more than 100ms, instantaneity is poor, real-time interaction response of the robot cannot be supported, and the visualization scheme is limited in that the visualization of the existing sound source positioning is independent upper computer software, additional installation programs are needed, operation is complex, and the visualization scheme cannot be adapted to a light interaction scene of a desktop robot. Disclosure of Invention The invention aims to provide a system and a method for positioning a microphone array sound source of a desktop robot, which adopt a 4-microphone circular array with the diameter of 50mm and ESP32 double I2S synchronous acquisition, realize 360-degree full-horizontal positioning, have small hardware volume, adapt to miniaturized requirements of the desktop accompanying robot, introduce a VAD voice activity detection module, trigger algorithm operation only when human voice is detected, have small dormant state power consumption, prolong the duration of the robot, improve the lightweight of an SRP-PHAT algorithm, accelerate FFT with 256-point hardware and 300-3400 Hz band-pass PHAT weighting, adapt to ESP32 low-calculation-force scenes, realize real-time data pushing through built-in WebServers and WebSocket of the ESP32, directly render thermodynamic diagrams by a Web client, and adapt to lightweight interaction requirements of the desktop robot so as to solve the problems in the background technology. In order to achieve the above purpose, the present invention provides the following technical solutions: The utility model provides a desktop robot microphone array sound source positioning system, includes main control chip, gathers microphone, power module, main control chip is ESP32 main control board, ESP32 main control board embeds WebServer module, VAD voice activity detection module, SRP-PHAT calculation module, I2S drive module, wiFi module, and ESP32 main control board passes through two I2S bus pins and 4 gather the circular array connection that the microphone constitutes, 4 gather the array diameter of microphone is 50mm, adjacent gather the microphone contained angle and be 90, gather the microphone and realize ground connection through the GND line, ESP32 main control board passes through the power supply line for gather the microphone and provide 3.3V voltage. As a still further aspect of the present invention, the threshold for short-time energy analysis of the audio stream by the VAD voice activity detection module is set to-40 dBFS. A method of using a tabletop robot microphone array sound source localization system, comprising: S1, synchronously acquiring time domain audio signals of 4 acquisition microphones by the ESP32 main control board through a double I2S bus, and arranging the acquired serial data into 4 independent parallel audio streams; S2, the ESP32 main control board sends the collected audio stream to a VAD voice activity detection module, and the module sets a short-time energy analysis threshold value of the a