CN-122015640-A - Method and system for automatically positioning double-ear position of automobile sound field center

CN122015640ACN 122015640 ACN122015640 ACN 122015640ACN-122015640-A

Abstract

The application discloses a method and a system for automatically positioning the positions of ears of an automobile sound field center, which relate to the technical field of automobile electronic technology and audio signal processing, and the method relates to image acquisition and preprocessing, wherein a convolutional neural network is used for identifying left ear coordinates and right earcon element coordinates, and a camera internal reference matrix and depth information are combined to map and convert the left ear coordinates and the right earcon element coordinates into vehicle-mounted three-dimensional space coordinates; calculating the straight line distance from the sound field center point to the loudspeaker based on the geometric relationship, and obtaining the delay compensation time and the gain compensation coefficient. According to the application, through visual tracking and digital signal processing depth fusion, real-time positioning of the binaural positions of the driver and self-adaptive reconstruction of the sound field are realized, the problem of auditory offset generated by the traditional fixed sound field along with the change of the gesture is solved, and the positioning accuracy and immersive experience of the in-car sound are improved.

Inventors

YANG DONGYU
ZHENG HONGLI
WANG WEIMING

Assignees

中国第一汽车股份有限公司

Dates

Publication Date: 20260512
Application Date: 20251212

Claims (10)

1. A method for automatically locating binaural positions for an automotive sound field center, comprising the steps of: S1, acquiring original image data containing a head area of a driver through an image acquisition module; S2, inputting the original image data into a neural network model by a processor module, identifying and outputting a first pixel coordinate of a left ear characteristic point on an image plane and a second pixel coordinate of a right ear characteristic point on the image plane, converting the first pixel coordinate into a left ear space coordinate under a vehicle-mounted three-dimensional coordinate system by the processor module according to imaging parameters and depth information of a camera, and converting the second pixel coordinate into a right ear space coordinate under the vehicle-mounted three-dimensional coordinate system; S3, calculating a sound field center point by the processor module according to the left ear space coordinate and the right ear space coordinate; S4, an audio parameter self-adaptive calculation step, namely calculating the spatial distance from the sound field center point to a loudspeaker unit in a loudspeaker array by an audio signal processing module, and determining delay compensation time and gain compensation coefficients according to the spatial distance; the audio signal processing module further determines the head posture angle of the driver according to the left ear space coordinate and the right ear space coordinate, calculates the relative azimuth angle and the pitch angle of the speaker unit relative to the face orientation of the driver, and matches a corresponding impulse response function in a pre-stored head related transfer function database according to the relative azimuth angle and the pitch angle; And S5, a signal output step, namely processing an original audio signal by the digital signal processing module by utilizing the delay compensation time, the gain compensation coefficient and the impulse response function, generating an output signal and transmitting the output signal to the loudspeaker unit.
2. A method for automatic localization of binaural locations for a sound field center of a vehicle according to claim 1, wherein step S1 further comprises a preprocessing operation, said processor module performing said preprocessing operation on said raw image data for generating preprocessed image data and using said preprocessed image data as input to said neural network model in step S2; The pretreatment operation specifically comprises the following steps: calculating a weighted sum of red channel components, green channel components and blue channel components of pixel points in the original image data by a weighted average method, and converting the original image data into gray image data; calculating a cumulative distribution function by counting the distribution frequency of gray levels in the gray image data, establishing a mapping relation between an original gray level and an equalized gray level according to the cumulative distribution function, and executing histogram equalization processing on the gray image data.
3. A method for automatically locating binaural positions in a sound field of a vehicle according to claim 1, wherein the imaging parameters comprise an internal matrix of cameras, and wherein in step S2 the step of the processor module converting the first pixel coordinates into the left ear space coordinates comprises: analyzing the internal reference matrix to obtain the main point coordinates of the camera and the focal length of the camera; Calculating a first difference by subtracting the abscissa of the first pixel coordinate from the abscissa of the camera principal point, and calculating a second difference by subtracting the ordinate of the first pixel coordinate from the ordinate of the camera principal point; calculating a ratio by dividing the depth information by the focal length of the camera; And multiplying the first difference value by the ratio to obtain the abscissa of the left ear space coordinate, multiplying the second difference value by the ratio to obtain the ordinate of the left ear space coordinate, and setting the depth information as the ordinate of the left ear space coordinate.
4. A method for automatically locating binaural positions in a sound field center of a vehicle according to claim 1, wherein step S3 further comprises a step of smoothing the sound field center point, in particular comprising: Calculating the Euclidean distance between the sound field center point obtained by the calculation of the current frame and the effective sound field center point determined by the previous frame through the Euclidean distance formula; judging whether the Euclidean distance is smaller than a preset distance threshold value or not; If the Euclidean distance is smaller than the distance threshold, keeping the effective sound field center point determined by the previous frame as the effective sound field center point of the current frame; And if the Euclidean distance is greater than or equal to the distance threshold, updating the sound field center point obtained by calculating the current frame into a new effective sound field center point.
5. A method for automatic localization of binaural positions for a sound field center of a vehicle according to claim 1, wherein in step S4, the step of the audio signal processing module calculating the spatial distance and determining the delay compensation time and the gain compensation coefficient comprises in particular: calculating the linear distance from the sound field center point to a loudspeaker unit in the loudspeaker array as the space distance through a Euclidean distance formula; Determining the maximum distance and the minimum distance in the straight line distances from the sound field center point to all the loudspeaker units through numerical comparison; Calculating a difference by subtracting a straight-line distance of the speaker unit from the maximum distance, and calculating the delay compensation time of the speaker unit by dividing the difference by an acoustic wave propagation velocity constant; A ratio is calculated by dividing a straight line distance of the speaker unit by the minimum distance, and the ratio is determined as the gain compensation coefficient of the speaker unit.
6. A method for automatically locating binaural positions in a sound field center of a vehicle according to claim 5, wherein the head pose angle comprises a head yaw angle, and wherein in step S4 the step of the audio signal processing module calculating the head yaw angle comprises: calculating a longitudinal coordinate difference by subtracting the ordinate of the left ear space coordinate from the ordinate of the right ear space coordinate, and calculating a transverse coordinate difference by subtracting the abscissa of the right ear space coordinate from the abscissa of the left ear space coordinate; Calculating a slope ratio by dividing the longitudinal coordinate difference by the transverse coordinate difference; and calculating an arc tangent function value of the slope ratio through arc tangent trigonometric function operation, and calculating the head yaw angle of the driver through subtracting an arc value corresponding to ninety degrees from the arc tangent function value.
7. A method for automatically locating binaural positions in a sound field of a vehicle according to claim 6, wherein in step S4, the step of the audio signal processing module solving for the relative azimuth and the elevation angle comprises: calculating a longitudinal distance by subtracting the ordinate of the sound field center point from the ordinate of the speaker unit, calculating a transverse distance by subtracting the abscissa of the sound field center point from the abscissa of the speaker unit, and calculating an arctangent function value of the ratio of the longitudinal distance to the transverse distance by arctangent trigonometric function operation, thereby calculating a global azimuth; Calculating the relative azimuth by subtracting the head yaw angle from the global azimuth; And calculating a vertical distance by subtracting the vertical coordinate of the sound field center point from the vertical coordinate of the loudspeaker unit, and calculating an arcsine function value of the ratio of the vertical distance to the linear distance by arcsine trigonometric function operation, thereby calculating the pitch angle.
8. A method for automatic localization of binaural positions for a sound field center of a vehicle according to claim 1, wherein in step S5 the step of processing the original audio signal comprises in particular: Modulating the original audio signal by using the delay compensation time and the gain compensation coefficient to obtain an intermediate signal, and performing convolution operation on the intermediate signal by using the impulse response function to obtain the output signal; the convolution operation specifically comprises: Acquiring the filter length of the impulse response function; Selecting the amplitude values of the intermediate signals corresponding to a plurality of historical time indexes before the current time index on a time axis; Obtaining a product result by multiplying the amplitude of the intermediate signal with the coefficient of the impulse response function of the corresponding index; And performing summation operation on all the product results to calculate a summation result, and taking the summation result as the amplitude of the final output signal at the current time index.
9. A method for automatically locating binaural positions in an automotive sound field center according to claim 1, wherein the depth information is obtained by a method comprising: The processor module performs stereo matching on a left viewpoint image and a right viewpoint image acquired by the binocular camera, and calculates the parallax value of the left ear characteristic point in the left viewpoint image and the right viewpoint image by comparing the abscissa of the same-name characteristic point; According to the focal length value and the baseline distance of the binocular camera, obtaining the depth information by dividing the product of the focal length value and the baseline distance by the parallax value; in step S3, the processor module performs optimal estimation and trajectory smoothing on the position state of the sound field center point by using a kalman filter algorithm.
10. A system for automatic binaural localization of a car sound field center, characterized in that it is applied to a method for automatic binaural localization of a car sound field center according to any one of claims 1-9, comprising: an image acquisition module configured to be installed within an automobile cockpit for acquiring raw image data including a driver head region; A processor module configured to receive the raw image data and perform image preprocessing, binaural feature recognition, coordinate conversion, and sound field center point calculation steps, the processor module having a neural network model deployed therein; a speaker array including a plurality of speaker units distributed in a vehicle interior; The audio signal processing module is configured to execute an audio parameter self-adaptive resolving step and a signal output and circulation control step, wherein a head related transfer function database is stored in the audio signal processing module, and the audio signal processing module is configured to call a corresponding impulse response function according to the resolved relative azimuth angle and pitch angle to carry out convolution processing on an audio signal.

Description

Method and system for automatically positioning double-ear position of automobile sound field center Technical Field The application relates to the technical field of automobile electronic technology and audio signal processing, in particular to a method and a system for automatically positioning the positions of ears of an automobile sound field center. Background Along with the improvement of the intelligent level of the automobile cabin, the vehicle-mounted sound system has become an important index for measuring the riding experience. In order to create an auditory experience with immersive and sound image localization in a limited vehicle interior space, modern car audio systems require delay and gain adjustments to the individual speaker channels by digital signal processors to construct an optimal listening position with an ideal sound field distribution. In the prior art, such sound field optimization is preset based on the theoretical center position of the driver's seat, i.e. assuming that the driver's head is always on fixed spatial coordinates. However, in an actual driving scene, the head position of the driver is not stationary, and the driver rotates, tilts, or displaces the head by observing the rearview mirror, the side window, or adjusting the sitting posture. Although some technologies have been tried to introduce a head tracking function to improve this problem, there are still technical bottlenecks in practical applications. First, the prior art is limited to overall tracking of the driver's head contour or facial region, lacking binaural position recognition and localization capabilities. Since the human ears are located on both sides of the head, the actual spatial position of the ears may be geometrically displaced relative to the center of the head when the driver is performing a yaw (e.g., roll) or pitch motion of the head. The prior art cannot sense the tiny gesture change, so that the center of the sound field cannot be truly aligned with the midpoint of the binaural connecting line, and the relative azimuth angle and pitch angle data are difficult to acquire, so that the head related transfer function cannot be accurately matched for frequency response and phase compensation. This makes when the driver turns his head, the virtual sound source can not follow the movement of the auditory organs and keep accurate positioning, resulting in blurred or drifting sound images, and can not eliminate the hearing deviation caused by the traditional fixed sound field. Secondly, the environment in the vehicle has the characteristics of severe light change (such as tunnel entering and exiting and direct sunlight) and frequent vehicle vibration, which puts higher demands on the reliability of the visual tracking system. The existing camera-based scheme is easy to lose feature points under complex illumination conditions, so that identification fails. More importantly, the prior art lacks a smooth processing mechanism for tiny displacement, and when vibration is generated during vehicle running or an unintentional tiny shake is generated by a driver, the system can be misjudged to be effectively moved and sound field parameters are frequently adjusted. Such oversensitive adjustments may result in discontinuous jumps or unstable fluctuations in the audio output, which may instead cause audible discomfort to the driver. Finally, to achieve high-precision three-dimensional spatial localization, some solutions tend to employ infrared sensors, lidars, or complex binocular depth camera arrays, which increase the cost and integration difficulty of the vehicle hardware. The low-cost scheme based on the monocular camera is difficult to deduce three-dimensional depth information from a two-dimensional image due to the lack of an effective geometric optical resolving model, or the calculation process is too complex, so that the system delay is too high, and quick response can not be realized under the condition of limited computational power resources. The lag of the reaction speed makes the sound field adjustment always lag behind the action of the driver, and the real-time self-adaptive surround sound effect cannot be realized. Disclosure of Invention The invention aims to provide a method and a system for automatically positioning the positions of two ears of an automobile sound field center, which at least solve one technical problem that in the prior art, the two ears of the automobile sound field center cannot be accurately positioned due to the fact that only the head outline can be tracked, and the problem that the sound field center is deviated, the sound image positioning is fuzzy and the hearing immersion is lost when a driver turns or shifts due to the lack of an adaptive phase and frequency response compensation mechanism for the head posture change of the driver. In a first aspect of the present invention, a method for automatically locating a binaural position in a sound field center of an