US-12627941-B2 - Tracking control method and apparatus, storage medium, and computer program product

US12627941B2US 12627941 B2US12627941 B2US 12627941B2US-12627941-B2

Abstract

Embodiments of this application disclose a tracking control method. When a sound source object makes a sound, a control device determines an azimuth θ 1 of the sound source object relative to a first microphone array based on detection data of the first microphone array, and determines an azimuth θ 2 of the sound source object relative to a second microphone array based on detection data of the second microphone array. The control device determines a location of the sound source object based on the azimuth θ 1 , the azimuth θ 2 , a location of the first microphone array, and a location of the second microphone array. The control device controls, based on the location of the sound source object, a camera to shoot the sound source object to obtain a tracking video image. According to this application, a speaker can be accurately recognized, to improve accuracy of automatic tracking.

Inventors

Lei Zhang
Zhihui Liu

Assignees

HUAWEI TECHNOLOGIES CO., LTD.

Dates

Publication Date: 20260512
Application Date: 20240524
Priority Date: 20211125

Claims (20)

1 . A tracking control method, wherein the method is applied to a tracking control system, the tracking control system comprises a first microphone array, a second microphone array, a camera, and a control device, and the method comprises: determining, by the control device, a location of the first microphone array and a location of the camera; based on a sound source object making a sound, determining, by the control device, a location of the sound source object based on a location of the sound source object relative to the first microphone array, a location of the sound source object relative to the second microphone array, the location of the first microphone array, and a location of the second microphone array; and determining, by the control device, a tracking operation on the camera based on the location of the sound source object and the location of the camera.
2 . The method according to claim 1 , wherein the first microphone array is integrated with a first sound emitter, the second microphone array comprises a first microphone and a second microphone, and the determining the location of the first microphone array comprises: determining, by the control device, a distance D 1 between the first sound emitter and the first microphone and a distance D 2 between the first sound emitter and the second microphone based on a time at which the first microphone and the second microphone receive a sound signal from the first sound emitter and a time at which the first sound emitter emits the sound signal; and determining, by the control device, a location of the first microphone array relative to the second microphone array based on a location of the first microphone, a location of the second microphone, the distance D 1 , and the distance D 2 .
3 . The method according to claim 1 , wherein the tracking control system further comprises a second sound emitter and a third sound emitter, the second sound emitter and the third sound emitter are integrated on a same electronic screen as the second microphone array, and the determining the location of the first microphone array further comprises: obtaining, by the control device, an azimuth θ 3 of the second sound emitter relative to the first microphone array and an azimuth θ 4 of the third sound emitter relative to the first microphone array that are sent by the first microphone array; and determining, by the control device, an orientation of the first microphone array based on the azimuth θ 3 , the azimuth θ 4 , a location of the second sound emitter, and a location of the third sound emitter.
4 . The method according to claim 1 , wherein the camera is integrated with a fourth sound emitter, the second microphone array comprises a first microphone and a second microphone, and the determining the location of the camera comprises: determining, by the control device, a distance D 3 between the first microphone and the fourth sound emitter and a distance D 4 between the second microphone and the fourth sound emitter based on a time at which the first microphone and the second microphone receive a sound signal from the fourth sound emitter and a time at which the fourth sound emitter emits the sound signal; and determining, by the control device, a location of the camera relative to the second microphone array based on a location of the first microphone, a location of the second microphone, the distance D 3 , and the distance D 4 .
5 . The method according to claim 3 , wherein the first microphone array is integrated with a first sound emitter, the camera is integrated with a fourth sound emitter and a third microphone array, and the determining the location of the camera comprises: determining, by the control device, an azimuth θ 6 of the first sound emitter relative to the third microphone array based on data detected by the third microphone array when the first sound emitter emits a sound signal, and determining an azimuth θ 7 of the fourth sound emitter relative to the first microphone array based on data detected by the first microphone array when the fourth sound emitter emits a sound signal; and determining, by the control device, a deviation angle of the camera based on the azimuth θ 6 , the azimuth θ 7 , and the orientation of the first microphone array.
6 . The method according to claim 3 , wherein the first microphone array is integrated with a light emitter, the camera is integrated with a fourth sound emitter, and the determining the location of the camera comprises: determining, by the control device, a location of a light emitting point in an image shot by the camera, wherein the image is shot when the light emitter emits light, and determining an azimuth θ 9 of the light emitter relative to the camera based on the location of the light emitting point in the image and a rotation angle of the camera; determining, by the control device, an azimuth θ 7 of the fourth sound emitter relative to the first microphone array based on data detected by the first microphone array when the fourth sound emitter emits a sound signal; and determining, by the control device, an orientation of the camera based on the azimuth θ 9 , the azimuth θ 7 , and the orientation of the first microphone array.
7 . The method according to claim 1 , wherein the first microphone array is integrated with a first sound emitter, the second microphone array comprises a first microphone and a second microphone, and the determining a location of the first microphone array comprises: determining, by the control device, a distance D 5 between the first sound emitter and the second microphone array and an azimuth θ 10 of the first sound emitter relative to the second microphone array based on data detected by the second microphone array when the first sound emitter emits a sound signal; and determining, by the control device, the location of the first microphone array based on the distance D 5 , the azimuth θ 10 , and the location of the second microphone array.
8 . The method according to claim 1 , wherein the first microphone array is integrated with a first sound emitter, the second microphone array is integrated with a fifth sound emitter, and the determining the location of the first microphone array comprises: determining, by the control device, an azimuth θ 10 of the first sound emitter relative to the second microphone array based on data detected by the second microphone array when the first sound emitter emits a sound signal, and determining an azimuth θ 11 of the fifth sound emitter relative to the first microphone array based on data detected by the first microphone array when the fifth sound emitter emits a sound signal; and determining, by the control device, an orientation of the first microphone array based on the azimuth θ 10 , the azimuth θ 11 , and an orientation of the second microphone array.
9 . The method according to claim 1 , wherein the camera is integrated with a fourth sound emitter, and the method further comprises: determining, by the control device, a distance D 6 between the first microphone array and the fourth sound emitter and a distance D 7 between the second microphone array and the fourth sound emitter based on a time at which the first microphone array and the second microphone array receive a sound signal from the fourth sound emitter and a time at which the fourth sound emitter emits the sound signal; and determining, by the control device, the location of the camera based on the location of the first microphone array, the location of the second microphone array, the distance D 6 , and the distance D 7 .
10 . The method according to claim 1 , wherein the determining, by the control device, the tracking operation on the camera based on the location of the sound source object and the location of the camera comprises: determining, by the control device, an azimuth of the sound source object relative to the camera and a distance between the sound source object and the camera based on the location of the sound source object and the location of the camera; and determining, by the control device, a tracking rotation angle of the camera based on the azimuth of the sound source object relative to the camera, and determining a tracking focal length of the camera based on the distance between the sound source object and the camera.
11 . The method according to claim 1 , wherein the tracking control system further comprises another camera, and the determining, by the control device, the tracking operation on the camera based on the location of the sound source object and the location of the camera comprises: determining, by the control device based on the location of the sound source object and locations of the camera and the another camera, a target camera that is among the camera and the another camera and that is farther away from the sound source object, and determining a tracking operation on the target camera based on the location of the sound source object and the location of the target camera.
12 . A computing device, wherein the computing device is applied to a tracking control system, the tracking control system comprises a first microphone array, a second microphone array, a camera, and a control device, wherein the computing device comprises a memory and a processor, the memory is configured to store computer instructions, and the processor is configured to execute the computer instructions stored in the memory, so that the computer device performs operations comprising: determine a location of the first microphone array and a location of the camera; based on a sound source object making a sound, determining a location of the sound source object based on a location of the sound source object relative to the first microphone array, a location of the sound source object relative to the second microphone array, the location of the first microphone array, and a location of the second microphone array; and determining a tracking operation on the camera based on the location of the sound source object and the location of the camera.
13 . The computing device according to claim 12 , wherein the first microphone array is integrated with a first sound emitter, the second microphone array comprises a first microphone and a second microphone, and the determining the location of the first microphone array comprises: determining a distance D 1 between the first sound emitter and the first microphone and a distance D 2 between the first sound emitter and the second microphone based on a time at which the first microphone and the second microphone receive a sound signal from the first sound emitter and a time at which the first sound emitter emits the sound signal; and determining a location of the first microphone array relative to the second microphone array based on a location of the first microphone, a location of the second microphone, the distance D 1 , and the distance D 2 .
14 . The computing device according to claim 12 , wherein the tracking control system further comprises a second sound emitter and a third sound emitter, the second sound emitter and the third sound emitter are integrated on a same electronic screen as the second microphone array, and the determining the location of the first microphone array comprises: obtaining an azimuth θ 3 of the second sound emitter relative to the first microphone array and an azimuth θ 4 of the third sound emitter relative to the first microphone array that are sent by the first microphone array; and determining an orientation of the first microphone array based on the azimuth θ 3 , the azimuth θ 4 , a location of the second sound emitter, and a location of the third sound emitter.
15 . The computing device according to claim 12 , wherein the camera is integrated with a fourth sound emitter, the second microphone array comprises a first microphone and a second microphone, and the determining the location of the camera comprises: determining a distance D 3 between the first microphone and the fourth sound emitter and a distance D 4 between the second microphone and the fourth sound emitter based on a time at which the first microphone and the second microphone receive a sound signal from the fourth sound emitter and a time at which the fourth sound emitter emits the sound signal; and determining a location of the camera relative to the second microphone array based on a location of the first microphone, a location of the second microphone, the distance D 3 , and the distance D 4 .
16 . The computing device according to claim 14 , wherein the first microphone array is integrated with a first sound emitter, the camera is integrated with a fourth sound emitter and a third microphone array, and the determining the location of the camera comprises: determining an azimuth θ 6 of the first sound emitter relative to the third microphone array based on data detected by the third microphone array when the first sound emitter emits a sound signal, and determine an azimuth θ 7 of the fourth sound emitter relative to the first microphone array based on data detected by the first microphone array when the fourth sound emitter emits a sound signal; and determining a deviation angle of the camera based on the azimuth θ 6 , the azimuth θ 7 , and the orientation of the first microphone array.
17 . The computing device according to claim 14 , wherein the first microphone array is integrated with a light emitter, the camera is integrated with a fourth sound emitter, and the determining the location of the camera comprises: determining a location of a light emitting point in an image shot by the camera, wherein the image is shot when the light emitter emits light, and determine an azimuth θ 9 of the light emitter relative to the camera based on the location of the light emitting point in the image and a rotation angle of the camera; determining an azimuth θ 7 of the fourth sound emitter relative to the first microphone array based on data detected by the first microphone array when the fourth sound emitter emits a sound signal; and determining an orientation of the camera based on the azimuth θ 9 , the azimuth θ 7 , and the orientation of the first microphone array.
18 . A non-transitory computer-readable storage medium, wherein the computer-readable storage medium stores computer program code, and when the computer program code is executed by a computing device, the computing device performs operations applied to a tracking control system, the tracking control system comprises a first microphone array, a second microphone array, a camera, and a control device, and the operations comprise: determining a location of the first microphone array and a location of the camera; based on a sound source object making a sound, determining a location of the sound source object based on a location of the sound source object relative to the first microphone array, a location of the sound source object relative to the second microphone array, the location of the first microphone array, and a location of the second microphone array; and determining a tracking operation on the camera based on the location of the sound source object and the location of the camera.
19 . The computer-readable storage medium according to claim 18 , wherein the first microphone array is integrated with a first sound emitter, the second microphone array comprises a first microphone and a second microphone, and the determining the location of the first microphone array comprises: determining a distance D 1 between the first sound emitter and the first microphone and a distance D 2 between the first sound emitter and the second microphone based on a time at which the first microphone and the second microphone receive a sound signal from the first sound emitter and a time at which the first sound emitter emits the sound signal; and determining a location of the first microphone array relative to the second microphone array based on a location of the first microphone, a location of the second microphone, the distance D 1 , and the distance D 2 .
20 . The computer-readable storage medium according to claim 18 , wherein the tracking control system further comprises a second sound emitter and a third sound emitter, the second sound emitter and the third sound emitter are integrated on a same electronic screen as the second microphone array, and the determining the location of the first microphone array comprises: obtaining an azimuth θ 3 of the second sound emitter relative to the first microphone array and an azimuth θ 4 of the third sound emitter relative to the first microphone array that are sent by the first microphone array; and determining an orientation of the first microphone array based on the azimuth θ 3 , the azimuth θ 4 , a location of the second sound emitter, and a location of the third sound emitter.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application is a continuation of International Application No. PCT/CN2022/105499, filed on Jul. 13, 2022, which claims priority to Chinese Patent Application No. 202111415949.4, filed on Nov. 25, 2021, and Chinese Patent Application No. 202210119348.7, filed on Feb. 8, 2022. All of the aforementioned patent applications are hereby incorporated by reference in their entireties. TECHNICAL FIELD This application relates to the field of communication technologies, and in particular, to a tracking control method and apparatus, a storage medium, and a computer program product. BACKGROUND Tracking means that a camera is controlled based on a real-time shooting requirement to shoot a key object (a person or an object) in a scene, to output a video image in a video shooting process. For example, in a video conference, the camera may be controlled to shoot a current speaker, and when the speaker changes, the camera may be controlled to shoot a new speaker. In a tracking process, to obtain a video image that includes a key object, a shooting direction of the camera may be adjusted, or a video image may be selected from video images of a plurality of cameras, or a part of the video image may be captured. At present, with the development of computer technology, automatic tracking has developed rapidly and is gradually replacing manual tracking. Generally, a processing process of automatic tracking is as follows: A control device recognizes a video image that is shot by the camera in real time, determines an object (that is, the foregoing key object) having a specified feature in the image, and controls the camera to shoot the object. For example, in a conference scenario, the control device may recognize a person standing or having a mouth movement (speaking) in a video image shot in real time, determine the person as a speaker, and then control the camera to shoot a close-up of the speaker for playing. However, an automatic tracking method in the conventional technology has obvious limitations, and sometimes tracking accuracy is poor. SUMMARY Embodiments of this application provide a tracking control method, to resolve a problem of poor tracking accuracy in the conventional technology. The technical solutions are as follows. According to a first aspect, a tracking control method is provided. The method is applied to a tracking control system, and the tracking control system includes a first microphone array, a second microphone array, a camera, and a control device. The method includes: The control device determines a location of the first microphone array and a location of the camera; when a sound source object makes a sound, the control device determines a location of the sound source object based on a location of the sound source object relative to the first microphone array, a location of the sound source object relative to the second microphone array, the location of the first microphone array, and a location of the second microphone array; and the control device determines a tracking operation on the camera based on the location of the sound source object and the location of the camera. When a speaker speaks, each microphone in the first microphone array may detect corresponding audio data, and the first microphone array sends the audio data to the control device. The control device may perform sound source localization based on the audio data, and determine an azimuth θ1 of the speaker relative to the first microphone array. An algorithm used in a sound source localization process may be a steered-response power (SRP) algorithm or the like. Similarly, the control device may also perform sound source localization based on audio data detected by a microphone in the second microphone array, and determine an azimuth θ2 of the speaker relative to the second microphone array. When deviation angles of the first microphone array and the second microphone array are both 0 degrees, the control device may obtain a location of the speaker through calculation based on the azimuth θ1, the azimuth θ2, the location of the first microphone array, the location of the second microphone array, and a geometric relationship between the first microphone array, the second microphone array, and the speaker. When neither of the deviation angles of the first microphone array and the second microphone array is 0 degrees, the control device may obtain the location of the speaker through calculation based on the deviation angle γ1 of the first microphone array, the deviation angle γ2 of the second microphone array, the azimuth θ1, the azimuth θ2, the location of the first microphone array, the location of the second microphone array, and the geometric relationship between the first microphone array, the second microphone array, and the speaker. After determining the location of the speaker, the control device may calculate an azimuth of the speaker relative to the camera and a distance between the