US-12621627-B2 - Signal processing device, signal processing method, and program
Abstract
The present technology relates to a signal processing device, signal processing method, and program capable of providing a higher realistic feeling. A signal processing device includes: an acquisition unit that acquires audio data of an audio object and metadata including position information indicating a position of the audio object and direction information indicating a direction of the audio object; and a signal generation unit that generates a reproduction signal for reproducing a sound of the audio object at a listening position on the basis of listening position information indicating the listening position, listener direction information indicating a direction of a listener at the listening position, the position information, the direction information, and the audio data. The present technology is applicable to a transmission reproduction system.
Inventors
- Ryuichi Namba
- Makoto Akune
- Keiichi Aoyama
- Yoshiaki Oikawa
Assignees
- Sony Group Corporation
Dates
- Publication Date
- 20260505
- Application Date
- 20240516
- Priority Date
- 20190621
Claims (20)
- 1 . A signal processing device comprising: An audio data acquirer that acquires audio data of an audio object and metadata including position information indicating a position of the audio object and direction information indicating a direction of the audio object; and a signal generator that generates a reproduction signal for reproducing a sound of the audio object at a listening position on a basis of listening position information indicating the listening position, listener direction information indicating a direction of a listener at the listening position, the position information, the direction information, and the audio data, wherein the signal generator generates the reproduction signal on a basis of directional characteristic data indicating a directional characteristic of the audio object, the listening position information, the listener direction information, the position information, the direction information, and the audio data; and the directional characteristic data has a value of an ID indicating the audio object type, an azimuth angle and elevation angle indicating a direction viewed from the audio object, and a distance from the audio object as arguments of function.
- 2 . The signal processing device according to claim 1 , wherein the audio data acquirer acquires the metadata at predetermined time intervals.
- 3 . The signal processing device according to claim 1 , wherein the signal generator generates the reproduction signal on a basis of the directional characteristic data determined for a type of the audio object.
- 4 . The signal processing device according to claim 1 , wherein the direction information includes an azimuth angle indicating the direction of the audio object.
- 5 . The signal processing device according to claim 1 , wherein the direction information includes an azimuth angle and elevation angle indicating the direction of the audio object.
- 6 . The signal processing device according to claim 1 , wherein the direction information includes an azimuth angle and elevation angle indicating the direction of the audio object and a tilt angle indicating rotation of the audio object.
- 7 . The signal processing device according to claim 1 , wherein the listening position information indicates the listening position that is determined in advance and is fixed, and the listener direction information indicates the direction of the listener that is determined in advance and is fixed.
- 8 . The signal processing device according to claim 7 , wherein the position information includes an azimuth angle and elevation angle indicating the direction of the audio object viewed from the listening position and a radius indicating a distance from the listening position to the audio object.
- 9 . The signal processing device according to claim 1 , wherein the listening position information indicates the listening position that is arbitrarily determined, and the listener direction information indicates the direction of the listener that is arbitrarily determined.
- 10 . The signal processing device according to claim 9 , wherein the position information is coordinates of an orthogonal coordinate system indicating the position of the audio object.
- 11 . The signal processing device according to claim 1 , wherein the signal generator generates the reproduction signal on a basis of the directional characteristic data, relative distance information obtained on a basis of the listening position information and the position information and indicating a relative distance between the audio object and the listening position, relative direction information obtained on a basis of the listening position information, the listener direction information, the position information, and the direction information and indicating a relative direction between the audio object and the listener, and the audio data.
- 12 . The signal processing device according to claim 11 , wherein the relative direction information includes an azimuth angle and elevation angle indicating the relative direction between the audio object and the listener.
- 13 . The signal processing device according to claim 11 , wherein the relative direction information includes information indicating the direction of the listener viewed from the audio object and information indicating the direction of the audio object viewed from the listener.
- 14 . The signal processing device according to claim 13 , wherein the signal generator generates the reproduction signal on a basis of information indicating a transfer characteristic of the direction of the listener viewed from the audio object, the information being obtained on a basis of the directional characteristic data and the information indicating the direction of the listener viewed from the audio object.
- 15 . A signal processing method comprising: causing a signal processing device to acquire audio data of an audio object and metadata including position information indicating a position of the audio object and direction information indicating a direction of the audio object, and generate a reproduction signal for reproducing a sound of the audio object at a listening position on a basis of listening position information indicating the listening position, listener direction information indicating a direction of a listener at the listening position, the position information, the direction information, and the audio data, wherein the reproduction signal is generated on a basis of directional characteristic data indicating a directional characteristic of the audio object, the listening position information, the listener direction information, the position information, the direction information, and the audio data; and the directional characteristic data has a value of an ID indicating the audio object type, an azimuth angle and elevation angle indicating a direction viewed from the audio object, and a distance from the audio object as arguments of function.
- 16 . The signal processing device according to claim 15 , wherein the reproduction signal is generated on a basis of the directional characteristic data, relative distance information obtained on a basis of the listening position information and the position information and indicating a relative distance between the audio object and the listening position, relative direction information obtained on a basis of the listening position information, the listener direction information, the position information, and the direction information and indicating a relative direction between the audio object and the listener, and the audio data.
- 17 . The signal processing device according to claim 16 , wherein the relative direction information includes an azimuth angle and elevation angle indicating the relative direction between the audio object and the listener.
- 18 . A non-transitory computer readable medium storing instructions that, when executed by a computer, cause the computer to execute the processes of: acquiring audio data of an audio object and metadata including position information indicating a position of the audio object and direction information indicating a direction of the audio object; and generating a reproduction signal for reproducing a sound of the audio object at a listening position on a basis of listening position information indicating the listening position, listener direction information indicating a direction of a listener at the listening position, the position information, the direction information, and the audio data, wherein the reproduction signal is generated on a basis of directional characteristic data indicating a directional characteristic of the audio object, the listening position information, the listener direction information, the position information, the direction information, and the audio data; and the directional characteristic data has a value of an ID indicating the audio object type, an azimuth angle and elevation angle indicating a direction viewed from the audio object, and a distance from the audio object as arguments of function.
- 19 . The signal processing device according to claim 18 , wherein the reproduction signal is generated on a basis of the directional characteristic data, relative distance information obtained on a basis of the listening position information and the position information and indicating a relative distance between the audio object and the listening position, relative direction information obtained on a basis of the listening position information, the listener direction information, the position information, and the direction information and indicating a relative direction between the audio object and the listener, and the audio data.
- 20 . The signal processing device according to claim 19 , wherein the relative direction information includes an azimuth angle and elevation angle indicating the relative direction between the audio object and the listener.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS This application claims the benefit under 35 U.S.C. § 120 as a continuation application of U.S. application Ser. No. 17/619,179, filed on Dec. 14, 2021, now U.S. Pat. No. 11,997,472, which claims the benefit under 35 U.S.C. § 371 as a U.S. National Stage Entry of International Application No. PCT/JP2020/022787, filed in the Japanese Patent Office as a Receiving Office on Jun. 10, 2020, which claims priority to Japanese Patent Application Number JP2019-115406, filed in the Japanese Patent Office on Jun. 21, 2019, each of which is hereby incorporated by reference in its entirety. TECHNICAL FIELD The present technology relates to a signal processing device, signal processing method, and program, and more particularly relates to a signal processing device, signal processing method, and program capable of providing a higher realistic feeling. BACKGROUND ART For example, in order to reproduce a sound field from a free viewpoint such as a bird's-eye view or a walk-through, it is important to record a target sound such as a voice of a person, a motion sound of a player such as a ball kicking sound in sports, or a musical instrument sound in music at a signal to noise ratio (SNR) as high as possible. Further, at the same time, it is necessary to reproduce a sound with accurate localization for each sound source of the target sound and to cause sound image localization and the like to follow movement of a viewpoint or the sound source. By the way, a technology capable of providing a higher realistic feeling in a free-viewpoint or fixed-viewpoint content has been desired, and a large number of such technologies have been proposed. For example, as a technology regarding reproduction of a sound field from a free viewpoint, there is proposed a technology for, in a case where a user can freely designate a listening position, performing gain correction and frequency characteristic correction in accordance with a distance from a changed listening position to an audio object (see, for example, Patent Document 1). CITATION LIST Patent Document Patent Document 1: WO 2015/107926 A SUMMARY OF THE INVENTION Problems to be Solved by the Invention However, the technology cited above cannot provide a sufficiently high realistic feeling in some cases. For example, a sound source is not a point sound source in the real world, and a sound wave propagates from a sounding body having a size with a specific directional characteristic including reflection and diffraction caused by the sounding body. A large number of attempts to record a sound field in a target space have been made, however, currently, and even in a case where recording is performed for each sound source, that is, for each audio object, a sufficiently high realistic feeling cannot be obtained in some cases because a direction of each audio object is not considered on a reproduction side. The present technology has been made in view of such a situation, and an object thereof is to provide a higher realistic feeling. Solutions to Problems A signal processing device according to one aspect of the present technology includes: an acquisition unit that acquires audio data of an audio object and metadata including position information indicating a position of the audio object and direction information indicating a direction of the audio object; and a signal generation unit that generates a reproduction signal for reproducing a sound of the audio object at a listening position on the basis of listening position information indicating the listening position, listener direction information indicating a direction of a listener at the listening position, the position information, the direction information, and the audio data. A signal processing method or a program according to one aspect of the present technology includes: a step of acquiring audio data of an audio object and metadata including position information indicating a position of the audio object and direction information indicating a direction of the audio object; and a step of generating a reproduction signal for reproducing a sound of the audio object at a listening position on the basis of listening position information indicating the listening position, listener direction information indicating a direction of a listener at the listening position, the position information, the direction information, and the audio data. In one aspect of the present technology, audio data of an audio object and metadata including position information indicating a position of the audio object and direction information indicating a direction of the audio object are acquired, and a reproduction signal for reproducing a sound of the audio object at a listening position is generated on the basis of listening position information indicating the listening position, listener direction information indicating a direction of a listener at the listening position, the position information, the direction informat