EP-4738828-A1 - ENCODING DEVICE, DECODING DEVICE, ENCODING METHOD, AND DECODING METHOD
Abstract
In the present invention, an encoding device acquires a first three-dimensional data generation model corresponding to a first timepoint and a second three-dimensional data generation model corresponding to a second timepoint, and generates a bit stream by encoding the acquired first three-dimensional data generation model and second three-dimensional data generation model. A decoding device acquires a bit stream, and decodes, from the bit stream, a first three-dimensional data generation model corresponding to a first timepoint and a second three-dimensional data generation model corresponding to a second timepoint. When viewpoint information including a viewpoint and a line-of-sight direction is inputted, the first three-dimensional data generation model and the second three-dimensional data generation model each output a two-dimensional image of a subject as viewed from the viewpoint and the line-of-sight direction.
Inventors
- SUGIO, TOSHIYASU
- IGUCHI, NORITAKA
- NISHI, TAKAHIRO
Assignees
- Panasonic Intellectual Property Corporation of America
Dates
- Publication Date
- 20260506
- Application Date
- 20240625
Claims (20)
- An encoding device comprising: circuitry; and memory coupled to the circuitry, wherein in operation, the circuitry: obtains a first three-dimensional data generative model corresponding to a first time and a second three-dimensional data generative model corresponding to a second time; and generates a bitstream by encoding the first three-dimensional data generative model obtained and the second three-dimensional data generative model obtained, and when receiving viewpoint information including a viewpoint and a line-of-sight direction, each of the first three-dimensional data generative model and the second three-dimensional data generative model outputs a two-dimensional image of a subject as viewed from the viewpoint and the line-of-sight direction.
- The encoding device according to claim 1, wherein each of the first three-dimensional data generative model and the second three-dimensional data generative model is a learning model using a neural network.
- The encoding device according to claim 1, wherein the bitstream includes first time information indicating the first time and second time information indicating the second time.
- The encoding device according to claim 3, wherein the bitstream includes a first frame number corresponding to the first time and a second frame number corresponding to the second time.
- The encoding device according to any one of claims 1 to 4, wherein the bitstream includes frame rate information regarding a frame rate of a plurality of training images used to generate the first three-dimensional data generative model and the second three-dimensional data generative model, and the plurality of training images are two-dimensional images obtained by capturing the subject at different points in time.
- The encoding device according to any one of claims 1 to 4, wherein the bitstream includes viewpoint information including a viewpoint and a line-of-sight direction for a plurality of training images used to generate the first three-dimensional data generative model and the second three-dimensional data generative model.
- The encoding device according to claim 6, wherein the plurality of training images are two-dimensional images obtained by capturing the subject from mutually different viewpoints and mutually different line-of-sight directions, and the viewpoint information includes the mutually different viewpoints and the mutually different line-of-sight directions.
- The encoding device according to any one of claims 1 to 4, wherein in encoding the second three-dimensional data generative model, the circuitry calculates difference information indicating a difference between the first three-dimensional data generative model and the second three-dimensional data generative model, and the bitstream includes the difference information.
- The encoding device according to claim 8, wherein the difference includes a difference between a weight parameter associated with a node included in the first three-dimensional data generative model and a weight parameter associated with a node included in the second three-dimensional data generative model.
- The encoding device according to claim 8, wherein the bitstream includes reference information indicating that the difference information has been calculated with reference to the first three-dimensional data generative model.
- The encoding device according to any one of claims 1 to 4, wherein the first time corresponds to a random access point, and the first three-dimensional data generative model is encoded using intra prediction or using inter prediction with a predicted value of 0.
- The encoding device according to claim 11, wherein the first three-dimensional data generative model and the second three-dimensional data generative model are included in one group among a plurality of groups, and the first three-dimensional data generative model is placed first in data order of three-dimensional data generative models included in the one group.
- The encoding device according to claim 12, wherein in encoding each of the three-dimensional data generative models, the bitstream includes permission information indicating whether referring to another three-dimensional data generative model included in a different group is allowed for the three-dimensional data generative model.
- The encoding device according to any one of claims 1 to 4, wherein the first three-dimensional data generative model corresponds to a first period including the first time, and the second three-dimensional data generative model corresponds to a second period including the second time.
- The encoding device according to claim 14, wherein a plurality of first training images used to generate the first three-dimensional data generative model are two-dimensional images obtained by capturing the subject at different points in time during the first period.
- The encoding device according to claim 14, wherein when receiving a time included in the first period, the first three-dimensional data generative model outputs a two-dimensional image of the subject captured at the time received.
- The encoding device according to claim 14, wherein the bitstream includes count information indicating a maximum number of images to be generated by the first three-dimensional data generative model.
- The encoding device according to claim 15, wherein the bitstream includes first information regarding the plurality of first training images, and the first information includes a plurality of viewpoints, a plurality of line-of-sight directions, and a plurality of points in time, corresponding to the plurality of first training images.
- The encoding device according to claim 14, wherein the first period or the second period is dynamically determined according to the subject.
- The encoding device according to any one of claims 1 to 4, wherein the circuitry further: stores, in the memory, the first three-dimensional data generative model generated; and generates the second three-dimensional data generative model based on the first three-dimensional data generative model stored in the memory.
Description
[Technical Field] The present disclosure relates to an encoding device, a decoding device, an encoding method, and a decoding method. [Background Art] Devices or services utilizing three-dimensional data are expected to find their widespread use in a wide range of fields, such as computer vision that enables autonomous operations of cars or robots, map information, monitoring, infrastructure inspection, and video distribution. Three-dimensional data is obtained through various means including a distance sensor such as a rangefinder, as well as a stereo camera and a combination of a plurality of monocular cameras. Methods of representing three-dimensional data include a method known as a point cloud scheme that represents the shape of a three-dimensional structure by a point cloud in a three-dimensional space. In the point cloud scheme, the positions and colors of a point cloud are stored. While point cloud is expected to be a mainstream method of representing three-dimensional data, a massive amount of data of a point cloud necessitates compression of the amount of three-dimensional data by encoding for accumulation and transmission, as in the case of a two-dimensional moving picture (examples include Moving Picture Experts Group-4 Advanced Video Coding (MPEG-4 AVC) and High Efficiency Video Coding (HEVC) standardized by MPEG). Meanwhile, point cloud compression is partially supported by, for example, an open-source library (Point Cloud Library) for point cloud-related processing. Furthermore, a technique for searching for and displaying a facility located in the surroundings of the vehicle by using three-dimensional map data is known (see, for example, Patent Literature (PTL) 1). [Citation List] [Patent Literature] [PTL 1] International Publication WO 2014/020663 [Non Patent Literature] [NPL 1] ISO/IEC 15938-17:2022 (Information technology - Multimedia content description interface - Part 17: Compression of neural networks for multimedia content description and analysis (https//www.iso.org/standard/78480.html)) [Summary of Invention] [Technical Problem] An object of the present disclosure is to provide an encoding device or the like that can reduce the amount of data from which a moving image from an arbitrary viewpoint is obtained. [Solution to Problem] An encoding device according to one aspect of the present disclosure includes circuitry and memory coupled to the circuitry. In operation, the circuitry: obtains a first three-dimensional data generative model corresponding to a first time and a second three-dimensional data generative model corresponding to a second time; and generates a bitstream by encoding the first three-dimensional data generative model obtained and the second three-dimensional data generative model obtained, and when receiving viewpoint information including a viewpoint and a line-of-sight direction, each of the first three-dimensional data generative model and the second three-dimensional data generative model outputs a two-dimensional image of a subject as viewed from the viewpoint and the line-of-sight direction. A decoding device according to one aspect of the present disclosure includes circuitry and memory coupled to the circuitry. In operation, the circuitry: obtains a bitstream; and decodes, from the bitstream, a first three-dimensional data generative model corresponding to a first time and a second three-dimensional data generative model corresponding to a second time, and when receiving viewpoint information including a viewpoint and a line-of-sight direction, each of the first three-dimensional data generative model and the second three-dimensional data generative model outputs a two-dimensional image of a subject as viewed from the viewpoint and the line-of-sight direction. It is to be noted that these general or specific aspects may be implemented as a system, an integrated circuit, a computer program, or a computer-readable recording medium such as a CD-ROM, or may be implemented as any combination of a system, a method, an integrated circuit, a computer program, and a recording medium. [Advantageous Effects of Invention] A decoding device, and the like, according to the present disclosure is capable of outputting three-dimensional data with different resolutions. [Brief Description of Drawings] [FIG. 1] FIG. 1 is a diagram illustrating a configuration example of a three-dimensional data encoding and decoding system according to an embodiment in Embodiment 1.[FIG. 2] FIG. 2 is a diagram illustrating an example of point cloud data in Embodiment 1.[FIG. 3] FIG. 3 is a diagram illustrating a configuration example of a data file describing information of the point cloud data in Embodiment 1.[FIG. 4] FIG. 4 is a diagram illustrating the configuration of three-dimensional mesh data in Embodiment 1.[FIG. 5] FIG. 5 is a diagram illustrating a configuration example of a data file describing information of the three-dimensional mesh data in Embodiment 1.[FIG. 6] FIG. 6 is a dia