EP-4738828-A1 - ENCODING DEVICE, DECODING DEVICE, ENCODING METHOD, AND DECODING METHOD

EP4738828A1EP 4738828 A1EP4738828 A1EP 4738828A1EP-4738828-A1

Abstract

In the present invention, an encoding device acquires a first three-dimensional data generation model corresponding to a first timepoint and a second three-dimensional data generation model corresponding to a second timepoint, and generates a bit stream by encoding the acquired first three-dimensional data generation model and second three-dimensional data generation model. A decoding device acquires a bit stream, and decodes, from the bit stream, a first three-dimensional data generation model corresponding to a first timepoint and a second three-dimensional data generation model corresponding to a second timepoint. When viewpoint information including a viewpoint and a line-of-sight direction is inputted, the first three-dimensional data generation model and the second three-dimensional data generation model each output a two-dimensional image of a subject as viewed from the viewpoint and the line-of-sight direction.

Inventors

SUGIO, TOSHIYASU
IGUCHI, NORITAKA
NISHI, TAKAHIRO

Assignees

Panasonic Intellectual Property Corporation of America

Dates

Publication Date: 20260506
Application Date: 20240625

Claims (20)

An encoding device comprising: circuitry; and memory coupled to the circuitry, wherein in operation, the circuitry: obtains a first three-dimensional data generative model corresponding to a first time and a second three-dimensional data generative model corresponding to a second time; and generates a bitstream by encoding the first three-dimensional data generative model obtained and the second three-dimensional data generative model obtained, and when receiving viewpoint information including a viewpoint and a line-of-sight direction, each of the first three-dimensional data generative model and the second three-dimensional data generative model outputs a two-dimensional image of a subject as viewed from the viewpoint and the line-of-sight direction.
The encoding device according to claim 1, wherein each of the first three-dimensional data generative model and the second three-dimensional data generative model is a learning model using a neural network.
The encoding device according to claim 1, wherein the bitstream includes first time information indicating the first time and second time information indicating the second time.
The encoding device according to claim 3, wherein the bitstream includes a first frame number corresponding to the first time and a second frame number corresponding to the second time.
The encoding device according to any one of claims 1 to 4, wherein the bitstream includes frame rate information regarding a frame rate of a plurality of training images used to generate the first three-dimensional data generative model and the second three-dimensional data generative model, and the plurality of training images are two-dimensional images obtained by capturing the subject at different points in time.
The encoding device according to any one of claims 1 to 4, wherein the bitstream includes viewpoint information including a viewpoint and a line-of-sight direction for a plurality of training images used to generate the first three-dimensional data generative model and the second three-dimensional data generative model.
The encoding device according to claim 6, wherein the plurality of training images are two-dimensional images obtained by capturing the subject from mutually different viewpoints and mutually different line-of-sight directions, and the viewpoint information includes the mutually different viewpoints and the mutually different line-of-sight directions.
The encoding device according to any one of claims 1 to 4, wherein in encoding the second three-dimensional data generative model, the circuitry calculates difference information indicating a difference between the first three-dimensional data generative model and the second three-dimensional data generative model, and the bitstream includes the difference information.
The encoding device according to claim 8, wherein the difference includes a difference between a weight parameter associated with a node included in the first three-dimensional data generative model and a weight parameter associated with a node included in the second three-dimensional data generative model.
The encoding device according to claim 8, wherein the bitstream includes reference information indicating that the difference information has been calculated with reference to the first three-dimensional data generative model.
The encoding device according to any one of claims 1 to 4, wherein the first time corresponds to a random access point, and the first three-dimensional data generative model is encoded using intra prediction or using inter prediction with a predicted value of 0.
The encoding device according to claim 11, wherein the first three-dimensional data generative model and the second three-dimensional data generative model are included in one group among a plurality of groups, and the first three-dimensional data generative model is placed first in data order of three-dimensional data generative models included in the one group.
The encoding device according to claim 12, wherein in encoding each of the three-dimensional data generative models, the bitstream includes permission information indicating whether referring to another three-dimensional data generative model included in a different group is allowed for the three-dimensional data generative model.
The encoding device according to any one of claims 1 to 4, wherein the first three-dimensional data generative model corresponds to a first period including the first time, and the second three-dimensional data generative model corresponds to a second period including the second time.
The encoding device according to claim 14, wherein a plurality of first training images used to generate the first three-dimensional data generative model are two-dimensional images obtained by capturing the subject at different points in time during the first period.
The encoding device according to claim 14, wherein when receiving a time included in the first period, the first three-dimensional data generative model outputs a two-dimensional image of the subject captured at the time received.
The encoding device according to claim 14, wherein the bitstream includes count information indicating a maximum number of images to be generated by the first three-dimensional data generative model.
The encoding device according to claim 15, wherein the bitstream includes first information regarding the plurality of first training images, and the first information includes a plurality of viewpoints, a plurality of line-of-sight directions, and a plurality of points in time, corresponding to the plurality of first training images.
The encoding device according to claim 14, wherein the first period or the second period is dynamically determined according to the subject.
The encoding device according to any one of claims 1 to 4, wherein the circuitry further: stores, in the memory, the first three-dimensional data generative model generated; and generates the second three-dimensional data generative model based on the first three-dimensional data generative model stored in the memory.

Description

[Technical Field] The present disclosure relates to an encoding device, a decoding device, an encoding method, and a decoding method. [Background Art] Devices or services utilizing three-dimensional data are expected to find their widespread use in a wide range of fields, such as computer vision that enables autonomous operations of cars or robots, map information, monitoring, infrastructure inspection, and video distribution. Three-dimensional data is obtained through various means including a distance sensor such as a rangefinder, as well as a stereo camera and a combination of a plurality of monocular cameras. Methods of representing three-dimensional data include a method known as a point cloud scheme that represents the shape of a three-dimensional structure by a point cloud in a three-dimensional space. In the point cloud scheme, the positions and colors of a point cloud are stored. While point cloud is expected to be a mainstream method of representing three-dimensional data, a massive amount of data of a point cloud necessitates compression of the amount of three-dimensional data by encoding for accumulation and transmission, as in the case of a two-dimensional moving picture (examples include Moving Picture Experts Group-4 Advanced Video Coding (MPEG-4 AVC) and High Efficiency Video Coding (HEVC) standardized by MPEG). Meanwhile, point cloud compression is partially supported by, for example, an open-source library (Point Cloud Library) for point cloud-related processing. Furthermore, a technique for searching for and displaying a facility located in the surroundings of the vehicle by using three-dimensional map data is known (see, for example, Patent Literature (PTL) 1). [Citation List] [Patent Literature] [PTL 1] International Publication WO 2014/020663 [Non Patent Literature] [NPL 1] ISO/IEC 15938-17:2022 (Information technology - Multimedia content description interface - Part 17: Compression of neural networks for multimedia content description and analysis (https//www.iso.org/standard/78480.html)) [Summary of Invention] [Technical Problem] An object of the present disclosure is to provide an encoding device or the like that can reduce the amount of data from which a moving image from an arbitrary viewpoint is obtained. [Solution to Problem] An encoding device according to one aspect of the present disclosure includes circuitry and memory coupled to the circuitry. In operation, the circuitry: obtains a first three-dimensional data generative model corresponding to a first time and a second three-dimensional data generative model corresponding to a second time; and generates a bitstream by encoding the first three-dimensional data generative model obtained and the second three-dimensional data generative model obtained, and when receiving viewpoint information including a viewpoint and a line-of-sight direction, each of the first three-dimensional data generative model and the second three-dimensional data generative model outputs a two-dimensional image of a subject as viewed from the viewpoint and the line-of-sight direction. A decoding device according to one aspect of the present disclosure includes circuitry and memory coupled to the circuitry. In operation, the circuitry: obtains a bitstream; and decodes, from the bitstream, a first three-dimensional data generative model corresponding to a first time and a second three-dimensional data generative model corresponding to a second time, and when receiving viewpoint information including a viewpoint and a line-of-sight direction, each of the first three-dimensional data generative model and the second three-dimensional data generative model outputs a two-dimensional image of a subject as viewed from the viewpoint and the line-of-sight direction. It is to be noted that these general or specific aspects may be implemented as a system, an integrated circuit, a computer program, or a computer-readable recording medium such as a CD-ROM, or may be implemented as any combination of a system, a method, an integrated circuit, a computer program, and a recording medium. [Advantageous Effects of Invention] A decoding device, and the like, according to the present disclosure is capable of outputting three-dimensional data with different resolutions. [Brief Description of Drawings] [FIG. 1] FIG. 1 is a diagram illustrating a configuration example of a three-dimensional data encoding and decoding system according to an embodiment in Embodiment 1.[FIG. 2] FIG. 2 is a diagram illustrating an example of point cloud data in Embodiment 1.[FIG. 3] FIG. 3 is a diagram illustrating a configuration example of a data file describing information of the point cloud data in Embodiment 1.[FIG. 4] FIG. 4 is a diagram illustrating the configuration of three-dimensional mesh data in Embodiment 1.[FIG. 5] FIG. 5 is a diagram illustrating a configuration example of a data file describing information of the three-dimensional mesh data in Embodiment 1.[FIG. 6] FIG. 6 is a dia