EP-4242974-B1 - GENERATION OF VIRTUAL VIEWPOINT IMAGES

EP4242974B1EP 4242974 B1EP4242974 B1EP 4242974B1EP-4242974-B1

Inventors

HANDA, MASAHIRO

Dates

Publication Date: 20260506
Application Date: 20230301

Claims (15)

A system comprising an image processing apparatus (110), a specific imaging device (102), and a plurality of imaging devices (101) different from the specific imaging device (102), the image processing apparatus (110) comprising: first obtaining means (202) for obtaining a foreground image included in a captured image obtained by the specific imaging device (102) performing image capturing and indicating an area of a foreground; second obtaining means (206) for obtaining a virtual viewpoint image generated based on a plurality of captured images obtained by the plurality of imaging devices (101) performing image capturing and indicating an area of a foreground viewed from a virtual camera; and output means (210) for selectively outputting at least one of a first image generated based on the foreground image and a background image generated by rendering a background model and background texture as viewed by the specific imaging device and a second image generated based on the virtual viewpoint image and a background image generated by rendering the background model and background texture as viewed by the virtual camera.
The system according to claim 1, wherein the first obtaining means obtain the foreground image by extracting an area of a foreground included in a captured image obtained by the specific imaging device (102) performing image capturing and generating the foreground image and the second obtaining means obtain the virtual viewpoint image by generating the virtual viewpoint image based on shape data representing a three-dimensional shape of a foreground generated based on a plurality of captured images.
The system according to claim 1 or 2, wherein a viewing angle of the specific imaging device (102) is smaller than a viewing angle of at least one imaging device included in the plurality of imaging devices (101).
The system according to any one of claims 1 to 3, wherein the first image is generated by processing to enlarge an image obtained by combining the foreground image and the background image so that a size becomes an image size at the time of being output by the output means.
The system according to any one of claims 1 to 4, wherein the virtual viewpoint image is further generated based on a captured image obtained by the specific imaging device (102) performing image capturing.
The system according to any one of claims 1 to 5, wherein the output means switch between outputting the first image and outputting the second image based on instructions of a user.
The system according to any one of claims 1 to 5, wherein the output means determine whether to output the first image or output the second image based on a distance between a virtual viewpoint corresponding to the virtual viewpoint image and the foreground.
The system according to claim 7, wherein the output means determine to output the first image in a case where a distance between a virtual viewpoint corresponding to the virtual viewpoint image and the foreground is within a threshold value.
The system according to any one of claims 1 to 5, wherein the output means output the first image in a case where an area of the foreground is included in a captured image obtained by the specific imaging device (102) performing image capturing.
The system according to claim 9, wherein the output means output the first image in a case where a specific portion of the foreground is included in a captured image obtained by the specific imaging device (102) performing image capturing.
The system according to claim 10, wherein the output means output the first image in a case where a front face of a person who is the foreground is included in a captured image obtained by the specific imaging device (102) performing image capturing.
The system according to any one of claims 1 to 5, wherein the output means output the second image in a case where an area of the foreground is not included in a captured image obtained by the specific imaging device (102) performing image capturing.
The system according to any one of claims 1 to 12, further comprising: notification means for giving a notification to a user who sets a virtual viewpoint in a case where a distance between the virtual viewpoint corresponding to the virtual viewpoint image and the foreground is within a threshold value.
An image processing method comprising: a first obtaining step of obtaining a foreground image included in a captured image obtained by a specific imaging device (102) performing image capturing and indicating an area of a foreground; a second obtaining step of obtaining a virtual viewpoint image generated based on a plurality of captured images obtained by a plurality of imaging devices (101), different from the specific imaging device (102), performing image capturing and indicating an area of a foreground viewed from a virtual camera; and an output step of selectively outputting at least one of a first image generated based on the foreground image and a background image generated by rendering a background model and background texture as viewed by the specific imaging device and a second image generated based on the virtual viewpoint image and a background image generated by rendering the background model and background texture as viewed by the virtual camera.
A program for causing a computer to perform the image processing method according to claim 14.

Description

BACKGROUND Field The present disclosure relates to a technique of generating a virtual viewpoint image. Description of the Related Art In recent years, a technique has been attracting attention, which generates an image (virtual viewpoint image) representing an appearance from a virtual viewpoint by using a plurality of captured images obtained by arranging a plurality of imaging devices at different positions and performing image capturing in synchronization. The generation of a virtual viewpoint image is implemented by gathering captured images in a server and the like, generating three-dimensional shape data of an object, and performing processing, such as rendering based on the virtual viewpoint. Generally, by increasing the number of imaging devices to be arranged around an object and performing image capturing with a high resolution, it is possible to generate a virtual viewpoint image of a higher quality. On the other hand, increasing the number of imaging devices and the resolution of the captured image will increase the amount of data, resulting in an increase in the processing load in an image processing system. In this regards, Japanese Patent Laid-Open No. 2017-211828 has disclosed a technique to reduce the amount of data by extracting an object, which is taken as a foreground, from the captured image to create a foreground image and transmitting the foreground image to a server that generates three-dimensional shape data. With the technique of Japanese Patent Laid-Open No. 2017-211828 described above, it is possible to reduce the amount of data per imaging device. However, it is not necessarily possible to provide an image appropriate to a viewer. For example, the desired image quality may be different between a case where it is desired to view a specific object from a position close thereto and a case where it is desired to view a plurality of objects en bloc. Alternatively, for example, there is a case where it is better to provide an image viewed from a viewpoint at which it is not possible for an actual imaging device to perform image capturing. Furthermore, a method and program for enhancing accuracy of composited picture quality of free viewpoint picture using non-fixed zoom camera are disclosed in document JP 2012 185772 A. An information processing apparatus, information processing method, video processing system, and storage medium are described in document US 11 172 185 B2. SUMMARY The present disclosure has been made in view of the problem such as this and an object of the present disclosure is to provide a technique capable of outputting an appropriate image. The present disclosure in its first aspect provides an image processing apparatus as specified in claims 1 to 13. The present disclosure in its second aspect provides an image processing method as specified in claim 14. The present disclosure in its third aspect provides a program as specified in claim 15. Further features of the present disclosure will become apparent from the following description of exemplary embodiments with reference to the attached drawings. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a diagram showing an example of a generation configuration of an image processing system;FIG. 2A is a block diagram showing a software configuration of an image processing server and FIG. 2B is a block diagram showing a hardware configuration of the image processing server;FIG. 3 is a flowchart showing a flow of generation processing of a virtual viewpoint image;FIG. 4 is a flowchart showing a flow of generation processing of a combined image;FIG. 5 is a diagram explaining a generation process of a combined image;FIG. 6 is a flowchart showing details of background image generation processing; andFIG. 7 is a flowchart showing a flow of output image switching processing. DESCRIPTION OF THE EMBODIMENTS Hereinafter, with reference to the attached drawings, the present disclosure is explained in detail in accordance with preferred embodiments. Configurations shown in the following embodiments are merely exemplary and the present disclosure is not limited to the configurations shown schematically. In the present specification, the virtual viewpoint image is an image that is generated by a user and/or a dedicated operator or the like freely operating the position and orientation of a virtual camera in the image capturing space and also called a free-viewpoint image, an arbitrary viewpoint image and the like. In this case, the virtual camera means a virtual imaging device that does not exist actually in the image capturing space and is distinguished from an imaging device (actual camera) that exists in the image capturing space. Further, unless specified particularly, explanation is given by assuming that the term image includes both concepts of a moving image and a still image. [First Embodiment] <Explanation of problem> Generally, a system that generates a virtual viewpoint image generates shape data (generally called "3D model" a