EP-4523417-B1 - METHODS, SERVERS AND DEVICES FOR TRANSMITTING AND RENDERING MULTIPLE VIEWS COMPRISING NON-DIFFUSE OBJECTS

EP4523417B1EP 4523417 B1EP4523417 B1EP 4523417B1EP-4523417-B1

Inventors

JUNG, Joël

Dates

Publication Date: 20260506
Application Date: 20220512

Claims (8)

A method implemented by a server (1SVR) for transmitting data used by a device (1CTL) for rendering multiple views (Vref, Vi) of a single scene, said multiple views comprising at least one basic view (Vref) and at least one alternate view (Vi), said method comprising steps of: - generating (SS10) a stream (STR) by encoding said views (Vref, Vi); - detecting (1SS20) non-diffuse pixels (NDP) representing pixels of a non-diffuse surface in said at least one alternate view (Vi); - selecting (1SS30) non-diffuse pixels (SNDP) among said detected non-diffuse pixels (NDP); - generating (1SS40) non-diffuse data (1DND) which, for each selected non-diffuse pixel (SNDP) of each alternate view (Vi), comprise (i) information identifying the selected pixel in the alternate view (Vi) and (ii) a texture difference ΔT(Vref,Vi) equal to the difference between the texture of said pixel in the alternate view (Vi) and the texture of a matching pixel in the basic view (Vref); and - transmitting (SS50) said stream (STR) and said non-diffuse data (1DND) to said device (1CTL), wherein a size of said non-diffuse data is a function of a pixel rate (PR) of said device (1CTL) and/or of a transmission rate (TR) between said server (1SVR) and said device (1CTL), wherein the pixel rate (PR) corresponds to a quantity of received data which can be decoded and rendered on a client side per unit of time.
The method according to claim 1, wherein said non-diffuse pixels (SNDP) are selected (1SS30) based on differences between: - textures of pixels of said at least one alternate view (Vi); and - textures of pixels of said at least one basic view (Vref) matching said pixels of said at least one alternate view (Vi).
The method according to claim 2, wherein said selected non-diffuse pixels (SNDP) are among those maximizing said differences.
The method according to claim 2 or 3, wherein said non-diffuse pixels (SNDP) are selected (1SS30) based on a comparison between said differences and at least one threshold depending on said pixel rate and/or on said transmission rate.
The method according to any of claims 1 to 4, wherein said selecting step (1SS30) comprises substeps of: - associating (1SS301) the pixels of said views (Vref, Vi) with epipolar plane image lines; - determining (1SS302), for each epipolar plane image line, at least one value representative of the form of this epipolar plane image line; and said non-diffuse pixels (SNDP) are selected (S30) based on said values.
A method implemented by a device (1CLT) for rendering multiple views (Vref, Vi) of a single scene, said multiple views comprising at least one basic view (Vref) and at least one alternate view (Vi), said method comprising steps of: - receiving (SD60), from a server (1SVR), a stream (STR) of encoded said views (Vref, Vi) and non-diffuse data (1DND) representative of selected non-diffuse pixels (SNDP) representing pixels of a non-diffuse surface of said at least one alternate view (Vi), wherein a size of the non-diffuse data (1DND) is a function of a pixel rate (PR) of the device (1CLT) and/or of a transmission rate (TR) between the server (1SVR) and the device (1CLT), the non-diffuse data (1DND) comprising, for each selected non-diffuse pixel (SNDP) of each alternate view (Vi), (i) information identifying the selected pixel in the alternate view (Vi) and (ii) a texture difference ΔT(Vref,Vi) equal to the difference between the texture of said pixel in the alternate view (Vi) and the texture of a matching pixel in the basic view (Vref); - rendering (1SD70) said views (Vref, Vi) based on said stream (STR) and said non-diffuse data (1DND), wherein the rendering comprises, for each alternate view (Vi), adding the received texture differences ΔT(Vref,Vi) at the identified pixel locations to the texture decoded from the stream for the matching pixel of the basic view (Vref), thereby rendering the selected non-diffuse pixels (SNDP) of the alternate views (Vi) with reflectance corresponding to the respective alternate view (Vi), wherein the pixel rate (PR) corresponds to a quantity of received data which can be decoded and rendered on a client side per unit of time.
The method according of any of claim 1 to 6, wherein said data (1DND, 2DND, 3DND) are transmitted in a Supplemental Enhancement Information (SEI) message.
A server (1SVR) for transmitting data used by a device (1CTL) for rendering multiple views (Vref, Vi) of a single scene, said multiple views comprising at least one basic view (Vref) and at least one alternate view (Vi), said server (1SVR) comprising : - a module (MS10) of generating a stream (STR) by encoding said views (Vref, Vi); - a module (1MS20) of detecting non-diffuse pixels (NDP) representing pixels of a non-diffuse surface in said at least one alternate view (Vi); - a module (1MS30) of selecting non-diffuse pixels (SNDP) among said detected non-diffuse pixels (NDP); - a module (1MS40) of generating non-diffuse data (1DND) which, for each selected non-diffuse pixel (SNDP) of each alternate view (Vi), comprise (i) information identifying the selected pixel in the alternate view (Vi) and (ii) a texture difference ΔT(Vref,Vi) equal to the difference between the texture of said pixel in the alternate view (Vi) and the texture of a matching pixel in the basic view (Vref); and - a module (MS50) of transmitting said stream (STR) and said non-diffuse data (1DND) to said device (1CTL), wherein a size of said non-diffuse data is a function of a pixel rate (PR) of said device (1CTL) and/or of a transmission rate (TR) between said server (1SVR) and said device (1CTL), wherein the pixel rate (PR) corresponds to a quantity of received data which can be decoded and rendered on a client side per unit of time.

Description

Field of the invention The invention relates to the field of computer graphics. It relates more particularly to a method for encoding and transmitting data for rendering non-diffuse pixels of multiples views of a same scene. The invention may be used in the context of immersive videos. Diffuse surfaces have an apparent brightness that is the same regardless the observer's point of view. They produce a diffuse reflectance of the light: the light is absorbed and re-emitted in all directions. Concrete, wood and wool are examples of such surfaces. In contrast, non-diffuse surfaces have the property that the visible texture depends on the point of view. Almost all natural and high-quality synthetic scenes exhibit non-diffuse reflections. Typical examples of non-diffuse surfaces are mirrors, windows, and glossy surfaces. The reflectance of a surface in a particular direction is the fraction of incident light which is reflected by this surface in this particular direction. Hereafter, the reflectance of a surface corresponding to a point of view refers to the reflectance of the surface of this object in the direction of this point of view. For a non-diffuse surface, the reflectance varies with the point of view. Pixels representing a non-diffuse surface are called non-diffuse pixels. Background of the invention In an immersive video, a user can navigate in a scene through different views. Because of memory and/or computational limitation of the client device, these multiple views are received in a compressed form from a server. The transmission rate between the server and the client device is limited, such that efficient compression schemes are needed to reduce the amount of data to transmit. Current methods of compression and transmission of multiple views are not satisfying for rendering non-diffuse pixels of these views. In particular, MPEG Immersive Video (MIV), a recent codec (ISO/IEC MPEG Immersive Video (MIV) standard, MPEG-1 Part 12) designed for immersive video, fails to handle view-dependent effects such as non-diffuse reflections. The MIV method consists in fully transmitting so-called "basic" views and pruning alternate views such that the client receiver can render them, while reducing the amount of data to be transmitted. Because of the pruning process, the reflectance of the non-diffuse pixels in the alternate views rendered by the client is not correct. An extension of the standard MIV method enables to transmit additional data for rendering texture of non-diffuse pixels of the alternate views. Nevertheless, this method requires a large quantity of transmitted data in order to render the views with a suitable quality. There exists a need for a solution that enables to transmit data for rendering non-diffuse pixels in multiple views. Summary of the disclosure A purpose of the present invention is to overcome all or some of the limitations of the prior art solutions, particularly those outlined above. The invention is defined by the appended claims. In contrast to the prior art, a first aspect according to claim 1 takes into account the transmission rate and the pixel rate when transmitting data for rendering non-diffuse pixels. In particular, this first aspect of the disclosure proposes to generate data to be transmitted for a client to render non-diffuse pixels, such that the size of these data is a function of pixel and/or transmission rate. The transmission rate corresponds to the quantity of data which can be transmitted from the server to the client per unit of time. The pixel rate corresponds to the quantity of received data which can be decoded and rendered on the client side per unit of time. In one embodiment, the pixel rate and the transmission rate are negotiated during an initialisation phase of a communication session between the server and the client device. In another embodiment the negotiation occurs just before the step of selecting the non-diffuse pixels. Advantageously, generating a quantity of transmitted data accordingly to transmission rate and pixel rate avoids these data to be transmitted very slowly or even be arbitrarily truncated during their transmission or during their decoding and rendering on the client side, resulting in a degradation of the user experience. The first aspect of the disclosure thus enables, for a same amount of transmitted data (this amount including the stream of encoded views and the data representative of the non-diffuse pixels), to improve the quality of rendered views on the client side. Corollary, it enables to transmit a limited amount of data while achieving a given quality of the rendered views. The data representative of non-diffuse pixels are hereafter called "non-diffuse data". Transmitting all non-diffuse pixels, with constraint transmission rate, could lead the other pixels or the views encoded in the stream to be more compressed, or could lead to a slow transmission thereof. Both consequences would have a negative impact on all rendered s