JP-7855444-B2 - Video Processing and Playback System and Method

JP7855444B2JP 7855444 B2JP7855444 B2JP 7855444B2JP-7855444-B2

Inventors

マイケルアダムカヴァリエロウ
ラジーブグプタ
デイビッドエルワンダミエンウベルティ
アレクサンダースミス

Assignees

株式会社ソニー・インタラクティブエンタテインメント

Dates

Publication Date: 20260508
Application Date: 20220719
Priority Date: 20210716

Claims (17)

A video processing method for processing an annular panoramic recorded video comprising an original field of view having a first resolution and a further peripheral region outside the original field of view having a second resolution lower than the first resolution, The original field of view region comprises a foveal region having a third resolution higher than the first resolution, The steps include spatially upscaling the further peripheral region to a resolution higher than the second resolution, A method characterized by comprising the step of spatially upscaling the original field of view to a resolution substantially equal to the third resolution .
The method according to claim 1, characterized in that the spatial upscaling step is a step of upscaling the further peripheral region to a resolution substantially equal to the first resolution.
The annular panoramic video recording comprises a first transition region between the foveal region and the original field of view region, and a second transition region between the original field of view region and the further peripheral region. The first transition region has a resolution intermediate between the third resolution and the first resolution. The method according to claim 1 , characterized in that the second transition region has a resolution intermediate between the first resolution and the second resolution.
The aforementioned spatial upscaling step is performed by a machine learning system. The method according to claim 1, characterized in that the machine learning system is trained with input image data at a lower input resolution within the recording resolution and with corresponding target image data at a higher input resolution within the recording resolution.
With respect to a predetermined number of preceding frames, the steps include storing the positions of at least a subset of image data having a resolution higher than the second resolution within each frame, The method according to claim 1, characterized in that, when upscaling a predetermined portion of the current frame of the annular panoramic recorded video, the method includes the step of using image data of one or more preceding frames having a higher resolution at the position of the predetermined portion of the current frame as input.
The original field of view region comprises a foveal region having a third resolution higher than the first resolution, A step of storing the position of the image data having the third resolution within each frame with respect to a predetermined number of preceding frames, The method according to claim 1, characterized in that, when upscaling a predetermined portion of the current frame of the annular panoramic recorded video, the method includes the step of using image data of one or more preceding frames having the third resolution at the position of the predetermined portion of the current frame as input.
A step of generating a reference annular panoramic image using at least a subset of image data having a resolution higher than the second resolution in each of a predetermined number of preceding frames, The step of upscaling a predetermined portion of the current frame of the aforementioned annular panoramic recorded video, using image data from a corresponding portion of a reference annular panoramic image as input, The method according to claim 1, characterized in that the annular panoramic image stores pixels with higher resolution that have been most recently rendered in each direction on the reference annular panoramic image.
The method according to 7 , characterized in that, with respect to the reference annular panoramic image, pixel data relating to a higher resolution region of a predetermined image frame is stored in preference to pixel data relating to a lower resolution region.
The aforementioned spatial upscaling step is performed by a machine learning system. The method according to 7, characterized in that the machine learning system is trained with input image data and corresponding input data from the reference annular panoramic image at a lower input resolution within the recording resolution, and is trained with corresponding target image data at a higher input resolution within the recording resolution.
The aforementioned annular panoramic video is rendered using a cubemap. The method according to claim 1, characterized in that the spatial upscaling step is performed by a plurality of machine learning systems trained on one or more facets of the cubemap.
The method according to claim 1, characterized in that the annular panoramic video recording is cylindrical or spherical.
A video output method, The steps of obtaining a spatially upscaled annular panoramic video recording according to the method of claim 1, A method characterized by comprising the step of outputting the annular panoramic recorded video for display to a user.
The aforementioned annular panoramic video recording includes the original field of view area for each frame, The method according to 12, characterized in that if the user's field of view moves outside the original field of view by a predetermined amount during playback, a visual display indicating where the original field of view is located in the annular panoramic recorded video is displayed.
A computer program characterized by causing a computer to execute the method described in claim 1.
A video processor for performing spatial upscaling of an annular panoramic recorded video comprising an original field of view having a first resolution and a further peripheral region outside the original field of view having a second resolution lower than the first resolution, The original field of view region comprises a foveal region having a third resolution higher than the first resolution, The further peripheral region is spatially upscaled to a resolution higher than the second resolution , A video processor further comprising a spatial upscaling processor that spatially upscales the original field of view to a resolution substantially equal to the third resolution .
A playback processor that acquires a spatially upscaled annular panoramic recorded video according to the method of claim 1, A video playback device comprising a graphics processor that outputs the annular panoramic recorded video for display to a user.
The method according to claim 1, characterized in that the step of substantially upscaling the further peripheral region to the third resolution includes the step of spatially upscaling the further peripheral region to a resolution higher than the second resolution.

Description

This disclosure relates to video processing and playback systems and methods. Traditional video game streaming systems like Twitch® and video hosting platforms like YouTube® and Facebook® have enabled video game players to broadcast their gameplay to a wide audience. The major difference between playing a video game and watching video recordings of that gameplay lies in the passive nature of the experience. This is true both in terms of in-game decisions and the player's perspective (which is determined, for example, by player input). The latter problem is more serious when the game is a VR or AR game. In this case, the game player usually determines their viewpoint based, at least partially, on their head or eye movements. Therefore, when watching a live or recorded stream of such a VR or AR game, the recorded image will be tracking the streamer's head and/or eye movements, not the viewer's. This can make viewers feel nauseous and can lead to frustration for viewers who want to look in a different direction than the streamer. A complete understanding of this disclosure and its many advantages can be obtained by referring to the attached drawings and reading the following detailed description. This is a schematic diagram of an HMD (Head-Mounted Display) worn by a user. This is a schematic plan view of an HMD (Head-Mounted Display). This is a schematic diagram illustrating the formation of a virtual image by an HMD (Head-Mounted Display). This is a schematic diagram of another type of display used in HMDs. This is a schematic diagram of a pair of three-dimensional images. This is a schematic plan view of an HMD (Head-Mounted Display). This is a schematic diagram of a near-eye tracking configuration. This is a schematic diagram of a remote tracking configuration. This is a schematic diagram of an eye-tracking environment. This is a schematic diagram of an eye-tracking system. This is a schematic diagram of the human eye. This is a schematic diagram of the graph of human visual acuity. This is a schematic diagram of foveal rendering. This is a schematic diagram of foveal rendering. This is a schematic diagram illustrating the change in resolution. This is a schematic diagram illustrating the change in resolution. This is a schematic diagram of an extended rendering scheme according to an embodiment of the present invention. This is a schematic diagram of an extended rendering scheme according to an embodiment of the present invention. This is a flowchart of a video processing method according to an embodiment of the present invention. This is a flowchart of a video playback method according to an embodiment of the present invention. This specification discloses video recording and playback systems and methods thereof. The following description provides some specific details for the purpose of fully understanding embodiments of the invention. However, it will be apparent to those skilled in the art that the use of these specific details is not essential for carrying out the invention. Conversely, for clarity, certain details known to those skilled in the art may be omitted where necessary. In the following drawings, identical or similar components are denoted by the same reference numerals. In Figure 1, user 10 is wearing an HMD 20 (in one example, a conventional head-mountable device, but in other examples, audio headphones or a head-mountable light source) on the user's head 30. The HMD comprises a frame 40 (in this example, formed by a rear strap and a top strap) and a display portion 50. The HMD selectively includes associated headphone transducers or earpieces 60 (which fit the user's left and right ears 70). The earpieces 60 reproduce audio signals supplied from an external sound source (which may be the same as the video signal source that provides video signals to the display). During operation, the video signal for the display is provided by the HMD. This may be provided by an external video signal source 80 (e.g., a video game console or a data processing device such as a personal computer). In this case, the signal may be transmitted to the HMD by a wired or wireless connection 82. A preferred example of a wireless connection includes a Bluetooth® connection. The audio signal for the earpiece 60 may be transmitted by the same connection. Similarly, any control signals sent from the HMD to the video (audio) signal source may be transmitted by the same connection. Furthermore, a power supply 83 (which may include one or more batteries and/or be connected to a mains outlet) may be connected to the HMD by a cable 84. Thus, the configuration in Figure 1 provides an example of a head-mountable display system comprising a frame mounted on the viewer's head and a display element mounted relative to the line of sight. The frame defines one or two line of sight positions, which are positioned in front of the viewer's eyes during use. The display element provides a virtual image of the video display signal from