US-12620177-B2 - Method and device for generating a synthesized reality reconstruction of flat video content

US12620177B2US 12620177 B2US12620177 B2US 12620177B2US-12620177-B2

Abstract

In one implementation, a method includes: identifying a plurality of plot-effectuators and a plurality of environmental elements within a scene associated with a portion of video content; determining one or more spatial relationships between the plurality of plot-effectuators and the plurality of environmental elements within the scene; synthesizing a representation of the scene based at least in part on the one or more spatial relationships; extracting a plurality of action sequences corresponding to the plurality of plot-effectuators based at least in part on the portion of the video content; and generating a corresponding synthesized reality (SR) reconstruction of the scene by driving a plurality of digital assets, associated with the plurality of plot-effectuators, within the representation of the scene according to the plurality of action sequences.

Inventors

Ian M. Richter
Daniel Ulbricht
Jean-Daniel E. NAHMIAS
Omar ELAFIFI
Peter Meier

Assignees

APPLE INC.

Dates

Publication Date: 20260505
Application Date: 20220711

Claims (20)

1 . A method comprising: at a device including non-transitory memory and one or more processors coupled with the non- transitory memory: identifying a plurality of plot-effectuators and a plurality of environmental elements within a scene associated with a portion of video content; determining one or more spatial relationships between the plurality of plot-effectuators and the plurality of environmental elements within the scene; synthesizing a representation of the scene based at least in part on the one or more spatial relationships; extracting a plurality of action sequences corresponding to the plurality of plot-effectuators based at least in part on the portion of the video content; and generating a corresponding synthesized reality (SR) reconstruction of the scene, including instantiating a plurality of threads, each corresponds to a respective one of the plurality of plot-effectuators performing respective plurality of action sequences in the scene, and driving a plurality of digital assets, associated with the plurality of plot-effectuators, within the representation of the scene according to the plurality of action sequences tracking the plurality of threads.
2 . The method of claim 1 , further comprising: generating a map of an environment associated with the scene that includes the plurality of environmental elements within the scene, wherein the synthesizing the representation of the scene includes synthesizing the representation of the scene based at least in part on the one or more spatial relationships and the map of the environment.
3 . The method of claim 2 , wherein the map of the environment associated with the scene corresponds to a three-dimensional map of the environment that localizes the plurality of plot-effectuators and the plurality of environmental elements within the environment associated with the scene.
4 . The method of claim 1 , wherein a first action sequence among the plurality of action sequences corresponds to a first plot-effectuator among the plurality of plot-effectuators, and wherein a trajectory of the first plot-effectuator within an environment is linked to the first action sequence for the first plot-effectuator.
5 . The method of claim 4 , wherein first action sequence includes actions performed by the first plot-effectuator within the environment.
6 . The method of claim 4 , wherein the first plot-effectuator corresponds to one of a humanoid, animal, vehicle, android, or robot associated with the scene.
7 . The method of claim 1 , wherein a first digital asset among the plurality of digital assets corresponds to a first plot-effectuator, and wherein the first digital asset corresponds to a pre-existing video game model associated with the first plot-effectuator.
8 . The method of claim 1 , wherein a first digital asset among the plurality of digital assets corresponds to a first plot-effectuator, and wherein the first digital asset corresponds to a pre-existing skinned point cloud associated with the first plot-effectuator.
9 . The method of claim 1 , further comprising: obtaining the plurality of digital assets from a library of digital assets associated with the video content.
10 . The method of claim 1 , further comprising: generating the plurality of digital assets on-the-fly based at least in part on the video content and external data associated with the video content.
11 . A device comprising: one or more processors; a non-transitory memory; and one or more programs stored in the non-transitory memory, which, when executed by the one or more processors, cause the device to: identify a plurality of plot-effectuators and a plurality of environmental elements within a scene associated with a portion of video content; determine one or more spatial relationships between the plurality of plot-effectuators and the plurality of environmental elements within the scene; synthesize a representation of the scene based at least in part on the one or more spatial relationships; extract a plurality of action sequences corresponding to the plurality of plot-effectuators based at least in part on the portion of the video content; and generate a corresponding synthesized reality (SR) reconstruction of the scene, including instantiating a plurality of threads, each corresponds to a respective one of the plurality of plot- effectuators performing respective plurality of action sequences in the scene, and driving a plurality of digital assets, associated with the plurality of plot-effectuators, within the representation of the scene according to the plurality of action sequences tracking the plurality of threads.
12 . The device of claim 11 , wherein the one or more programs further cause the device to: generate a map of an environment associated with the scene that includes the plurality of environmental elements within the scene, wherein the synthesizing the representation of the scene includes synthesizing the representation of the scene based at least in part on the one or more spatial relationships and the map of the environment.
13 . The device of claim 12 , wherein the map of the environment associated with the scene corresponds to a three-dimensional map of the environment that localizes the plurality of plot-effectuators and the plurality of environmental elements within the environment associated with the scene.
14 . The device of claim 11 , wherein a first action sequence among the plurality of action sequences corresponds to a first plot-effectuator among the plurality of plot-effectuators, and wherein a trajectory of the first plot-effectuator within an environment is linked to the first action sequence for the first plot-effectuator.
15 . The device of claim 14 , wherein first action sequence includes actions performed by the first plot-effectuator within the environment.
16 . The device of claim 14 , wherein a first digital asset among the plurality of digital assets corresponds to the first plot-effectuator, and wherein the first digital asset corresponds to a pre-existing video game model associated with the first plot-effectuator.
17 . A non-transitory memory storing one or more programs, which, when executed by one or more processors of a device, cause the device to: identify a plurality of plot-effectuators and a plurality of environmental elements within a scene associated with a portion of video content; determine one or more spatial relationships between the plurality of plot-effectuators and the plurality of environmental elements within the scene; synthesize a representation of the scene based at least in part on the one or more spatial relationships; extract a plurality of action sequences corresponding to the plurality of plot-effectuators based at least in part on the portion of the video content; and generate a corresponding synthesized reality (SR) reconstruction of the scene, including instantiating a plurality of threads, each corresponds to a respective one of the plurality of plot-effectuators performing respective plurality of action sequences in the scene, and driving a plurality of digital assets, associated with the plurality of plot-effectuators, within the representation of the scene according to the plurality of action sequences tracking the plurality of threads.
18 . The non-transitory memory of claim 17 , wherein the one or more programs further cause the device to: generate a map of an environment associated with the scene that includes the plurality of environmental elements within the scene, wherein the synthesizing the representation of the scene includes synthesizing the representation of the scene based at least in part on the one or more spatial relationships and the map of an environment.
19 . The non-transitory memory of claim 18 , wherein the map of the environment associated with the scene corresponds to a three-dimensional map of the environment that localizes the plurality of plot-effectuators and the plurality of environmental elements within the environment associated with the scene.
20 . The non-transitory memory of claim 17 , wherein a first action sequence among the plurality of action sequences corresponds to a first plot-effectuator among the plurality of plot-effectuators, and wherein a trajectory of the first plot-effectuator within an environment is linked to the first action sequence for the first plot-effectuator.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application is a continuation of U.S. patent application Ser. No. 16/961,835, filed on Jul. 13, 2020, which claims priority to International Patent App. No. PCT/US2019/014260, filed on Jan. 18, 2019, U.S. Provisional patent application No. 62/734,061, filed on Sep. 20, 2018, and U.S. Provisional Patent App. No. 62/620,334, filed on Jan. 22, 2018, which are hereby incorporated by reference in their entireties. TECHNICAL FIELD The present disclosure generally relates to synthesized reality (SR), and in particular, to systems, methods, and devices for generating an SR reconstruction of flat video content. BACKGROUND Virtual reality (VR) and augmented reality (AR) are becoming more popular due to their remarkable ability to alter a user's perception of the world. For example, VR and AR are used for learning purposes, gaming purposes, content creation purposes, social media and interaction purposes, or the like. These technologies differ in the user's perception of his/her presence. VR transposes the user into a virtual space so their VR perception is different from his/her real-world perception. In contrast, AR takes the user's real-world perception and adds something to it. These technologies are becoming more commonplace due to, for example, miniaturization of hardware components, improvements to hardware performance, and improvements to software efficiency. As one example, a user may experience AR content superimposed on a live video feed of the user's setting on a handheld display (e.g., an AR-enabled mobile phone or tablet with video pass-through). As another example, a user may experience AR content by wearing a head-mounted device (HMD) or head-mounted enclosure that still allows the user to see his/her surroundings (e.g., glasses with optical see-through). As yet another example, a user may experience VR content by using an HMD that encloses the user's field-of-view and is tethered to a computer. BRIEF DESCRIPTION OF THE DRAWINGS So that the present disclosure can be understood by those of ordinary skill in the art, a more detailed description may be had by reference to aspects of some illustrative implementations, some of which are shown in the accompanying drawings. FIG. 1A is a block diagram of an example operating architecture in accordance with some implementations. FIG. 1B is a block diagram of another example operating architecture in accordance with some implementations. FIG. 2 is a block diagram of an example controller in accordance with some implementations. FIG. 3 is a block diagram of an example electronic device in accordance with some implementations. FIG. 4 is a block diagram of a synthesized reality (SR) content generation architecture in accordance with some implementations. FIG. 5 illustrates a scene understanding spectrum in accordance with some implementations. FIG. 6 illustrates an example SR content generation scenario in accordance with some implementations. FIG. 7 is a flowchart representation of a method of generating an SR reconstruction of flat video content in accordance with some implementations. FIG. 8 is a flowchart representation of a method of generating an SR reconstruction of flat video content in accordance with some implementations. In accordance with common practice the various features illustrated in the drawings may not be drawn to scale. Accordingly, the dimensions of the various features may be arbitrarily expanded or reduced for clarity. In addition, some of the drawings may not depict all of the components of a given system, method or device. Finally, like reference numerals may be used to denote like features throughout the specification and figures. SUMMARY Various implementations disclosed herein include devices, systems, and methods for generating synthesized reality (SR) content from flat video content. According to some implementations, the method is performed at a device including non-transitory memory and one or more processors coupled with the non-transitory memory. The method includes: identifying a first plot-effectuator within a scene associated with a portion of video content; synthesizing a scene description for the scene that corresponds to a trajectory of the first plot-effectuator within a setting associated with the scene and actions performed by the first plot-effectuator; and generating a corresponding SR reconstruction of the scene by driving a first digital asset associated with the first plot-effectuator according to the scene description for the scene. In accordance with some implementations, a device includes one or more processors, a non-transitory memory, and one or more programs; the one or more programs are stored in the non-transitory memory and configured to be executed by the one or more processors and the one or more programs include instructions for performing or causing performance of any of the methods described herein. In accordance with some implementations, a non-transitory co