KR-20260064872-A - DEVICE AND METHOD FOR RECONSTRUCTING MOVING OBJECTS IN VIDEOS USING A HIERARCHICAL NEURAL DEFORMATION MODEL

KR20260064872AKR 20260064872 AKR20260064872 AKR 20260064872AKR-20260064872-A

Abstract

The present invention relates to a device for reconstructing moving objects within a video using a hierarchical neural deformation model, comprising: a video input unit for receiving a video; an embedding generation unit for generating canonical embedding and time embedding for an object in the video; a hierarchical neural deformation model unit for receiving the canonical embedding and time embedding, capturing hierarchical neural deformation from a coarse level to a fine level of the object, and outputting a time-independent skeletal structure and a time-dependent skeletal deformation of the object through a neural network; and a time-deformed object representation unit for representing the temporal deformation of the object as a time-deformed object in a canonical space based on the time-independent skeletal structure and the time-dependent skeletal deformation of the object.

Inventors

김선주
전수빈
조 인
김민수
조웅오

Assignees

연세대학교 산학협력단

Dates

Publication Date: 20260508
Application Date: 20241030

Claims (12)

Video input unit for receiving video input; An embedding generation unit that generates canonical embedding and time embedding for objects in the above video; A hierarchical neural deformation model unit that receives the formalized embedding and the time embedding, captures hierarchical neural deformation of the object from a coarse level to a fine level, and outputs a time-independent skeletal structure and a time-dependent skeletal deformation of the object through a neural network; and A device for reconstructing a moving object in a video through a hierarchical neural deformation model, comprising a time deformation object representation unit that represents the temporal deformation of the object as a time deformation object in a structured space based on the time-independent skeletal structure and time-dependent skeletal deformation of the object.
In claim 1, the embedding generation unit A device for reconstructing moving objects in a video through a hierarchical neural deformation model, characterized by generating a canonical embedding as a vector representing the basic form and shape of the object, which is time-independent and provides deformation criteria in a canonical space.
In paragraph 2, the embedding generation unit A device for reconstructing moving objects within a video using a hierarchical neural deformation model, characterized by generating the time embedding as a vector representing the frame-by-frame object change of the video.
In paragraph 1, the hierarchical neurodeformation model part A device for reconstructing a moving object in a video through a hierarchical neural deformation model, characterized by combining the above-mentioned standardized embeddings and the above-mentioned time embeddings to generate a tree-structured skeleton for the object and modeling the movement of the object.
In paragraph 4, the above-mentioned hierarchical neurodeformation model part A device for reconstructing a moving object in a video using a hierarchical neural deformation model, characterized by generating a skeleton of the tree structure as a parent-child skeleton structure, capturing a relatively large movement of the object through the parent skeleton, and capturing a relatively small movement of the object through the child skeleton.
In paragraph 5, the above-mentioned hierarchical neurodeformation model part A device for reconstructing a moving object in a video using a hierarchical neural deformation model, characterized by processing the coarse level neural deformation of the object through the parent skeleton and processing the fine level neural deformation of the object through the child skeleton, and learning the interaction between the parent and child skeletons through the neural network.
In paragraph 1, the hierarchical neurodeformation model part A device for reconstructing moving objects within a video using a hierarchical neural deformation model characterized by determining the time-dependent bone deformation through skinning weights.
In paragraph 7, the above-mentioned hierarchical neurodeformation model part A device for reconstructing a moving object in a video using a hierarchical neural deformation model, characterized by calculating skinning weights regarding how much each part of the object is affected by a specific bone using the LBS (Linear Blend Skinning) technique.
In paragraph 1, the time transformation object representation part A device for reconstructing a moving object within a video using a hierarchical neural deformation model, characterized by reconstructing the object into a next-time independent skeletal structure through the temporal deformation of the object via the aforementioned standardized space.
In paragraph 1, A device for reconstructing a moving object in a video using a hierarchical neural deformation model, characterized by further including a bone mask generation unit that generates a bone mask for detecting which region of the object a specific bone affects regarding the time-dependent bone deformation through a BOF (Bone Occupancy Function).
In paragraph 1, A device for reconstructing moving objects within a video using a hierarchical neural deformation model, characterized by further including a volume rendering unit that performs visualization through dimension transformation for the above-mentioned time-deformed object.
In a method for reconstructing moving objects within a video using a hierarchical neural deformation model, performed in a device for reconstructing moving objects within a video using a hierarchical neural deformation model, Video input stage for receiving video input; An embedding generation step for generating canonical embedding and time embedding for objects in the above video; A hierarchical neural deformation model step that receives the formalized embeddings and the time embeddings, captures hierarchical neural deformations of the object from a coarse level to a fine level, and outputs a time-independent skeletal structure and a time-dependent skeletal deformation of the object through a neural network; and A method for reconstructing a moving object in a video through a hierarchical neural deformation model, comprising a time deformation object representation step that represents the temporal deformation of the object as a time deformation object in a structured space based on the time-independent skeletal structure and time-dependent skeletal deformation of the object.

Description

Device and Method for Reconstructing Moving Objects in Videos Using a Hierarchical Neural Deformation Model The present invention relates to a technology for reconstructing moving objects within a video using a hierarchical neural deformation model, and more specifically, to an apparatus and method for reconstructing moving objects within a video using a hierarchical neural deformation model that captures hierarchical neural deformation from a coarse level to a fine level of an object and represents the temporal deformation of an object as a temporally deformed object in a structured space based on the object's time-independent skeletal structure and time-dependent skeletal deformation through a neural network. AI-based video processing technology encompasses various techniques that analyze, modify, and transform videos using artificial intelligence (AI) and deep learning algorithms. These technologies are used to improve video quality, supplement missing data between frames, and generate new videos. While similar to image processing, AI-based video processing requires more complex algorithms because it must consider temporal consistency and correlations between frames. Korean Published Patent No. 10-2023-0131970 (September 14, 2023) describes, in one embodiment, a method comprising the step of receiving a first image and a second image by an object reconstruction module. The first image includes a first region of an object, and the second image includes a second region of an object. The method also comprises the step of identifying a transition image by an object reconstruction module. The transition image includes a first region of an object and a second region of an object. The method further comprises the step of determining by an object reconstruction module that the first region of an object in the transition image and the first region of an object in the first image are equivalent regions, and the step of generating a reconstruction of an object using the first image and the transition image by an object reconstruction module. The reconstruction of an object includes the first region of an object and the second region of an object and excludes the equivalent region. FIG. 1 is a drawing illustrating a device for reconstructing moving objects in a video using a hierarchical neural deformation model according to one embodiment of the present invention. Figure 2 is a diagram illustrating the functional configuration of a moving object reconstruction device within a video through the hierarchical neural deformation model of Figure 1. Figure 3 is a diagram illustrating the system configuration of a moving object reconstruction device within a video using the hierarchical neural deformation model of Figure 1. FIG. 4 is a flowchart illustrating a method for reconstructing moving objects in a video using a hierarchical neural deformation model according to the present invention. Figure 5 is a skeletal hierarchy diagram of a moving object reconstruction device within a video through the hierarchical neural deformation model of Figure 1. Figure 6 is a qualitative comparison diagram of template-free methods (ViSER, BANMo) and bone-based methods (CAMM, RAC) of a moving object reconstruction device within a video using the hierarchical neural deformation model of Figure 1. Figure 7 is a qualitative comparison diagram of (a) the neural rendering results and (b) the retargeted object of a moving object reconstruction device in a video through the hierarchical neural deformation model of Figure 1. Figure 8 is a diagram showing the results of manipulation of objects of various categories by a moving object reconstruction device within a video using the hierarchical neural deformation model of Figure 1. Figure 9 is a diagram showing (a) visualization of a hierarchically structured skeleton at each depth and (b) qualitative elimination results for skeleton regulation items of a moving object reconstruction device within a video through the hierarchical neural deformation model of Figure 1. The description of the present invention is merely an example for structural or functional explanation, and therefore the scope of the present invention should not be interpreted as being limited by the examples described in the text. That is, since the examples are subject to various modifications and may take various forms, the scope of the present invention should be understood to include equivalents capable of realizing the technical concept. Furthermore, the objectives or effects presented in the present invention do not imply that a specific example must include all of them or only such effects; therefore, the scope of the present invention should not be understood as being limited by them. Meanwhile, the meaning of the terms described in this application should be understood as follows. Terms such as "first," "second," etc., are intended to distinguish one component from another, and the scope of rights shall not be limited by