KR-102962398-B1 - APPARATUS AND METHOD FOR EXTRACTING QUALITATIVE CHARACTERISTICS OF VIDEO

KR102962398B1KR 102962398 B1KR102962398 B1KR 102962398B1KR-102962398-B1

Abstract

The method, apparatus, and recording medium for extracting qualitative features of an image disclosed in this invention include the steps of dividing an image into at least one video segment and generating video graph sequences in the form of a graph of the video segments based on objects of the video segments and edges representing relationships between the objects, wherein the generation of the video graph sequences may be performed based on an object feature vector representing the qualitative features of the objects as vectors and an edge feature vector representing the qualitative features of the edges as vectors.

Inventors

임지연

Assignees

한국전자통신연구원

Dates

Publication Date: 20260508
Application Date: 20230427
Priority Date: 20221007

Claims (20)

In a method for extracting qualitative features of an image, performed by a device for extracting qualitative features of an image, In the image segmentation module of the qualitative feature extraction device of the above image, the step of dividing the image into at least one video segment; A graph embedding module of a qualitative feature extraction device for the above image includes the step of generating video graph sequences, which are in the form of a graph of the image segment, based on edges representing objects of the image segment and relationships between the objects. The generation of the above image graph sequences is performed based on an object feature vector representing the qualitative characteristics of the objects as vectors and an edge feature vector representing the qualitative characteristics of the edges as vectors, and A method for extracting qualitative features of an image, wherein the generation of the above image graph sequences is performed using a graph expression G=(V,E,X V ,X E ), wherein V is an expression for each of the objects, E is an edge expression representing the relationship between the objects, X V is an expression for the object feature vector representing the qualitative characteristics of the objects, such as color, texture, age, and gender, and X E is an expression for the edge feature vector representing the qualitative characteristics of the edge, such as the geographical relationship and behavioral relationship between the objects.
In paragraph 1, The above image graph sequences are a method for extracting qualitative features of an image, generated by matrix operations of an object feature matrix and an object adjacency matrix.
In paragraph 2, The object feature matrix above is a kxn matrix generated based on the object feature vector, and The above k is the number of the above objects, and A method for extracting qualitative features of an image, wherein n is the number of features of the object feature vector.
In paragraph 3, A method for extracting qualitative features of an image, wherein the object adjacency matrix is a kxk matrix generated based on the edge feature vector.
In paragraph 1, A method for extracting qualitative features of an image, the above-described image graph sequences further comprising the step of classifying graph sequences having similar qualitative features to form a single style.
In paragraph 5, A method for extracting qualitative features of an image, further comprising the step of generating a feature matrix by extracting object feature vectors and edge feature vectors having values greater than or equal to a specific threshold from image graph sequences having the same style among the image graph sequences based on the style in an image classification module of a qualitative feature extraction device of the image.
In paragraph 5, Based on the above style, identify the style of the above image segment, and A method for extracting qualitative characteristics of an image, further comprising the step of storing the image segment in a database based on the style of the image segment.
delete
delete
In paragraph 1, A method for extracting qualitative characteristics of an image, wherein the objects include objects, people, and terrain features of the background that constitute the image segment.
A video segmentation module that divides a video into at least one video segment; and A graph embedding module that generates video graph sequences, which are in the form of a graph of the video segment, based on edges representing objects of the video segment and relationships between the objects, wherein The generation of the above image graph sequences is performed based on an object feature vector representing the qualitative characteristics of the objects as vectors and an edge feature vector representing the qualitative characteristics of the edges as vectors, and An image qualitative feature extraction device, characterized in that the generation of the above image graph sequences is performed using a graph expression G=(V,E,X V ,X E ), wherein V is an expression for each of the objects, E is an edge expression representing the relationship between the objects, X V is an expression for the object feature vector representing the qualitative characteristics of the objects, such as color, texture, age, and gender, and X E is an expression for the edge feature vector representing the qualitative characteristics of the edge, such as the geographical relationship and behavioral relationship between the objects.
In Paragraph 11, The above image graph sequences are a qualitative feature extraction device for images, generated by matrix operations of an object feature matrix and an object adjacency matrix.
In Paragraph 12, The object feature matrix above is a kxn matrix generated based on the object feature vector, and The above k is the number of the above objects, and The above n is a qualitative feature extraction device for an image, wherein n is the number of features of the object feature vector.
In Paragraph 13, A qualitative feature extraction device for an image, wherein the object adjacency matrix is a kxk matrix generated based on the edge feature vector.
In Paragraph 11, The qualitative feature extraction device for the above image further includes an image classification module that classifies graph sequences having similar qualitative features from the image graph sequences to form a single style.
In paragraph 15, The above image classification module is, A qualitative feature extraction device for an image, comprising a feature extraction unit that generates a feature matrix by extracting object feature vectors and edge feature vectors having values greater than or equal to a specific threshold from image graph sequences having the same style among the image graph sequences based on the above style.
In paragraph 15, The above image classification module is, A device for extracting qualitative characteristics of an image, comprising a video style tagging unit that identifies the style of the image segment based on the style above and stores the image segment in a database based on the style of the image segment.
In Paragraph 11, A device for extracting qualitative characteristics of an image, wherein the qualitative characteristics of the objects include the color, texture, age, and gender of the objects.
In Paragraph 11, A device for extracting qualitative characteristics of an image, wherein the qualitative characteristics of the above edges include geographical relationships and behavioral relationships between the objects.
A computer-readable recording medium having a computer program recorded thereon for executing a method according to any one of paragraphs 1 through 7 and paragraph 10 on a computer.

Description

Apparatus and Method for Extracting Qualitative Characteristics of Video The present disclosure relates to a method and apparatus for extracting qualitative characteristics of an image, such as the manner in which a narrative structure is unfolded or the social or aesthetic characteristics of visual elements. Conventional technology has evolved in a direction that understands contexts such as the composition of a video's narrative. However, the social and aesthetic characteristics of video, which can be described by the arrangement of objects or their colors, have a significant impact on consumers' consumption desires. Referring to the Journal of the Korean Society of Beauty Science, 'The Influence of Character Image Realization Elements in Korean Films on Audience Viewing Satisfaction, Immersion, and Behavioral Intention' [authored by Lee Seo-hyun and Kwon Oh-hyuk (2022)], it was found that how characters are portrayed in Korean films has a positive influence on viewing satisfaction and immersion. Furthermore, referring to the Art and Education journal, 'Exploring the Educational Potential of Film as Visual Culture Art Education: Understanding and Utilizing Mise-en-scène' [authored by Lee Ji-yeon, Jang Yoon-kyung, and Kim Ji-eun (2019)], it can be seen that aesthetic elements are utilized as a method to express the value and intent of films, or video works. Examples of this include types of shots, sequences of scenes used to construct a story, and mise-en-scène. Therefore, if qualitative characteristics of the video can be extracted in addition to its content or plot, it is expected to have a positive impact on producing results that satisfy consumer needs ranging from viewing to production. In the present disclosure, two technologies can be primarily utilized to extract qualitative characteristics of an image: graph embedding, which stores data in a graph structure, and deep neural network technology, which can classify data in a graph structure using a deep learning network. Conventional technologies related to video understanding have focused on video metadata tagging. For example, they automatically recognize information regarding objects, locations, and backgrounds within a video and perform tasks to understand the story context of the video. However, such technologies have limitations in identifying qualitative factors, such as social, cultural, and artistic characteristics, which have a major influence on determining the nature of a video. Factors that provide satisfaction and immersion when watching a video include social, literary, and artistic characteristics that determine the feelings or emotions conveyed by the video. However, it is difficult to find instances in conventional technologies where these factors are quantitatively extracted and utilized. Therefore, this disclosure proposes a method for extracting qualitative factors from video data. Figure 1 is a diagram illustrating a video-style search interface. Figure 2 is a conceptual diagram illustrating a method for extracting qualitative features of an image. Figure 3 is a diagram illustrating a context-based video scene segmentation method. Figure 4 is a diagram showing an example of a video graph embedding. Figure 5 is a diagram illustrating an example of a graph embedding method. Figure 6 is a diagram illustrating an example of dividing an image into meaningful units. Figure 7 is a diagram showing examples of emotional words and statistics related to emotional words. Figure 8 is a diagram illustrating an image classification method. Hereinafter, embodiments of the present disclosure are described in detail with reference to the attached drawings so that those skilled in the art can easily implement them. However, the present disclosure may be embodied in various different forms and is not limited to the embodiments described herein. In describing the embodiments of the present disclosure, detailed descriptions of known configurations or functions are omitted if it is determined that such descriptions could obscure the essence of the present disclosure. Additionally, parts of the drawings unrelated to the description of the present disclosure have been omitted, and similar parts are denoted by similar reference numerals. In the present disclosure, when a component is described as being "connected," "combined," or "joined" with another component, this may include not only a direct connection but also an indirect connection in which another component exists in between. Furthermore, when a component is described as "comprising" or "having" another component, this means that, unless specifically stated otherwise, it does not exclude the other component but may include additional components. In the present disclosure, terms such as first, second, etc. are used solely for the purpose of distinguishing one component from another and do not limit the order or importance of the components unless specifically stated otherwise. Accordingly, within