KR-20260067769-A - VISUAL ARGUMENTATION REASONING DEVICE AND METHOD

KR20260067769AKR 20260067769 AKR20260067769 AKR 20260067769AKR-20260067769-A

Abstract

The present invention relates to a visual argument inference device and comprises a Visual Premises Unit (VPU) that receives an image and detects argument premises from the image to determine visual premises, a Commonsense Premises Unit (CPU) that extracts at least one background knowledge associated with the visual premises to determine commonsense premises, and a Conclusion Derivation Unit that derives at least one intermediate conclusion based on the commonsense premises and the visual premises, and derives a final conclusion through the logical association between the at least one intermediate conclusion.

Inventors

유영재
정지완

Assignees

연세대학교 산학협력단

Dates

Publication Date: 20260513
Application Date: 20241106

Claims (11)

A Visual Premises Unit (VPU) that receives an image as input, detects argumentative premises in the image, and determines visual premises; Commonsense Premises Unit (CPU) that determines a commonsense premise by extracting at least one background knowledge associated with the above-mentioned visual premise; and A visual argumentation reasoning device comprising a conclusion derivation unit that derives at least one intermediate conclusion based on the above common sense premise and the above visual premise, and derives a final conclusion through the logical connection between the at least one intermediate conclusion.
In paragraph 1, the above visual premise A visual argument inference device characterized by detecting an object in the above image and determining whether the detected object can be used as an argument premise to determine an argument premise object.
In paragraph 2, the above visual premise A visual argument inference device characterized by evaluating the possibility of the argument premise based on at least one of the shape, color, size, and texture features of the detected object.
In paragraph 3, the above visual premise A visual argument inference device characterized by evaluating the possibility of the argument premise based on an object placement relationship that considers the position of the detected object within the image.
In paragraph 2, the above visual premise A visual argument reasoning device characterized by evaluating the possibility of the argument premise based on whether the detected object contains text or a symbol.
In paragraph 2, the above visual premise A visual argument inference device characterized by determining the visual premise by performing semantic clustering through analysis of similarity between the objects of the argument premise.
In paragraph 1, the above common sense premise A visual argument reasoning device characterized by generating a text representation of the above visual premise and extracting candidate background knowledge by searching the text representation in a knowledge base.
In paragraph 7, the above common-sense premise A visual argument reasoning device characterized by determining at least one background knowledge by evaluating the logical validity of the candidate background knowledge regarding the above visual whole.
In paragraph 8, the above common-sense premise A visual argumentation reasoning device characterized by determining the common-sense premise by calculating the correlation of the visual whole for each of the above at least one background knowledge.
In paragraph 8, the above conclusion derivation part A visual argument reasoning device characterized by determining the logical order of at least one intermediate conclusion and integrating the at least one intermediate conclusion by performing selection and removal of the at least one intermediate conclusion in the process of determining the logical order.
In a visual argument reasoning method performed in a visual argument reasoning device, A visual premise step that receives an image as input, detects argumentative premises from the image, and determines visual premises; A common-sense premise step for determining a common-sense premise by extracting at least one background knowledge associated with the above-mentioned visual premise; and A visual argumentative reasoning method comprising a conclusion derivation step of deriving at least one intermediate conclusion based on the above common sense premise and the above visual premise, and deriving a final conclusion through the logical connection between the at least one intermediate conclusion.

Description

Visual Argumentation Reasoning Device and Method The present invention relates to a visual argumentative reasoning technique, and more specifically, to a visual argumentative reasoning device and method capable of deriving at least one intermediate conclusion based on common-sense premises and visual premises, and deriving a final conclusion through the logical connection between at least one intermediate conclusion. Common sense-based reasoning is an artificial intelligence technology that collects common sense and finds ways to teach it to computers, helping them to naturally understand common sense and interact based on it. The four steps of common sense-based reasoning skills are as follows. In the commonsense knowledge extraction stage, commonsense information can be expressed in the form of an ontology or a graph from various data sources such as text corpora, web documents, images, and crowdsourcing. In the common sense verification stage, established common sense information can be verified through question-and-answer sessions with people, such as experts. In the data construction phase for learning, training data for learning the common sense reasoning model and benchmark data used for validation can be constructed from data sources. In the common sense reasoning stage, a deep learning or probability-based reasoning model is trained, or a situation that changes according to common sense based on a specific event is defined as a standardized rule, and then a common sense appropriate answer to a question can be given. Research is actively underway to establish standardized common sense from various sources and benchmark data for its inference. Korean Published Patent No. 10-2010-0031039 (March 19, 2010) posits that if occupational groups and value systems functioning in society are filtered through the lens of history, they possess the potential to contribute positively to social design. However, the realization of such potential is not determined by the occupational groups and value systems themselves. It becomes possible when a network of diverse occupational groups and value systems is established to suit the context of problem-solving. The individual capable of weaving such a network is precisely one who possesses organizational thinking skills, and the purpose of various aptitude tests is to identify such individuals. This invention is a learning system aimed at training organizational thinking skills, which are an essential competency required of competent talent in modern society. It achieves this objective by devising a method for visualizing and manipulating discussion structures, thereby efficiently analyzing discussion structures that are organic in nature. In summary, the present invention relates to a discussion analysis training system that devises a method of operation for implicitly using a visualization tool suitable for a problem solver based on a visualization tool and typological rules for discussion structure patterns, and cases in which such visualization tool and rules are applied. FIG. 1 is a drawing illustrating a visual argument reasoning device according to one embodiment of the present invention. Figure 2 is a diagram illustrating the system configuration of the visual argument reasoning device of Figure 1. FIG. 3 is a flowchart illustrating a visual argument reasoning method according to the present invention. FIG. 4 is a diagram showing a human worker according to the present invention iteratively improving initial data generated by a machine in a VisArgs annotation process. FIG. 5 is a diagram showing the diversity of themes appearing in the visual premises and conclusions in the person VisArgs according to the present invention. FIG. 6 is a diagram of a case of failure to identify the entire LLaVA-1.5 according to the present invention. FIG. 7 is a diagram showing (left) OCR detection results according to the present invention and (right) actual text instances missed by the model (highlighted in red). The description of the present invention is merely an example for structural or functional explanation, and therefore the scope of the present invention should not be interpreted as being limited by the examples described in the text. That is, since the examples are subject to various modifications and may take various forms, the scope of the present invention should be understood to include equivalents capable of realizing the technical concept. Furthermore, the objectives or effects presented in the present invention do not imply that a specific example must include all of them or only such effects; therefore, the scope of the present invention should not be understood as being limited by them. Meanwhile, the meaning of the terms described in this application should be understood as follows. Terms such as "first," "second," etc., are intended to distinguish one component from another, and the scope of rights shall not be limited by these terms. For example, the first co