KR-20260063061-A - Methods for training object detection networks that operate in domain-general situation

KR20260063061AKR 20260063061 AKR20260063061 AKR 20260063061AKR-20260063061-A

Abstract

A method for training an object detection network according to one embodiment of the present invention comprises: a step of generating augmented image data (I A ) by applying image data (I) to an image augmentation module; a step of extracting an original feature map (F) from the augmented image data (I A ); and a step of extracting an augmented feature map (F A ) by applying the original feature map (F) to a text-based feature map augmentation module. By doing so, training of a domain-generalized object detection network that can operate well even when there is a difference between the source data domain and the target data domain can be provided.

Inventors

김귀식
박하영
조충상
이영한
김태우

Assignees

한국전자기술연구원

Dates

Publication Date: 20260507
Application Date: 20241030

Claims (12)

The system acquires image data (I); The system applies acquired image data (I) to an image enhancement module to generate enhanced image data (I A ); The system extracts an original feature map (F) from augmented image data (I A ); and A training method for an object detection network operating in a general domain situation, comprising the step of the system applying an original feature map (F) to a text-based feature map augmentation module to extract an augmented feature map ( FA ).
In claim 1, The step of generating augmented image data (I A ) is, A training method for an object detection network operating in a domain-general situation, characterized by performing a blurring-based image augmentation operation such that a relatively higher weight is set to pixels in the central region of image data (I) and a relatively lower weight is set to pixels in the outer region.
In claim 1, Image data (I) is, It is image data obtained in the first domain environment, and The original feature map (F) is, It is a feature map of image data obtained in a first domain environment, and The step of extracting the augmented feature map ( FA ) is, A learning method for an object detection network operating in a domain general situation, characterized by extracting an augmented feature map ( FA ) by combining text data for describing one or more cases where the domain environment is different from the first domain environment with the original feature map (F).
In claim 3, The step of extracting the augmented feature map ( FA ) is, A training method for an object detection network operating in a domain-general situation, characterized by applying an addition operation technique or a style conversion technique when combining text data for describing one or more other cases with an original feature map (F).
In claim 4, The step of extracting the augmented feature map ( FA ) is, A training method for an object detection network operating in a domain-general situation, characterized in that when combining text data for describing one or more other cases with an original feature map (F), μ() and σ() are functions for calculating the mean and standard deviation, μA and σA are mean and standard deviation factors used for augmentation, and dA is an addition-based augmentation factor, an augmented feature map ( FA ) is calculated through the following Equation 1. (Formula 1)
In claim 5, The system further includes the step of training a feature map augmentation module based on an augmented feature map ( FA ), and The step of training the feature map augmentation module is, A training method for an object detection network operating in a domain-general situation, characterized by training μ A , σ A , and d A.
In claim 5, A method for training an object detection network operating in a domain-general situation, characterized by further including the step of training an object detection module that detects objects in image data ( I ) or augmented image data (I A ) based on an augmented feature map (FA).
In claim 3, The image enhancement module is, Generates augmented image data ( IA ) independent of the domain environment, and The feature map augmentation module is, A training method for an object detection network operating in a general domain situation, characterized by extracting an augmented feature map ( FA ) related to the domain environment.
In claim 8, The image augmentation module and the feature map augmentation module are, A learning method for an object detection network operating in a general domain situation, characterized by being simultaneously applied to the system, each responsible for irrelevant and relevant augmentation of the domain environment, such that a greater performance enhancement effect occurs than when each is applied separately to the system.
A communication unit for acquiring image data (I); and A learning system for an object detection network operating in a general domain situation, comprising: a processor that applies acquired image data (I) to an image augmentation module to generate augmented image data (I A ), extracts an original feature map (F) from the augmented image data (I A ), and applies the original feature map (F) to a text-based feature map augmentation module to extract an augmented feature map (F A ).
The system applies image data (I) to an image enhancement module to generate enhanced image data (I A ); The system extracts an original feature map (F) from augmented image data (I A ); and A training method for an object detection network operating in a general domain situation, comprising the step of the system applying an original feature map (F) to a text-based feature map augmentation module to extract an augmented feature map ( FA ).
An image augmentation module that generates augmented image data (I A ) by applying image data (I) as input data; A feature map extraction module for extracting an original feature map (F) from augmented image data (I A ); and A learning system for an object detection network operating in a general domain situation, comprising a feature map augmentation module that extracts an augmented feature map ( FA ) by applying an original feature map (F) as input data.

Description

Methods for training object detection networks that operate in domain-general situations The present invention relates to a visual intelligence-based object detection technology, and more specifically, to a method for training a visual intelligence-based object detection network. If there is a difference in domain (e.g., weather conditions) between the data used to train an AI network and the input data for actual operation, there is a problem where performance significantly degrades during actual operation compared to tests using the domain of the training data. Furthermore, conventional training data augmentation techniques do not simultaneously consider augmentation techniques for input images and augmentation techniques for intermediate results (features) within the network, and thus have a problem in that they cannot effectively address the mutual influence between the two techniques. FIG. 1 is a drawing provided for the description of the configuration of a learning system of an object detection network operating in a general domain situation according to an embodiment of the present invention. FIG. 2 is a drawing provided for a more detailed configuration description of the processor illustrated in FIG. 1. FIG. 3 is a drawing provided to explain the process of performing learning of a feature map augmentation module or an object detection module through a learning system of an object detection network operating in a domain general situation according to an embodiment of the present invention, and FIG. 4 is a flowchart provided to describe a learning method of an object detection network operating in a general domain situation according to one embodiment of the present invention. The present invention will be described in more detail below with reference to the drawings. To clearly explain the invention, parts unrelated to the description have been omitted from the drawings, and in the drawings, the width, length, thickness, etc., of the components may be exaggerated for convenience. FIG. 1 is a diagram provided to describe the configuration of a learning system for an object detection network operating in a general domain situation according to one embodiment of the present invention. A learning system for an object detection network operating in a general domain situation according to the present embodiment (hereinafter collectively referred to as the 'system') can simultaneously consider a data augmentation method for input data and a network intermediate feature augmentation method during the process of learning a visual intelligence-based object detection network. In other words, the system can provide a training method for an object detection network that can perform well on target data even when there is a difference between the source data domain and the target data domain. To this end, the system may include a communication unit (100), a processor (200), and a storage unit (300). The communication unit (100) is equipped with a communication module connected to a network, and can acquire image data (I) for object detection. The storage unit (300) is provided to store programs and data necessary for the operation of the processor (200). The processor (200) can handle all matters for simultaneously considering data augmentation methods for input data and network intermediate feature augmentation methods during the process of training a visual intelligence-based object detection network. Specifically, the processor (200) can apply image data (I) to an image augmentation module to generate augmented image data (I A ), extract an original feature map (F) from the augmented image data (I A ), and apply the original feature map (F) to a text-based feature map augmentation module to extract an augmented feature map (F A ), thereby performing training of the feature map augmentation module or the object detection module. A more detailed description of the processor (200) will be provided later with reference to FIGS. 2 and 3. FIG. 2 is a drawing provided for a more detailed configuration description of the processor illustrated in FIG. 1, and FIG. 3 is a drawing provided for a description of a process of performing learning of a feature map augmentation module or an object detection module through a learning system of an object detection network operating in a general domain situation according to an embodiment of the present invention. Referring to FIG. 2, the processor (200) may include an image augmentation module (210) that generates augmented image data (I A ) by applying image data (I) as input data, a feature map extraction module (220) that extracts an original feature map (F) from the augmented image data (I A ), and a feature map augmentation module (230) that extracts an augmented feature map (F A ) by applying the original feature map (F) as input data. The image augmentation module (210) can generate augmented image data (I A ) independent of the domain environment. Specifically, the image