KR-20260064236-A - METHOD AND SERVER FOR TRAINING OBJECT DETECTION MODEL ADAPTIVE TO CAMERA INSTALLATION ENVIRONMENT

KR20260064236AKR 20260064236 AKR20260064236 AKR 20260064236AKR-20260064236-A

Abstract

A method for training an object detection model adaptive to the installation environment of a camera according to an embodiment of the present invention may include: a step of obtaining first edge camera information including first view point information corresponding to the installation environment of a first edge camera and first pose information for an object captured by the first edge camera; a step of determining an entire training dataset including 6D pose information representing the 3D position and 3D rotation of an object from an image dataset using an artificial intelligence model; a step of selecting at least two first training datasets corresponding to the first edge camera information from the entire training dataset; and a step of training an object detection model adaptive to the installation environment of the first edge camera through backpropagation to minimize a pose loss function determined based on poses grouped for each object using the selected at least two first training datasets.

Inventors

김흥준
송봉섭
이현창

Assignees

주식회사 슈프리마

Dates

Publication Date: 20260507
Application Date: 20241031

Claims (20)

As a method for training an object detection model adaptive to the camera installation environment, A step of obtaining first edge camera information including first view point information corresponding to the installation environment of the first edge camera and first pose information for an object captured by the first edge camera; A step of determining an entire training dataset including 6D pose information representing the 3D position and 3D rotation of an object from an image dataset using an artificial intelligence model; A step of selecting at least two first training datasets corresponding to the first edge camera information from the entire training dataset; and The method includes the step of training an object detection model adaptive to the installation environment of the first edge camera through backpropagation to minimize a pose loss function determined based on poses grouped for each object using at least two selected first training datasets. Object detection model training method.
In Article 1, The step of acquiring the first edge camera information is A step comprising acquiring the first view point information and the first pose information at the time of initialization of the first edge camera or at preset intervals using a first artificial intelligence model mounted on the first edge camera. Object detection model training method.
In Article 1, The step of acquiring the first edge camera information is A step of acquiring the first viewpoint information based on the movement direction and perspective change of the object being photographed. Object detection model training method.
In Article 1, The step of determining the entire training dataset above is, A step of inferring camera view points and object poses from images included in the image dataset using the artificial intelligence model above; A step of determining the 6D pose information based on the inferred camera view point and object pose; and The method includes the step of determining the entire training dataset through clustering of images included in the image dataset based on the above 6D pose information. Object detection model training method.
In Paragraph 4, The step of determining the entire training dataset through clustering the above images is, The method includes the step of determining the entire training dataset comprising at least one cluster corresponding to the object pose based on optimization for at least one first clustering parameter, and The above at least one first clustering parameter is, Includes at least one parameter regarding the range of clusters, the number of clusters, and the proportion of objects corresponding to each cluster in the image dataset. Object detection model training method.
In Article 5, The step of selecting at least two first training datasets is, A step of determining a sampling rate for each of at least one cluster included in the entire training dataset based on the first view point information and the first pose information; and The method includes the step of determining at least two first training datasets based on the adjustment of at least one first clustering parameter and the sampling rate. Object detection model training method.
In Article 6, The step of determining at least two first training datasets is, The method includes the step of determining at least two first training datasets based on adjustment for at least one second clustering parameter, and The above at least one second clustering parameter is, including at least one parameter among inter-cluster distance, intra-cluster variance, and cluster selection weight Object detection model training method.
In Article 1, The step of training an object detection model adaptive to the installation environment of the first edge camera is: A step of augmenting the at least two first training datasets using a mosaic augmentation technique that maintains object poses; and A step comprising further training the object detection model based on at least two augmented first training datasets. Object detection model training method.
In Article 1, A step of determining an optimal object detection model based on a performance evaluation of an object detection model trained using at least two first training datasets; and The step of distributing the above optimal object detection model to the first edge camera is further included. Object detection model training method.
As a server that trains an object detection model adaptive to the camera installation environment, Memory where the object detection model training program is stored; and It includes a processor that loads the object detection model training program from the memory and executes the object detection model training program. The above processor is, Acquiring first edge camera information including first view point information corresponding to the installation environment of the first edge camera and first pose information for an object captured by the first edge camera, and Using an artificial intelligence model, an entire training dataset including 6D pose information representing the 3D position and 3D rotation of an object is determined from an image dataset, and at least two first training datasets corresponding to the first edge camera information are selected from the entire training dataset. Training an object detection model adaptive to the installation environment of the first edge camera through backpropagation to minimize a pose loss function determined based on poses grouped for each object using at least two selected first training datasets. Server.
In Article 10, The above processor is, Using a first artificial intelligence model mounted on the first edge camera, the first view point information and the first pose information are obtained at the initialization time of the first edge camera or at preset intervals. Server.
In Article 10, The above processor is, Based on the movement direction and perspective change of the object being photographed, the first view point information is obtained. Server.
In Article 10, The above processor is, Using the above artificial intelligence model, camera view points and object poses are inferred from images included in the above image dataset, and Determining the 6D pose information based on the inferred camera view point and object pose above, and Based on the above 6D pose information, the entire training dataset is determined through clustering of images included in the image dataset. Server.
In Article 13, The above processor is, Based on optimization for at least one first clustering parameter, the entire training dataset including at least one cluster corresponding to the object pose is determined, and The above at least one first clustering parameter is, Includes at least one parameter regarding the range of clusters, the number of clusters, and the proportion of objects corresponding to each cluster in the image dataset. Server.
In Article 14, The above processor is, Based on the first view point information and the first pose information, a sampling rate for each of at least one cluster included in the entire training dataset is determined, and Determining at least two first training datasets based on the adjustment of at least one first clustering parameter and the sampling rate. Server.
In Article 15, The above processor is, The method includes the step of determining at least two first training datasets based on adjustment for at least one second clustering parameter, and The above at least one second clustering parameter is, including at least one parameter among inter-cluster distance, intra-cluster variance, and cluster selection weight Server.
In Article 10, The above processor is, Augmenting the at least two first training datasets using a mosaic augmentation technique that maintains object poses, and Training the object detection model based on at least two augmented first training datasets. Server.
In Article 10, The above processor is, Based on the performance evaluation of an object detection model trained using at least two first training datasets, an optimal object detection model is determined, and Distributing the above optimal object detection model to the first edge camera Server.
As a computer-readable recording medium storing a computer program, When the above computer program is executed by a processor, A step of obtaining first edge camera information including first view point information corresponding to the installation environment of the first edge camera and first pose information for an object captured by the first edge camera; A step of determining an entire training dataset including 6D pose information representing the 3D position and 3D rotation of an object from an image dataset using an artificial intelligence model; A step of selecting at least two first training datasets corresponding to the first edge camera information from the entire training dataset; and The method includes the step of training an object detection model adaptive to the installation environment of the first edge camera through backpropagation to minimize a pose loss function determined based on poses grouped for each object using at least two selected first training datasets. A method comprising instructions for the processor to perform a method of training an object detection model adaptive to the camera's installation environment. Computer-readable recording medium.
As a computer program stored on a computer-readable recording medium, When the above computer program is executed by a processor, A step of obtaining first edge camera information including first view point information corresponding to the installation environment of the first edge camera and first pose information for an object captured by the first edge camera; A step of determining an entire training dataset including 6D pose information representing the 3D position and 3D rotation of an object from an image dataset using an artificial intelligence model; A step of selecting at least two first training datasets corresponding to the first edge camera information from the entire training dataset; and The method includes the step of training an object detection model adaptive to the installation environment of the first edge camera through backpropagation to minimize a pose loss function determined based on poses grouped for each object using at least two selected first training datasets. A method comprising instructions for the processor to perform a method of training an object detection model adaptive to the camera's installation environment. A computer program stored on a computer-readable recording medium.

Description

Method and Server for Training an Object Detection Model Adaptive to Camera Installation Environment The present invention relates to a method and a server for training an object detection model that is adaptive to the installation environment of a camera. In order to determine the viewpoint, which is the angle at which a camera looks at a target object in edge-based image sensing devices (e.g., CCTVs, black boxes, kiosks, etc.), calibration must be performed to calculate the camera's internal and external parameters. Here, the calibration process requires detailed specifications of the camera sensor and lens, a calibration image, and a complex process to find approximate parameters. Meanwhile, since the shape and characteristics of the target object to be detected in the image vary significantly depending on the camera's viewpoint, if a generally trained object detection model is lightweighted and used on an edge-based image detection device equipped with a low-spec NPU or CPU, a problem arises in which object detection performance is significantly degraded. In this regard, one could consider using an object detection model trained on an image dataset classified according to the viewpoint of the camera's installation environment in an edge-based image detection device; however, since the image dataset includes millions of images captured by various camera models, manually classifying the dataset based on the viewpoint of the camera's installation environment has limitations in that it requires a significant amount of time and cost. Accordingly, there is a need to develop a method to improve image detection performance by constructing a training dataset adaptive to the installation environment of edge-based image detection devices and training an object detection model using the constructed training dataset. FIG. 1 is a block diagram showing a server according to an embodiment of the present invention. FIG. 2 is a block diagram conceptually illustrating the functions of an object detection model learning program according to an embodiment of the present invention. FIG. 3 is a flowchart illustrating an object detection model learning method according to an embodiment of the present invention. FIG. 4 is a diagram illustrating an exemplary system for selecting a learning dataset adaptive to the installation environment of an edge camera using a server according to an embodiment of the present invention, and distributing an object detection model learned based on the selected learning dataset to an edge camera device. The advantages and features of the present invention and the methods for achieving them will become clear by referring to the embodiments described below in detail together with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below but can be implemented in various different forms. These embodiments are provided merely to ensure that the disclosure of the present invention is complete and to fully inform those skilled in the art of the scope of the invention, and the present invention is defined only by the scope of the claims. In describing the embodiments of the present invention, specific descriptions of known functions or configurations will be omitted if it is determined that such detailed descriptions could unnecessarily obscure the essence of the invention. Furthermore, the terms described below are defined in consideration of their functions in the embodiments of the present invention, and these definitions may vary depending on the intentions or practices of the user or operator. Therefore, such definitions should be based on the content throughout this specification. FIG. 1 is a block diagram showing a server according to an embodiment of the present invention. Referring to FIG. 1, the server (100) may include a processor (110), an input/output device (120), and a memory (130). The processor (110) can control the overall operation of the server (100). The processor (110) can receive view point information corresponding to the installation environment of the edge camera and pose information for an object captured by the edge camera using an input/output device (120). Additionally, the processor (110) can receive image information that does not include a label regarding cracks in the tunnel using an input/output device (120). In the present invention, the edge camera is a device equipped with an artificial intelligence model that processes captured data independently, such as analyzing video captured by the camera and recognizing specific actions, and may include, for example, a CCTV, a black box, a kiosk, etc. In the present invention, the view point information corresponding to the installation environment of the edge camera is information regarding the camera installation angle, and may include information regarding yaw, pitch, and roll, and position information regarding a point corresponding to the field of view. In addition, in th