JP-7856392-B2 - Image processing device, control method for image processing device, program

JP7856392B2JP 7856392 B2JP7856392 B2JP 7856392B2JP-7856392-B2

Inventors

石栗隆一

Assignees

キヤノン株式会社

Dates

Publication Date: 20260511
Application Date: 20210827

Claims (7)

An information processing system that communicates with a terminal device that performs image recognition on captured images and trains a neural network image recognition model for the purpose of image recognition, Receiving means for receiving a facial image transmitted from the terminal device, A determination means for determining whether or not the facial image received by the receiving means is a learning target, The aforementioned discrimination means includes a training data acquisition means that acquires a face image corresponding to the face image determined to be the target of recognition, and a person name information corresponding to that face image, from the database of the information processing system as training data. A learning means for training an image recognition model to recognize a person from an image determined to be the target of learning using the aforementioned training data, The system includes a transmission means for transmitting the learned image recognition model to the terminal device, The aforementioned discrimination means is characterized in that it identifies facial images received from the terminal device a certain number of times or more within a unit period as recognition targets, and does not identify facial images that do not meet this criterion as recognition targets.
The information processing system according to claim 1, characterized in that the database stores image data received from the Internet.
The information processing system according to claim 1, characterized in that the learning means performs training of an image recognition model when the receiving means receives an image from the terminal device.
The information processing system according to claim 1, characterized in that the learning means performs training on the image recognition model at regular intervals.
The training data obtained from the aforementioned database includes attribute information of individuals. The information processing system according to any one of claims 1 to 4, characterized in that the attribute information includes any of age, gender, nationality, height, weight, occupation, affiliation, or hobbies.
A control method for an information processing system that communicates with a terminal device that performs image recognition on captured images and trains a neural network image recognition model for the said image recognition, A receiving step of receiving a facial image transmitted from the terminal device, and a determination step of determining whether or not the facial image received in the receiving step is a learning target, The aforementioned discrimination step involves acquiring training data from the database of the information processing system to obtain training data, which includes training data for the face image determined to be the target of recognition in the aforementioned discrimination step, and training data for the face image and the corresponding person's name information. A learning process in which an image recognition model is trained to recognize a person from an image determined to be the target of learning using the aforementioned training data, The process includes a transmission step of transmitting the learned image recognition model to the terminal device, The control method for an information processing system is characterized in that, in the discrimination step, images of faces that have been received from the terminal device a certain number of times or more within a unit period are identified as recognition targets, and images of faces that do not meet this criteria are not identified as recognition targets.
A computer-readable program for a computer to operate as one of the means of the information processing system described in any one of claims 1 to 5.

Description

This invention relates to an image processing device for recording a learning model. In recent years, deep learning methods based on convolutional neural network (CNN) models have become known. To perform face recognition using such learning models, a large-scale database containing a large number of labeled face images is necessary. For example, Patent Document 1 describes a method for generating multiple training images from source training images and using them as training data, in order to solve the problem of needing a large number of labeled face images to train a CNN model. Japanese Patent Publication No. 2018-195309 This is a block diagram of the entire system in the first embodiment.This is a sequence diagram in the first embodiment.This is a flowchart of the person recognition server in the first embodiment.This is a flowchart of the attribute information server in the first embodiment.This is a flowchart of the management server in the first embodiment.This is a flowchart of the training server in the first embodiment.This is a flowchart of the person recognition glasses in the first embodiment. The embodiments for carrying out the present invention will be described in detail below with reference to the attached drawings. The embodiments described below are merely examples of means for realizing the present invention, and may be modified or changed as appropriate depending on the configuration of the apparatus to which the present invention is applied and various conditions. Furthermore, it is possible to combine the embodiments as appropriate. [First Embodiment] <Configuration of each device> Figure 1 is a block diagram showing the overall configuration of a human face image recognition system using the image recognition system of the present invention. Here, we describe a server PC and smart glasses as examples of the devices that make up a facial image recognition system, but the devices are not limited to these. For example, the devices may be information processing devices such as portable PCs, tablet PCs, and media players. First, let me explain the learning server A100. The control unit A101 controls each part of the learning server A100 according to the input signals and the program described later. Alternatively, instead of the control unit A101 controlling the entire device, multiple hardware components may share the processing to control the entire device. Non-volatile memory A102 is an electrically erasable and recordable non-volatile memory that stores programs and other data executed by control unit A101, as described later. The working memory A103 is used as a buffer memory for temporarily holding training data, as image display memory for the display unit A105, and as a working area for the control unit A101, etc. The control unit A104 is used to receive instructions from the user for the learning server A100. The control unit A104 includes, for example, the power button for the learning server A100, a keyboard, and a mouse. The display unit A105 displays learning data and a GUI (Graphical User Interface) for interactive operation. Note that the display unit A105 does not necessarily need to be built into the learning server A100. The learning server A100 can connect to an external display device and only needs to have a display control function to control the display of the display unit A105. The learning target determination unit A106 determines whether or not to use the facial image data recorded on the recording medium A110 (described later) as the learning target. The face image learning unit A107 performs learning processing on face image data recorded on the recording medium A110, which will be described later. In this embodiment, a neural network is used, and a machine learning algorithm is used to generate a learning model (image recognition model) that recognizes face images recorded on the recording medium A110. The neural network is used to predict output values from input values, and by learning training data consisting of actual input and output values in advance, it can estimate output values for new input values. In this embodiment, it is assumed that a database of training data created from various face images is recorded on the recording medium A110, which will be described later. The recording medium A110 can record facial image data and training data. The recording medium A110 may be configured to be detachable from the learning server A100, or it may be built into the learning server A100. In other words, the learning server A100 only needs to have means to access the recording medium A110. The communication unit A120 is an interface that performs wireless LAN communication, for example, compliant with the IEEE 802.11 standard. Wireless LAN communication enables wireless communication with the access point. Furthermore, higher-level protocols such as TCP/IP enable data transmission and reception between the access point and devices connected to the cloud netw