JP-7855010-B2 - Apparatus and method for generating training data, and apparatus and method for generating training models

JP7855010B2JP 7855010 B2JP7855010 B2JP 7855010B2JP-7855010-B2

Inventors

大酒正明

Assignees

富士フイルム株式会社

Dates

Publication Date: 20260507
Application Date: 20221026
Priority Date: 20211122

Claims (20)

A learning data generation device that generates learning data, Equipped with a processor, The aforementioned processor , First image data and second image data, each having a region of interest, are acquired. If the positional relationship between the region of interest in the first image data and the region of interest in the second image data satisfies a predetermined condition, the image of the region including the region of interest in the first image data and the image of the region including the region of interest in the second image data are combined to generate a third image data. It is a training data generation device, The predetermined conditions include the fact that the region of interest in the first image data is located within a first region in the image, and the region of interest in the second image data is located within a second region different from the first region in the image. A device for generating training data.
The predetermined conditions include the fact that the region of interest in the first image data is located within the first region at a distance of a threshold or more from the boundary line separating the first region and the second region, and the region of interest in the second image data is located within the second region at a distance of a threshold or more from the boundary line. The learning data generation device according to claim 1 .
The predetermined conditions include the fact that a plurality of the regions of interest in the first image data are located within the first region at a distance of a threshold or more from the boundary line separating the first region and the second region, and that a plurality of the regions of interest in the second image data are located within the second region at a distance of a threshold or more from the boundary line. The learning data generation device according to claim 1 .
When the aforementioned training data is used to train a neural network using convolution, The threshold is set based on the size of the receptive field of the first convolutional layer. The learning data generation device according to claim 2 or 3 .
The processor generates the third image data by combining the image of the first region of the first image data with the image of the region of the second image data other than the first region. The learning data generation device according to claim 1 .
The processor generates the third image data by overwriting the image of the region of the first image data other than the first region with the image of the region of the second image data other than the first region. The learning data generation device according to claim 5 .
The predetermined conditions include the fact that the region of interest in the first image data and the region of interest in the second image data are separated by a threshold or more. A learning data generation device according to any one of claims 1 to 3 .
The aforementioned processor, A boundary line is set between the area of interest in the first image data and the area of interest in the second image data to divide the image into multiple regions. The third image data is generated by combining the image of the first image data in the region containing the region of interest among a plurality of regions of the first image data divided by the boundary line, and the image of the second image data in the region containing the region of interest among a plurality of regions of the second image data divided by the boundary line. The learning data generation device according to claim 7 .
The processor generates the third image data by overwriting the image of the region of the first image data other than the region of interest with the image of the region of interest of the second image data. The learning data generation device according to claim 8 .
When the aforementioned training data is used to train a neural network using convolution, The threshold is set based on the size of the receptive field of the first convolutional layer. The learning data generation device according to claim 7 .
A learning data generation device that generates learning data, Equipped with a processor, The aforementioned processor , First image data and second image data, each having a region of interest, are acquired. If the positional relationship between the region of interest in the first image data and the region of interest in the second image data satisfies a predetermined condition, the image of the region including the region of interest in the first image data and the image of the region including the region of interest in the second image data are combined to generate a third image data. It is a training data generation device, The predetermined conditions include the fact that the region of interest in the first image data and the region of interest in the second image data are separated by a threshold or more. A device for generating training data.
The aforementioned processor, First correct answer data indicating the correct answer for the first image data and second correct answer data indicating the correct answer for the second image data are obtained. A third set of correct data indicating the correct image data is generated from the first set of correct data and the second set of correct data. A learning data generation device according to claim 1, 2, 3, or 11 .
The processor generates a third correct answer data indicating the correct answer of the third image data from the first correct answer data and the second correct answer data, according to the conditions for generating the third image data from the first image data and the second image data. The learning data generation device according to claim 12.
The first and second correct data are mask data for the region of interest. The learning data generation device according to claim 12.
A learning model generation device that generates a learning model, Equipped with a processor, The aforementioned processor, A third image data generated by the learning data generation device according to claim 1, 2, 3, or 11 is acquired, The learning model is trained using the third image data. A learning model generation device.
The processor further uses at least one of the first image data and the second image data used to generate the third image data to train the learning model. A learning model generation device according to claim 15.
The processor performs learning using the third image data and learning using at least one of the first image data and the second image data. A learning model generation device according to claim 16.
The processor trains the learning model by excluding the boundary region of the image synthesis of the third image data. A learning model generation device according to claim 15.
A method for generating training data, which generates training data, The steps include acquiring first image data and second image data, each having a region of interest, The steps include determining whether the region of interest in the first image data and the region of interest in the second image data are in a specific positional relationship, If the positional relationship between the region of interest in the first image data and the region of interest in the second image data satisfies a predetermined condition, the third image data is generated by combining the image of the region of interest in the first image data and the image of the region of interest in the second image data. Includes, The predetermined conditions include the fact that the region of interest in the first image data is located within a first region in the image, and the region of interest in the second image data is located within a second region different from the first region in the image. Method for generating training data.
A method for generating training data, which generates training data, The steps include acquiring first image data and second image data, each having a region of interest, The steps include determining whether the region of interest in the first image data and the region of interest in the second image data are in a specific positional relationship, If the positional relationship between the region of interest in the first image data and the region of interest in the second image data satisfies a predetermined condition, the third image data is generated by combining the image of the region of interest in the first image data and the image of the region of interest in the second image data. Includes, The predetermined conditions include the fact that the region of interest in the first image data and the region of interest in the second image data are separated by a threshold or more. Method for generating training data.

Description

The present invention relates to a learning data generation apparatus and method, and a learning model generation apparatus and method, and more particularly to a learning data generation apparatus and method for a learning model that performs image recognition, and to a learning model generation apparatus and method. In recent years, with the advent of deep learning (see Non-Patent Document 1, etc.), it has become possible to generate models with high recognition accuracy when given a large amount of training data for image recognition. Patent Document 1 describes a technique for increasing training data by combining the image to be recognized with the image used as the input image during training. Patent Document 2 describes a technique for increasing the variety of training data by extracting images of specific parts from an image to be recognized, applying image transformation processing to the extracted images of parts, and then synthesizing them with the image to be recognized. Japanese Patent Publication No. 2021-157404Japanese Patent Publication No. 2020-60883 A. Krizhevsky, I. Sutskever, and G. Hinton. ImageNet Classification with Deep Convolutional Neural Networks. In NIPS, 2012 A diagram showing an example of training data.Conceptual diagram for generating training dataA diagram showing an example of image segmentation.A diagram showing an example where the location of the lesion cannot be identified.Figure showing an example of new image data.A diagram showing an example of new correct answer data.Block diagram showing an example of the hardware configuration of a training data generation device.Block diagram of the main functions of the learning data generation deviceA flowchart illustrating an example of the procedure for generating new training data.This diagram shows an example of combining four image data.This diagram shows an example of setting a boundary line by dynamically changing it.A diagram showing an example of the newly generated image data.Conceptual diagram for determining whether synthesis is possible or notBlock diagram of the main functions of the learning data generation deviceA flowchart illustrating an example of the procedure for generating new training data.A diagram showing another example of boundary line setting.This diagram shows an example of dynamically switching the boundary settings for each training data set being synthesized.This figure shows an example of setting boundaries when there are multiple areas of interest.Block diagram of the main functions of the learning model generation device. Preferred embodiments of the present invention will be described below with reference to the attached drawings. [Training data generation device (training data generation method)] [First Embodiment] This section explains the process using the example of generating a learning model that recognizes lesions from images (endoscopic images) of tubular organs such as the stomach and large intestine. In particular, it explains the process of generating a learning model that recognizes the area occupied by the lesion within the image, that is, a learning model that performs image segmentation (especially semantic segmentation). In this case, examples of learning models that can be used include U-net, FCN (Fully Convolutional Network), SegNet, PSPNet (Pyramid Scene Parsing Network), Deeplab v3+, etc. These are types of neural networks that use convolutional processing, i.e., convolutional neural networks (CNN or ConvNet). Figure 1 shows an example of training data. As shown in the figure, the training data consists of pairs of image data and ground truth data. The image data is training image data. The training image data consists of image data that includes the object to be recognized. As described above, in this embodiment, a learning model that recognizes lesions is generated from images taken with an endoscope. Therefore, the training image data consists of image data taken with an endoscope, and also includes image data that includes lesions. In particular, it consists of image data of the target organ for image recognition, taken with an endoscope. For example, when recognizing a lesion in the stomach, it consists of image data of the stomach taken with an endoscope. The ground truth data is data that shows the correct answers for the training image data. In this embodiment, the training image data consists of image data in which the lesion is distinguished from the rest of the image. Figure 1 shows an example in which the ground truth data is constructed using so-called mask images. In this case, the ground truth data is constructed using image data of an image with the lesion masked (an image with the lesion filled in). Image data of an image with the lesion masked is an example of mask data. Thus, training data consists of pairs of image data and ground truth data (image pairs). A large number of these image pairs are prepared to construct a dataset, and the training model i