EP-3762868-B1 - SYSTEMS AND METHODS FOR TRAINING GENERATIVE ADVERSARIAL NETWORKS AND USE OF TRAINED GENERATIVE ADVERSARIAL NETWORKS

EP3762868B1EP 3762868 B1EP3762868 B1EP 3762868B1EP-3762868-B1

Inventors

NGO DINH, Nhan
EVANGELISTI, Giulio
NAVARI, Flavio

Dates

Publication Date: 20260506
Application Date: 20190611

Claims (13)

A system for training a neural network system to detect an abnormality in one or more medical images of a gastro-intestinal organ, comprising: at least one memory configured to store instructions; and at least one processor (607) configured to execute the instructions to perform operations for training a neural network comprising a discriminator network and a generator network, the discriminator network comprising a perception branch having an object detection network for detecting the abnormality, and an adversarial branch, the operations comprising: selecting a first subset (403) of a plurality of videos including representations of the abnormality; applying (503) the perception branch of the discriminator network to frames of the first subset (403) of the plurality of videos to produce a first plurality of detections of the abnormality; selecting a second subset (411) of the plurality of videos; using the first plurality of detections (413) and frames from the second subset (411) of the plurality of videos, training (505) the generator network to generate a plurality of artificial representations of the abnormality; training (507) the adversarial branch of the discriminator network to differentiate between the artificial representations (417) of the abnormality and true representations of the abnormality in the second subset (411); applying (509) the adversarial branch of the discriminator network to the plurality of artificial representations and the true representations of the abnormality in the second subset (411) to produce difference indicators between the artificial representations of the abnormality and true representations of the abnormality included in frames of the second subset (411) of the plurality of videos; applying (511) the perception branch of the discriminator network to the artificial representations to produce a second plurality of detections (413) of the abnormality; and retraining (513) the perception branch of the discriminator network based on the difference indicators (419) and the second plurality of detections (413).
The system of claim 1, wherein the number of images in the second subset of the plurality of videos is at least 100 times larger than that included in the first subset of the plurality of videos.
The system of any preceding claim, wherein the generator network comprises a generative adversarial network.
The system of any preceding claim, wherein the discriminator network comprises a convolutional neural network.
The system of any preceding claim, wherein at least one of the first subset of the plurality of videos and the second subset of the plurality of videos comprise images from an imaging device used during at least one of a gastroscopy, a colonoscopy, an enteroscopy, or an upper endoscopy, the imaging device optionally including an endoscopy device.
The system of any preceding claim, wherein the abnormality comprises any one of: a change in human tissue from one type of cell to another type of cell, an absence of human tissue from a location where the human tissue is expected, and/or a formation on or of human tissue.
The system of claim 6, wherein the abnormality comprises a lesion, optionally including a polypoid lesion or a non-polypoid lesion.
A computer-implemented method (500) for training a neural network system to detect an abnormality in one or more medical images of a gastro-intestinal organ, the neural network system comprising a discriminator network and a generator network, the discriminator network comprising a perception branch having an object detection network for detecting the abnormality, and an adversarial branch, the method comprising the following steps performed by at least one processor (607): storing (501), in a database, a plurality of videos including representations of the abnormality; selecting a first subset (403) of the plurality of videos; applying (503) the perception branch of the discriminator network to frames of the first subset (403) of the plurality of videos to produce a first plurality of detections of the abnormality; selecting a second subset (411) of the plurality of videos; using the first plurality of detections (413) and frames from the second subset (411) of the plurality of videos, training (505) the generator network to generate a plurality of artificial representations of the abnormality; training (507) the adversarial branch of the discriminator network to differentiate between the artificial representations (417) of the abnormality and true representations of the abnormality in the second subset (411); applying (509) the adversarial branch of the discriminator network to the plurality of artificial representations and the true representations of the abnormality in the second subset (411) to produce difference indicators between the artificial representations of the abnormality and true representations of the abnormality included in frames of the second subset (411) of the plurality of videos; applying (511) the perception branch of the discriminator network to the artificial representations to produce a second plurality of detections (413) of the abnormality; and retraining (513) the perception branch of the discriminator network based on the difference indicators (419) and the second plurality of detections (413).
The method of claim 8, wherein the generator network comprises a generative adversarial network.
The method of claim 8 or 9, wherein the discriminator network comprises a convolutional neural network.
The method of any one of claims 8 to 10, wherein the abnormality comprises any one or more of: a change in human tissue from one type of cell to another type of cell, an absence of human tissue from a location where the human tissue is expected, and/or a formation on or of human tissue.
The method of claim 11, wherein the abnormality comprises a lesion, optionally including a polypoid lesion or a non-polypoid lesion.
The method of any of claims 8 to 12, wherein each artificial representation provides a false representation of the abnormality that is highly similar to a true representation of the abnormality.

Description

TECHNICAL FIELD The present disclosure relates generally to the field of neural networks and the use of such networks for image analysis and object detection. More specifically, and without limitation, this disclosure relates to computer-implemented systems and methods for training generative adversarial networks and using the same. The systems and methods and trained neural networks disclosed herein may be used in various applications and vision systems, such as medical image analysis and systems that benefit from accurate object detection capabilities. BACKGROUND In many object detection systems, an object is detected in an image. An object of interest may be a person, place, or thing. In some applications, such as medical image analysis and diagnosis, the location of the object is important as well. However, computer-implemented systems that utilize image classifiers are typically unable to identify or provide the location of a detected object. Accordingly, extant systems that only use image classifiers are not very useful. Furthermore, training techniques for object detection may rely on manually annotated training sets. Such annotations are time-consuming when the detection network being trained is one that is bounding box-based, such as a You Only Look Once (YOLO) architecture, a Single Shot Detector (SSD) architecture, or the like. Accordingly, large datasets are difficult to annotate for training, often resulting in a neural network that is trained on a smaller dataset, which decreases accuracy. For computer-implemented systems, extant medical imaging is usually built on a single detector network. Accordingly, once a detection is made, the network simply outputs the detection, e.g., to a physician or other health care professional. However, such detections may be false positives, such as non-polyps in endoscopy or the like. Such systems do not provide a separate network for differentiating false positives from true positives. Furthermore, object detectors based on neural networks usually feed features identified by a neural network into the detector, which may comprise a second neural network. However, such networks are often inaccurate because feature detection is performed by a generalized network, with only the detector portion being specialized. Finally, many extant object detectors function on a delay. For example, medical images may be captured and stored before analysis. However, some medical procedures, such as endoscopy, are diagnosed on a real-time basis. Consequently, these systems are usually difficult to apply in the required real-time fashion. AHN JUNGMO ET AL, "Finding Small-Bowel Lesions: Challenges in Endoscopy-Image-Based Learning Systems", COMPUTER, IEEE COMPUTER SOCIETY, USA, vol. 51, no. 5, doi:10.1109/MC.2018.2381116, ISSN 0018-9162 states that capsule endoscopy identifies damaged areas in a patient's small intestine but often outputs poor-quality images or misses lesions, leading to either misdiagnosis or repetition of the lengthy procedure. The authors propose applying deep-learning models to automatically process the captured images and identify lesions in real time, enabling the capsule to take additional images of a specific location, adjust its focus level, or improve image quality. The authors also describe the technical challenges in realizing a viable automated capsule-endoscopy system. ROSS TOBIAS ET AL, "Exploiting the potential of unlabeled endoscopic video data with self-supervised learning", INTERNATIONAL JOURNAL OF COMPUTER ASSISTED RADIOLOGY AND SURGERY, SPRINGER, DE, vol. 13, no. 6, doi:10.1007/S11548-018-1772-0, ISSN 1861-6410 proposes the re-colorization of medical images with a conditional generative adversarial network (cGAN)-based architecture as auxiliary task. A variant of the method involves a second pre-training step based on labeled data for the target task from a related domain. We validate both variants using medical instrument segmentation as target task. SUMMARY The invention sets out a system for training a neural network system to detect an abnormality in one or more medical images of a gastro-intestinal organ according to claim 1 and a method for training a neural network system to detect an abnormality in one or more medical images of a gastro-intestinal organ according to claim 8. Preferred embodiments are set out in the dependent claims. BRIEF DESCRIPTION OF THE DRAWINGS The accompanying drawings, which comprise a part of this specification, illustrate several embodiments and, together with the description, serve to explain the principles and features of the disclosed embodiments. In the drawings: FIG. 1 is a schematic representation of an exemplary computer-implemented system for overlaying object detections on a video feed, according to embodiments of the present disclosure.FIG. 2 is an exemplary two phase training loop for an object detection network, according to embodiments of the present disclosure.FIG. 3 is a flowchart of an exemplary