EP-4292062-B1 - LEARNING METHOD FOR A MACHINE LEARNING SYSTEM FOR DETECTING AND MODELLING AN OBJECT IN AN IMAGE, CORRESPONDING COMPUTER PROGRAM PRODUCT AND DEVICE

EP4292062B1EP 4292062 B1EP4292062 B1EP 4292062B1EP-4292062-B1

Inventors

NATUREL, XAVIER
CHOUKROUN, ARIEL

Dates

Publication Date: 20260506
Application Date: 20220210

Claims (11)

A method for learning a machine learning system for detecting and modeling at least one object represented in at least one given image and/or at least one characteristic region of said at least one given image, characterized in that said machine learning system performs: - generating a plurality of augmented reality images comprising a real image and at least one virtual element representative of said at least one object and/or of said at least one characteristic region; - obtaining, for each augmented reality image, learning information comprising, for at least one given virtual element of said augmented reality image: - a segmentation model of the given virtual element obtained from said given virtual element, and - either a set of contour points corresponding to a parametrization of the given virtual element, or said parametrization, obtained from said given virtual element; and - learning from the plurality of augmented reality images and said learning information, delivering a set of parameters enabling said machine learning system to detect said at least one object and/or said at least one characteristic region in said at least one given image and to determine corresponding modeling information comprising: - a segmentation model of said at least one object and/or of said at least one characteristic region, and - either a set of contour points corresponding to a parametrization of said at least one object and/or of said at least one characteristic region, or said parametrization.
The learning method according to claim 1 wherein, for each augmented reality image, said learning of said machine learning system comprises joint learning from, on the one hand, said segmentation model of the given virtual element and, on the other hand, said set of contour points corresponding to a parameterization of the given virtual element.
The learning method according to claim 2, wherein said joint learning implements a cost function dependent on a linear combination between, on the one hand, a cross-entropy associated with said segmentation model of the given virtual element and, on the other hand, a Euclidean distance associated with said set of contour points corresponding to a parametrization of the given virtual element.
The learning method according to any one of claims 1 to 3, wherein said real image comprises the illustration of a face, and wherein said learning information comprises, for at least one contour point of said set of contour points corresponding to a parametrization of the given virtual element, visibility information indicating whether the contour point is visible or whether it is concealed by said face.
The learning method according to claim 4 when dependent on claim 2, wherein said cost function further depends on a binary cross-entropy associated with the visibility of said contour point.
A method for detecting and modeling at least one object represented in at least one image and/or at least one characteristic region of said at least one image, characterized in that a machine learning system, trained by implementing the learning method according to any one of claims 1 to 5, performs a detection of said at least one object and/or of said at least one characteristic region in said at least one image and performs a determination of the modeling information of said at least one object and/or of said at least one characteristic region.
The detection and modeling method according to claim 6, wherein said machine learning system is trained by implementing the learning method according to claim 2 or according to any one of claims 3 to 5 as dependent on claim 2, and wherein said determination comprises a joint determination: - of said segmentation model of said at least one object and/or of said at least one characteristic region; and - of said set of contour points corresponding to said parametrization of said at least one object and/or of said at least one characteristic region.
The detection and modeling method according to any one of claims 6 to 7, wherein said machine learning system is trained by implementing the learning method according to claim 4 or 5 as dependent on claim 4, wherein said at least one image comprises the representation of a given face, and wherein said machine learning system further determines, for at least one given contour point of said set of contour points corresponding to said parametrization of said at least one object and/or of said at least one characteristic region, visibility information indicating whether the given contour point is visible or whether it is concealed by said given face.
The detection and modeling method according to any one of claims 6 to 8, wherein said at least one image comprises a plurality of images each representing a different view of said at least one object and/or of said at least one characteristic region, and wherein said detection and said determination are implemented jointly for each of said plurality of images.
Computer program product comprising program code instructions for implementing the method according to any of claims 1 to 9, when said program is executed on a computer.
A device for detecting and modeling at least one object represented in at least one image and/or at least one characteristic region of said at least one image, characterized in that it comprises at least one processor and/or at least one dedicated computing machine configured to implement: - generating a plurality of augmented reality images comprising a real image and at least one virtual element representative of said at least one object and/or of said at least one characteristic region; - obtaining, for each augmented reality image, learning information comprising, for at least one given virtual element of said augmented reality image: - a segmentation model of the given virtual element obtained from said given virtual element, and - either a set of contour points corresponding to a parametrization of the given virtual element, or said parametrization, obtained from said given virtual element; and - learning from the plurality of augmented reality images and said learning information, delivering a set of parameters enabling said machine learning system to detect said at least one object and/or said at least one characteristic region in at least one given image and to determine corresponding modeling information comprising: - a segmentation model of said at least one object and/or of said at least one characteristic region, and - either a set of contour points corresponding to a parametrization of said at least one object and/or of said at least one characteristic region, or said parametrization.

Description

Domaine de l'invention Le domaine de l'invention est celui du traitement d'image. L'invention se rapporte plus particulièrement à une méthode de détection et de modélisation d'un objet et/ou d'une zone caractéristique (e.g. des yeux, un nez, etc.) détecté(s) dans une image. L'invention a de nombreuses applications, notamment, mais non exclusivement, pour l'essayage virtuel d'une paire de lunettes. Art antérieur et ses inconvénients On s'attache plus particulièrement dans la suite de ce document à décrire une problématique existante dans le domaine de l'essayage virtuel d'une paire de lunettes à laquelle ont été confrontés les inventeurs de la présente demande de brevet. L'invention ne se limite bien sûr pas à ce domaine particulier d'application, mais présente un intérêt pour la détection et la modélisation de tout type d'objet représenté dans une image et/ou de tout type de zone caractéristique (i.e. une partie d'intérêt de l'image) d'une telle image. Il est connu de l'art antérieur d'utiliser des points caractéristiques de certains objets et/ou de certaines zones caractéristiques afin de détecter les objets et/ou les zones caractéristiques en question. Par exemple, le coin des yeux est classiquement utilisé comme point caractéristique permettant de détecter les yeux d'un individu dans une image. D'autres points caractéristiques peuvent également être envisagés pour la détection d'un visage, tels que le nez ou le coin de la bouche. La qualité de la détection du visage est généralement fonction du nombre et de la position des points caractéristiques utilisés. De telles techniques sont notamment décrites dans le brevet français publié sous le numéro FR 2955409 et dans la demande internationale de brevet publiée sous le numéro WO 2016/135078, de la société déposant la présente demande de brevet. Concernant un objet manufacturé, des arrêtes ou coins peuvent par exemple être envisagés en tant que points caractéristiques. Cependant, l'utilisation de tels points caractéristiques peut conduire à un manque de précision dans la détection, et donc dans la modélisation des objets et/ou zones caractéristiques en question le cas échéant. Alternativement, une annotation manuelle des images est parfois envisagée afin de générer artificiellement les points caractéristiques des objets et/ou des zones caractéristiques considéré(e)s. Cependant, là encore il est constaté un manque de précision dans la détection des objets et/ou zones caractéristiques en question. Le cas échéant, une telle imprécision peut conduire à des problèmes de modélisation des objets et/ou zones caractéristiques ainsi détecté(e)s. WANG YATING ET AL: "Eyeglasses 3D Shape Reconstruction from a Single Face Image", 20 novembre 2020 (2020-11-20), ADVANCES IN INTELLIGENT DATA ANALYSIS XIX; [LECTURE NOTES IN COMPUTER SCIENCE; LECT.NOTES COMPUTER], PAGE(S) 372 - 387, divulgue un système automatique qui récupère la forme 3D des lunettes à partir d'une seule image du visage avec une pose arbitraire de la tête. Pour atteindre cet objectif, on entraîne d'abord un réseau neuronal à effectuer conjointement la détection et la segmentation des repères des lunettes, qui transportent les informations sur la forme des lunettes. YUAN XIAOYUN ET AL: "Magic Glasses: From 2D to 3D",IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, INSTITUTE OF ELECTRICAL AND ELECTRONICS ENGINEERS, US, vol. 27, no. 4, 1 avril 2017 (2017-04-01), pages 843-854, divulgue un système virtuel d'essayage de lunettes en 3D basé sur une image Internet en 2D d'un visage humain portant une paire de lunettes. Il existe ainsi un besoin pour une technique permettant de détecter et de modéliser de manière précise un (ou plusieurs) objet représenté dans une image et/ou une (ou plusieurs) zone caractéristique présente dans l'image en question. Exposé de l'invention Dans un mode de réalisation de l'invention, il est proposé un procédé d'apprentissage d'un système d'apprentissage automatique pour la détection et la modélisation d'au moins un objet représenté dans au moins une image donnée et/ou d'au moins une zone caractéristique de ladite au moins une image donnée. Selon un tel procédé, le système d'apprentissage automatique effectue : une génération d'une pluralité d'images de réalité augmentée comprenant une image réelle et au moins un élément virtuel représentatif dudit au moins un objet et/ou de ladite au moins une zone caractéristique ;une obtention, pour chaque image de réalité augmentée, d'informations d'apprentissage comprenant, pour au moins un élément virtuel donné de l'image de réalité augmentée : un modèle de segmentation de l'élément virtuel donné obtenu à partir de l'élément virtuel donné, etsoit un ensemble de points de contour correspondant à une paramétrisation de l'élément virtuel donné, soit ladite paramétrisation, obtenu(e) à partir dudit élément virtuel donné ; etun apprentissage à partir de la pluralité d'images de réalité augmentée et des informations d'apprentissage, délivra