US-12626822-B2 - Deep learning for automated smile design

US12626822B2US 12626822 B2US12626822 B2US 12626822B2US-12626822-B2

Abstract

A method for displaying teeth after planned orthodontic treatment in order to show persons how their smiles will look after the treatment. The method includes receiving a digital 3D model of teeth or rendered images of teeth, and an image of a person such as a digital photo. The method uses a generator network to produce a generated image of the person showing teeth of the person, the person's smile, after the planned orthodontic treatment. The method uses a discriminator network processing input images, generated images, and real images to train the generator network through deep learning models to product a photo-realistic image of the person after the planned treatment.

Inventors

Cameron M. Fabbri
Wenbo Dong
James L. Graham, II
Cody J. Olson

Assignees

SOLVENTUM INTELLECTUAL PROPERTIES COMPANY

Dates

Publication Date: 20260512
Application Date: 20220805

Claims (20)

1 . A computer-implemented method for displaying teeth after planned orthodontic treatment, comprising: receiving, by a computing device, a digital three-dimension (3D) model of teeth or rendered images of teeth, and an image of a person; processing, by a generator network implemented on the computing device, the digital 3D model and the image of the person to produce a photo-realistic generated image of the person showing teeth of the person after the planned orthodontic treatment of the teeth; training the generator network using adversarial learning by a discriminator network that processes input images, generated images, and real images to improve image realism; and displaying, on a graphical user interface of the computer device, the generated image of the person showing the results of the planned orthodontic treatment.
2 . The method of claim 1 , further comprising receiving a final alignment or stage of the teeth after the planned orthodontic treatment.
3 . The method of claim 2 , further comprising blocking out teeth of the person in the received image.
4 . The method of claim 3 , wherein using the generator network comprises filling in the blocked out teeth in the received image with the final alignment or stage of the teeth.
5 . The method of claim 1 , further comprising using a feature extracting network to extract features from the digital 3D model of teeth or rendered images of teeth for the generator network.
6 . The method of claim 1 , further comprising using a feature extracting network to extract features from the digital 3D model of teeth or rendered images of teeth for the discriminator network.
7 . The method of claim 1 , wherein the image comprises a digital photo.
8 . The method of claim 1 , wherein when the discriminator network is provided with real images, the discriminator network classifies the input images as real in order to train the generator network.
9 . The method of claim 1 , wherein when the discriminator network is provided with generated images, the discriminator network classifies the input images as fake in order to train the generator network.
10 . The method of claim 1 , wherein the planned orthodontic treatment comprises a final stage or setup.
11 . The method of claim 1 , wherein the planned orthodontic treatment comprises an intermediate stage or setup.
12 . A system for displaying teeth after planned orthodontic treatment, comprising a processor configured to: receive, by the system, a digital three-dimension (3D) model of teeth or rendered images of teeth, and an image of a person; process, by a generator network implemented on the computing device, the digital 3D model and the image of the person to produce a photo-realistic generated image of the person showing teeth of the person after the planned orthodontic treatment of the teeth; train the generator network using adversarial learning by a discriminator network that processes input images, generated images, and real images to improve image realism; and display, on a graphical user interface of the system, the generated image of the person showing the results of the planned orthodontic treatment.
13 . The system of claim 12 , wherein the processor is further configured to receive a final alignment or stage of the teeth after the planned orthodontic treatment.
14 . The system of claim 13 , wherein the processor is further configured to block out teeth of the person in the received image.
15 . The system of claim 14 , wherein to use the generator network, the processor is configured to fill in the blocked out teeth in the received image with the final alignment or stage of the teeth.
16 . The system of claim 12 , wherein the processor is further configured to use a feature extracting network to extract features from the digital 3D model of teeth or rendered images of teeth for the generator network.
17 . The system of claim 12 , wherein the processor is further configured to use a feature extracting network to extract features from the digital 3D model of teeth or rendered images of teeth for the discriminator network.
18 . The system of claim 12 , wherein the image comprises a digital photo.
19 . The system of claim 12 , wherein when the discriminator network is provided with real images, the discriminator network classifies the input images as real in order to train the generator network.
20 . The system of claim 12 , wherein when the discriminator network is provided with generated images, the discriminator network classifies the input images as fake in order to train the generator network.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This Application is a national stage filing under 35 U.S.C. § 371 of International Application No. PCT/IB2022/057323, filed 5 Aug. 2022, which claims the benefit of Provisional U.S. Patent Application No. 63/231,823, filed 11 Aug. 2021, the entire disclosure of each of which is incorporated herein by reference. BACKGROUND Orthodontic clear tray aligners allow patients to receive high quality, customizable treatment options. A potential patient may browse past clinical cases if they are considering getting treatment. A high-level overview of one pipeline is as follows: a potential patient will arrive at the doctor's office: the doctor will take a scan of their teeth, extracting a three-dimensional (3D) mesh; and this 3D mesh is processed by an algorithm to produce a mesh of the patient's teeth in their final alignment. A common question from patients is, “what would my new smile look like?” While they have the ability to view previous clinical trials and even the ability to view the 3D mesh of their newly aligned teeth, neither option provides the patient with a true feel of what their teeth and smile may look like after orthodontic treatment. Because of this, potential patients may not be fully committed to receiving treatment. SUMMARY A method for displaying teeth after planned orthodontic treatment includes receiving a digital 3D model of teeth or rendered images of teeth, and an image of a person. The method uses a generator network to produce a generated image of the person showing teeth of the person after the planned orthodontic treatment. The method uses a discriminator network processing input images, generated images, and real images to train the generator network. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a diagram of a system for generating an image of a person's smile showing post-treatment results. FIG. 2 is a diagram of a generator network for the system. FIG. 3 is a diagram of a discriminator network for the system. FIG. 4 shows some results after training the system. DETAILED DESCRIPTION Embodiments include an automated system to generate an image of a potential person's smile showing post-treatment aligner results, before treatment has begun. The system utilizes data including a person's image as well as their corresponding 3D scan in order to learn how to generate a photo-realistic image. Though the system is trained to generate a person's smile from their pre-treatment scan, the scan can be swapped out with a post-treatment scan in order to give the person the ability to view potential post-treatment results. Alternatively or in addition, the system can be used to show persons their appearance after each stage, treatment, or selected stages of treatment. The ability for a person to view a post-treatment photo of themselves smiling, before any treatment has begun, may give them confidence moving forward with the treatment process, as well as help convince those who may be uncertain. Additionally, the person would be able to provide feedback to the doctor or practitioner if any aesthetic changes are requested, and the doctor or practitioner can modify the alignment of the mesh to meet the person's needs. FIG. 1 is a diagram of a system 10 for generating an image of a person's smile showing post-treatment results (21). System 10 includes a processor 20 receiving a digital 3D model (mesh) or rendered images of teeth and an image (e.g., digital photo) of the corresponding person (12). The digital 3D model can be generated from, for example, intra-oral 3D scans or scans of impressions of teeth. System 10 can also include an electronic display device 16, such as a liquid crystal display (LCD) device, and an input device 18 for receiving user commands or other information. Systems to generate digital 3D images or models based upon image sets from multiple views are disclosed in U.S. Pat. Nos. 7,956,862 and 7,605,817, both of which are incorporated herein by reference as if fully set forth. These systems can use an intra-oral scanner to obtain digital images from multiple views of teeth or other intra-oral structures, and those digital images are processed to generate a digital 3D model representing the scanned teeth and gingiva. System 10 can be implemented with, for example, a desktop, notebook, or tablet computer. The system is built upon generative machine or deep learning models known as Generative Adversarial Networks (GANs). This class of algorithms contains a pair of differentiable functions, often deep neural networks, whose goal is to learn an unknown data distribution. The first function, known as the generator, produces a data sample given some input (e.g., random noise, conditional class label, or others). The generator and feature extracting network also receive the pixel-wise difference between the generated and ground truth image in the form of a loss function. The second function, known as the discriminator, attempts to classify the “fake”