US-12620457-B2 - Retrosynthesis using neural networks

US12620457B2US 12620457 B2US12620457 B2US 12620457B2US-12620457-B2

Abstract

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for performing retrosynthesis using a neural network. One of the methods includes generating a prediction of a set of a plurality of predicted reactants that are combinable to generate a target compound, the generating comprising processing, for each of a plurality of candidate sets of reactants, a network input characterizing the candidate set using a neural network, determining, for each candidate set of the plurality of candidate sets, a score using the generated probabilities; and selecting a particular candidate set of one or more reactants using the determined scores.

Inventors

Ruoxi Sun
Li Li
Bo Dai
Hanjun Dai

Assignees

GOOGLE LLC

Dates

Publication Date: 20260505
Application Date: 20210628

Claims (20)

1 . A method for generating a prediction of a set of a plurality of predicted reactants that are combinable to generate a target compound, the generating comprising: processing, for each of a plurality of candidate sets of reactants, a network input characterizing the candidate set using a neural network, the network input being generated from (i) data identifying the target compound and (ii) data identifying the candidate set of reactants, the processing comprising: processing the network input using a first subnetwork to generate a predicted probability of the candidate set of reactants that represents a likelihood of the candidate set of reactants according to a data distribution of sets of predicted reactants and target compounds; processing the network input using a second subnetwork to generate a predicted conditional probability of the target compound conditioned on the candidate set of reactants according to the data distribution; processing the network input using a third subnetwork to generate a predicted conditional probability of the candidate set of reactants conditioned on the target compound according to the data distribution; determining, for each candidate set of the plurality of candidate sets, a score using the generated probabilities; and selecting a particular candidate set of one or more reactants using the determined scores.
2 . The method of claim 1 , wherein the score for a particular candidate set is proportional to: log p ( X )+log p ( y|X )+log p ( X|y ), wherein X is the candidate set of reactants and y is the target compound.
3 . The method of claim 1 , wherein the neural network has been trained by: obtaining a training example characterizing i) a training set of reactants and ii) a training target compound, processing the training example using the neural network, comprising: processing the training example using the first subnetwork to generate a first subnetwork output, processing the training example using the second subnetwork to generate a second subnetwork output, and processing the training example using the third subnetwork to generate a third subnetwork output-corresponding to the third subnetwork, and determining an update to a plurality of parameters of the neural network in order to minimize a dual loss characterizing, for the training set of reactants, a difference between i) a first predicted joint probability of the training set of reactants and the training target compound defined by the first subnetwork output and the second subnetwork output and ii) a second predicted joint probability of the training set of reactants and the training target compound defined by the third subnetwork output.
4 . The method of claim 3 , wherein the dual loss penalizes a KL divergence between i) a distribution corresponding to the first predicted joint probability and ii) a distribution corresponding to the second predicted joint probability.
5 . The method of claim 3 , wherein the training comprises: sampling the training example from an empirical data distribution; and determining the dual loss according to the empirical data distribution.
6 . The method of claim 5 , wherein the dual loss includes a term Ê[log p(X)+log p(y|X)], where Ê[⋅] indicates an expectation over the empirical data distribution.
7 . The method of claim 5 , wherein the dual loss includes a term Ê[log p(X|y)], where Ê[⋅] indicates an expectation over the empirical data distribution.
8 . The method of claim 5 , wherein the training comprises: sampling, from a predicted data distribution corresponding to the third subnetwork, a first example comprising i) a first set of reactants and ii) a first target compound; processing a first subnetwork input corresponding to the first example using the first subnetwork to generate the first subnetwork output; processing a second subnetwork input corresponding to the first example using the second subnetwork to generate the second subnetwork output; and determining the dual loss according to the predicted data distribution, the first subnetwork output, and the second subnetwork output.
9 . The method of claim 8 , wherein the dual loss includes a term β Ê y E X|y [log p ( X )+log p ( y|X )] wherein Ê y [⋅] indicates an expectation over an empirical data distribution, E X|y indicates an expectation over the predicted data distribution corresponding to the third subnetwork, and β is a weight value.
10 . The method of claim 1 , wherein the first subnetwork is a transformer neural network comprising: an encoder configured to process a placeholder input and to generate an encoder output; and a decoder configured to process i) the encoder output and ii) an input characterizing a set of reactants to generate the first subnetwork output.
11 . The method of claim 1 , wherein the second subnetwork is a transformer neural network comprising: an encoder configured to process an input characterizing a set of reactants and to generate an encoder output; and a decoder configured to process i) the encoder output and ii) an input characterizing the target compound and to generate the second subnetwork output.
12 . The method of claim 1 , wherein the third subnetwork is a transformer neural network comprising: an encoder configured to process an input characterizing the target compound and to generate an encoder output; and a decoder configured to process i) the encoder output and ii) an input characterizing a set of reactants and to generate the third subnetwork output.
13 . The method of claim 1 , further comprising performing in silico screening of a plurality of candidate compounds to select the target compound.
14 . The method of claim 1 , wherein selecting a particular candidate set of one or more reactants comprises, for each candidate set of the plurality of candidate sets, determining a respective final score using the score corresponding to the candidate set and one or more of: a number of retrosynthesis steps, a selection of solvents and/or reagents, or a temperature of retrosynthesis.
15 . The method of claim 1 , further comprising providing data characterizing the particular candidate set of one or more reactants to a robotic synthesis system for synthesizing the target compound.
16 . A system comprising one or more computers and one or more storage devices storing instructions that when executed by the one or more computers cause the one or more computers to perform operations for generating a prediction of a set of a plurality of predicted reactants that are combinable to generate a target compound, the generating comprising: processing, for each of a plurality of candidate sets of reactants, a network input characterizing the candidate set using a neural network, the network input being generated from (i) data identifying the target compound and (ii) data identifying the candidate set of reactants, and the processing comprising: processing the network input using a first subnetwork to generate a predicted probability of the candidate set of reactants that represents a likelihood of the candidate set of reactants according to a data distribution of sets of predicted reactants and target compounds; processing the network input using a second subnetwork to generate a predicted conditional probability of the target compound conditioned on the candidate set of reactants according to the data distribution; processing the network input using a third subnetwork to generate a predicted conditional probability of the candidate set of reactants conditioned on the target compound according to the data distribution; determining, for each candidate set of the plurality of candidate sets, a score using the generated probabilities; and selecting a particular candidate set of one or more reactants using the determined scores.
17 . One or more non-transitory computer-readable storage media storing instructions that when executed by one or more computers cause the one more computers to perform operations for generating a prediction of a set of a plurality of predicted reactants that are combinable to generate a target compound, the generating comprising: processing, for each of a plurality of candidate sets of reactants, a network input characterizing the candidate set using a neural network, the network input being generated from (i) data identifying the target compound and (ii) data identifying the candidate set of reactants, and the processing comprising: processing the network input using a first subnetwork to generate a predicted probability of the candidate set of reactants that represents a likelihood of the candidate set of reactants according to a data distribution of sets of predicted reactants and target compounds; processing the network input using a second subnetwork to generate a predicted conditional probability of the target compound conditioned on the candidate set of reactants according to the data distribution; processing the network input using a third subnetwork to generate a predicted conditional probability of the candidate set of reactants conditioned on the target compound according to the data distribution; determining, for each candidate set of the plurality of candidate sets, a score using the generated probabilities; and selecting a particular candidate set of one or more reactants using the determined scores.
18 . The system of claim 16 , wherein the score for a particular candidate set is proportional to: log p ( X )+log p ( y|X )+log p ( X|y ), wherein X is the candidate set of reactants and y is the target compound.
19 . The system of claim 16 , wherein the neural network has been trained by: obtaining a training example characterizing i) a training set of reactants and ii) a training target compound, processing the training example using the neural network, comprising: processing the training example using the first subnetwork to generate a first subnetwork output, processing the training example using the second subnetwork to generate a second subnetwork output, and processing the training example using the third subnetwork to generate a third subnetwork output, and determining an update to a plurality of parameters of the neural network in order to minimize a dual loss characterizing, for the training set of reactants, a difference between i) a first predicted joint probability of the training set of reactants and the training target compound defined by the first subnetwork output and the second subnetwork output and ii) a second predicted joint probability of the training set of reactants and the training target compound defined by the third subnetwork output.
20 . The system of claim 19 , wherein the dual loss penalizes a KL divergence between i) a distribution corresponding to the first predicted joint probability and ii) a distribution corresponding to the second predicted joint probability.

Description

CROSS-REFERENCE TO RELATED APPLICATION This application is a National Stage Application under 35 U.S.C. § 371 of International Application No. PCT/US2021/039416, filed on Jun. 28, 2021, which claims priority to U.S. Provisional Application Ser. No. 63/044,991, filed Jun. 26, 2020, the entirety of which are herein incorporated by reference. BACKGROUND This specification relates to neural networks. Neural networks are machine learning models that employ one or more layers of nonlinear units to predict an output for a received input. Some neural networks include one or more hidden layers in addition to an output layer. The output of each hidden layer is used as input to one or more other layers in the network, i.e., one or more other hidden layers, the output layer, or both. Each layer of the network generates an output from a received input in accordance with current values of a respective set of parameters. SUMMARY This specification describes a system implemented as computer programs on one or more computers in one or more locations that performs retrosynthesis using a neural network. Retrosynthesis is the process of determining, from a target chemical compound, a set of chemical reactants for synthesizing the target compound. In one aspect there is described a method for generating a prediction of a set of a plurality of predicted reactants that are combinable to generate a target compound. The generating comprises processing, for each of a plurality of candidate sets of reactants, a network input characterizing the candidate set using a neural network. The processing comprises processing the network input using a first subnetwork to generate a predicted prior probability of the candidate set of reactants, in particular a prior probability according to a data distribution of sets of predicted reactants and target compounds. For example the prior probability may be a prior probability that the set of a plurality of predicted reactants is combinable to generate the target compound. The processing also comprises processing the network input using a second subnetwork to generate a predicted conditional probability of the target compound conditioned on the candidate set of reactants according to the (empirical) data distribution. The processing also comprises processing the network input using a third subnetwork to generate a predicted conditional probability of the candidate set of reactants conditioned on the target compound according to the (e.g. empirical) data distribution. The method further comprises determining, for each candidate set of the plurality of candidate sets, a score using the generated probabilities, and selecting a particular candidate set of one or more reactants using the determined scores. In some implementations the method performs one-step retrosynthesis but in such implementations multi-step retrosynthesis may be performed by applying the method recursively. That is, in some implementations the method generates a prediction of a set of a plurality of predicted reactants that are combinable in one-step without having first to make intermediate molecules, to generate the target compound. The network input characterizing the candidate set of reactants may be e.g. any type of representation of the set of reactants. For example in some implementations the network input represents the set of reactants as a one-dimensional sequence of tokens e.g. each reactant may be represented as a string of characters. There are various standardized approaches for representing chemical structures in this way e.g. based around SMILES (simplified molecular-input line-entry system) or a variant thereof; or using other linear notations. In implementations the method includes synthesizing the target compound, i.e. physically combining the particular candidate set of one or more reactants selected by the method to chemically synthesize, i.e. to physically generate, the target compound. The above-described data distribution may be an empirical data distribution i.e. a data distribution that is determined by experimental data (from physical molecules) e.g. that has been learned during training. The subject matter described in this specification can be implemented in particular embodiments so as to realize one or more of the following advantages. Using techniques described in this specification, a system can automatically perform retrosynthesis in an efficient and accurate manner to determine an optimal set of reactants to use to synthesize a target compound. The search space of possible sets of reactants for synthesizing a particular target compound can be very large, growing exponentially with the number of individual reactants considered. For example, even when only considering 1000 unique reactants that can be included in a set of reactants to synthesize a target compound, the number of candidate sets that can be generated is on the order of 10300. A system can use the techniques described herein to drasticall