Search

US-20260128121-A1 - PLATFORMS, SYSTEMS, AND METHODS FOR COMPARATIVE ANALYSIS COMPATIBILITY

US20260128121A1US 20260128121 A1US20260128121 A1US 20260128121A1US-20260128121-A1

Abstract

A system may select a first feature and a second feature of the biologic product. The system may determine a first biologic parent having the first feature and not having the second feature, wherein the first feature is based on an aspect of the first biologic parent. The system may determine a second biologic parent having the second feature and not having the first feature, wherein the second feature is based on an aspect of the first biologic parent, and the aspect of the second biologic parent can be combined with the aspect of the first biologic parent. The system may determine a biologic product having the first feature and the second feature, wherein the biologic product is determined based on an evaluation of a set of combinations of the aspect of the first biologic parent and the aspect of the second biologic parent.

Inventors

  • John Ata Bachman
  • Laura Barker
  • Sanaa Mansoor

Assignees

  • X DEVELOPMENT LLC

Dates

Publication Date
20260507
Application Date
20251210

Claims (20)

  1. 1 . A method of generating a biologic product of a biologic synthesis process, comprising: selecting a first feature and a second feature of the biologic product; determining a first biologic parent having the first feature and not having the second feature, wherein the first feature is based on an aspect of the first biologic parent; determining a second biologic parent having the second feature and not having the first feature, wherein the second feature is based on an aspect of the first biologic parent, and the aspect of the second biologic parent can be combined with the aspect of the first biologic parent; and determining a biologic product having the first feature and the second feature, wherein the biologic product is determined based on an evaluation of a set of combinations of the aspect of the first biologic parent and the aspect of the second biologic parent.
  2. 2 . The method of claim 1 , where the aspect of each biologic parent of the first biologic parent and the second biologic parent includes at least one of: a portion of the biologic parent, a structural feature of the biologic parent, a functional feature of the biologic parent, a behavior of the biologic parent, a source of the biologic parent, a metabolic pathway associated with the biologic product, or a biologic condition associated with the biologic product.
  3. 3 . The method of claim 1 , wherein the determination that the aspect of the second biologic parent can be combined with the aspect of the first biologic parent is based on at least one of: a structural requirement of the aspect of each biologic parent, a functional requirement of the aspect of each biologic parent, an environmental requirement of the aspect of each biologic parent, a requirement of a source of the aspect of each biologic parent, a requirement of a metabolic pathway associated with the aspect of each biologic parent, or a requirement of a biologic condition associated with the aspect of each biologic parent.
  4. 4 . The method of claim 1 , wherein the first biologic parent is determined by a machine learning model including an attention feature, and the attention feature associates the aspect of the first biologic parent with the first feature of the first biologic parent.
  5. 5 . The method of claim 1 , wherein the second biologic parent is determined by a machine learning model including an attention feature, and the attention feature associates the aspect of the second biologic parent with the second feature of the second biologic parent.
  6. 6 . The method of claim 1 , wherein determining the second biologic parent includes determining, by a machine learning model including an attention feature, that the aspect of the second biologic parent can be combined with the aspect of the first biologic parent.
  7. 7 . The method of claim 1 , wherein, the determination that the aspect of the second biologic parent can be combined with the aspect of the first biologic parent includes determining a modification of at least one of: the aspect of the first biologic parent, the aspect of the second biologic parent, the biologic synthesis process, or the biologic product, the determination that the aspect of the second biologic parent cannot be combined with the aspect of the first biologic parent based on an absence of the modification, and the determination that the aspect of the second biologic parent can be combined with the aspect of the first biologic parent based on the modification.
  8. 8 . The method of claim 7 , wherein determining the modification includes determining, by a machine learning model including an attention feature, and the attention feature associates the modification with at least one of: the aspect of the first biologic parent, the aspect of the second biologic parent, the biologic synthesis process, or the biologic product.
  9. 9 . The method of claim 1 , wherein the biologic product includes at least one of an enzyme protein, a non-enzyme protein, a DNA sequence, an RNA sequence, a plasmid, a metabolite, a biologic strain, a bioreactor process, or a downstream purification process.
  10. 10 . The method of claim 1 , wherein the biologic synthesis process includes at least one of a DNA synthesis process, an RNA synthesis process, a protein synthesis process, a metabolite synthesis process, a metabolic process, at least one pathway of a metabolic system, a plate growth process, or a fermentation process.
  11. 11 . The method of claim 1 , wherein at least one of the first feature or the second feature includes at least one of a product expression feature, a product activation feature, a product reaction feature, an enzyme cleaning feature, a product stability feature, a product biocompatibility feature, a process rate feature, a process catalyzation rate feature, a process efficiency feature, a process cost feature, or a process yield feature.
  12. 12 . The method of claim 1 , wherein the evaluation of the set of combinations of the first biologic parent and the second biologic parent includes conserving a distance of the set of combinations relative to the first biologic parent.
  13. 13 . The method of claim 12 , wherein the distance includes at least one of an edit distance between the first biologic parent and each combination, a number of edits between the first biologic parent and each combination, a degree of edits between the first biologic parent and each combination, a difference between a measure of the first feature of each combination relative to a measurement of the first feature of the first biologic parent, a structural feature of each combination relative to a corresponding structural feature of the first biologic parent, or a viability score of each combination relative to a corresponding viability score of the first biologic parent.
  14. 14 . The method of claim 1 , wherein the evaluation of the set of combinations of the first biologic parent and the second biologic parent includes selectively evaluating combinations that at least maintain the first feature of the first biologic parent.
  15. 15 . The method of claim 1 , wherein the evaluation of the set of combinations of the first biologic parent and the second biologic parent includes selectively evaluating combinations based on a measurement of the second feature.
  16. 16 . The method of claim 1 , wherein the evaluation of the set of combinations of the first biologic parent and the second biologic parent includes for a respective combination of the first biologic parent and the second biologic parent, jointly measuring the first feature of the respective combination and the second feature of the respective combination.
  17. 17 . The method of claim 16 , wherein jointly measuring the first feature of the respective combination and the second feature of the respective combination includes, determining the first feature of the respective combination according to a first dimension of an evaluation space, determining the second feature of the respective combination according to a second dimension of the evaluation space, and evaluating the respective combination according to a vector representation in the evaluation space, wherein the vector representation is based on the first feature according to the first dimension of the evaluation space and the second feature according to the second dimension of the evaluation space.
  18. 18 . The method of claim 16 , wherein jointly measuring the first feature of the respective combination and the second feature of the respective combination includes, generating a weighted evaluation of the first feature of the respective combination according to a first weight associated with the first feature, generating a weighted evaluation of the second feature of the respective combination according to a second weight associated with the second feature, and evaluating the respective combination according to a combination of the weighted evaluation of the first feature and the weighted evaluation of the second feature.
  19. 19 . A system comprising: one or more computers; and one or more storage devices communicatively coupled to the one or more computers, wherein the one or more storage devices store instructions that, when executed by the one or more computers, cause the one or more computers to perform operations for generating a biologic product of a biologic synthesis process, the operations comprising: selecting a first feature and a second feature of the biologic product; determining a first biologic parent having the first feature and not having the second feature, wherein the first feature is based on an aspect of the first biologic parent; determining a second biologic parent having the second feature and not having the first feature, wherein the second feature is based on an aspect of the first biologic parent, and the aspect of the second biologic parent can be combined with the aspect of the first biologic parent; and determining a biologic product having the first feature and the second feature, wherein the biologic product is determined based on an evaluation of a set of combinations of the aspect of the first biologic parent and the aspect of the second biologic parent.
  20. 20 . One or more non-transitory computer storage media storing instructions that when executed by one or more computers cause the one or more computers to perform operations for generating a biologic product of a biologic synthesis process, the operations comprising: selecting a first feature and a second feature of the biologic product; determining a first biologic parent having the first feature and not having the second feature, wherein the first feature is based on an aspect of the first biologic parent; determining a second biologic parent having the second feature and not having the first feature, wherein the second feature is based on an aspect of the first biologic parent, and the aspect of the second biologic parent can be combined with the aspect of the first biologic parent; and determining a biologic product having the first feature and the second feature, wherein the biologic product is determined based on an evaluation of a set of combinations of the aspect of the first biologic parent and the aspect of the second biologic parent.

Description

CROSS-REFERENCE TO RELATED APPLICATIONS This application is a continuation of PCT Application No. PCT/US2025/031891, filed on Jun. 2, 2025, which claims priority to U.S. Provisional Patent Application No. 63/655,575, filed on Jun. 3, 2024, and U.S. Provisional Patent Application No. 63/803,471, filed on May 9, 2025. and the disclosure of these applications are incorporated herein by reference in their entirety. Each of the aforementioned earlier-filed applications is hereby incorporated by reference in its entirety. BACKGROUND Most synthetic biology work today is lab-driven, and hence capital intensive, painstaking, expensive, and uncertain. However, the rapid development of AI models in general, as well as in pharma and specific segments within the life sciences, is poised to spur rapid innovation in AI-driven synthetic biology. Competition will emerge as AI, LLMs, and supporting technologies accelerate. These advancements could reduce barriers to entry, contributing to the emergence of a rapidly evolving research and development landscape and marketplace. SUMMARY Embodiments include an AI-guided synthetic biology development platform, systems, and methods substantially as shown and described. Embodiments include a method for providing AI-guided synthetic biology development platform, systems, and methods substantially as shown and described. In embodiments, a computer-implemented method for data integration in an AI-guided analytic platform for development of biologic synthesis processes may comprise: receiving, by a platform, biologic data from a plurality of databases, wherein the biologic data use different data formats and/or semantics; converting the received biologic data into at least one standardized data format to create an integrated dataset; processing the integrated dataset through at least one data normalization process to minimize batch-specific systemic variation; storing the normalized biologic data in a structured format that describes biologic components and their relationships to other components; applying at least one machine learning method to the normalized biologic data to generate at least one predictive model for synthetic biology design; and outputting at least one specification for biologic system design based on the at least one predictive model. In embodiments, the data normalization processes used by the platform may include applying a Bayesian statistical model that incorporates prior knowledge about strain behavior, modeling different sources of variation including biological effects and technical factors, estimating strain performance while accounting for batch effects and other sources of systematic variability, batch effect correction, wherein a batch effect correction addresses systematic variations across at least one of a plurality of experimental runs, equipment, or operators, multi-modal data integration, or some other type of data normalization process. In embodiments, multi-modal data integration may include data relating to at least one of an enzyme level, a metabolite concentration, or a gene expression level. In embodiments, data normalization processes used by the platform may include standardized nomenclature across different data sources, quality control normalization, including flagging an anomalous data point, and/or flagging a well or sample that failed during an experiment. In embodiments, data normalization processes used by the platform may include experiment normalization, such as experiment normalization to account for a variation across a plurality of experimental runs using a similar strain or condition. Experiment normalization used by the platform may implement a statistical method to minimize impact of a technical variation, and/or may use a control sample and spike-in standard for validation. In embodiments, data normalization processes used by the platform may include cross-platform data harmonization, including but not limited to data harmonization that standardizes data from a plurality of experimental platforms and setups. In embodiments, data normalization processes used by the platform may include time series data normalization, wherein the time series data normalization includes normalizing data relating to time-varying growth conditions, wherein the time series data normalization includes normalizing data relating to variations in a feed profile or fermentation parameter. In embodiments, data normalization processes used by the platform may include knowledge graph-based normalization, including but not limited to knowledge graph-based normalization that represents biological entities and relationships in standardized format, knowledge graph-based normalization that associates information across a plurality of experiments or organisms, and/or knowledge graph-based normalization integrates a plurality of biological data types. In embodiments, a predictive model used by the platform may include, but is not limited to, a long-short term mem