US-20260128131-A1 - MACHINE LEARNING FOR PERFORMANCE AND VIABILITY PREDICTION
Abstract
A machine learning system configured to: generate viability predictions by analyzing predicted performance and process conditions using one or more machine learning models; and prioritize development of entities based on the predicted performance and the viability predictions.
Inventors
- John Ata Bachman
- Relly Brandman
- Laura Barker
Assignees
- X DEVELOPMENT LLC
Dates
- Publication Date
- 20260507
- Application Date
- 20251230
Claims (20)
- 1 . A machine learning system comprising: one or more computers; and one or more storage devices communicatively coupled to the one or more computers, wherein the one or more storage devices store instructions that, when executed by the one or more computers, cause the one or more computers to implement: a data collection system configured to collect performance data for a plurality of entities and market data comprising costs for development inputs; a development system configured to predict performance of the entities under different process conditions; and a machine learning analysis system configured to: generate viability predictions by analyzing the predicted performance and process conditions using one or more machine learning models trained on economic data, wherein the economic data includes market data, and wherein the development system is further configured to prioritize development of entities based on the predicted performance and the viability predictions.
- 2 . The system of claim 1 , wherein prioritizing development comprises: generating risk-adjusted economic predictions for each entity; generating rankings of entities based on probability of commercial success; and adjusting development resource allocation based on the rankings.
- 3 . The system of claim 1 , wherein the market data further comprises one or more of feedstock costs, energy costs, labor costs, capital costs, equipment costs, or product market prices.
- 4 . The system of claim 1 , wherein the system is further configured to: identify thresholds for viability; monitor performance data with respect to the thresholds; and automatically adjust development priorities when the performance data indicates a particular threshold will not be met.
- 5 . The system of claim 1 , wherein the development system generates the viability predictions for a plurality of parallel development paths for multiple entities, wherein the development system is configured to dynamically allocate development resources between the parallel development paths based on comparing the viability predictions.
- 6 . The system of claim 1 , wherein the one or more machine learning models comprise one or more of a convolutional neural network, a long-short term memory (LSTM), and a transformer neural network.
- 7 . The system of claim 1 , wherein the performance data comprises one or more of yield data, titer data, productivity data, stability data, or growth rate data.
- 8 . The system of claim 1 , wherein the process conditions comprise one or more of temperature, pH, nutrient concentrations, dissolved oxygen levels, mixing speed, gas flow rates, or nutrient feeding rates.
- 9 . The system of claim 1 , wherein the economic data further comprises production data indicating relationships between production factors and economic outcomes.
- 10 . The system of claim 1 , wherein the system is further configured to simulate scale-up costs for different production scenarios.
- 11 . The system of claim 1 , wherein the system is further configured to predict market-dependent revenue potential.
- 12 . The system of claim 1 , wherein the system is further configured to calculate economic metrics, including return on investment and payback period.
- 13 . The system of claim 1 , wherein the data collection system continuously collects the performance data and the market data, and wherein the system continuously updates the viability predictions during development of the entities.
- 14 . A method performed by one or more computers, the method comprising: collecting performance data for a plurality of entities and market data comprising costs for development inputs; predicting performance of the entities under different process conditions; and generating viability predictions by analyzing the predicted performance and process conditions using one or more machine learning models trained on economic data, wherein the economic data includes market data; prioritizing development of entities based on the predicted performance and the viability predictions.
- 15 . The method of claim 14 , wherein prioritizing development comprises: generating risk-adjusted economic predictions for each entity; generating rankings of entities based on probability of commercial success; and adjusting development resource allocation based on the rankings.
- 16 . The method of claim 14 , wherein the market data further comprises one or more of feedstock costs, energy costs, labor costs, capital costs, equipment costs, or product market prices.
- 17 . The method of claim 14 , further comprising: identify thresholds for viability; monitor performance data with respect to the thresholds; and automatically adjust development priorities when the performance data indicates a particular threshold will not be met.
- 18 . One or more non-transitory computer storage media storing instructions that when executed by one or more computers cause the one or more computers to perform operations comprising: collecting performance data for a plurality of entities and market data comprising costs for development inputs; predicting performance of the entities under different process conditions; and generating viability predictions by analyzing the predicted performance and process conditions using one or more machine learning models trained on economic data, wherein the economic data includes market data; prioritizing development of entities based on the predicted performance and the viability predictions.
- 19 . The non-transitory computer storage media of claim 18 , wherein prioritizing development comprises: generating risk-adjusted economic predictions for each entity; generating rankings of entities based on probability of commercial success; and adjusting development resource allocation based on the rankings.
- 20 . The non-transitory computer storage media of claim 18 , wherein the market data further comprises one or more of feedstock costs, energy costs, labor costs, capital costs, equipment costs, or product market prices.
Description
CROSS-REFERENCE TO RELATED APPLICATIONS This application is a continuation of U.S. patent application Ser. No. 19/338,517, filed on Sep. 24, 2025, which is a continuation of PCT Application No. PCT/US2025/031891, filed on Jun. 2, 2025, which claims priority to U.S. Provisional Patent Application No. 63/655,575, filed on Jun. 3, 2024, and U.S. Provisional Patent Application No. 63/803,471, filed on May 9, 2025, and the disclosure of these applications are incorporated herein by reference in their entirety. Each of the aforementioned earlier-filed applications is hereby incorporated by reference in its entirety. BACKGROUND Most synthetic biology work today is lab-driven, and hence capital intensive, painstaking, expensive, and uncertain. However, the rapid development of AI models in general, as well as in pharma and specific segments within the life sciences, is poised to spur rapid innovation in AI-driven synthetic biology. Competition will emerge as AI, LLMs, and supporting technologies accelerate. These advancements could reduce barriers to entry, contributing to the emergence of a rapidly evolving research and development landscape and marketplace. SUMMARY Embodiments include an AI-guided synthetic biology development platform, systems, and methods substantially as shown and described. Embodiments include a method for providing AI-guided synthetic biology development platform, systems, and methods substantially as shown and described. In embodiments, a computer-implemented method for data integration in an AI-guided analytic platform for development of biologic synthesis processes may comprise: receiving, by a platform, biologic data from a plurality of databases, wherein the biologic data use different data formats and/or semantics; converting the received biologic data into at least one standardized data format to create an integrated dataset; processing the integrated dataset through at least one data normalization process to minimize batch-specific systemic variation; storing the normalized biologic data in a structured format that describes biologic components and their relationships to other components; applying at least one machine learning method to the normalized biologic data to generate at least one predictive model for synthetic biology design; and outputting at least one specification for biologic system design based on the at least one predictive model. In embodiments, the data normalization processes used by the platform may include applying a Bayesian statistical model that incorporates prior knowledge about strain behavior, modeling different sources of variation including biological effects and technical factors, estimating strain performance while accounting for batch effects and other sources of systematic variability, batch effect correction, wherein a batch effect correction addresses systematic variations across at least one of a plurality of experimental runs, equipment, or operators, multi-modal data integration, or some other type of data normalization process. In embodiments, multi-modal data integration may include data relating to at least one of an enzyme level, a metabolite concentration, or a gene expression level. In embodiments, data normalization processes used by the platform may include standardized nomenclature across different data sources, quality control normalization, including flagging an anomalous data point, and/or flagging a well or sample that failed during an experiment. In embodiments, data normalization processes used by the platform may include experiment normalization, such as experiment normalization to account for a variation across a plurality of experimental runs using a similar strain or condition. Experiment normalization used by the platform may implement a statistical method to minimize impact of a technical variation, and/or may use a control sample and spike-in standard for validation. In embodiments, data normalization processes used by the platform may include cross-platform data harmonization, including but not limited to data harmonization that standardizes data from a plurality of experimental platforms and setups. In embodiments, data normalization processes used by the platform may include time series data normalization, wherein the time series data normalization includes normalizing data relating to time-varying growth conditions, wherein the time series data normalization includes normalizing data relating to variations in a feed profile or fermentation parameter. In embodiments, data normalization processes used by the platform may include knowledge graph-based normalization, including but not limited to knowledge graph-based normalization that represents biological entities and relationships in standardized format, knowledge graph-based normalization that associates information across a plurality of experiments or organisms, and/or knowledge graph-based normalization integrates a plurality of biological data types. In embodiments, a